Last week in Stream Processing & Analytics – 13.02.2017

This is the 53rd edition of my blog series blog series around Stream Processing and Analytics!

Starting last week, I’m now updating the following two lists with the presentations and videos of each week:

As usual, find below the new blog articles, presentations, videos and software releases from last week:

News and Blog Posts

General

Spark Streaming

Apache Kafka / Kafka Streams

Apache Flink

Apache NiFi / Hortonworks HDF

New Presentations

New Videos

New Releases

New Code Samples

Upcoming Events

Please let me know if that is of interest. Please tweet your projects, blog posts, and meetups to @gschmutz to get them listed in next week’s edition!

Last week in Stream Processing & Analytics 12/26/2016

Happy Holidays!

This is the Christmas edition and therefore rather compact! It’s 46th installment of my blog series around Stream Processing and Analytics.

As usual, find below the new blog articles, presentations, videos and software releases from last week:

News and Blog Posts

General

Spark Streaming

Apache Flink

StreamSets

Apache NiFi / Hortonworks Data Flow (HDF)

New Presentations

New Releases

Upcoming Events

Please let me know if that is of interest. Please tweet your projects, blog posts, and meetups to @gschmutz to get them listed in next week’s edition!

Last week in Stream Processing & Analytics 5/9/2016

This is the 13th installment of my blog series around Stream Processing and Analytics.

Last week the new release of Oracle Stream Explorer has been release, now under a new name Oracle Stream Analytics. I have written my own blog article about it. This new version is an impressive release with over 15 new major features! It really deserves the name change. Oracle Stream Analytics simplifies stream processing and enables Self Service Streaming Analytics applications for business people. It is based on the idea of a “streaming excel sheet”, allowing a business analyst to work in a way he is used from excel, but instead of working on static data, the data constantly changes based on the incoming stream(s).

For those not able to attend the Hadoop Summit in Dublin last month (like mysellf), all the sessions and slides are now available online for free!

Apart from that the week was a bit more quiet than previous weeks. As usual, find below the new blog articles, presentations, videos and software releases from last week:

News and Blog Posts

General

Apache Storm

Apache Spark Streaming

Apache Flink

Apache Beam

Apache Apex

Apache Kafka

StreamSets

Microsoft Azure Stream Analytics

Oracle Stream Analytics

New Presentations

New Videos

New Releases

Upcoming Events

Please let me know if that is of interest. Please tweet your projects, blog posts, and meetups to @gschmutz to get them listed in next week’s edition!

Last week in Stream Processing & Analytics 2/29/2016

This is the third installment of my blog series around Stream Processing and Analytics. It’s the first time where I have to squeeze it into my regular schedule ;-).

As promised in my last weeks blog article, this week the presentations and videos related to Stream Processing fro the Spark Summit 2016 are included.

News and Blog Posts

General

Comparisons

Apache Storm

Apache Spark Streaming

Apache Flink

Apache NiFi / Hortonworks DataFlow

Apache Kafka

Apache Beam / Google DataFlow SDK

StreamSets

Microsoft Azure Stream Analytics

IBM Bluemix

IBM Quarks

New Presentations

New Videos

New Books

New Releases / Components

Upcoming Events

Please let me know if that is of interest. Please tweet your projects, blog posts, and meetups to @gschmutz to get them listed in next week’s edition!

 

Last week in Stream Processing & Analytics 2/21/2016

This is the second installment of my blog series around Stream Processing and Analytics. I will now publish by the end of the week (on Sunday evening or Monday evening) that’s why the title “Last week in …” is more appropriate 😉

This might not surprise you, but I just realized, that it’s just not feasible to have the intention to cover every single news/blog article which is new in this week. I also decided that I will not cover any link to something which is older than the last 7 days (with one or two exceptions where I just thought the content is to valuable). I’m also not covering all possible products/frameworks which are out there. The idea is to concentrate mainly on the innovation in the open source space around “Streaming Analytics”, but to also cover some products from commercial vendors such as Oracle and IBM.

There was the Spark Summit East last week and there will be some interesting new features in the area of Spark Streaming in Spark 2.0, notably Structured Streaming which unifies streaming, interactive and batch query supporting SQL queries on streaming data. There was the possibility to follow the live streaming of the presentation from remote, which I enjoyed the 2nd day, but unfortunately the presentations and videos from the event are not yet available. I will include them in my next weeks post.

Kafka Connect has been officially announced last week, a new feature in Kafka 0.9+ that makes building and managing stream data pipelines easier, especially in the area of data capture. It supports the data integration part of the Kafka Stream Data Platform. There is now quite a variety of ways for handling the data ingestion part of a data processing system, such as Flume, Apache NiFi, StreamSets and Kafka Connect, complementary in some areas, overlapping in others. It’s definitely an area worth investigating and controlled data ingestion becomes even more important in the world of Internet of Things (IoT).

Last week Cloudera  announced their support of Kafka 0.9 with their Release 2.0 of Kafka’s distribution. At the same time they also published an interesting article on the Cloudera VISION blog, highlighting the maturity and importance of Kafka in modern data processing infrastructure: “While Kafka remains a young technology in the now 10-year-old Hadoop ecosystem, it has unequivocally reached the point of being enterprise-grade software, suitable for mission critical deployments”.

Last but not least two new projects have been announced:

  • Apache Arrow, a new open source project with the goal to deliver “an industry-standard columnar in-memory data layer enables users to combine multiple systems, applications and programming languages in a single workload without the usual overhead” according to Ted Dunning.
  • IBM Quarks, an open source development tool that makes it easier for developers to create Internet of Things (IoT) applications to analyze data on the edge of their networks.

News and Blog Posts

General

Comparisons

Apache Storm

Apache Spark Streaming

Apache Samza

Apache Flink

Apache NiFi / Hortonworks DataFlow

Apache Kafka / Confluent Platform

StreamSets

Microsoft Azure Stream Analytics

IBM Bluemix

IBM Quarks

Apache Arrow

New Presentations

New Videos

New Releases / Components

Upcoming Events

Please let me know if that is of interest. Please tweet your projects, blog posts, and meetups to @gschmutz to get them listed in next week’s edition!