Last week in Stream Processing & Analytics 2/29/2016

This is the third installment of my blog series around Stream Processing and Analytics. It’s the first time where I have to squeeze it into my regular schedule ;-).

As promised in my last weeks blog article, this week the presentations and videos related to Stream Processing fro the Spark Summit 2016 are included.

News and Blog Posts

General

Comparisons

Apache Storm

Apache Spark Streaming

Apache Flink

Apache NiFi / Hortonworks DataFlow

Apache Kafka

Apache Beam / Google DataFlow SDK

StreamSets

Microsoft Azure Stream Analytics

IBM Bluemix

IBM Quarks

New Presentations

New Videos

New Books

New Releases / Components

Upcoming Events

Please let me know if that is of interest. Please tweet your projects, blog posts, and meetups to @gschmutz to get them listed in next week’s edition!

 

Last week in Stream Processing & Analytics 2/21/2016

This is the second installment of my blog series around Stream Processing and Analytics. I will now publish by the end of the week (on Sunday evening or Monday evening) that’s why the title “Last week in …” is more appropriate 😉

This might not surprise you, but I just realized, that it’s just not feasible to have the intention to cover every single news/blog article which is new in this week. I also decided that I will not cover any link to something which is older than the last 7 days (with one or two exceptions where I just thought the content is to valuable). I’m also not covering all possible products/frameworks which are out there. The idea is to concentrate mainly on the innovation in the open source space around “Streaming Analytics”, but to also cover some products from commercial vendors such as Oracle and IBM.

There was the Spark Summit East last week and there will be some interesting new features in the area of Spark Streaming in Spark 2.0, notably Structured Streaming which unifies streaming, interactive and batch query supporting SQL queries on streaming data. There was the possibility to follow the live streaming of the presentation from remote, which I enjoyed the 2nd day, but unfortunately the presentations and videos from the event are not yet available. I will include them in my next weeks post.

Kafka Connect has been officially announced last week, a new feature in Kafka 0.9+ that makes building and managing stream data pipelines easier, especially in the area of data capture. It supports the data integration part of the Kafka Stream Data Platform. There is now quite a variety of ways for handling the data ingestion part of a data processing system, such as Flume, Apache NiFi, StreamSets and Kafka Connect, complementary in some areas, overlapping in others. It’s definitely an area worth investigating and controlled data ingestion becomes even more important in the world of Internet of Things (IoT).

Last week Cloudera  announced their support of Kafka 0.9 with their Release 2.0 of Kafka’s distribution. At the same time they also published an interesting article on the Cloudera VISION blog, highlighting the maturity and importance of Kafka in modern data processing infrastructure: “While Kafka remains a young technology in the now 10-year-old Hadoop ecosystem, it has unequivocally reached the point of being enterprise-grade software, suitable for mission critical deployments”.

Last but not least two new projects have been announced:

  • Apache Arrow, a new open source project with the goal to deliver “an industry-standard columnar in-memory data layer enables users to combine multiple systems, applications and programming languages in a single workload without the usual overhead” according to Ted Dunning.
  • IBM Quarks, an open source development tool that makes it easier for developers to create Internet of Things (IoT) applications to analyze data on the edge of their networks.

News and Blog Posts

General

Comparisons

Apache Storm

Apache Spark Streaming

Apache Samza

Apache Flink

Apache NiFi / Hortonworks DataFlow

Apache Kafka / Confluent Platform

StreamSets

Microsoft Azure Stream Analytics

IBM Bluemix

IBM Quarks

Apache Arrow

New Presentations

New Videos

New Releases / Components

Upcoming Events

Please let me know if that is of interest. Please tweet your projects, blog posts, and meetups to @gschmutz to get them listed in next week’s edition!

 

This week in Stream Processing & Streaming Analytics 2/13/2016

Update 14.2: Have added Google Dataflow / Apache Beam

Inspired by “This Week in Cassandra”, I will start collecting and documenting once a week the latest news in the world of “Stream Processing & Streaming Analytics” platforms and framework. I will look at new projects/subprojects, blog posts and new and upcoming features around Open Source, Oracle, IBM and others.

So far it’s not planned to do that in a live discussion, as the folks at DataStax do. But maybe that could be something for the future.

Before starting with the news of the last week, here are some classical / evergreens around the topic of Stream Processing / Streaming Analytics:

News and Blog Posts

General

Comparisons

Apache Storm

Apache Spark Streaming

Apache Samza

Apache Flink

Google Dataflow / Apache Beam

Apache NiFi

Apache Kafka / Confluent Platform

Oracle Stream Xplorer / Oracle Event Processing

Others

New Presentations

New Releases / Components

Upcoming Events

Please let me know if that is of interest. Please tweet your projects, blog posts, and meetups to @gschmutz to get them listed in next week’s edition!