Last Week in Stream Processing & Analytics – 7.8.2017

This is the 76th edition of my blog series blog series around Stream Processing and Analytics!

As every week I was also updating the following two lists with the presentations/videos of the current week:

As usual, find below the new blog articles, presentations, videos and software releases from last week:

News and Blog Posts

General

Apache Kafka / Kafka Streams / Confluent Platform

Apache Flink

Apache NiFi / Hortonworks HDF

New Presentations

New Videos

New Releases

Upcoming Events

Please let me know if that is of interest. Please tweet your projects, blog posts, and meetups to @gschmutz to get them listed in next week’s edition!

 

Last week in Stream Processing & Analytics – 5.6.2017

This is the 69th edition of my blog series blog series around Stream Processing and Analytics!

As every week I was also updating the following two lists with the presentations/videos of the current week:

As usual, find below the new blog articles, presentations, videos and software releases from last week:

News and Blog Posts

General

Comparison

Apache Kafka / Kafka Streams / Confluent Platform

Spark Streaming

Storm

Heron

Apache Flink

Apache NiFi

Apache Ignite

New Presentations

New Videos

New Podcast

New Releases

Upcoming Events

Please let me know if that is of interest. Please tweet your projects, blog posts, and meetups to @gschmutz to get them listed in next week’s edition!

Last week in Stream Processing & Analytics – 17.5.2017

This is the 66th edition of my blog series blog series around Stream Processing and Analytics!

As every week I was also updating the following two lists with the presentations/videos of the current week:

As usual, find below the new blog articles, presentations, videos and software releases from last week:

News and Blog Posts

General

Apache Kafka / Kafka Streams / Confluent Platform

Spark Streaming

Apache Beam

Apache Flink

StreamSets

Apache NiFi

New Presentations

New Video

New Releases

Upcoming Events

Please let me know if that is of interest. Please tweet your projects, blog posts, and meetups to @gschmutz to get them listed in next week’s edition!

Last week in Stream Processing & Analytics 11/21/2016

This is the 41st installment of my blog series around Stream Processing and Analytics.

As usual, find below the new blog articles, presentations, videos and software releases from last week:

News and Blog Posts

General

Apache Kafka

Apache Storm

Apache Flink

Apache NiFi / Hortonworks Data Flow (HDF)

New Presentations

New Videos

New Podcasts

New Releases

Upcoming Events

Please let me know if that is of interest. Please tweet your projects, blog posts, and meetups to @gschmutz to get them listed in next week’s edition!

Last week in Stream Processing & Analytics 9/13/2016

This is the 31st installment of my blog series around Stream Processing and Analytics.

As usual, find below the new blog articles, presentations, videos and software releases from last week:

News and Blog Posts

General

Comparison

Spark Streaming

Apache Kafka

Apache Flink

Apache Apex

StreamSets

Apache NiFi / Hortonworks Data Flow (HDF)

New Presentations

New Videos

New Releases

New Books

Upcoming Events

Please let me know if that is of interest. Please tweet your projects, blog posts, and meetups to @gschmutz to get them listed in next week’s edition!

Last week in Stream Processing & Analytics 5/24/2016

This is the 15th installment of my blog series around Stream Processing and Analytics.

I’m once more a day late with this article, if it wouldn’t be because work, I could just argue that I have waited for the next version of Kafka being released 🙂 This was the case last night, with Kafka 0.10 and the Confluent Platform 3.0. Here two of the many tweets announcing the release:

Here are my 3 favorite new features in this new release:

  • Kafka Streams: a library that turns Apache Kafka into a full featured, modern stream processing system. It includes a high level language for describing common stream operations (such as joining, filtering, and aggregating records), allowing developers to quickly develop powerful streaming applications. Kafka Streams offers a true event-at-a-time processing model, handles out-of-order data, allows stateful and stateless processing and can easily be deployed on many different systems.
  • Timestamps in Messages: every message in Kafka now has a timestamp field that indicates the time at which the message was produced. This enables stream processing by event-time in Kafka Streams.
  • Rack Awareness: isolates replicas so they are guaranteed to span multiple racks or availability zones. It allows all of Kafka’s durability guarantees to be applied to these larger architectural units, significantly increasing resilience and availability.

On a side note: Camus, a tool in the Kafka ecosystem used for ingesting Kafka messages into HDFS is no deprecated. To export data from Kafka to HDFS and Hive, Kafka Connect is the new recommended way.

As usual, find below the new blog articles, presentations, videos and software releases from last week:

News and Blog Posts

General

Apache Beam / Google Dataflow

Apache Storm

Apache Apex

Apache Flink

Apache Kafka

Apache Quarks

Apache NiFi / Hortonworks HDF

Microsoft Azure Stream Analytics

New Presentations

New Videos

New Releases

Upcoming Events

Please let me know if that is of interest. Please tweet your projects, blog posts, and meetups to @gschmutz to get them listed in next week’s edition!

Last week in Stream Processing & Analytics 2/21/2016

This is the second installment of my blog series around Stream Processing and Analytics. I will now publish by the end of the week (on Sunday evening or Monday evening) that’s why the title “Last week in …” is more appropriate 😉

This might not surprise you, but I just realized, that it’s just not feasible to have the intention to cover every single news/blog article which is new in this week. I also decided that I will not cover any link to something which is older than the last 7 days (with one or two exceptions where I just thought the content is to valuable). I’m also not covering all possible products/frameworks which are out there. The idea is to concentrate mainly on the innovation in the open source space around “Streaming Analytics”, but to also cover some products from commercial vendors such as Oracle and IBM.

There was the Spark Summit East last week and there will be some interesting new features in the area of Spark Streaming in Spark 2.0, notably Structured Streaming which unifies streaming, interactive and batch query supporting SQL queries on streaming data. There was the possibility to follow the live streaming of the presentation from remote, which I enjoyed the 2nd day, but unfortunately the presentations and videos from the event are not yet available. I will include them in my next weeks post.

Kafka Connect has been officially announced last week, a new feature in Kafka 0.9+ that makes building and managing stream data pipelines easier, especially in the area of data capture. It supports the data integration part of the Kafka Stream Data Platform. There is now quite a variety of ways for handling the data ingestion part of a data processing system, such as Flume, Apache NiFi, StreamSets and Kafka Connect, complementary in some areas, overlapping in others. It’s definitely an area worth investigating and controlled data ingestion becomes even more important in the world of Internet of Things (IoT).

Last week Cloudera  announced their support of Kafka 0.9 with their Release 2.0 of Kafka’s distribution. At the same time they also published an interesting article on the Cloudera VISION blog, highlighting the maturity and importance of Kafka in modern data processing infrastructure: “While Kafka remains a young technology in the now 10-year-old Hadoop ecosystem, it has unequivocally reached the point of being enterprise-grade software, suitable for mission critical deployments”.

Last but not least two new projects have been announced:

  • Apache Arrow, a new open source project with the goal to deliver “an industry-standard columnar in-memory data layer enables users to combine multiple systems, applications and programming languages in a single workload without the usual overhead” according to Ted Dunning.
  • IBM Quarks, an open source development tool that makes it easier for developers to create Internet of Things (IoT) applications to analyze data on the edge of their networks.

News and Blog Posts

General

Comparisons

Apache Storm

Apache Spark Streaming

Apache Samza

Apache Flink

Apache NiFi / Hortonworks DataFlow

Apache Kafka / Confluent Platform

StreamSets

Microsoft Azure Stream Analytics

IBM Bluemix

IBM Quarks

Apache Arrow

New Presentations

New Videos

New Releases / Components

Upcoming Events

Please let me know if that is of interest. Please tweet your projects, blog posts, and meetups to @gschmutz to get them listed in next week’s edition!