Last week in Stream Processing & Analytics 5/30/2016

This is the 16th installment of my blog series around Stream Processing and Analytics.

Last week saw the open sourcing of Twitter Heron, yet another stream processing system created by Twitter. Heron has been powering all of Twitter’s real-time analytics for over two years. Heron is backward compatible with the Storm ecosystem, the stream processing solution previously used by Twitter.  Joe Stein already created an article showing how to get started with Heron on Apache Mesos.

As usual, find below the new blog articles, presentations, videos and software releases from last week:

News and Blog Posts

General

Comparision

Heron

Apache Beam / Google Dataflow

Apache Spark Streaming

Apache Kafka

Apache NiFi / Hortonworks HDF

New Presentations

New Videos

Upcoming Events

Please let me know if that is of interest. Please tweet your projects, blog posts, and meetups to @gschmutz to get them listed in next week’s edition!

Last week in Stream Processing & Analytics 5/24/2016

This is the 15th installment of my blog series around Stream Processing and Analytics.

I’m once more a day late with this article, if it wouldn’t be because work, I could just argue that I have waited for the next version of Kafka being released:-) This was the case last night, with Kafka 0.10 and the Confluent Platform 3.0. Here two of the many tweets announcing the release:

Here are my 3 favorite new features in this new release:

  • Kafka Streams: a library that turns Apache Kafka into a full featured, modern stream processing system. It includes a high level language for describing common stream operations (such as joining, filtering, and aggregating records), allowing developers to quickly develop powerful streaming applications. Kafka Streams offers a true event-at-a-time processing model, handles out-of-order data, allows stateful and stateless processing and can easily be deployed on many different systems.
  • Timestamps in Messages: every message in Kafka now has a timestamp field that indicates the time at which the message was produced. This enables stream processing by event-time in Kafka Streams.
  • Rack Awareness: isolates replicas so they are guaranteed to span multiple racks or availability zones. It allows all of Kafka’s durability guarantees to be applied to these larger architectural units, significantly increasing resilience and availability.

On a side note: Camus, a tool in the Kafka ecosystem used for ingesting Kafka messages into HDFS is no deprecated. To export data from Kafka to HDFS and Hive, Kafka Connect is the new recommended way.

As usual, find below the new blog articles, presentations, videos and software releases from last week:

News and Blog Posts

General

Apache Beam / Google Dataflow

Apache Storm

Apache Apex

Apache Flink

Apache Kafka

Apache Quarks

Apache NiFi / Hortonworks HDF

Microsoft Azure Stream Analytics

New Presentations

New Videos

New Releases

Upcoming Events

Please let me know if that is of interest. Please tweet your projects, blog posts, and meetups to @gschmutz to get them listed in next week’s edition!

Last week in Stream Processing & Analytics 5/16/2016

This is the 14th installment of my blog series around Stream Processing and Analytics.

If you have not been able to attend the Kafka Summit in San Francisco last week, then you might like the tweet from Confluent Inc.

I listed most of the presentations below in the section “New Presentations” but I did not do it for the videos. Find them as mentioned in the tweet here. I have already start watching some of the presentations and it is really worth it! Will list my top 3 sessions next week.

As usual, find below the new blog articles, presentations, videos and software releases from last week:

News and Blog Posts

General

Comparison

Apache Spark Streaming

Apache Flink

Apache Kafka

Apache NiFi / Hortonworks HDF

Microsoft Azure Stream Analytics

New Presentations

New Videos

New Releases

Upcoming Events

Please let me know if that is of interest. Please tweet your projects, blog posts, and meetups to @gschmutz to get them listed in next week’s edition!

Last week in Stream Processing & Analytics 5/9/2016

This is the 13th installment of my blog series around Stream Processing and Analytics.

Last week the new release of Oracle Stream Explorer has been release, now under a new name Oracle Stream Analytics. I have written my own blog article about it. This new version is an impressive release with over 15 new major features! It really deserves the name change. Oracle Stream Analytics simplifies stream processing and enables Self Service Streaming Analytics applications for business people. It is based on the idea of a “streaming excel sheet”, allowing a business analyst to work in a way he is used from excel, but instead of working on static data, the data constantly changes based on the incoming stream(s).

For those not able to attend the Hadoop Summit in Dublin last month (like mysellf), all the sessions and slides are now available online for free!

Apart from that the week was a bit more quiet than previous weeks. As usual, find below the new blog articles, presentations, videos and software releases from last week:

News and Blog Posts

General

Apache Storm

Apache Spark Streaming

Apache Flink

Apache Beam

Apache Apex

Apache Kafka

StreamSets

Microsoft Azure Stream Analytics

Oracle Stream Analytics

New Presentations

New Videos

New Releases

Upcoming Events

Please let me know if that is of interest. Please tweet your projects, blog posts, and meetups to @gschmutz to get them listed in next week’s edition!

Oracle Stream Analytics (OSA): the new Oracle Stream Explorer

A few days ago, Oracle released the new version of Oracle Stream Explorer and renamed it to Oracle Stream Analytics (OSA). This new version is an impressive release with over 15 new major features! It really deserves the name change.

Enhanced Patterns Library

The existing patterns have been enhanced substantially  now including Spatial, Statistical, General industry and Anomaly detection through streaming machine learning.

patterns

New Geo-spatial pattern

This pattern can be used to analyze streams containing geo-location data and determine how events relate to pre-defined geo-fences in your maps.

map.png

Integrated Expression Builder

The Expression Builder allows to add calculated/derived fields on the Live Output Stream of an exploration, an important step towards the “streaming Excel sheet” idea of Oracle Stream Analytics.

expression.png

It provides the ability to apply and insert mathematical and statistical calculations into the active live output stream. Once a new expression has been defined and validated, a column will be added next to the column of relevance. This new column can then be used in subsequent filters and explorations.

Support for Business Rules in Explorations

The Business Rules section of the Stream Analytics canvas provides the ability to apply the more traditional IF-THEN-ELSE constraints and clauses on the various properties of the event shape.

business-rule.png

This capability enables the user to combine both streaming query analytics using temporal criteria together with a collection of business rules that can randomly effect the information in existing or new columns.

New streaming end point connections/targets

Oracle Stream Analytics supports new Event Stream sources and targets, such as MQTT, Apache Kafka and Twitter.

connection

Especially Kafka gets more and more important in modern Big Data architectures so I’m really pleased to see it available now.

We can now use Oracle GoldenGate for immediately capturing changes on any database table (CDC = change data capture), send these captured change events into Kafka using  GoldenGate for BigData  and consume it from OSA to apply streaming analytics on it.

Scaling-Out with Spark Streaming

An OEP server is no longer the only runtime option. With Oracle Stream Analytics you can deploy and execute streaming applications to a Spark Streaming infrastructure.

The figure below shows how you can select one of the two possible runtime environments (Spark grayed-out because not yet configured on my environment).

spark

Better Insights with Catalog Topology Viewer and Navigation

Topology is a graphical representation of the connected entities. The topology illustrates the dependencies and connections between the entities. The Topology Viewer helps in identifying the dependencies that a selected entity has on other entities. Understanding the dependencies helps you in being cautious while deleting or undeploying an entity.

topology

 

I’m really pleased with this new release and I’m looking forward to see more enhancements and improvements in future releases. As already mentioned, the product really deserves the name change, but I also hope it’s the last one for the next couple of years😉. Oracle Stream Analytics simplifies stream processing and will enable Self Service Streaming Analytics applications for business people.

Find more information on Oracle Stream Analytics in the Documentation available here.

Stay tuned for an update on the Docker support I already had for Stream Explorer. I’m currently in the progress of updating it for Oracle Stream Analytics so you can quickly setup your own playground environment.

 

Last week in Stream Processing & Analytics 5/2/2016

This is the 12th installment of my blog series around Stream Processing and Analytics.

The most important event last week was probably the first Kafka Summit being held in San Francisco. At the summit, Confluent shared results from a recent survey that clearly shows the rise of Kafka across the enterprise and the growth of stream processing:

  • According to the results, Apache Kafka is most commonly used for stream processing (72% of respondents). In addition, nearly seven of ten (68%) Kafka users surveyed say they plan to incorporate more stream processing over the next 6 to 12 months. Of those:
    • 74% say they will add it to new applications in development now
    • 69% will add it to existing applications
    • 60% will build net-new applications using stream processing and Kafka

The survey also reveals how Kafka is being used:

  • Kafka powers a wide variety of applications today based on survey respondents, including application monitoring (60%), data warehousing (51%) and asynchronous applications (47%) to system monitoring (39%), recommendation/decisioning engines (35%), security/fraud detection (26%), Internet of Things applications (20%) and dynamic pricing applications (12%), to name a few.

We have been using Kafka for almost 3 years now and I have never regretted  choosing it. Unfortunately the journey to California for just that one day was a bit too far for me to attend, but I read that most/some of the presentation should be available on video later. Find below the presentations which are already available through SlideShare.

So that’s it for this week. As usual, find below the new blog articles, presentations, videos and software releases from last week:

News and Blog Posts

General

Apache Storm

Apache Spark Streaming

Apache Flink

Apache Apex

Apache Kafka

Apache NiFi / Hortonworks DataFlow

Striim

StreamSets

StreamingAnalytix

IBM Streaming Analytics

Microsoft Azure Stream Analytics

MapR

New Presentations

New Videos

Upcoming Events

Please let me know if that is of interest. Please tweet your projects, blog posts, and meetups to @gschmutz to get them listed in next week’s edition!

Last week in Stream Processing & Analytics 4/25/2016

This is the 11th installment of my blog series around Stream Processing and Analytics.

First two interesting tweets I found last week. The first one by Steve Wilkes brings it straight to the point:

The second one by Neha Narkhede reveals some impressive metrics about the usage of Kafka @ LinkedIn. 1.4 trillion messages a day on 1400 brokers. Kafka is really a game changer!

Last but not least I would like to quote from Mark Palmer’s latest article on 8 Predictions for the Internet of Analytics which I really enjoyed reading:

  • Streaming analytics will become a fundamental topic in computer science. Forrester’s Streaming Analytics Wave defines a set of computer science criteria to define streaming analytics: time windowing, aggregation, correlation, and integration with interactive analytics. These fundamentals are not well understand by the computer science community, are not yet taught in school, and are therefore not yet well known.
  • Data streams will be as important as data lakes. Data lakes contain data at rest; data streams contain data in motion. But most IT applications today are designed around data at rest. In the coming decade, data streams will become as important as data at rest.
  • Streaming analytics and traditional analytics will become increasingly intertwined. In order to apply analytics to streams, you need to know what to look for. Traditional analytics help you look through the rearview mirror at the past, and predict important conditions. Streaming analytics are about looking forward, through your windshield, looking at real-time conditions, and acting.

 

So that’s it for this week. As usual, find below the new blog articles, presentations, videos and software releases from last week:

News and Blog Posts

General

Comparison

Apache Beam

Apache Storm

Apache Spark Streaming

Apache Flink

Apache Apex

Apache Kafka

Apache NiFi / Hortonworks DataFlow

Apache Metron

StreamSets

New Presentations

New Videos

New Podcasts

New Releases / Components

Upcoming Events

Please let me know if that is of interest. Please tweet your projects, blog posts, and meetups to @gschmutz to get them listed in next week’s edition!