gschmutz 21:22 on April 25, 2016
Tags: apex ( 18 ), beam ( 25 ), flink ( 92 ), kafka ( 248 ), nifi ( 238 ), spark-streaming ( 219 ), storm ( 46 ), Stream Processing ( 241 ), streaming-analytics ( 84 ), streamsets ( 74 )

2016

This is the 11th installment of my blog series around Stream Processing and Analytics.

First two interesting tweets I found last week. The first one by Steve Wilkes brings it straight to the point:

The one key takeaway from my #DataWest16 presentation https://t.co/ehXsWQfxI2 #bigdata @striimteam #streaming pic.twitter.com/l2OyXQNtqw

— Steve Wilkes (@BXCellent) April 21, 2016

The second one by Neha Narkhede reveals some impressive metrics about the usage of Kafka @ LinkedIn. 1.4 trillion messages a day on 1400 brokers. Kafka is really a game changer!

.@LinkedIn's use of @apachekafka:1.4 trillion msg/day, 1400 brokers.Powers database replication, change capture etc https://t.co/mVCR7SwjZR

— Neha Narkhede (@nehanarkhede) April 21, 2016

Last but not least I would like to quote from Mark Palmer’s latest article on 8 Predictions for the Internet of Analytics which I really enjoyed reading:

Streaming analytics will become a fundamental topic in computer science. Forrester’s Streaming Analytics Wave defines a set of computer science criteria to define streaming analytics: time windowing, aggregation, correlation, and integration with interactive analytics. These fundamentals are not well understand by the computer science community, are not yet taught in school, and are therefore not yet well known.
Data streams will be as important as data lakes. Data lakes contain data at rest; data streams contain data in motion. But most IT applications today are designed around data at rest. In the coming decade, data streams will become as important as data at rest.
Streaming analytics and traditional analytics will become increasingly intertwined. In order to apply analytics to streams, you need to know what to look for. Traditional analytics help you look through the rearview mirror at the past, and predict important conditions. Streaming analytics are about looking forward, through your windshield, looking at real-time conditions, and acting.

So that’s it for this week. As usual, find below the new blog articles, presentations, videos and software releases from last week:

News and Blog Posts

General

Streaming Analytics Market worth $1,955.7 Million by 2020 by Market and Markets
8 Predictions for the Internet of Analytics by Mark Palmer
Why Your Business Should Be Worried About Data Drift by Peter Daisyme
The Architecture Files, Ep. 3: OK EVENT LOG by Ian Varley
The Next Market Leaders Will Power Their Businesses from IoAT Data Sources by Mark Lochbihler
The IoAT and Big Data means big business by Andy Leaver
9 must watch big data technologies along with Hadoop by Kumar Chinnakali
Spark, Kafka & machine learning: 10 big data start-ups taking analytics to the next level by James Nunns
Streaming Analytics Will Transform The IoT Into The Internet Of Analytics by Rowan Curran
Hadoop, Kafka creators big on big data streaming analytics by Jack Vaughan
Keeping Hadoop Properly Fed a Challenge for Many Data Pros by David Weldon
Top Challenge With Real-time Analytics: Education by David Weldon

Comparison

Look out, Spark and Storm, here comes Apache Apex by Ian Pointer
Is Flink the shiny(err..) toy on the block? by Vikas Hazrati

Apache Beam

Apache Beam’s Ambitious Goal: Unify Big Data Development by Alex Woodie
Apache Beam : Next Step in Big Data Unification by Madhukara Phatak

Apache Storm

Distributed, configuration based ETL in Apache STORM by David Woodhead
Announcing Apache Storm 1.0.0 by Sriharsha Chintalapani & Kanishk Mahajan

Apache Spark Streaming

Fast, Scalable, Streaming Applications with MapR Streams, Spark Streaming, and MapR-DB by Carol McDonald
Spark Streaming and Twitter Sentiment Analysis by Nicolas Perez
Stateful Distributed Stream Processing by investigativeprogramming

Apache Flink

Apache Flink Gets Real with Continuous Stream Processing by Susan Hall

Apache Apex

The Apache Software Foundation Announces Apache Apex as a Top-Level Project by Datanami

Apache Kafka

First CAKE Meetup Overview and Video by Patrick Jaromin
Deploying Apache Kafka on AWS Elastic Block Store (EBS) by David Tucker
Open Source Kafka Connect adds more than a Dozen Connectors by Confluent
Kafka Ecosystem at LinkedIn by Joel Koshy

Apache NiFi / Hortonworks DataFlow

Royal Mail starts to deliver on Hortonworks’ ‘data in motion’ promise by Jessica Twentyman
NiFi OCR – Using Apache NiFi to read children’s books by Jeremy Dryer
Converting CSV To Avro with Apache NiFi by Jeremy Dyer

Apache Metron

Apache Metron Tech Preview 1 – Come and Get It! by George Vetticaden & James Sirota

StreamSets

Is Data Drift Polluting Your Data Lake? by StreamSets
Podcast with Girish Pancha from StreamSets about Performance Management of Data Flows by Girish Pancha

New Presentations

The Future of Apache Storm by Taylor Goetz
Streaming SQL by Julian Hyde
OT Big Data Ingestion and Processing in Hadoop by Silver Spring Networks by Greg Brosman
Apache Kafka as a Service by Oliver Deakin
Deep Dive and Best Practices for Real-Time Streaming Applications by Roy Ben-Alta
Apache Flink: Counting Elements in Streams by Jamie Grier
Spark Streaming in 10 minutes by Sven Tessmann
Getting Started with AWS IoT by Ian Massingham
Productization of Big Data Streaming Analytics by Mike Gualtieri & Amol Kekre
Visual Dataflows with Apache NiFi – and how they interact with AWS by Kay Lerch
Apache Flink Introduction by Ahmed Nader

New Videos

Distributed Stream Processing with Apache Kafka by Jay Kreps
Combining Stream Processing and In-Memory Data Grids for Near-Real Time Aggregation and Notifications by Oliver Malassi
Getting Started with Streaming Analytics and the IoT by Adrian Bowles
IOT Big Data Ingestion and Processing in Hadoop by Silver Spring Networks by Greg Brosman & Darin Nee
One Million Clicks per Minute with Kafka and Clojure by Devon Peticolas

New Podcasts

Podcast with Girish Pancha from StreamSets about Performance Management of Data Flows by Girish Pancha

New Releases / Components

Upcoming Events

4/25/2016 (San Francisco, US) – Kafka Summit
4/26/2016 (New York, US) – Building Data Pipelines for Solr with Apache NiFi (Meetup)
4/27/2016 (Laurel, US) – Apache NiFi: Because It Aint Data Science Without the Data (Meetup)
4/28/2016 (Renens, CH) – Spark Streaming: Dealing with State (Meetup)
5/4/2016 (online) – How to Achieve High Throughput for Real-Time Applications with SMACK, Apache Kafka and Spark Streaming (Datastax Webinar)
5/4/2016 (Fairfield, US) – Data Flow using Apache NiFi (Meetup)
5/5/2016 (online) – Apache Spark 2.0 presented by Databricks co-founder Reynold Xin (Databricks Webinar)
5/5/2016 (online) – Apache Storm and Twitter Heron: Stream Processing at Scale and Monitoring Performance (Opsclarity Webinar)
5/9/2016 (Santa Clara, US) – Make Streaming IoT Analytics Work For You: The Devil is in the Details (Meetup)
5/9/2016 (Vancouver, CA) – Streams Track (Big data 2016 conference)
5/10/2016 (New York, US) – The Big Heist: Attempting to steal distributed systems from complexity and chaos (Scala Days)
5/10/2016 (Vancouver, CA) – Using Kafka and Kudu for Fast, Low-Latency SQL Analytics on Streaming Data (Big data 2016 conference)
5/10/2016 (Flemington, US) – Apache NiFi – Deep Dive (Meetup)
5/16-20/2 (San Francisco, US) – Pileplines By the Bay (Pipelines By the Bay Conference)
5/16/2016 (online) – Productionizing your Streaming Jobs (Databricks Webinar)
5/20/2016 (Madrid, SP) – Workshop Apache Flink (Meetup)
6/1-3/2016 (London, UK) – Real-time conference sessions (Strata + Hadoop World)
6/5-7/2016 (Berlin, GE) – Berlin Buzzwords
6/13/2016 (Amsterdam, NL) – GOTO Night: Stream Processing with Apache Flink and Mining Github (Meetup)
6/13/2016 (New York, US) – Apache Beam (Stream Processing @ Scale Track at QCon New York)

Please let me know if that is of interest. Please tweet your projects, blog posts, and meetups to @gschmutz to get them listed in next week’s edition!

gschmutz 20:53 on April 20, 2016
Tags: beam ( 25 ), flink ( 92 ), kafka ( 248 ), metron, nifi ( 238 ), spark-streaming ( 219 ), storm ( 46 ), Stream Processing ( 241 ), streaming-analytics ( 84 ), streamsets ( 74 ), striim ( 2 )

Last week in Stream Processing & Analytics 4/18/2016

This is the 10th installment of my blog series around Stream Processing and Analytics.

Two days later than planned, was traveling and had again trouble with my power supply 😦

So what happened in the world of Stream Processing? For me the most interesting news last week was the release of Storm 1.0.

Apache Storm 1.0 released!https://t.co/fXWKposkt5

— P. Taylor Goetz (@ptgoetz) April 12, 2016

I’m a storm user for more than 3 years now and this is really a significant release that delivers several features that pertain to enterprise readiness, operational simplicity and ease of use. I really like that Storm now has native Windowing and State Management Support, Automatic Back Pressure Support and the new connectors for Cassandra, Elasticsearch and Kafka.

Nathan Marz, the founder and creator of Storm also tweeted about it:

From humble beginnings at a tiny startup more than 5 years ago, Storm has matured to have a big impact across the whole world

— Nathan Marz (@nathanmarz) April 12, 2016

And Ian Hellström already updated his stream processing overview chart with Storm 1.0.0.

apache-streaming6

As usual, find below the new blog articles, presentations, videos and software releases from last week:

News and Blog Posts

General

Streaming Analytics Will Transform The Internet Of Things Into The Internet Of Analytics by Rowan Curran
Hadoop Summit 2016: Data in motion governs Hortonworks roadmap by Brian McKenna
Amazing Big Data At NASA: Real Time Analytics 150 Million Miles From Earth by Bernard Marr
15 “True” Streaming Analytics Platforms For Real-Time Everything by Mike Gualtieri
Case Studies: How are Enterprises Using Streaming Analytics by insideBIGDATA
Streaming Analytics Explained by Andrew Brust
Making Sense of Stream Processing – A Must Read by Leeor Engel
Streaming Analytics: Predictions of the Future by BI News
Real-Time Event Streaming: What Are Your Options? by Ankur Desai
Counting in Streams: A Hierarchy of Needs by Kostas Tzoumas
Streaming Data Who’s Who: Kafka, Kinesis, Flume, and Storm by Jake Dolezal

Apache Beam

Apache Beam wants to be uber-API for big data by Ian Pointer

Apache Storm

Tutorial: Deploying Apache Storm on Docker Swarm by Baqend Tech Blog
Storm 1.0.0 released
Apache Storm Reaches 1.0, Brings Improved Performance, Many New Features by Sergio De Simone
Apache Storm 1.0 packs a punch by Sendur Yegulalp
Apache Storm New UI Features by Kishorkumar Patil
Apache Storm 1.0 milestone includes native streaming window API by Christina Mulligan

Apache Spark Streaming

Real-Time Global Anomaly Detection in IoT with EMC Elastic Cloud Storage (ECS) by Claudio Fahey
Real-Time Global Anomaly Detection in IoT with EMC Elastic Cloud Storage (ECS) – Part 2 by Claudio Fahey
Real-Time Global Anomaly Detection in IoT with EMC Elastic Cloud Storage (ECS) – Part 3 by Claudio Fahey
Spark Streaming added to IBM BigInsights examples project by Chris Snow
Real-Time Aggregation on Streaming Data Using Spark Streaming and Kafka by Anant Asthana
About an example using kafka, spark-streaming, mongodb and twitter by Aironman

Apache Flink

Counting in streams: A hierarchy of needs by Kostas Tzoumas

Apache Kafka

Join Us For The Inaugural Stream Data Hackathon by Ewen Cheslack-Postava
How We Monitor and Run Kafka At Scale by Rajiv Kurian
Decoupling the Data Pipeline with Kafka – A (Very) Simple Real Life Example by Robin Moffatt

Apache NiFi / Hortonworks DataFlow

Inspecting your NiFi DistributedMapCacheServer with Groovy by Matt Burgess
OAuth 1.0A with Apache NiFi (Twitter API example) by Pierre Villard

Apache Metron

Kicking off the Apache Metron Tech Preview 1 Blog Series by George Vetticaden & James Sirota
Apache Metron User Personas and Core Functional Themes by George Vetticaden & James Sirota
Apache Metron for Big Data Cyber Security Analytics by Hortonworks

Striim

Real-Time Financial Transaction Monitoring by Steve Wilkes

StreamSets

Ingesting JSON Data Into Apache Kudu with StreamSets Data Collector by Pat Patterson
Announcing Data Collector ver 1.3.0.0 by Kirit Basu
Startup Spotlight: StreamSets’ Big Data Integration by Loraine Lawson
Visualize Apache Log Data in Minecraft with StreamSets Data Collector by Pat Patterson

IBM Quarks

Quarks: Sending events from Raspberry Pi to Watson IoT Platform by Samanha Chan

New Presentations

How to Build Data Pipelines for Real-Time Applications with SMACK & Apache Kafka by Patrick McFadin
Large-Scale Stream Processing in the Hadoop Ecosystem by Gyula Fora & Marton Balassi
Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming by Michael Rainey
Spark Streaming + Twitter : Analytics to one’s taste by jpaniego
Unified Stream & Batch Processing with Apache Flink by Ufuk Celebi
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident by Julian Hyde
Streaming Data Processing with Apache Storm by Rishabh Jain
Apache NiFi in the Hadoop Ecosystem by Bryan Bende
Fast Data Intelligence in the IoT – real-time data analytics with Spark by Bas Geerdink
Introduction to Real-Time Data Processing by Yogi Devendra

New Videos

Ingest and Stream Processing What will you choose by Pat Patterson & Ted Malaska
Apache NiFi in the Hadoop Ecosystem by Bryan Bende
Large-Scale Stream Processing in the Hadoop Ecosystem by Gyula Fora & Marton Balassi
The Future of Apache Storm by Taylor Goetz
Streaming outlier analysis for fun and scalability by Casey Stella
Unified Stream and Batch Processing with Apache Flink by Ufuk Celebi
Querying the Internet of Things Streaming SQL on Kafka Samza and Storm Trident by Julian Hyde
StreamSets Data Collector with Apache Kafka and Kudu by Pat Patterson
Real time Search on Terabytes of Data Per Day Lessons Learned by Joey Echeverria
Telematics with Hadoop and Nifi by Adam Morton & Simon Elliston Ball
High Performance Stream Processing by Stephane Maldini, Glen Renfo & David Turanski
Introduction to Apache Quarks by Kathy Saunders, Dan Debrunner, Will Marshall, Susan CLine and Kathey Marsden
Distributed Real-Time Stream Processing by Petr Zapletal
Overview of Apache Flink the 4G of Big Data Analytics Frameworks by Slim Baltagi

New Releases / Components

Upcoming Events

4/19/2016 (Chicago, US) – Apache Flink 1.0: A new era for Real-Time Streaming Analytics (Meetup)
4/19/2016 (Santa Clara, US) – Mining MySQL’s Binary Log with Apache Kafka and Kafka Connect (Percona Live)
4/20/2016 (London, UK) – Kafka, Samza and HBase at the Home Office / Kafka as a Service on IBM’s Bluemix (Unified Log London Meetup)
4/20/2016 (Copenhagen, DK) – High performance data flow with a GUI, and guts (Meetup)
4/21/2016 (Austin, (US) – Accelerating Data Ingestion & Streaming Analytics with HDF/NiFi (Meetup)
4/21/2016 (Leidschendam, NL) – Apache Kafka: A guide to integrate and build data processing pipelines (Meetup)
4/21/2016 (Palo Alto, US) – Data in Motion: Simplifying Security & Building Custom Integrations (Meetup)
4/25/2016 (San Francisco, US) – Kafka Summit
4/26/2016 (New York, US) – Building Data Pipelines for Solr with Apache NiFi (Meetup)
4/27/2016 (Laurel, US) – Apache NiFi: Because It Aint Data Science Without the Data (Meetup)
4/28/2016 (Renens, CH) – Spark Streaming: Dealing with State (Meetup)
5/4/2016 (online) – How to Achieve High Throughput for Real-Time Applications with SMACK, Apache Kafka and Spark Streaming (Datastax Webinar)
5/9/2016 (Vancouver, CA) – Streams Track (Big data 2016 conference)
5/10/2016 (New York, US) – The Big Heist: Attempting to steal distributed systems from complexity and chaos (Scala Days)
5/10/2016 (Vancouver, CA) – Using Kafka and Kudu for Fast, Low-Latency SQL Analytics on Streaming Data (Big data 2016 conference)
5/10/2016 (Flemington, US) – Apache NiFi – Deep Dive (Meetup)
5/16-20/2 (San Francisco, US) – Pileplines By the Bay (Pipelines By the Bay Conference)
6/1-3/2016 (London, UK) – Real-time conference sessions (Strata + Hadoop World)
6/5-7/2016 (Berlin, GE) – Berlin Buzzwords
6/13/2016 (Amsterdam, NL) – GOTO Night: Stream Processing with Apache Flink and Mining Github (Meetup)
6/13/2016 (New York, US) – Apache Beam (Stream Processing @ Scale Track at QCon New York)

Please let me know if that is of interest. Please tweet your projects, blog posts, and meetups to @gschmutz to get them listed in next week’s edition!

gschmutz 21:50 on April 11, 2016
Tags: flink ( 92 ), kafka ( 248 ), nifi ( 238 ), spark-streaming ( 219 ), storm ( 46 ), Stream Processing ( 241 ), streaming-analytics ( 84 ), streamsets ( 74 )

Last week in Stream Processing & Analytics 4/11/2016

This is the 9th installment of my blog series around Stream Processing and Analytics.

First I have to mention a blog article I have somehow missed last month. It nicely compares the various streaming frameworks available from the Apache software foundation.

Source: An Overview of Apache Streaming Technologies (Databaseline Blog)

Last week Forrester published it’s updated Forrester Wave for Big Data Streaming Analytics products. Forrester Research defines Big Data Streaming Analytics as

Software that can filter, aggregate, enrich, and analyze a high throughput of data from multiple, disparate live data sources and in any data format to identify simple and complex patterns to provide applications with context to detect opportune situations, automate immediate actions, and dynamically adapt.

Here the Leaders, the Strong Performers and Contenders as seen by Forrester:

cez8rzywwaadrg8

Source: The Forrester Wave: Big Data Streaming Analytics, Q1 16

As usual, find below the new blog articles, presentations, videos and software releases from last week:

News and Blog Posts

General

Why Time-Value of Data Matters by Steve Wilkes
Pondering the Time Value of Data by Steve Wilkes
What Spark’s Structured Streaming really means by Ian Pointer
Data in the emerging world of stream processing by Andrew Brust
Sensors, Heartbeats and Analytics – What Gets Our Blood Racing? by Richard Buckle
Big Data’s Hidden Scourge: Data Drift by Girish Pancha
Now available – The Forrester Wave™: Big Data Streaming Analytics, Q1 2016 by Kimberly Madia
Exploring the Potential of Streaming Analytics in Healthcare by Sherrie Mersdorf
Selecting a Streaming Architecture by insideBIGDATA
Inside Wargaming.net’s Data-driven, Real-time Rules Engine by Wargaming.net
Let’s Get Real: Acting on Data in Real Time by Jack Norris
IoT Spotlight: Predictive Maintenance and the Promised Land of Zero Unexpected Downtime by Ankur Desai
Fast data is for people by Timothy McGovern

Apache Storm

Monitoring and Troubleshooting Apache Storm by Sachin Agarwal

Apache Spark Streaming

Spark Streaming and Twitter Sentiment Analysis by Nicolas A Perez
About how to interact with Mongo and Spark Streaming using scala by Alonso Isidoro
What Spark’s Structured Streaming really means by Ian Pointer

Apache Flink

Introducing Complex Event Processing (CEP) with Apache Flink by Till Rohrmann
Introduction to Flink Streaming – Part 5 : Window API in Flink by Madhukar
Introduction to Flink Streaming – Part 6 : Anatomy of Window API by Madhukar
Introduction to Flink Streaming – Part 7 : Implementing Session Windows using Custom Trigger by Madhukar

Apache Kafka

Hello World, Kafka Connect + Kafka Streams by Michal Haris
Log Compaction | Highlights in the Apache Kafka and Stream Processing Community | April 2016 by Gwen Shapira
The Real-Time Rise of Apache Kafka by Alex Woodle
Kafka as a Message Broker in the IoT World – Part 1 by Wipro Digital
Kafka as a Message Broker in the IoT World – Part 2 by Wipro Digital
Spring Integration Kafka Support 2.0.0.M1 is now available by Artem Bilan

Apache NiFi / Hortonworks DataFlow

Analyze Flickr user interests using Apache NiFi and Spark by Pierre Villard
SQL in NiFi with ExecuteScript by Matt Burgess
Analyze Flickr user interests using Apache NiFi and Spark by Pierre Villard
URL shortener service with Apache NiFi by Pierre Villard
Windows Share + Nifi + HDFS – A Practical Guide by Chris Gambino

StreamSets

Data in Motion: Simplifying Security & Building Custom Integrations by Pat Patterson

New Presentations

Apache Apex Kafka Input Operator by Siyuan Hua
From stream to recommendation using apache beam with cloud pubsub and cloud dataflow by Igor Maravic & Neville Li
Building a scalable architecture for processing streaming data on AWS by Siva Raghupathy & Manjeet Chayel
Designing a scalable real-time data platform using Akka, Spark Streaming, and Kafka by Alex Silva
Fast data made easy with Apache Kafka and Apache Kudu (incubating) by Ted Maleska & Jeff Holomann
High-performance clickstream analytics with Apache Phoenix and HBase by Arun Thangamani
Putting Kafka into overdrive by Todd Palino & Gwen Shapira
Real-time fraud detection using process mining with Spark Streaming by Bolke de Bruin & Hylke Hendriksen
Real-Time Hadoop: The Ideal Messaging System for Hadoop by Ted Dunning
The Internet of Things: How to do it. Seriously! by Chris Rawles
Transactional Streaming: If you can compute it, you can probably stream it by John Hugg
Stream Processing with Kafka and Samza by Diego Pacheco
Flying Faster with Heron by Karthik Ramasamy
SMACK Stack – Fast Data Done Right by Stefan Siprell
Beyond ETL: End to End Streaming Architectures by Sean Anderson & Amandeep Khurana
IOT Ingestion & Analytics using Apache Apex by Pramod Immaneni
Introduction to Apache Apex by Pramod Immaneni

New Videos

Log Analytics Optimization by Hortonworks
Stream Processing with Apache Flink by Robert Metzger
Logging for Microservices using StreamSets Data Collector by Virag Kothari
Let’s get real: Acting on data in real time by Jack Norris
How to Build Data Pipelines for Real-Time Applications with SMACK & Apache Kafka by Patrick McFadin
From the source: learn about Apache Flink from a project committer by Max Michels
Flying Faster with Heron by Karthik Ramasamy
Real-time Stream Computing & Analytics @Uber by Sudhir Tonse
AWS IoT Real Time Stream Processing with AWS Lambda by Vyom Nagrani
Apache Apex Fault Tolerance & Processing Semantics by Thomas Wise & Pramod Immaneni

New Releases / Components

Upcoming Events

4/12/2016 (Philadelphia, US) – Demystifying Stream Processing with Apache Kafka (Emerging Technologies for the Enterprise conference)
4/12/2016 (Dublin, IE) – Data Flow using Apache NiFi
4/13/2016 (London, UK) – Real time search and insights with Apache Kafka (Meetup)
4/13/2016 (Online) – 8 Priorities for Modernizing Your Data Integration and Analytics Strategy (Striim Webinar)
4/14/2016 (Chicago, US) – Jay Kreps: Apache Kafka and the Confluent Platform – Overview and Roadmap (Meetup)
4/14/2016 (Online) – Getting Started with Streaming Analytics and the IoT (Smar Data Webinar Series)
4/14/2016 (Online) – Optimizing Log Analytics (Hortonworks Webcast)
4/14/12016 (San Jose, US) – IOT Big Data Ingestion and Processing in Hadoop by Silver Spring Networks (Meetup)
4/14/2016 (Las Vegas, US) – Oracle GoldenGate and Apache Kafka: A Deep Dive into Real-Time Data Streaming (Collaborate16)
4/15/2016 (Bellevue, US) – Seattle Apache Kafka Meetup (Meetup)
4/18/2016 (Munich, GE) – Batch- & Stream-Processing mit Google Dataflow (Smart Data Developer Conference)
4/19/2016 (Chicago, US) – Apache Flink 1.0: A new era for Real-Time Streaming Analytics (Meetup)
4/19/2016 (Santa Clara, US) – Mining MySQL’s Binary Log with Apache Kafka and Kafka Connect (Percona Live)
4/20/2016 (London, UK) – Kafka, Samza and HBase at the Home Office / Kafka as a Service on IBM’s Bluemix (Unified Log London Meetup)
4/20/2016 (Copenhagen, DK) – High performance data flow with a GUI, and guts (Meetup)
4/21/2016 (Austin, (US) – Accelerating Data Ingestion & Streaming Analytics with HDF/NiFi (Meetup)
4/21/2016 (Leidschendam, NL) – Apache Kafka: A guide to integrate and build data processing pipelines (Meetup)
4/21/2016 (Palo Alto, US) – Data in Motion: Simplifying Security & Building Custom Integrations (Meetup)
4/25/2016 (San Francisco, US) – Kafka Summit
4/26/2016 (New York, US) – Building Data Pipelines for Solr with Apache NiFi (Meetup)
4/27/2016 (Laurel, US) – Apache NiFi: Because It Aint Data Science Without the Data (Meetup)
5/9/2016 (Vancouver, CA) – Streams Track (Big data 2016 conference)
5/10/2016 (New York, US) – The Big Heist: Attempting to steal distributed systems from complexity and chaos (Scala Days)
5/10/2016 (Vancouver, CA) – Using Kafka and Kudu for Fast, Low-Latency SQL Analytics on Streaming Data (Big data 2016 conference)
5/10/2016 (Flemington, US) – Apache NiFi – Deep Dive (Meetup)
6/1-3/2016 (London, UK) – Real-time conference sessions (Strata + Hadoop World)
6/5-7/2016 (Berlin, GE) – Berlin Buzzwords
6/13/2016 (Amsterdam, NL) – GOTO Night: Stream Processing with Apache Flink and Mining Github (Meetup)
6/13/2016 (New York, US) – Apache Beam (Stream Processing @ Scale Track at QCon New York)

Please let me know if that is of interest. Please tweet your projects, blog posts, and meetups to @gschmutz to get them listed in next week’s edition!

gschmutz 06:50 on April 6, 2016
Tags: flink ( 92 ), kafka ( 248 ), oracle stream explorer ( 2 ), storm ( 46 ), Stream Processing ( 241 ), streaming-analytics ( 84 ), streamsets ( 74 )

Last week in Stream Processing & Analytics 4/4/2016

This is the 8th installment of my blog series around Stream Processing and Analytics. It’s two days later, due to some technical difficulties. Forgot power adapter at home on Monday and I’m traveling 😉 87 EUR and one day later I’m back in business 🙂

As expected, there were quite a lot of topics around stream processing and streaming analytics at the Strata conference last week.

Jay Krebs and Neha Narkhede from Confluent both mentioned it on Twitter:

Seemed like every other presentation at Strata this year was on streaming data. Very exciting!

— Jay Kreps (@jaykreps) April 1, 2016

Exciting to see so much interest in real-time and stream processing at #StrataHadoop

— Neha Narkhede (@nehanarkhede) April 1, 2016

And Jack Vaughan summarized it in his blog article: “Moving streams of data is a must in many modern applications. As a result, streaming analytics applications with Spark Streaming, Kafka and other components are coming to the big data forefront.”

Definitely very interesting times ahead 🙂

As usual, just find what I have noticed last week:

News and Blog Posts

General

Implementing Lambda architecture to track real-time updates by Andy Chu
Can Event Streaming Make My Business More Productive? by Ankur Desai
Streaming Analytics: Predictions of the Future by Harrine Freeman
Cognitive Computing is Not the Next Step in Analytics by Mark Palmer
Seven things to watch for at Strata + Hadoop World 2016 in San Jose by Josh Klahr
Big Data: Applying Machine Learning to Event Processing by Kai Waehner
The Forrester Wave™: Big Data Streaming Analytics, Q1 2016 by Mike Gualtieri et. al.
Streaming analytics put data in motion at Strata + Hadoop 2016 by Jack Vaughan
Embeddable Data Transformation for Real-time Streams by Joey Echeverria
Adopt complex event processing architecture across hybrid clouds by George Lawton
Streaming analytics puts data in motion at Strata + Hadoop 2016 by Jack Vaughan
Use Cases for Real Time Stream Processing Systems by Shiv Shet

Comparison

Apache Showdown: Flink vs. Spark by Javier Lopez
To Flink or Spark? That is the question for stream data processors by Marlene Den Bleyker

Apache Storm

Apache Storm StreamParse Python by Walker R

Apache Flink

data Artisans raises a Series A by Kostas Tzoumas
Apache Flink creators snag $6M for stream-processing startup by Paul Gillin
Flink Concepts by Flink

Apache Kafka

Announcing Confluent University: World-Class Apache Kafka Training by Ian Wrigley
Confluent Introduces Partner Program to Support Rapidly Growing Apache Kafka Ecosystem by Confluent

Goggle Cloud Dataflow / Apache Beam

Apache Beam Presentation Materials by Frances Perry & Taylor Akidau

MapR Streams

MapR Introduces New Stream Processing Quick Start Solution

Apache NiFi / Hortonworks DataFlow

Hortonworks DataFlow Optimizes Log Analytics From the Edge by Hortonworks
Hortonworks DataFlow 1.2 Released by Haimo Liu
Parsing XML Logs With Nifi – Part 1 of 3 by Chris Gambino

Oracle Stream Explorer

Real-time model scoring for streaming data – a prototype based on Oracle Stream Explorer and Oracle R Enterprise by Alexandru Ardel

StreamSets

Integrating StreamSets with Salesforce Wave Analytics by Pat Patterson

New Presentations

Stream Processing in the Cloud With Data Microservices by Marius Bogoevici
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis by Evan Chan & Helena Edelson
Taking Spark Streaming to the Next Level with Datasets and DataFrames by Tathagata Das
Real-Time Spark: From Interactive Queries to Streaming by Michael Armbrust
Putting Kafka Into Overdrive by Todd Palino & Gwen Shapira
Apache Flink – Counting elements in streams by Kostas Tzoumas
10 Big Data Technologies you Didn’t Know About by Jesus Rodriguez
Embeddable data transformation for real-time streams by Joey Echerverria
Real-time Distributed Stream Processing @ Scale by Jerome Boulon
Streaming Analytics on AWS by Dimitri Tchikatilov
Hadoop application architectures – Fraud detection tutorial by Gwen Shapira, Jonathan Seidman, Ted Malaska & Mark Grover
Lessons learned building a scalable self-serve, real-time, multitenant monitoring service at Yahoo by Mridul Jain, Sumeet Singh
Kafka Streams – Stream Processing Made Simple with Kafka by Guozhang Wang
Data in Motion – Data at Rest – Hortonworks a Modern Architecture by Mats Johansson

New Podcasts

Stream Processing at Uber with Danny Yuan

New Videos

Integrating StreamSets with Salesforce Wave Analytics by Pat Patterson
Developing Real-Time Data Pipelines with Apache Kafka by Joe Stein
Streaming Big Data with StreamSets and Cloudera by Arvind Prabhakar & Matthew Schumpert
Rethinking Streaming Analytics for Scale by Helena Edelson
Doug Cutting: Apache Hadoop – The Next 10 Years by the Hive

New Books

Apache Spark Analytics Made Simple

New Releases / Components

Upcoming Events

4/5/2016 (New York, US) – Stream Processing in the Cloud with Data Microservices (Meetup)
4/6/2016 (Berlin, UK) – Big Data & Real Time Analytics at Idealo.de ! (Meetup)
4/7/2016 (online) – Building a Real-time Streaming Platform Using Kafka Streams and Kafka Connect (Webcast with Jay Kreps)
4/7/20016 (London, UK) – From the source: learn about Apache Flink from a project committer (Meetup)
4/13/2016 (London, UK) – Real time search and insights with Apache Kafka (Meetup)
4/14/2016 (Chicago, US) – Jay Kreps: Apache Kafka and the Confluent Platform – Overview and Roadmap (Meetup)
4/14/2016 (Online) – Getting Started with Streaming Analytics and the IoT (Smar Data Webinar Series)
4/18/2016 (Munich, GE) – Batch- & Stream-Processing mit Google Dataflow (Smart Data Developer Conference)
4/19/2016 (Chicago, US) – Apache Flink 1.0: A new era for Real-Time Streaming Analytics (Meetup)
4/20/2016 (London, UK) – Kafka, Samza and HBase at the Home Office / Kafka as a Service on IBM’s Bluemix (Unified Log London Meetup)
4/21/2016 (Leidschendam, NL) – Apache Kafka: A guide to integrate and build data processing pipelines (Meetup)
4/25/2016 (San Francisco, US) – Kafka Summit
4/26/2016 (New York, US) – Building Data Pipelines for Solr with Apache NiFi (Meetup)
5/9/2016 (Vancouver, CA) – Streams Track (Big data 2016 conference)
5/10/2016 (New York, US) – The Big Heist: Attempting to steal distributed systems from complexity and chaos (Scala Days)
5/10/2016 (Vancouver, CA) – Using Kafka and Kudu for Fast, Low-Latency SQL Analytics on Streaming Data (Big data 2016 conference)
5/10/2016 (Flemington, US) – Apache NiFi – Deep Dive (Meetup)
6/1-3/2016 (London, UK) – Real-time conference sessions (Strata + Hadoop World)
6/5-7/2016 (Berlin, GE) – Berlin Buzzwords
6/13/2016 (Amsterdam, NL) – GOTO Night: Stream Processing with Apache Flink and Mining Github (Meetup)
6/13/2016 (New York, US) – Apache Beam (Stream Processing @ Scale Track at QCon New York)

Please let me know if that is of interest. Please tweet your projects, blog posts, and meetups to @gschmutz to get them listed in next week’s edition!