This is the 11th installment of my blog series around Stream Processing and Analytics.
First two interesting tweets I found last week. The first one by Steve Wilkes brings it straight to the point:
The second one by Neha Narkhede reveals some impressive metrics about the usage of Kafka @ LinkedIn. 1.4 trillion messages a day on 1400 brokers. Kafka is really a game changer!
Last but not least I would like to quote from Mark Palmer’s latest article on 8 Predictions for the Internet of Analytics which I really enjoyed reading:
- Streaming analytics will become a fundamental topic in computer science. Forrester’s Streaming Analytics Wave defines a set of computer science criteria to define streaming analytics: time windowing, aggregation, correlation, and integration with interactive analytics. These fundamentals are not well understand by the computer science community, are not yet taught in school, and are therefore not yet well known.
- Data streams will be as important as data lakes. Data lakes contain data at rest; data streams contain data in motion. But most IT applications today are designed around data at rest. In the coming decade, data streams will become as important as data at rest.
- Streaming analytics and traditional analytics will become increasingly intertwined. In order to apply analytics to streams, you need to know what to look for. Traditional analytics help you look through the rearview mirror at the past, and predict important conditions. Streaming analytics are about looking forward, through your windshield, looking at real-time conditions, and acting.
So that’s it for this week. As usual, find below the new blog articles, presentations, videos and software releases from last week:
News and Blog Posts
General
- Streaming Analytics Market worth $1,955.7 Million by 2020 by Market and Markets
- 8 Predictions for the Internet of Analytics by Mark Palmer
- Why Your Business Should Be Worried About Data Drift by Peter Daisyme
- The Architecture Files, Ep. 3: OK EVENT LOG by Ian Varley
- The Next Market Leaders Will Power Their Businesses from IoAT Data Sources by Mark Lochbihler
- The IoAT and Big Data means big business by Andy Leaver
- 9 must watch big data technologies along with Hadoop by Kumar Chinnakali
- Spark, Kafka & machine learning: 10 big data start-ups taking analytics to the next level by James Nunns
- Streaming Analytics Will Transform The IoT Into The Internet Of Analytics by Rowan Curran
- Hadoop, Kafka creators big on big data streaming analytics by Jack Vaughan
- Keeping Hadoop Properly Fed a Challenge for Many Data Pros by David Weldon
- Top Challenge With Real-time Analytics: Education by David Weldon
Comparison
- Look out, Spark and Storm, here comes Apache Apex by Ian Pointer
- Is Flink the shiny(err..) toy on the block? by Vikas Hazrati
Apache Beam
- Apache Beam’s Ambitious Goal: Unify Big Data Development by Alex Woodie
- Apache Beam : Next Step in Big Data Unification by Madhukara Phatak
Apache Storm
- Distributed, configuration based ETL in Apache STORM by David Woodhead
- Announcing Apache Storm 1.0.0 by Sriharsha Chintalapani & Kanishk Mahajan
Apache Spark Streaming
- Fast, Scalable, Streaming Applications with MapR Streams, Spark Streaming, and MapR-DB by Carol McDonald
- Spark Streaming and Twitter Sentiment Analysis by Nicolas Perez
- Stateful Distributed Stream Processing by investigativeprogramming
Apache Flink
- Apache Flink Gets Real with Continuous Stream Processing by Susan Hall
Apache Apex
Apache Kafka
- First CAKE Meetup Overview and Video by Patrick Jaromin
- Deploying Apache Kafka on AWS Elastic Block Store (EBS) by David Tucker
- Open Source Kafka Connect adds more than a Dozen Connectors by Confluent
- Kafka Ecosystem at LinkedIn by Joel Koshy
Apache NiFi / Hortonworks DataFlow
- Royal Mail starts to deliver on Hortonworks’ ‘data in motion’ promise by Jessica Twentyman
- NiFi OCR – Using Apache NiFi to read children’s books by Jeremy Dryer
- Converting CSV To Avro with Apache NiFi by Jeremy Dyer
Apache Metron
- Apache Metron Tech Preview 1 – Come and Get It! by George Vetticaden & James Sirota
StreamSets
- Is Data Drift Polluting Your Data Lake? by StreamSets
- Podcast with Girish Pancha from StreamSets about Performance Management of Data Flows by Girish Pancha
New Presentations
- The Future of Apache Storm by Taylor Goetz
- Streaming SQL by Julian Hyde
- OT Big Data Ingestion and Processing in Hadoop by Silver Spring Networks by Greg Brosman
- Apache Kafka as a Service by Oliver Deakin
- Deep Dive and Best Practices for Real-Time Streaming Applications by Roy Ben-Alta
- Apache Flink: Counting Elements in Streams by Jamie Grier
- Spark Streaming in 10 minutes by Sven Tessmann
- Getting Started with AWS IoT by Ian Massingham
- Productization of Big Data Streaming Analytics by Mike Gualtieri & Amol Kekre
- Visual Dataflows with Apache NiFi – and how they interact with AWS by Kay Lerch
- Apache Flink Introduction by Ahmed Nader
New Videos
- Distributed Stream Processing with Apache Kafka by Jay Kreps
- Combining Stream Processing and In-Memory Data Grids for Near-Real Time Aggregation and Notifications by Oliver Malassi
- Getting Started with Streaming Analytics and the IoT by Adrian Bowles
- IOT Big Data Ingestion and Processing in Hadoop by Silver Spring Networks by Greg Brosman & Darin Nee
- One Million Clicks per Minute with Kafka and Clojure by Devon Peticolas
New Podcasts
- Podcast with Girish Pancha from StreamSets about Performance Management of Data Flows by Girish Pancha
New Releases / Components
Upcoming Events
- 4/25/2016 (San Francisco, US) – Kafka Summit
- 4/26/2016 (New York, US) – Building Data Pipelines for Solr with Apache NiFi (Meetup)
- 4/27/2016 (Laurel, US) – Apache NiFi: Because It Aint Data Science Without the Data (Meetup)
- 4/28/2016 (Renens, CH) – Spark Streaming: Dealing with State (Meetup)
- 5/4/2016 (online) – How to Achieve High Throughput for Real-Time Applications with SMACK, Apache Kafka and Spark Streaming (Datastax Webinar)
- 5/4/2016 (Fairfield, US) – Data Flow using Apache NiFi (Meetup)
- 5/5/2016 (online) – Apache Spark 2.0 presented by Databricks co-founder Reynold Xin (Databricks Webinar)
- 5/5/2016 (online) – Apache Storm and Twitter Heron: Stream Processing at Scale and Monitoring Performance (Opsclarity Webinar)
- 5/9/2016 (Santa Clara, US) – Make Streaming IoT Analytics Work For You: The Devil is in the Details (Meetup)
- 5/9/2016 (Vancouver, CA) – Streams Track (Big data 2016 conference)
- 5/10/2016 (New York, US) – The Big Heist: Attempting to steal distributed systems from complexity and chaos (Scala Days)
- 5/10/2016 (Vancouver, CA) – Using Kafka and Kudu for Fast, Low-Latency SQL Analytics on Streaming Data (Big data 2016 conference)
- 5/10/2016 (Flemington, US) – Apache NiFi – Deep Dive (Meetup)
- 5/16-20/2 (San Francisco, US) – Pileplines By the Bay (Pipelines By the Bay Conference)
- 5/16/2016 (online) – Productionizing your Streaming Jobs (Databricks Webinar)
- 5/20/2016 (Madrid, SP) – Workshop Apache Flink (Meetup)
- 6/1-3/2016 (London, UK) – Real-time conference sessions (Strata + Hadoop World)
- 6/5-7/2016 (Berlin, GE) – Berlin Buzzwords
- 6/13/2016 (Amsterdam, NL) – GOTO Night: Stream Processing with Apache Flink and Mining Github (Meetup)
- 6/13/2016 (New York, US) – Apache Beam (Stream Processing @ Scale Track at QCon New York)
Please let me know if that is of interest. Please tweet your projects, blog posts, and meetups to @gschmutz to get them listed in next week’s edition!