Last week in Stream Processing & Analytics 6/6/2016


This is the 17th installment of my blog series around Stream Processing and Analytics.

I really liked Darryl Taft’s article on 10 Best Practices for Managing Modern Data in Motion, where she lists 10 tips for managing data in motion. I think all of them are important, here are my 5 favorite ones:

  1. Replace Specifying schema with capturing intent: An intent-driven focus on big data helps decrease the effort and time needed to develop and implement pipelines.
  2. Sanitize before Storing: Sanitizing data as close to the source as possible makes data scientist more productive.
  3. Expect and deal with Data Drift: Implementing the rights kinds of tools and processes can help mitigate the effects on data drift.
  4. Don’t just count packages, inspect the contents: Analyzing the value of your data can be more important than just measuring throughput and latency.
  5. Decouple for Continual Modernization: Decoupling the stages of data movement allows you to upgrade each as you see fit.

As usual, find below the new blog articles, presentations, videos and software releases from last week:

News and Blog Posts

General

Comparison

Apache Storm

Heron

Apache Spark Streaming

Apache Kafka

Apache NiFi / Hortonworks HDF

StreamSets

Apache Quarks

New Presentations

New Videos

New Releases

Upcoming Events

Please let me know if that is of interest. Please tweet your projects, blog posts, and meetups to @gschmutz to get them listed in next week’s edition!