Please let me know if that is of interest. Please tweet your projects, blog posts, and presentations & videos to @gschmutz to get them listed in next week’s edition!

gschmutz 19:55 on June 27, 2022
Tags: kafka ( 248 ), kafka-streams ( 176 ), nifi ( 238 ), spark ( 153 ), spark-streaming ( 219 ), Stream Processing ( 241 )

Last Week in Stream Data Integration & Stream Analytics – 27.6.2022

This is the 246th edition of my blog series blog series around Stream Data Integration and Stream Analytics!

As usual, find below the new blog articles, presentations, videos and software releases from last week. Happy reading and stay safe!

News and Blog Posts

General

Event-Driven Architecture. Part 1: Pros and cons with examples by Dan Siwiec
CDC Strategies for Real Time Data Lakes/Data Platform by Sambhavgupta

Kafka

Tuning Apache Kafka and Confluent Platform for Graviton2 using Amazon Corretto by Mike Cook
Kafka Streams Introduction by Rob Golder
Announcing ksqlDB 0.26 by Tom Nguyen
Kafka Monthly Digest: May 2022 by Mickael Maison
Rack awareness in Kafka Streams by Levani Kokhreidze
How to Elastically Scale Apache Kafka Clusters on Confluent Cloud by Aashish Kohli
How Confluent Treats Incidents in the Cloud by Tim Ellis
Legacy Modernization and Hybrid Multi-Cloud with Kafka in Healthcare by Kai Waehner
Architectural Microservices Patterns: SAGA, Outbox and CQRS with Kafka by Ali Gelenler
A practical guide for migrating Kafka Schema Registry between data centers by Shlomi Király
Debezium to Snowflake: Lessons learned building data replication in production by Omar Ghalawinji
Kafka Streams Spring Boot Demo by Rob Golder
Streaming Analytics With KSQL vs. a Real-Time Analytics Database by Lewis Gavin
Building a scalable webhook delivery system using Kafka, SQS & S3 by Eyal Ringort
Microservices 101: Transactional Outbox and Inbox by Krzysztof Atłasik
Streaming data with Postgres, Debezium and Kafka by Clifford Frempong
IDC Perspective: Accelerate Data Streaming Adoption With Confluent by Mekhala Roy
Autonomous Networks — The Telco and Media Growth Engine by Eric Dozier & Justin Lee
Applying Data Pipeline Principles in Practice: Exploring the Kafka Summit Keynote Demo by Yeva Byzek
Managing Hybrid Cloud Data with Cloud-Native Kubernetes APIs by Karthikeyan Srinivasan

Apache NiFi

Apache NiFi and Apache NiFi Registry on Kubernetes by GetInData TechTeam

StreamSets

Insights from the road: Shifting to a higher gear with StreamSets by Sanjay Brahmawar

New Videos

Securing Kafka Connect Pipelines with Client-Side Field Level Cryptography by Hans-Peter Grashl
Streaming Updates through Complex Operations in Kafka Streams at Scale by Victor Künstler
Flink vs Kafka Streams/ksqlDB: Comparing Stream Processing Tools by Confluent Streaming Audio
Keep Your Cache Always Fresh with Debezium! by Gunnar Morling
Geo-replicated Kafka Streams Apps by Ryanne Dolan
Common Apache Kafka Mistakes to Avoid by Confluent Streaming Audio

New Presentations

Securing Kafka Connect Pipelines with Client-Side Field Level Cryptography by Hans-Peter Grashl

New Podcasts

Practical Data Pipeline: Build a Plant Monitoring System with ksqlDB by Confluent Streaming Audio
Flink vs Kafka Streams/ksqlDB: Comparing Stream Processing Tools by Confluent Streaming Audio
Data Mesh Architecture: A Modern Distributed Data Model by Confluent Streaming Audio
How I Became a Developer Advocate by Confluent Streaming Audio
Common Apache Kafka Mistakes to Avoid by Confluent Streaming Audio

New Releases

Apache NiFi 1.16.2

Please let me know if that is of interest. Please tweet your projects, blog posts, and presentations & videos to @gschmutz to get them listed in next week’s edition!

gschmutz 21:05 on May 18, 2022
Tags: kafka ( 248 ), kafka-streams ( 176 ), nifi ( 238 ), spark ( 153 ), spark-streaming ( 219 ), Stream Processing ( 241 )

Last Week in Stream Data Integration & Stream Analytics – 18.5.2022

This is the 245th edition of my blog series blog series around Stream Data Integration and Stream Analytics!

As usual, find below the new blog articles, presentations, videos and software releases from last week. Happy reading and stay safe!

News and Blog Posts

General

Need for Speed: Evaluating Real-time Analytics Systems by Gang Tao
Powering real-time data analytics with Druid at Twitter by Ruchin Kabra & Chunxu Tang

Kafka

How Walmart Uses Apache Kafka for Real-Time Replenishment at Scale by Suman Pattnaik
Confluent at a Fully Disconnected Edge by Joseph Morais & Braeden Quirante
Cloud-native Core Banking Modernization with Apache Kafka by Kai Waehner
Kafka Streaming: Live Streaming Kafka Application to Cassandra by Isaac Omolayo
What’s New in Apache Kafka 3.2.0 by Bruno Cadonna
Stream Processing vs. Batch Processing: What to Know by Jean-Sébastien Brunner

Flink

Getting into Low-Latency Gears with Apache Flink – Part One by Jun Qin & Nico Kruber

Apache NiFi

Apache NiFi: Importing and exporting parameters by Maarten Smeets

StreamSets

Redis Pipeline: How to Publish and Subscribe Data from Redis to Your Destination by Wilson Shamim

New Videos

Apache Kafka 3.2 – New Features & Improvements by Danica Fine

New Podcasts

Scaling Apache Kafka Clusters on Confluent Cloud ft. Ajit Yagaty and Aashish Kohli by Confluent Streaming Audio (#214)
Data Journey with Victoria Bukta (Shopify) – Apache Iceberg and Data Ingestion

New Releases

Please let me know if that is of interest. Please tweet your projects, blog posts, and presentations & videos to @gschmutz to get them listed in next week’s edition!

gschmutz 15:52 on May 9, 2022
Tags: kafka ( 248 ), kafka-streams ( 176 ), nifi ( 238 ), spark ( 153 ), spark-streaming ( 219 ), Stream Processing ( 241 )

Last Week in Stream Data Integration & Stream Analytics – 9.5.2022

This is the 244th edition of my blog series blog series around Stream Data Integration and Stream Analytics!

As usual, find below the new blog articles, presentations, videos and software releases from last week. Happy reading and stay safe!

News and Blog Posts

General

On Data Lakes and Stream Ingestion by Vojtech Tuma
Unbundling the Modern Streaming Stack by Dunith Dhanushka

Kafka

Zero Copy. One Of Reason Behind Why Kafka So Fast by ANKIT
Accelerate Cloud Database Modernizations and Migrations with Confluent by Peter Kennedy
Process Formula 1 telemetry with Quarkus and OpenShift Streams for Apache Kafka by Paolo Patierno
Accelerate Cloud Database Modernizations and Migrations with Confluent by Peter Kennedy
Designing and testing a highly available Kafka cluster on Kubernetes by Douglas Hellinger
Announcing ksqlDB 0.25.1 by Tolga Dur
Fine-tune Kafka performance with the Kafka optimization theorem by Bilgin Ibryam
From the Cellar to the Cloud – How Aedifion is Driving Next-Generation Building Automation with Confluent by Dr. Jan Henrik Ziegeldorf
Transactions vs. Analytics in Apache Kafka by Kai Waehner
How To Deploy Apache Kafka With Kubernetes by Alvin Lee
Kafka Monthly Digest: April 2022 by Mickael Maison
How to Remove Apache Kafka Brokers the Easy Way by Stanislav Kozlovski
Spinning Your Drones With Cadence and Apache Kafka® – Architecture, Order and Delivery Workflows by Paul Brebner
Why Apache Kafka is dropping ZooKeeper for KRaft by Paul Krill
My top 5 tools to manage/develop with Apache Kafka by Rafael Zimmermann
Learn Apache Kafka with Python and docker by Rafael Zimmermann
Kafka Consumer Group Rebalance (1 of 2) by Rob Golder
Kafka Consumer Group Rebalance (2 of 2) by Rob Golder

Spark

Speed Up Streaming Queries With Asynchronous State Checkpointing by Craig Ng
Streaming Windows Event Logs into the Cybersecurity Lakehouse by Derek King

Flink

Announcing the Release of Apache Flink 1.15 by Joe Moser & Yun Gao

Apache NiFi

Apache NiFi: Having fun with Jolt transformations by Maarten Smeets

StreamSets

The Role of ETL in Data Integration by Sean Anderson

New Presentations

Keep Your Cache Always Fresh With Debezium by Gunnar Morling

New Videos

Build a Data Streaming App for Sound & Vision | Coding in Motion | Coding Workshop by Kris Jenkins

New Podcasts

Optimizing Apache Kafka’s Internals with Its Co-Creator Jun Rao by Confluent Streaming Audio (#211)
Build a Data Streaming App with Apache Kafka and JS – Coding in Motion by Confluent Streaming Audio (#212)
Streaming Analytics on 50M Events Per Day with Confluent Cloud at Picnic by Confluent Streaming Audio (#213)

New Releases

Please let me know if that is of interest. Please tweet your projects, blog posts, and presentations & videos to @gschmutz to get them listed in next week’s edition!

gschmutz 19:20 on April 27, 2022
Tags: kafka ( 248 ), kafka-streams ( 176 ), nifi ( 238 ), spark ( 153 ), spark-streaming ( 219 ), Stream Processing ( 241 )

Last Week in Stream Data Integration & Stream Analytics – 27.04.2022

This is the 243rd edition of my blog series blog series around Stream Data Integration and Stream Analytics!

As usual, find below the new blog articles, presentations, videos and software releases from last week. Happy reading and stay safe!

News and Blog Posts

General

The Architect’s Guide to Real-Time Retail by Sumeet Puri

Kafka

Kafka Visualization by SoftwareMill
Real-Time Apache Kafka Monitoring and Metrics with Health+ by Jesse Miller
Building a Bridge to the Cloud with Confluent CLI v2 by Brian Strauch
Debezium Change Data Capture without Kafka Connect by Kestra
Real-time event driven system for law enforcement with Kafka streams — Elasticsearch and Slack by BigM
Kafka Summit London 2022: The Full Recap by Robin Moffatt
Playing with OKafka – creating the simplest of producers and consumers by Mark Nelson
Introducing Current 2022: The Next Generation of Kafka Summit by Ben Stopford

Spark

Real-time Monitoring of Apache Spark Streaming Jobs with Power BI by Patrick Pichler
Spark Data Streaming with MongoDB by Abhishek Jaiswal

StreamSets

The Convergence of Data and Application Integration Is Here by Suraj Kumar, Arvind Prabhakar & Raji Narayanan

New Presentations

Testing Kafka Containers with Testcontainers: There and Back Again by Viktor Gamov

New Videos

Using Event-Driven Design with Apache Kafka Streaming Applications ft. Bobby Calderwood by Confluent Streaming Audio (#210)

New Podcasts

Using Event-Driven Design with Apache Kafka Streaming Applications ft. Bobby Calderwood by Confluent Streaming Audio (#210)
Kafka Event Streaming ft. Anna McDonald by Counting Sand

New Releases

Debezium 1.9.1.Final Released

Please let me know if that is of interest. Please tweet your projects, blog posts, and presentations & videos to @gschmutz to get them listed in next week’s edition!

gschmutz 20:27 on April 19, 2022
Tags: kafka ( 248 ), kafka-streams ( 176 ), nifi ( 238 ), spark ( 153 ), spark-streaming ( 219 ), Stream Processing ( 241 )

Last Week in Stream Data Integration & Stream Analytics – 19.04.2022

This is the 242nd edition of my blog series blog series around Stream Data Integration and Stream Analytics!

As usual, find below the new blog articles, presentations, videos and software releases from last week. Happy reading and stay safe!

News and Blog Posts

Kafka

Harness Trusted, Quality Data Streams with Confluent Platform 7.1 by Hasan Jilani
Kafka Summit London 2022: Welcoming the Apache Kafka Community Back to In-Person Events! by Robin Moffatt
How Apache Kafka Works: An Introduction to Kafka’s Internals by Dave Klein
How Netflix Content Engineering makes a federated graph searchable by Alex Hutter & Falguni Jhaveri & Senthil Sayeebaba
Machine Learning and Data Science with Kafka in Healthcare by Kai Waehner
Buillding a Real-Time Data Pipeline with Oracle CDC and MarkLogic Using CFK and Confluent Cloud by Geetha Anne
RBAC at Scale, Oracle CDC Source Connector, and More – Q2’22 Confluent Cloud Launch by Ben Echols & Jacklyne Keomany
Load-balanced Brooklin Mirror Maker: Replicating large-scale Kafka clusters at LinkedIn by Vaibhav Maheshwari
Presto® on Apache Kafka® At Uber Scale by Yang Yang & Yupeng Fu & Hitarth Trivedi

Spark

Delivering Real-Time Data to Retailers with Delta Live Tables by Saurabh Shukla & Bryan Smith & Rob Saker & Sam Steiny

Apache NiFi

Apache NiFi: Automating tasks using NiPyAPI by Maarten Smeets

StreamSets

The Next Chapter for StreamSets by Girish Pancha
Software AG and StreamSets: Stronger Together for Our Customers by Scott Little & Rowan Scranage

New Videos

Handling 2 Million Apache Kafka Messages Per Second at Honeycomb by Confluent Streaming Audio (#204)

New Podcasts

Confluent Platform 7.1: New Features + Updates by Confluent Streaming Audio (#208)
Monitoring Extreme-Scale Apache Kafka Using eBPF at New Relic by Confluent Streaming Audio (#209)

New Releases

Please let me know if that is of interest. Please tweet your projects, blog posts, and presentations & videos to @gschmutz to get them listed in next week’s edition!

gschmutz 18:52 on April 11, 2022
Tags: kafka ( 248 ), kafka-streams ( 176 ), nifi ( 238 ), spark ( 153 ), spark-streaming ( 219 ), Stream Processing ( 241 )

Last Week in Stream Data Integration & Stream Analytics – 11.04.2022

This is the 241st edition of my blog series blog series around Stream Data Integration and Stream Analytics!

As usual, find below the new blog articles, presentations, videos and software releases from last week. Happy reading and stay safe!

News and Blog Posts

General

Real-time data ingestion in Grab by Shuguang Xiang & Irfan Hanif & Feng Cheng
The Evolution To Streaming Graph from Graph Databases by Rob Malnati
Key Concepts to Help You Get Started With Streaming Graph by Allan Konar
Can Streaming Graphs Clean Up the Data Pipeline Mess? by Alex Woodie

Comparision

Why you want RabbitMQ not Kafka by Eric Fossas
Kafka vs. Kinesis: A Deep Dive Comparison by Raji Narayanan

Kafka

Why ZooKeeper Was Replaced with KRaft – The Log of All Logs by Guozhang Wang
How Confluent Can Help Optimize and Modernize Your SIEM for Better Cybersecurity by Jeff Bean & Will LaForest
Deploying Self-Managed Connectors on EKS Fargate by Braeden Quirante & Joseph Morais
Why you should not query a database in your stream processors by Brice LEPORINI
Introducing Stream Processing Use Case Recipes Powered by ksqlDB by Michael Drogalis & Sophia Jiang
Build a real-time data analytics pipeline with Airbyte, Kafka, and Pinot by Dunith Dhanushka
Migrating to a Multi-Cluster Managed Kafka with 0 Downtime by Natan Silnitsky
Apache Kafka in the Healthcare Industry by Kai Waehner
Testing your Apache Kafka Data with Confidence by James
Legacy Modernization and Hybrid Multi-Cloud with Kafka in Healthcare by Kai Waehner
Read-only Incremental Snapshots for MySQL by Kate Galieva
Multi-Cluster Deployment Options for Apache Kafka: Pros and Cons by Nikita Gorbachevski
Building a Dependable Real-Time Betting App with Confluent Cloud and Ably by Ben Gamble
Comparison: Apache Camel vs. Apache Kafka? by Kai Waehner
Why we dropped event sourcing with Kafka Streams when given a second chance by Mateusz Jadczyk
Beautify kcat consumer output by piping to jq by Francesco Tisiot
Kafka Monthly Digest: March 2022 by Mickael Maison
Migrating Data to Azure Synapse with Confluent’s Fully Managed Connector to Unlock Real-Time Advanced Analytics by Jacob Bogie & Dustin Vannoy
Securing Kafka® Infrastructure at Uber by Prateek Agarwal, Ryan Turner & KK Sriramadhesikan
The State of Data Streaming by Alexander Heckmann

Flink

Apache Flink Kubernetes Operator 0.1.0 Release Announcement by Gyula Fora

Apache NiFi

Apache NiFi: Avoid these common pitfalls by Maarten Smeets

StreamSets

What is Streaming Analytics? Use Cases, Examples, and Architecture by Sean Anderson
Python Pipeline: Here’s How to Build Your Python Package and install it in StreamSets by Wilson Shamim
Streaming Kafka to Snowflake: A Strategic & Technical Walkthrough by Thomas Bennett
When, Why, and How to Use Change Data Capture (CDC) by Raji Narayanan
How Your Data Ingestion Framework Turns Strategy into Action by Sean Anderson

New Presentations

Apache Flink Adoption at Shopify by Yaroslav Tkachenko
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apache Kafka by Kai Waehner

New Videos

Building a real-time streaming WebSockets server with TypeScript and Apache Kafka by Kris Jenkins
Building Real-Time Data Governance at Scale with Apache Kafka ft. Tushar Thole by Confluent Streaming Audio (#205)
Bridging Frontend and Backend with GraphQL and Apache Kafka ft. Gerard Klijs by Confluent Streaming Audio (#206) by Confluent Streaming Audio (#206)
Let’s Talk Streaming Graph! by thatDot team
Scaling an Apache Kafka Based Architecture at Therapie Clinic by Confluent Streaming Audio (#207)

New Podcasts

Building Real-Time Data Governance at Scale with Apache Kafka ft. Tushar Thole by Confluent Streaming Audio (#205)
A Bootiful Podcast: Event streaming guru Jan Svoboda on Apache Kafka Design Patterns by A Bootiful Podcast
Bridging Frontend and Backend with GraphQL and Apache Kafka ft. Gerard Klijs by Confluent Streaming Audio (#206)
JMS vs. Kafka: Technology Smackdown with Clement Escoffier and Kai Waehner by Coding over Cocktails
Scaling an Apache Kafka Based Architecture at Therapie Clinic by Confluent Streaming Audio (#207)

New Releases

Please let me know if that is of interest. Please tweet your projects, blog posts, and presentations & videos to @gschmutz to get them listed in next week’s edition!

gschmutz 20:54 on March 15, 2022
Tags: kafka ( 248 ), kafka-streams ( 176 ), nifi ( 238 ), spark ( 153 ), spark-streaming ( 219 ), Stream Processing ( 241 )

Last Week in Stream Data Integration & Stream Analytics – 11.03.2022

This is the 240th edition of my blog series blog series around Stream Data Integration and Stream Analytics!

As usual, find below the new blog articles, presentations, videos and software releases from last week. Happy reading and stay safe!

News and Blog Posts

General

The different types of events in event-driven systems by Frank de Jonge
Why Conduit? An evolutionary leap forward for real-time data integration by Lyric Hartley
An exciting example that shows the depths of CDC technology by Zsombor Chikán
DDD – Events Are Complex by Phillip Johnson
Use a message envelope by by Frank de Jonge
How to ensure uniqueness in Event Sourcing by Oskar Dudycz
Designing applications at scale: Microservices and events — Part 1 by Jose Antonio Diaz Mata

Comparision

Multi-Tenancy Systems: Apache Pulsar vs. Kafka by Yabin Meng
Pulsar or Kafka? And the lessons from doing our own testing! by Tyler Owen

Kafka

Stateful Serverless Architectures with ksqlDB and AWS Lambda by Bill Bejeck
Redis™ Streams vs Apache Kafka by Paul Brebner
Bringing Your Own Monitoring (BYOM) with Confluent Cloud by Niyi Odumosu
How Storyblocks Enabled a New Class of Event-Driven Microservices with Confluent by Chas DeVeas
Building Real-Time Data Systems the Hard Way by Kris Jenkins
Real-Time Supply Chain with Apache Kafka in the Food and Retail Industry by Kai Waehner
Neil’s Apache Kafka Resource Guide by Neil Buesing
Kafka + Parquet: Maximize speed, minimize storage by Amanda Martin
Kafka: Schema Registry PEM authentication by Elliot West
KIPs Under Discussion by Tom Cooper
How to Make Apache Kafka Clients Go Fast(er) on Confluent Cloud by Yeva Byzek
Kafka Monthly Digest: February 2022 by Mickael Maison
Querying Kafka topics using Presto by Khandelwal Praful
Kafka Dynamic Configuration & Multiple Error Handler by Bayiralican
Solving Concurrency in Event-Driven Microservices by Hugo Rocha
Announcing ksqlDB 0.24.0 by Tom Nguyen
An Introduction to Data Mesh by Rick Spurgeon

Flink

Scala Free in One Fifteen by Seth Wiesman
Keep the SQL: Move from batch to streaming with Apache Kafka® and Apache Flink® by Francesco Tisiot

Druid

Streaming fast: Druid’s event-based database practices for sub-second trillion row response by Chris Mellor

Apache NiFi

Processing Batch Messages with Apache NiFi by Ben Yaakobi

StreamSets

Software AG acquires StreamSets to further accelerate rapid growth in hybrid integration by Software AG

New Videos

Building a Telegram bot with Apache Kafka and ksqlDB by Robin Moffatt
Serverless Stream Processing with Apache Kafka ft. Bill Bejeck by Confluent Streaming Audio (#202)
Why Data Mesh? ft. Ben Stopford by Confluent Streaming Audio (#203)

New Podcasts

Intro to Event Sourcing with Apache Kafka ft. Anna McDonald by Confluent Streaming Audio (#198)
The Evolution of Apache Kafka: From In-House Infrastructure to Managed Cloud Service ft. Jay Kreps by Confluent Streaming Audio (#201)
#55 Apache Kafka – Like Functional Programming but for Data (With Anna McDonald) by Happy Path Programming
Serverless Stream Processing with Apache Kafka ft. Bill Bejeck by Confluent Streaming Audio (#202)
Why Data Mesh? ft. Ben Stopford by Confluent Streaming Audio (#203)
Handling 2 Million Apache Kafka Messages Per Second at Honeycomb by Confluent Streaming Audio (#204)

New Releases

Please let me know if that is of interest. Please tweet your projects, blog posts, and presentations & videos to @gschmutz to get them listed in next week’s edition!

gschmutz 07:48 on February 16, 2022
Tags: kafka ( 248 ), kafka-streams ( 176 ), nifi ( 238 ), spark ( 153 ), spark-streaming ( 219 ), Stream Processing ( 241 )

Last Week in Stream Data Integration & Stream Analytics – 16.02.2022

This is the 239th edition of my blog series blog series around Stream Data Integration and Stream Analytics!

As usual, find below the new blog articles, presentations, videos and software releases from last week. Happy reading and stay safe!

News and Blog Posts

General

The Four Innovation Phases of Netflix’s Trillions Scale Real-time Data Infrastructure by Zhenzhong Xu
Streaming data vs. real-time data — what’s the difference? by Richard Wang

Comparision

Kafka vs Kinesis: Comparing Across Five Dimensions by Stéphane Mareek
Redis™ Pub/Sub vs Apache Kafka®: Redis Pub/Sub Extras, Use Cases and Comparison With Apache Kafka by Paul Brebner
Redis™️ Pub/Sub vs Apache Kafka®: An Introduction and Connected vs Disconnected Delivery by Paul Brebner

Kafka

Building Reference Architectures for User-Facing Analytics by Dunith Dhanushka
Streaming ETL SFDC Data for Real-Time Customer Analytics by Shay Lin & Sharath Vandanapu & Keshav Mathur
Defrag Your Data Architecture by Jeff Ferguson
ksqlDB —real-time SQL magic in the cybersecurity scenario— part 1 by Maciej Szymczyk
Building an Enterprise CDC Solution by Dario Cazas Pernas
Kafka Topic Naming by Erman Terciyanlı
Kafka Idempotent Consumer With DynamoDB by Rob Golder
Kafka Monthly Digest: January 2022 by Mickael Maison
Apache Kafka as Data Hub for Crypto, DeFi, NFT, Metaverse – Beyond the Buzz by Kai Waehner
OPC UA, MQTT, and Apache Kafka – The Trinity of Data Streaming in IoT by Kai Waehner
Event-Driven Architectures with Kotlin on Serverless Kafka by Tobias Wissmueller
Spinning Apache Kafka® Microservices With Cadence Workflows by Paul Brebner
Building Self-driving Kafka clusters using open source components by Suman Karumuri & George Luong
Sailing through Kafka Streams by Data Affair

Spark

We Just Cut 85% of Our Data Streaming Pipelines Cost! (Part 2) by Yigal Pinhasi
We Just Cut 85% of Our Data Streaming Pipelines Cost! (Part 1) by Yigal Pinhasi
Structured Streaming: A Year in Review by Steven Yu and Ray Zhu

New Videos

Building a Telegram bot with Apache Kafka and ksqlDB by Robin Moffatt

New Podcasts

Intro to Event Sourcing with Apache Kafka ft. Anna McDonald by Confluent Streaming Audio (#198)

New Releases

Debezium 1.9.0.Alpha2 Released

Please let me know if that is of interest. Please tweet your projects, blog posts, and presentations & videos to @gschmutz to get them listed in next week’s edition!

gschmutz 21:48 on January 31, 2022
Tags: kafka ( 248 ), kafka-streams ( 176 ), nifi ( 238 ), spark ( 153 ), spark-streaming ( 219 ), Stream Processing ( 241 )

Last Week in Stream Data Integration & Stream Analytics – 31.01.2022

This is the 238th edition of my blog series blog series around Stream Data Integration and Stream Analytics! This time (once again) the title Last Month in ….. would be more accurate 😉

As usual, find below the new blog articles, presentations, videos and software releases from last week. Happy reading and stay safe!

News and Blog Posts

General

Improving data quality with Event Sourcing by Mattias Holmqvist
More Enterprises are Advancing Along the Path to Pervasive Event-Driven Architecture by Denis King

Kafka

I Interviewed Nearly 200 Apache Kafka Experts and I Learned These 10 Things by Tim Berglund
When NOT to use Apache Kafka? by Kai Waehner
Creating Positive Behavioral Changes at Scale with Confluent and ksqlDB at Omada by Ryan Quan
The Link To Cloud: How to Build a Seamless and Secure Hybrid Data Bridge with Cluster Linking by Luke Knepper & Rajini Sivaram
Auto-Balancing and Optimizing Apache Kafka Clusters with Improved Observability and Elasticity in Confluent Platform 7.0 by Aishwarya Gune & Marc Selwan
Kafka Monthly Digest: December 2021 by Mickael Maison
5 Common Pitfalls When Using Apache Kafka by Danica Fine & Hiro Kuwabara
Announcing ksqlDB 0.23.1 by Natea Eshetu Beshada
Pipelining Kafka Events into Snowflake with Dockerized Kafka Connect by Adam McQuistan
Autoscaling on Kubernetes with KEDA and Kafka by Piotr Minkowski
Announcing the Confluent Q1 ‘22 Launch by Ben Echols & Greg Murphy
A Great Day Out with … Apache Kafka by Gunnar Morling & Hans-Peter Grashl
What’s New in Apache Kafka 3.1.0 by David Jacot
Adopt Data Streaming with The Definitive Guide by Jakub Korab
Confluent Streaming for Databricks: Build Scalable Real-time Applications on the Lakehouse by Hiral Jasani
Kafka for Real-Time Replication between Edge and Hybrid Cloud by Kai Waehner
IoT Reference Architecture and Implementation Guide Using Confluent and MongoDB Realm by Venkatesh Shanbhag & Vasanth Sanna Mariyappa & Rankesh Kumar
AWS and Confluent Announce Deepened Strategic Collaboration by Erica Schultz
Outbox with Debezium and Kafka — The hidden challenges by Victor Perepelitsky
When to use Apache Camel vs. Apache Kafka? by Kai Waehner
The Art of Building Open Data Lakes with Apache Hudi, Kafka, Hive, and Debezium by Gary A. Stafford
Realtime data streaming with Apache Kafka, Apache Pinot, Apache Druid and Apache Superset by Bruno Cardoso Farias
Dapr and Kafka-easy binding by Lucas Jellema

Flink

How We Improved Scheduler Performance for Large-scale Jobs – Part One by Zhilong Hong , Zhu Zhu , Daisy Tsang , & Till Rohrmann
How We Improved Scheduler Performance for Large-scale Jobs – Part Two by Zhilong Hong , Zhu Zhu , Daisy Tsang , & Till Rohrmann

StreamSets

The JSON Validator: A Custom StreamSets Processor That Ensures Data Quality by Hannah Recker

Apache NiFi

Koo’s data platform — part 1: Apache Kafka and NiFi by Phaneesh Gururaj

New Presentations

Kappa vs Lambda Architectures and Technology Comparison by Kai Waehner

New Videos

Kappa vs Lambda Architectures and Technology Comparison by Kai Waehner
Make your Kafka cluster production ready: How many disks do I need? by Strimzi
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0 by Kai Waehner
Apache Kafka 3.1 – Overview of Latest Features, Updates, and KIPs by Danica Fine

New Podcasts

Real-Time Change Data Capture and Data Integration with Apache Kafka and Qlik by Confluent Streaming Audio (#193)
From Batch to Real-Time: Tips for Streaming Data Pipelines with Apache Kafka ft. Danica Fine by Confluent Streaming Audio (#194)
Optimizing Cloud-Native Apache Kafka Performance ft. Alok Nikhil and Adithya Chandra by Confluent Streaming Audio (#195)
Apache Kafka 3.1 – Overview of Latest Features, Updates, and KIPs by Confluent Streaming Audio (#196)
Expanding Apache Kafka Multi-Tenancy for Cloud-Native Systems ft. Anna Povzner and Anastasia Vela by Confluent Streaming Audio (#197)

New Releases

Please let me know if that is of interest. Please tweet your projects, blog posts, and presentations & videos to @gschmutz to get them listed in next week’s edition!