Back
March 27, 2025
5
min read

Checkpoint Chronicle - March 2025

Checkpoint Chronicle - March 2025

Welcome to the Checkpoint Chronicle, a monthly roundup of interesting stuff in the data and streaming space. Your editor-in-chief for this edition is Hans-Peter Grahsl. Feel free to send me any choice nuggets that you think we should feature in future editions.

Stream Processing, Streaming SQL, and Streaming Databases

  • I was regularly checking the Apache Flink website lately until I finally stumbled upon the long-awaited Flink 2.0.0 announcement. Grab a coffee and take some time to read through several highlights including but not limited to disaggregated state management, materialized tables, optimized batch execution mode, and deeper integration with Apache Paimon.
  • A recent Flink Community post on the Alibaba blog provides insights into how Flink helps big companies in the retail and e-commerce industries with real-time personalization to improve customer experience.
  • The folks at Responsive started a really insightful article series a while ago. This time, Almog Gavra walks us through the lifecycle of a Kafka Streams application. He explains several good practices and why it’s important to wire exception handlers and various types of listeners in your KStreams apps.
  • Giannis Polyzos discusses the concept of custom triggers in Apache Flink and shows how to control windowed computation by going beyond the built-in triggers which only cover standard behaviour.
  • My latest article walks you through the process of creating a real-time, multi-stage data pipeline by combining the flexibility of custom Flink jobs written in Java with the convenience and declarative nature of Flink SQL.

Event Streaming

  • Apache Kafka 4.0 was released earlier this March and shipped tons of good stuff. It’s the first major version to run KRaft mode by default, features a new consumer group protocol and offers early access to traditional queue semantics. Check out all details in the official release announcement.
  • Talking about Queues for Kafka (KIP-932) there have been several articles lately addressing this new feature set. While Andrew Schofield provides a concise overview here, Gunnar Morling dives deep to explore the fundamental underpinnings in his new “Let's Take a Look at…” series.
  • Jack Vanlightly examines how Kafka’s replication protocol embraces disaggregation—a separation of control and data planes—unlike more monolithic consensus protocols such as Raft.
  • David Arthur wrote about “Build Timeouts” in the context of CI for the Apache Kafka project and shared how they combine the timeout command with thread dumps to tackle the problem of stuck builds.

Data Ecosystem

  • Alireza Sadeghi recently shared a comprehensive overview about the open-source data engineering landscape. It’s a great resource to keep track of what’s happening in this rapidly evolving space. While review articles are only published once per year, this repo provides ongoing updates.
  • Interested in how the interplay between a columnar format, a high-performance RPC framework and a SQL-based interface allow to overcome the inefficiencies of row-based data access protocols from the past? Learn more in Dipankar Mazumdar’s article “What is Apache Arrow Flight, Flight SQL & ADBC?”
  • Vu Trinh put together a really insightful walkthrough after spending 8 hours learning about Parquet. The article distills lots of details in a very approachable manner to help understand not only the Parquet file format structure but also its read/write protocol.
  • In this short video to celebrate the first official release of Apache Polaris 0.9, Danica Fine takes a look back at the project’s origins and shares what to expect in the future.

RDBMS and Change Data Capture

  • In “Life Altering Postgresql Patterns” Ethan McCue shares 11 useful tips for working with Postgres, from using UUIDs as primary keys all the way to returning JSON objects from queries.
  • Andrea Peruffo blogged about how to write single message transformations (SMTs) in Go for Debezium. Built on top of TinyGo, Chicory, and WASM, this new integration path  allows developers to extend CDC processing capabilities by adding custom filters and routes implemented in Go.
  • Agus Mahari put together this beginner-friendly article explaining step by step how to set up a CDC pipeline, powered by Debezium and Kafka Connect, between different relational databases and Redpanda.

Paper of the Month

In Styx: Transactional Stateful Functions on Streaming Dataflows Kyriakos Psarakis et al. introduce a dataflow-based Stateful Functions-as-a-Service (SFaaS) runtime which supports multi-partition transactions while providing serializable isolation guarantees. They tested Styx with different workloads to demonstrate how it can outperform alternative solutions in throughput by at least one order of magnitude.

Events & Call for Papers (CfP)

New Releases

That’s all for this month! We hope you’ve enjoyed the newsletter and would love to hear about any feedback or suggestions you’ve got.

Hans-Peter (LinkedIn / Bluesky / X / Mastodon / Email)

📫 Email signup 👇

Did you enjoy this issue of Checkpoint Chronicle? Would you like the next edition delivered directly to your email to read from the comfort of your own home?

Simply enter your email address here and we'll send you the next issue as soon as it's published—and nothing else, we promise!

đź‘Ť Got it!
Oops! Something went wrong while submitting the form.
Hans-Peter Grahsl

Hans-Peter Grahsl is a Staff Developer Advocate at Decodable. He is an open-source community enthusiast and in particular passionate about event-driven architectures, distributed stream processing systems and data engineering. For his code contributions, conference talks and blog post writing at the intersection of the Apache Kafka and MongoDB communities, Hans-Peter received multiple community awards. He likes to code and is a regular speaker at developer conferences around the world.

Welcome to the Checkpoint Chronicle, a monthly roundup of interesting stuff in the data and streaming space. Your editor-in-chief for this edition is Hans-Peter Grahsl. Feel free to send me any choice nuggets that you think we should feature in future editions.

Stream Processing, Streaming SQL, and Streaming Databases

  • I was regularly checking the Apache Flink website lately until I finally stumbled upon the long-awaited Flink 2.0.0 announcement. Grab a coffee and take some time to read through several highlights including but not limited to disaggregated state management, materialized tables, optimized batch execution mode, and deeper integration with Apache Paimon.
  • A recent Flink Community post on the Alibaba blog provides insights into how Flink helps big companies in the retail and e-commerce industries with real-time personalization to improve customer experience.
  • The folks at Responsive started a really insightful article series a while ago. This time, Almog Gavra walks us through the lifecycle of a Kafka Streams application. He explains several good practices and why it’s important to wire exception handlers and various types of listeners in your KStreams apps.
  • Giannis Polyzos discusses the concept of custom triggers in Apache Flink and shows how to control windowed computation by going beyond the built-in triggers which only cover standard behaviour.
  • My latest article walks you through the process of creating a real-time, multi-stage data pipeline by combining the flexibility of custom Flink jobs written in Java with the convenience and declarative nature of Flink SQL.

Event Streaming

  • Apache Kafka 4.0 was released earlier this March and shipped tons of good stuff. It’s the first major version to run KRaft mode by default, features a new consumer group protocol and offers early access to traditional queue semantics. Check out all details in the official release announcement.
  • Talking about Queues for Kafka (KIP-932) there have been several articles lately addressing this new feature set. While Andrew Schofield provides a concise overview here, Gunnar Morling dives deep to explore the fundamental underpinnings in his new “Let's Take a Look at…” series.
  • Jack Vanlightly examines how Kafka’s replication protocol embraces disaggregation—a separation of control and data planes—unlike more monolithic consensus protocols such as Raft.
  • David Arthur wrote about “Build Timeouts” in the context of CI for the Apache Kafka project and shared how they combine the timeout command with thread dumps to tackle the problem of stuck builds.

Data Ecosystem

  • Alireza Sadeghi recently shared a comprehensive overview about the open-source data engineering landscape. It’s a great resource to keep track of what’s happening in this rapidly evolving space. While review articles are only published once per year, this repo provides ongoing updates.
  • Interested in how the interplay between a columnar format, a high-performance RPC framework and a SQL-based interface allow to overcome the inefficiencies of row-based data access protocols from the past? Learn more in Dipankar Mazumdar’s article “What is Apache Arrow Flight, Flight SQL & ADBC?”
  • Vu Trinh put together a really insightful walkthrough after spending 8 hours learning about Parquet. The article distills lots of details in a very approachable manner to help understand not only the Parquet file format structure but also its read/write protocol.
  • In this short video to celebrate the first official release of Apache Polaris 0.9, Danica Fine takes a look back at the project’s origins and shares what to expect in the future.

RDBMS and Change Data Capture

  • In “Life Altering Postgresql Patterns” Ethan McCue shares 11 useful tips for working with Postgres, from using UUIDs as primary keys all the way to returning JSON objects from queries.
  • Andrea Peruffo blogged about how to write single message transformations (SMTs) in Go for Debezium. Built on top of TinyGo, Chicory, and WASM, this new integration path  allows developers to extend CDC processing capabilities by adding custom filters and routes implemented in Go.
  • Agus Mahari put together this beginner-friendly article explaining step by step how to set up a CDC pipeline, powered by Debezium and Kafka Connect, between different relational databases and Redpanda.

Paper of the Month

In Styx: Transactional Stateful Functions on Streaming Dataflows Kyriakos Psarakis et al. introduce a dataflow-based Stateful Functions-as-a-Service (SFaaS) runtime which supports multi-partition transactions while providing serializable isolation guarantees. They tested Styx with different workloads to demonstrate how it can outperform alternative solutions in throughput by at least one order of magnitude.

Events & Call for Papers (CfP)

New Releases

That’s all for this month! We hope you’ve enjoyed the newsletter and would love to hear about any feedback or suggestions you’ve got.

Hans-Peter (LinkedIn / Bluesky / X / Mastodon / Email)

📫 Email signup 👇

Did you enjoy this issue of Checkpoint Chronicle? Would you like the next edition delivered directly to your email to read from the comfort of your own home?

Simply enter your email address here and we'll send you the next issue as soon as it's published—and nothing else, we promise!

Hans-Peter Grahsl

Hans-Peter Grahsl is a Staff Developer Advocate at Decodable. He is an open-source community enthusiast and in particular passionate about event-driven architectures, distributed stream processing systems and data engineering. For his code contributions, conference talks and blog post writing at the intersection of the Apache Kafka and MongoDB communities, Hans-Peter received multiple community awards. He likes to code and is a regular speaker at developer conferences around the world.

Let's get decoding

Decodable is free. No CC required. Never expires.