Back
February 22, 2024
5
min read

Checkpoint Chronicle - February 2024

By
Robin Moffatt
Share this post

Welcome to the Checkpoint Chronicle, a monthly roundup of interesting stuff in the data and streaming space. Your hosts and esteemed curators of said content are Gunnar Morling and Robin Moffatt (your editor-in-chief for this edition). Feel free to send our way any choice nuggets that you think we should feature in future editions.

Stream Processing, Streaming SQL, and Streaming Databases

Event Streaming

Change Data Capture

Data Platforms and Architecture

Data Ecosystem

  • The Modern Data Stack is a moniker that’s been ubiquitous for several years now and one to which any data tool vendor worth its salt would try to hitch its wagon. That is, until last week, when Tristan Handy at dbt wondered out loud whether the term "Modern Data Stack" [is] Still a Useful Idea? And thus spawning a series of response articles from names synonymous with the space including from Joe Reiss and Benn Stancil.
  • DocStore is a distributed database built at Uber, offering strong consistency, caching with Redis, CDC—and the ability to serve over 40 million reads per second.
  • Part of my fun with Flink catalogs (that I mention above) was reacquainting myself with the Hive Metastore. My former colleague Oz Katz has a good article exploring the options in this space now and looking at how some of the new ones aren’t entirely open, or have elements of vendor lock-in.
  • Real time analytics is a hot space with many active projects and vendors. Whilst both Vimeo and Lyft have embraced ClickHouse (moving from Apache Phoenix on HBase and Apache Druid respectively), Uber uses Apache Pinot at scale.
  • Daniel Beach is a data engineer at Rippleshot and prolific blogger. A few of his articles that I’ve enjoyed recently are Config Driven Pipelines and Are Data Contracts For Real? and Batch vs Near-Realtime vs Streaming

Papers of the Month

Murat Demirbas has a fascinating blog in which he analyses papers that have been published. Two papers that caught my eye recently are:

Events & Call for Papers (CfP)

New Releases

There are also a couple of releases that are almost there but not quite at the time of going to press 🙂

  • flink-connector-jdbc-3.1.2 RC3 vote has passed, and so the release is imminent (this will add support for Flink 1.18 to the connector)
  • Apache Kafka 3.7 RC4 vote is underway. This release includes a bunch new stuff such as a Docker image for Kafka (KIP-975), Kafka Connect supporting the creation of connectors in a stopped state (KIP-980), and in Kafka Streams support for rack aware task assignment (KIP-925) plus a bunch of improvements to Interactive Queries v2 (KIP-968, KIP-985, KIP-992)

That’s all for this month! We hope you’ve enjoyed the newsletter and would love to hear about any feedback or suggestions you’ve got.

Gunnar (LinkedIn / X / Mastodon / Email)
Robin (LinkedIn / X / Mastodon / Email)

📫 Email signup 👇

Did you enjoy this issue of Checkpoint Chronicle? Would you like the next edition delivered directly to your email to read from the comfort of your own home?

Simply enter your email address here and we'll send you the next issue as soon as it's published—and nothing else, we promise!

👍 Got it!
Oops! Something went wrong while submitting the form.
Robin Moffatt

Robin is a Principal DevEx Engineer at Decodable. He has been speaking at conferences since 2009 including QCon, Devoxx, Strata, Kafka Summit, and Øredev. You can find many of his talks online and his articles on the Decodable blog as well as his own blog.

Outside of work, Robin enjoys running, drinking good beer, and eating fried breakfasts—although generally not at the same time.

Welcome to the Checkpoint Chronicle, a monthly roundup of interesting stuff in the data and streaming space. Your hosts and esteemed curators of said content are Gunnar Morling and Robin Moffatt (your editor-in-chief for this edition). Feel free to send our way any choice nuggets that you think we should feature in future editions.

Stream Processing, Streaming SQL, and Streaming Databases

Event Streaming

Change Data Capture

Data Platforms and Architecture

Data Ecosystem

  • The Modern Data Stack is a moniker that’s been ubiquitous for several years now and one to which any data tool vendor worth its salt would try to hitch its wagon. That is, until last week, when Tristan Handy at dbt wondered out loud whether the term "Modern Data Stack" [is] Still a Useful Idea? And thus spawning a series of response articles from names synonymous with the space including from Joe Reiss and Benn Stancil.
  • DocStore is a distributed database built at Uber, offering strong consistency, caching with Redis, CDC—and the ability to serve over 40 million reads per second.
  • Part of my fun with Flink catalogs (that I mention above) was reacquainting myself with the Hive Metastore. My former colleague Oz Katz has a good article exploring the options in this space now and looking at how some of the new ones aren’t entirely open, or have elements of vendor lock-in.
  • Real time analytics is a hot space with many active projects and vendors. Whilst both Vimeo and Lyft have embraced ClickHouse (moving from Apache Phoenix on HBase and Apache Druid respectively), Uber uses Apache Pinot at scale.
  • Daniel Beach is a data engineer at Rippleshot and prolific blogger. A few of his articles that I’ve enjoyed recently are Config Driven Pipelines and Are Data Contracts For Real? and Batch vs Near-Realtime vs Streaming

Papers of the Month

Murat Demirbas has a fascinating blog in which he analyses papers that have been published. Two papers that caught my eye recently are:

Events & Call for Papers (CfP)

New Releases

There are also a couple of releases that are almost there but not quite at the time of going to press 🙂

  • flink-connector-jdbc-3.1.2 RC3 vote has passed, and so the release is imminent (this will add support for Flink 1.18 to the connector)
  • Apache Kafka 3.7 RC4 vote is underway. This release includes a bunch new stuff such as a Docker image for Kafka (KIP-975), Kafka Connect supporting the creation of connectors in a stopped state (KIP-980), and in Kafka Streams support for rack aware task assignment (KIP-925) plus a bunch of improvements to Interactive Queries v2 (KIP-968, KIP-985, KIP-992)

That’s all for this month! We hope you’ve enjoyed the newsletter and would love to hear about any feedback or suggestions you’ve got.

Gunnar (LinkedIn / X / Mastodon / Email)
Robin (LinkedIn / X / Mastodon / Email)

📫 Email signup 👇

Did you enjoy this issue of Checkpoint Chronicle? Would you like the next edition delivered directly to your email to read from the comfort of your own home?

Simply enter your email address here and we'll send you the next issue as soon as it's published—and nothing else, we promise!

Robin Moffatt

Robin is a Principal DevEx Engineer at Decodable. He has been speaking at conferences since 2009 including QCon, Devoxx, Strata, Kafka Summit, and Øredev. You can find many of his talks online and his articles on the Decodable blog as well as his own blog.

Outside of work, Robin enjoys running, drinking good beer, and eating fried breakfasts—although generally not at the same time.