Back
November 14, 2021
4
min read

The Future of Real Time Data Engineering

By
Eric Sammer
Share this post

Historically, real-time data engineering has been the land of low-level Java and C++ developers agonizing over frameworks, data serialization, messaging guarantees, distributed state checkpointing algorithms, and pipeline recovery semantics. Concerns over these low-level details aren't limited to people building these systems. They extend to those building pipelines to run on top. Even the most modern systems require that pipeline developers hold a PhD in distributed systems - or to have at least dropped out of a good PhD program - to build fast, correct, and manageable pipelines.

The truth is, while it's fun to get into the details (we love it here at Decodable), most people need to build pipelines quickly that "just work". When a node fails, when a partition goes offline, when a poison pill event shows up, when there's back-pressure in the system, when the power suddenly goes out, or when the data volume scales up, "just working," it turns out, is a tall order.

Assuming that infrastructure challenges are a solved problem for teams (they're typically not), they're still left with the real-world complexities of real-time data engineering. Whether it's coordinating a schema change, establishing the practices to make those changes safely, testing pipelines against real data, performing safe deployments, creating a safe way to share data across teams, or just knowing their pipelines are alive and healthy, teams are inevitably left to build the necessary tooling around the development and management of real-time pipelines. In most cases, teams only need to filter, restructure, parse, aggregate, or enrich data to drive a microservice, ML pipeline, or populate a table in a database or dataset in an object store.

Decodable grew out of this need to up-level, simplify, and accelerate the development and management of real-time pipelines. Application developers, data engineers, and data scientists should be able to build and deploy a production pipeline in minutes, just as they do with batch data pipelines. That means making the infrastructure effectively disappear, leaving what feels like a serverless data engineering process.

Rather than worrying about the low-level details, we think it should be possible to think in terms of connections to data sources and sinks, streams that connect to them, and pipelines that process those streams.

The developer experience building and deploying those pipelines should work with existing tools and processes rather than forcing you to learn an entirely new way of working. Finally, you shouldn't need to re-platform your existing microservices or applications. We get that we need to be a good citizen within your existing data platform, so we have no proprietary formats, work with de facto open source systems and standards.

Building pipelines should be fast, safe, and even fun. If you're a data engineer working in the batch world and need to handle real-time data, we've got you covered. If you're an application engineer building microservices or other data-powered infrastructure, Decodable is for you.

Create an account today to get rolling, and join our Slack community to talk directly to the team and folks just like you.

📫 Email signup 👇

Did you enjoy this issue of Checkpoint Chronicle? Would you like the next edition delivered directly to your email to read from the comfort of your own home?

Simply enter your email address here and we'll send you the next issue as soon as it's published—and nothing else, we promise!

👍 Got it!
Oops! Something went wrong while submitting the form.
Eric Sammer

Eric Sammer is a data analytics industry veteran who has started two companies, Rocana (acquired by Splunk in 2017), and Decodable. He is an author, engineer, and leader on a mission to help companies move and transform data to achieve new and useful business results. Eric is a speaker on topics including data engineering, ML/AI, real-time data processing, entrepreneurship, and open source. He has spoken at events including the RTA Summit and Current, on podcasts with Software Engineering Daily and Sam Ramji, and has appeared in various industry publications.

Historically, real-time data engineering has been the land of low-level Java and C++ developers agonizing over frameworks, data serialization, messaging guarantees, distributed state checkpointing algorithms, and pipeline recovery semantics. Concerns over these low-level details aren't limited to people building these systems. They extend to those building pipelines to run on top. Even the most modern systems require that pipeline developers hold a PhD in distributed systems - or to have at least dropped out of a good PhD program - to build fast, correct, and manageable pipelines.

The truth is, while it's fun to get into the details (we love it here at Decodable), most people need to build pipelines quickly that "just work". When a node fails, when a partition goes offline, when a poison pill event shows up, when there's back-pressure in the system, when the power suddenly goes out, or when the data volume scales up, "just working," it turns out, is a tall order.

Assuming that infrastructure challenges are a solved problem for teams (they're typically not), they're still left with the real-world complexities of real-time data engineering. Whether it's coordinating a schema change, establishing the practices to make those changes safely, testing pipelines against real data, performing safe deployments, creating a safe way to share data across teams, or just knowing their pipelines are alive and healthy, teams are inevitably left to build the necessary tooling around the development and management of real-time pipelines. In most cases, teams only need to filter, restructure, parse, aggregate, or enrich data to drive a microservice, ML pipeline, or populate a table in a database or dataset in an object store.

Decodable grew out of this need to up-level, simplify, and accelerate the development and management of real-time pipelines. Application developers, data engineers, and data scientists should be able to build and deploy a production pipeline in minutes, just as they do with batch data pipelines. That means making the infrastructure effectively disappear, leaving what feels like a serverless data engineering process.

Rather than worrying about the low-level details, we think it should be possible to think in terms of connections to data sources and sinks, streams that connect to them, and pipelines that process those streams.

The developer experience building and deploying those pipelines should work with existing tools and processes rather than forcing you to learn an entirely new way of working. Finally, you shouldn't need to re-platform your existing microservices or applications. We get that we need to be a good citizen within your existing data platform, so we have no proprietary formats, work with de facto open source systems and standards.

Building pipelines should be fast, safe, and even fun. If you're a data engineer working in the batch world and need to handle real-time data, we've got you covered. If you're an application engineer building microservices or other data-powered infrastructure, Decodable is for you.

Create an account today to get rolling, and join our Slack community to talk directly to the team and folks just like you.

📫 Email signup 👇

Did you enjoy this issue of Checkpoint Chronicle? Would you like the next edition delivered directly to your email to read from the comfort of your own home?

Simply enter your email address here and we'll send you the next issue as soon as it's published—and nothing else, we promise!

Eric Sammer

Eric Sammer is a data analytics industry veteran who has started two companies, Rocana (acquired by Splunk in 2017), and Decodable. He is an author, engineer, and leader on a mission to help companies move and transform data to achieve new and useful business results. Eric is a speaker on topics including data engineering, ML/AI, real-time data processing, entrepreneurship, and open source. He has spoken at events including the RTA Summit and Current, on podcasts with Software Engineering Daily and Sam Ramji, and has appeared in various industry publications.