Lineage Graphs for Real-Time Data Pipelines with Decodable

Share this post

In this article we look into another vital feature in any fully-managed data streaming platform, namely, data lineage. We briefly explore how Decodable’s real-time data platform supports interactive lineage graphs to provide enterprises with the necessary insights to effectively manage the ever growing overall complexity they face in their end-to-end data flows on a daily basis.

Lack of End-to-End Visibility Hampers Productivity

A much-cited aspect enterprises are really struggling with when trying to shift their batch workloads to real-time data streaming is the lack of enough actionable insights to stay on top of their end-to-end data flows. Not only are there multiple heterogenous components, such as ingestion and storage systems involved in complex data pipelines, but the stream processing itself is oftentimes composed of numerous steps for many use cases found in the wild. In general, it’s quite hard to get the right level of visibility and control necessary for data teams to keep their productivity high in complex environments.

Enter the Lineage Graph

Therefore, it didn’t really come as a surprise that throughout the previous year, our customers repeatedly expressed interest for another vital data streaming platform ingredient. They wanted to have a central place for viewing and interacting with their data pipelines end-to-end. While this feature request never faced any type of disagreement internally, it took our product and engineering team some time to figure out how to best address this customer need, integrate it into the Decodable web UI, and at the same time match expectations related to the developer experience.

When the prototype of what we call the lineage graph was up and running in the Decodable platform for the very first time, the entire team lit up. We felt like we unlocked a new level of visibility and control that made data platforms feel both powerful and genuinely fun, but more importantly, right after launching this feature customers immediately started to embrace it:

The Decodable pipeline UI has improved so much with the navigability across different pipelines, streams, the last exception displayed, and the lineage displayed.

Lineage Graph Example

Here is a lineage graph example for a basic end-to-end streaming data flow:

Source Connection: data changes from two Postgres tables are ingested into separate streams
Pipeline: a SQL pipeline is used to join the two input streams and write the result into a new output stream
Sink Connection: the output stream is written into a Redpanda topic

For connections and pipelines a status icon reflects the current state—here a green circle signalling a successfully running resource.

Inspecting Input/Output Metrics

Each running resource can be further inspected from within the lineage graph view by opening the metrics pane. While source connections provide insights into output metrics and sink connections show input metrics, pipelines expose both input and output metrics as shown below for the SQL pipeline to join two streams:

Diagnosing Processing Issues

Whenever things aren’t going according to plan, having visual support and quick access to errors and exceptions is essential for fast issue resolution. The lineage graph reflects any such runtime issues directly where they occur. In the example below, the SQL pipeline suddenly stopped processing data due to a schema violation when writing into the output stream. The error message is conveniently exposed as part of the metrics pane that is quickly accessible in a single click.

Thanks to the lineage graph, both the input and output streams for the erroneous SQL pipeline of this end-to-end data flow are immediately visible. This allows for inspecting their respective schemas and thereby identify the mismatch related to specific fields being incompatibly defined as nullable at the source side versus not null at the sink side.

Build Data Flows Interactively

Decodable’s lineage graph provides support for further activities which go beyond the mere inspection of existing data flows. Depending on the resource type, each node in the lineage graph offers a type-specific context menu to choose certain actions from. Any node representing a Decodable Stream allows to, for instance, add a new input—a source connection or a SQL pipeline—writing into the selected stream, or conversely, add a new output—a sink connection or SQL pipeline—reading from the stream in question.

If you want to see the lineage graph in action, check out this quick demo video.

Summary

In this article we have discussed another essential enterprise feature of data streaming platforms, namely, data lineage. We've looked into Decodable’s interactive lineage graph to understand how it aids end-to-end visibility and control when building real-time applications. In particular, we've seen a simple example showing why the lineage graph is a really valuable feature that helps data teams keep their productivity high and fix problems quickly whenever issues occur in data flows which are otherwise hard to grasp.

Interested in exploring the lineage graph for yourself? Sign up for a free Decodable trial today!

📫 Email signup 👇

Did you enjoy this issue of Checkpoint Chronicle? Would you like the next edition delivered directly to your email to read from the comfort of your own home?

Simply enter your email address here and we'll send you the next issue as soon as it's published—and nothing else, we promise!

👍 Got it!

Oops! Something went wrong while submitting the form.

Hans-Peter Grahsl

Hans-Peter Grahsl is a Staff Developer Advocate at Decodable. He is an open-source community enthusiast and in particular passionate about event-driven architectures, distributed stream processing systems and data engineering. For his code contributions, conference talks and blog post writing at the intersection of the Apache Kafka and MongoDB communities, Hans-Peter received multiple community awards. He likes to code and is a regular speaker at developer conferences around the world.

Resource Tagging for Real-Time Data Pipelines with Decodable

January 7, 2025

min read

Powered by Apache Flink and Debezium, Decodable is a real-time data platform that unifies ELT, ETL, and stream processing.

Get the Technical Guide Watch Our Tech Talk

Heading 2

Lack of End-to-End Visibility Hampers Productivity