In this article we look into another vital feature in any fully-managed data streaming platform, namely, data lineage. We briefly explore how Decodable’s real-time data platform supports interactive lineage graphs to provide enterprises with the necessary insights to effectively manage the ever growing overall complexity they face in their end-to-end data flows on a daily basis.
Lack of End-to-End Visibility Hampers Productivity
A much-cited aspect enterprises are really struggling with when trying to shift their batch workloads to real-time data streaming is the lack of enough actionable insights to stay on top of their end-to-end data flows. Not only are there multiple heterogenous components, such as ingestion and storage systems involved in complex data pipelines, but the stream processing itself is oftentimes composed of numerous steps for many use cases found in the wild. In general, it’s quite hard to get the right level of visibility and control necessary for data teams to keep their productivity high in complex environments.
Enter the Lineage Graph
Therefore, it didn’t really come as a surprise that throughout the previous year, our customers repeatedly expressed interest for another vital data streaming platform ingredient. They wanted to have a central place for viewing and interacting with their data pipelines end-to-end. While this feature request never faced any type of disagreement internally, it took our product and engineering team some time to figure out how to best address this customer need, integrate it into the Decodable web UI, and at the same time match expectations related to the developer experience.
When the prototype of what we call the lineage graph was up and running in the Decodable platform for the very first time, the entire team lit up. We felt like we unlocked a new level of visibility and control that made data platforms feel both powerful and genuinely fun, but more importantly, right after launching this feature customers immediately started to embrace it:
The Decodable pipeline UI has improved so much with the navigability across different pipelines, streams, the last exception displayed, and the lineage displayed.
Lineage Graph Example
Here is a lineage graph example for a basic end-to-end streaming data flow:
- Source Connection: data changes from two Postgres tables are ingested into separate streams
- Pipeline: a SQL pipeline is used to join the two input streams and write the result into a new output stream
- Sink Connection: the output stream is written into a Redpanda topic
For connections and pipelines a status icon reflects the current state—here a green circle signalling a successfully running resource.
Inspecting Input/Output Metrics
Each running resource can be further inspected from within the lineage graph view by opening the metrics pane. While source connections provide insights into output metrics and sink connections show input metrics, pipelines expose both input and output metrics as shown below for the SQL pipeline to join two streams:
Diagnosing Processing Issues
Whenever things aren’t going according to plan, having visual support and quick access to errors and exceptions is essential for fast issue resolution. The lineage graph reflects any such runtime issues directly where they occur. In the example below, the SQL pipeline suddenly stopped processing data due to a schema violation when writing into the output stream. The error message is conveniently exposed as part of the metrics pane that is quickly accessible in a single click.
Thanks to the lineage graph, both the input and output streams for the erroneous SQL pipeline of this end-to-end data flow are immediately visible. This allows for inspecting their respective schemas and thereby identify the mismatch related to specific fields being incompatibly defined as nullable at the source side versus not null at the sink side.
Build Data Flows Interactively
Decodable’s lineage graph provides support for further activities which go beyond the mere inspection of existing data flows. Depending on the resource type, each node in the lineage graph offers a type-specific context menu to choose certain actions from. Any node representing a Decodable Stream allows to, for instance, add a new input—a source connection or a SQL pipeline—writing into the selected stream, or conversely, add a new output—a sink connection or SQL pipeline—reading from the stream in question.
If you want to see the lineage graph in action, check out this quick demo video.
Summary
In this article we have discussed another essential enterprise feature of data streaming platforms, namely, data lineage. We've looked into Decodable’s interactive lineage graph to understand how it aids end-to-end visibility and control when building real-time applications. In particular, we've seen a simple example showing why the lineage graph is a really valuable feature that helps data teams keep their productivity high and fix problems quickly whenever issues occur in data flows which are otherwise hard to grasp.
Interested in exploring the lineage graph for yourself? Sign up for a free Decodable trial today!