Analytics engineers recognize the value of managing data transformation in a consistent, unified, trackable, testable manner. Keeping track of transformations that occur throughout the data stack, and the SQL that drives them, can be complex. This complexity often results in duplicated effort, confusion, and – worse – bad data that breaks downstream workflows. The open source project dbt has emerged as a key tool in helping bring order to what have been difficult, chaotic transformation processes.
How does dbt work?
dbt was originally designed for the batch transformation that occurs once data is loaded into a data warehouse or data lake. To date, the vast majority of transformations managed by dbt are batch, transforming data-at-rest. However, as data teams increasingly embrace the Kappa architecture to simplify their data stacks (and reduce costs), real-time stream processing is becoming the norm. Transforming data-in-flight at scale, in a way that is simple, reliable, and consistent, is why Decodable was created. Decodable’s dbt adapter integrates streaming real-time data transformation, powered by Apache Flink®, and batch transformation into a single environment in dbt.
Decodable’s adapter is available today, designed to let you manage Flink-powered streaming data transformation the way you already manage batch transformations. Our adapter makes Decodable the first managed streaming platform with dbt support. Importantly, Decodable is also releasing our dbt adapter as open source, available for contributions and feedback from the community. Documentation and source for the dbt-decodable project are available on GitHub, in the dbt documentation, and on the Decodable website.
Why dbt for Flink-powered transformation?
Instead of developing and maintaining Flink SQL in an unmanaged collection of files, dbt introduces a workflow based on the concept of storing queries and data assumptions in model files. These files can then be managed via git or another version control system. This unlocks several interesting features, including:
- Enables easily cloning an entire project with hundreds of queries and testing them with dbt run.
- Facilitates cross-team collaboration on projects by leveraging the capabilities of version control, including making changes via pull requests or working on experimental branches.
- Provides the ability to define assumptions about the data, which makes it safer to implement and validate changes.
- Allows teams to run dbt in their CI/CD workflows for automated deployments.
While dbt was designed primarily for traditional batch processing, it works very well with streaming SQL. As an example, a team could be working in a Decodable development account on a number of streaming SQL queries. Using the dbt adapter, they can share and collaborate on queries and stream definitions. Once these have been developed and tested, dbt can be used to quickly and easily apply them to a production account.
Getting started with Decodable and dbt
If you’re ready to try the Decodable dbt adapter, you can navigate to the open-source project dbt-decodable. You’ll also need Decodable, available as a free trial account, and, of course, dbt itself (pip install dbt-decodable will also pull dbt into your environment). To install the latest version of the Decodable adapter via pip (optionally using a virtual environment), run:
The readme and related documentation can help you configure appropriate dbt profiles, understand the adapter’s currently supported features, and more. If you’d like a tour of Decodable with one of our streaming experts, you can always request a personalized 1:1 demo of the product and see how simple we’ve made it to take advantage of real-time data streaming.