Back
October 21, 2022
4
min read

Record Replay, for When You Want to Begin Again

As a foundational aspect of tolerating failures and recovering from errors, data stream processing systems must be able to “replay” a stream, for the purpose of reproducing or correcting the results of a prior run. This becomes especially important when scaling systems to support large production workloads.

Decodable has built upon the underlying capabilities of Flink to provide a platform that enables you to start a pipeline, pipeline preview, or connection from the earliest record available so that all records up to the current moment can be reprocessed. It is also possible to start from the latest position, instead of resuming where the last run left off, skipping forward past unprocessed records. Essentially this gives you a “time machine” of sorts, allowing you to go backwards or forwards in time to the moment when you want to start processing records.

Common use cases include the need to re-process data in the event that an error in the SQL code for a pipeline is discovered, or the need to “warm-up” a new downstream datastore or application with data that had previously been sent to a different destination.

Pipeline Processing

When starting a pipeline, you have the option of setting the start position. By enabling Force start, the pipeline is configured to begin processing available data from either the earliest or latest position, instead of resuming where the last run left off.

If the requirements of a pipeline’s SQL changes or a bug was missed during development, incorrect output will be sent to downstream pipelines, datastores, or applications. When this occurs, developers can make modifications to the SQL to address the issue and then the pipeline can be restarted from the desired starting position to produce the correct output stream.

As another example, the accidental deletion of a table that was previously populated by a real-time data stream could be reconstituted by replaying the records from the input stream through the pipeline. Under the On Demand plan, all streams are allowed to retain data for 7 days, however the total size of all streams may not exceed 100GB.

Pipeline Preview

As a pipeline is being developed, running a preview can help ensure your SQL code is generating the expected output. In the same way as a pipeline, you can choose to start processing from the earliest data available in the input stream (as specified by the FROM clause of the pipeline’s SQL). Doing so allows you to re-process the same records again and again as you make changes to your code, which may help make developing and debugging easier in some scenarios. By default, the preview will start processing the latest available data from the stream.

Connection Processing

In a very similar way to a pipeline, you have the same option of setting the start position when starting a connection. Again, by enabling Force start, the connection is configured to begin processing available data from either the earliest or latest position, instead of resuming where the last run left off.

Connections come in two flavors: source and sink. Source connections read from an external system and write to a Decodable stream, while sink connections read from a Decodable stream and write to an external system. It’s important to keep in mind that only select external systems, primarily those which are Kafka-based, offer support for the replaying of a record stream. For those that do, Decodable is able to take advantage of that functionality.

For sink connections, support for setting a start position is exactly the same as for pipelines, since the input for both of these is a Decodable stream.

Let Us Know!

We hope you are as excited about these processing capabilities as we are. How would you take advantage of these features? What additional functionality would you like to see? Join our community Slack or contact us at support@decodable.co and let us know, we would love to hear from you!


Additional Resources

📫 Email signup 👇

Did you enjoy this issue of Checkpoint Chronicle? Would you like the next edition delivered directly to your email to read from the comfort of your own home?

Simply enter your email address here and we'll send you the next issue as soon as it's published—and nothing else, we promise!

👍 Got it!
Oops! Something went wrong while submitting the form.
David Fabritius

As a foundational aspect of tolerating failures and recovering from errors, data stream processing systems must be able to “replay” a stream, for the purpose of reproducing or correcting the results of a prior run. This becomes especially important when scaling systems to support large production workloads.

Decodable has built upon the underlying capabilities of Flink to provide a platform that enables you to start a pipeline, pipeline preview, or connection from the earliest record available so that all records up to the current moment can be reprocessed. It is also possible to start from the latest position, instead of resuming where the last run left off, skipping forward past unprocessed records. Essentially this gives you a “time machine” of sorts, allowing you to go backwards or forwards in time to the moment when you want to start processing records.

Common use cases include the need to re-process data in the event that an error in the SQL code for a pipeline is discovered, or the need to “warm-up” a new downstream datastore or application with data that had previously been sent to a different destination.

Pipeline Processing

When starting a pipeline, you have the option of setting the start position. By enabling Force start, the pipeline is configured to begin processing available data from either the earliest or latest position, instead of resuming where the last run left off.

If the requirements of a pipeline’s SQL changes or a bug was missed during development, incorrect output will be sent to downstream pipelines, datastores, or applications. When this occurs, developers can make modifications to the SQL to address the issue and then the pipeline can be restarted from the desired starting position to produce the correct output stream.

As another example, the accidental deletion of a table that was previously populated by a real-time data stream could be reconstituted by replaying the records from the input stream through the pipeline. Under the On Demand plan, all streams are allowed to retain data for 7 days, however the total size of all streams may not exceed 100GB.

Pipeline Preview

As a pipeline is being developed, running a preview can help ensure your SQL code is generating the expected output. In the same way as a pipeline, you can choose to start processing from the earliest data available in the input stream (as specified by the FROM clause of the pipeline’s SQL). Doing so allows you to re-process the same records again and again as you make changes to your code, which may help make developing and debugging easier in some scenarios. By default, the preview will start processing the latest available data from the stream.

Connection Processing

In a very similar way to a pipeline, you have the same option of setting the start position when starting a connection. Again, by enabling Force start, the connection is configured to begin processing available data from either the earliest or latest position, instead of resuming where the last run left off.

Connections come in two flavors: source and sink. Source connections read from an external system and write to a Decodable stream, while sink connections read from a Decodable stream and write to an external system. It’s important to keep in mind that only select external systems, primarily those which are Kafka-based, offer support for the replaying of a record stream. For those that do, Decodable is able to take advantage of that functionality.

For sink connections, support for setting a start position is exactly the same as for pipelines, since the input for both of these is a Decodable stream.

Let Us Know!

We hope you are as excited about these processing capabilities as we are. How would you take advantage of these features? What additional functionality would you like to see? Join our community Slack or contact us at support@decodable.co and let us know, we would love to hear from you!


Additional Resources

📫 Email signup 👇

Did you enjoy this issue of Checkpoint Chronicle? Would you like the next edition delivered directly to your email to read from the comfort of your own home?

Simply enter your email address here and we'll send you the next issue as soon as it's published—and nothing else, we promise!

David Fabritius