Decodable is a real-time stream processing platform built on Apache Flink®, providing engineers with an easy-to-use SQL-based developer experience when transforming streaming data. Decodable makes the power of Flink accessible to everyone and provides the fastest route to real-time data success.
Today we’re excited to announce the availability of a direct Snowflake connector that leverages the new Snowflake Snowpipe Streaming API. For Snowflake customers, the Snowflake Snowpipe Streaming API is designed with the goal of reducing data latency and making ingesting real-time data into Snowflake less expensive. Now data engineers can ingest real-time data into Snowflake directly with a minimum of configuration overhead using Decodable’s simple, fully-managed connector. Decodable’s new API connector is maintained and managed by our team of data platform experts, one of many Decodable connectors that make setting up streaming pipelines simple and quick – helping organizations build and use real-time data pipelines in minutes, not months.
Watch the demo video on our YouTube channel, Streaming Data into Snowflake with Decodable.
Decodable and Snowpipe with Amazon S3
Prior to the availability of the Decodable direct connector to Snowflake (which leverages the Snowpipe Streaming API), a common workflow for sinking data when using Apache Flink was via standard Snowpipe. This Snowpipe workflow requires companies to land their data into cloud storage (e.g., Amazon S3), set up automatic new-file notifications (e.g., via SQS), and create a Snowpipe object that receives those notifications. This workflow must include a way to manage failures through a separate cloud messaging system as well as remove processed files from cloud storage. In the case of OLTP, it also requires an additional merging step to reflect upstream changes to a primary key. This workflow can be complex with a significant amount of ongoing overhead to maintain, but can be useful if multiple systems (other than Snowflake) need access to streaming data landed in the cloud storage repository.
Key benefits of the Snowpipe Streaming API
In contrast, the new, direct, Decodable Snowflake connector leverages the new Snowpipe Streaming API for append-stream data and is fully hands-off. It brings the ease of use developers expect when ingesting data into Snowflake and modernizes Flink's ability to interface with it, bypassing the need for the S3-based Snowpipe workflow. It manages all failures and intermediate steps (e.g., merging in the OLTP case). All that’s needed is a simple, one-time configuration in Decodable. With the new connector, streaming data is available immediately in Snowflake to any downstream systems that may require it.
The new connector also has the benefit of being a true streaming sink. Via the new Snowpipe Streaming API, append-stream data is streamed into Snowflake tables in real time, which eliminates the need to balance data ingestion latency against the cost of Snowflake warehouse usage—the streaming feature is built to handle real-time latency in a cost effective manner, so it’s very likely that most Snowflake customers will want to use it for streaming.
Does the Snowflake Snowpipe API really lower costs and reduce latency?
Sinking streaming data into S3 then ingesting it in a second step into Snowflake using standard Snowpipe can be costly because of the multiple steps and additional infrastructure required. The new Decodable Snowflake connector bypasses those unnecessary steps and infrastructure – but the precise savings associated with eliminating unnecessary complexity will depend on specific use cases. Once the Snowpipe Streaming API has been available publicly for a few months, Snowflake customers will be able to experience directly how it impacts their costs.
Decodable has been using and testing the new API for several months at the time of this writing. Our experience is that yes, costs associated with the new API are much lower than the older Snowpipe S3-focused workflow. In fact, we had to check our Snowflake invoices to make sure we were actually being billed appropriately due to how much lower our costs seemed to be.
Additionally, data latency in Snowflake is clearly lower. By avoiding the batch-processing step of moving into S3 and then into Snowflake, data is far less stale and available far faster for all our Snowflake use cases. Here at Decodable, we find the new API and our new connector deliver as promised.
It is important to note that, even with the new API and Decodable’s connector, OLTP/change-stream data ingestion users will still need to make a tradeoff between latency and cost. Snowflake continues to require merges to support CDC data, so a dedicated virtual warehouse in which to run merges is still required. As a result, change-stream data will not necessarily be real-time without incurring additional costs.
Getting Started with Snowflake and Decodable
Create a free Decodable trial account today to try out the new Snowflake connector with your Snowflake account and the Snowpipe Streaming API. To learn more about how to use Snowflake with Decodable, or to read about our many other available database connectors, please refer to the Decodable documentation.