Back
January 30, 2023
5
min read

Five Drivers Behind the Rapid Rise of Apache Flink

By
Robert Metzger
Share this post

This article originally appeared at Datanami.

In 2022 alone, a total of at least $55 million has been invested by venture capitalists into startups building companies around Apache Flink, the open source project thatā€™s used to process data streams at large scale and deliver real-time analytical insights. In 2023, Confluent announced acquiring a Flink startup for a rumored $100m. Investors have high confidence that Flink is the right technology for stream processing.

And itā€™s not just new companies; AWS offers Flink as a hosted service, and Alibaba does the same with an even more advanced platform. More cloud providers are going to offer hosted Flink services in the future.

In this article, we will explore why Apache Flinkā€”a project that has been around for over a decadeā€”is suddenly hot. Originally called Stratosphere, the first commits to the project date back to 2010, and weā€™ll come back to why this matters later.

Letā€™s look at five drivers why Flink is suddenly enjoying a lot of attention:

VC Money Is Attracting Attention

Looking back into the past a bit, we can actually see that these recent investments show a renewed interest in Flink, as there have been a few VC-backed companies before.

Data Artisans (founded in 2014), which has been renamed to Ververica and acquired by Alibaba (for a rumored $103m after $6.5m in total funding) was the first startup to receive funding for Flink. In 2016, eventador.io was started. They collected an aggregate of $3.8m in funding before their acquisition by Cloudera. Aiven.io has been started in 2016 as well, theyā€™ve collected a total of $420m for a whole bunch of open source projects offered in their services, among them, you guessed it, Apache Flink.

If thereā€™s an industry specialized in looking into the future, taking risks and shaping the future, then itā€™s venture capitalists. Many people follow venture activity, thus itā€™s instructive to examine the recent investment rounds in companies building streaming solutions atop Flink. Notable examples include Decodable ($25.5m), Immerok ($17m) and DeltaStream ($10m). These and others have drawn a lot of attention to Flink, each is putting the technology at the core of their offerings.

Flinkā€™s Proven, Has Strong Community

Flink is used deep-down in the technology stacks that companies use to power internal real time analytics infrastructures. It is the foundation supporting the money-makers in modern architectures: real-time ads, recommendations, fraud detection, quality-of-service monitoring, and more. If itā€™s tied to revenue generation or providing users with up-to-date information, thereā€™s a good chance Flink is making it happen.

Prominent, large-scale users include Stripe, Uber, Shopify, and many others like Pinterest and Netflix; Flinkā€™s ā€œpowered byā€ page contains the ā€œwho is whoā€ of modern-day tech.Ā  What matters to these usersā€”and also future users of Flinkā€”is that they can have confidence that Flink is battle tested at scale. This assurance is critical, because deploying Flink is not easy, and it is often a multi-month project for one or several teams in an organizationā€™s data science and data infrastructure organizations.

Another strong argument for Flink is its vibrant, diverse and vendor-independent open-source community at the Apache Software Foundation. The project is backed by a variety of organizations, and it has a rich ecosystem of adjacent projects such as the Flink CDC Connectors or the Kubernetes operator.

No Real Alternatives to Flink

Some folks might be offended by this assumption, but hear me out first: For certain use-cases, there are no real alternatives to Flink. As soon as you want to do stream processing with very large state or high throughput, or if you want to be independent of specific data streaming platforms like Apache Kafka, then Flink is the only choice.

If you open the history books youā€™ll see that there have been plenty of attempts to build open source and/or source available stream processors: Apache Storm, Apache Samza, Apache Apex, Kafka Streams, Ksqldb, Materialize, Apache Spark Streaming.

Some of those projects are now in maintenance mode, while others remain quite active or are still getting started. But in my opinion none of them offers the breadth and depth of Flink in terms of deployment options, use-cases covered and adaptability to various use-cases (be it large state, low latency, reactive application development, etc.).

There are of course new projects showing up at the horizon, with interesting takes on the future, for example Materialize (not open source until end of 2026) or Rising Wave. Letā€™s see which direction they are going and how they stand the test of time (and large production use-cases).

Broader Market Has Finally Caught Up

As a long-term contributor to Flink, I believe that the project has always been ā€œhot,ā€ but of course Iā€™m biased šŸ˜‰ What has changed is that the market of users has finally caught up.

While a small group of engineers and companies saw the need for a technology like Flink 10 years ago, the world wasnā€™t ready for it. Only the largest and most sophisticated companies had the data volumes and scale to really need a technology like Flink. Itā€™s no coincidence that Netflix, Uber, Alibaba, Pinterest, Twitter etc. are talking about their use of Flink for many years already.

Besides the need, there is also the ability to use Flink. Flink is a specialist system for distributed systems engineers. You can easily shoot yourself in the foot by using an inefficient serializer, sending too much data over the network, or misconfiguring RocksDB. This is another explanation for why Flink has to this point mostly been adopted at large tech companies with substantial infrastructure engineering horsepower.

However, we now see this rapidly changing. Companies like Decodable offer Flink as a hosted service, with a Snowflake-like experience, accessible through widely known tools like SQL, with the platform taking care of the heavy lifting in terms of infrastructure and operations. Of course, it is more than just SQL, itā€™s also connectors, schema definitions, developer experience and much more. But the important thing is that customers donā€™t need to fiddle with Flink configurations, state backends, or checkpointing timeouts.

I believe we are at the point where the broader market is understanding that whatā€™s needed is a set of technologies that allow users to make decisions faster and gain insights into their data instantly. Flink is the right technology to solve that problem.

Streaming SQL

As I mentioned in assumption number 4, deploying and operating Flink in a production system requires specialists. Writing a Flink application in Java is not a trivial endeavor, and productionizing it is even harder.

You may think what you want about the SQL syntax and the scattered landscape of SQL dialects. It is the lingua franca for analytics. SQL is still taught today to the next generation of data analysts and data scientists. Both the database research community (for example: ā€œOne SQL to Rule Them All: An Efficient and Syntactically Idiomatic Approach to Management of Streams and Tablesā€) as well as the open source community (with Flink SQL or ksqlDB) have agreed that stream processing with SQL is possibleā€”even preferable.

With SQL also being understood by modern stream processors, a huge new population of engineers has access to streaming technologies.

Besides wider reach, thereā€™s another argument to be made about Streaming SQL. With managed services such as Decodable, a complex stateful operation can be expressed with a few lines of code. Instead of spending weeks building a microservice for a problem, a Flink SQL streaming application in combination with a REST connector from providers like Decodable solve the same problem with little initial and ongoing costs. Of course in-house Flink SQL platforms or other vendors can also support this use-case. The point I want to make here is that Streaming SQL is not only about ā€œdemocratizing access to streamingā€, as the marketing team would say, it is also reducing the time and complexity to production dramatically, even for complex use-cases.

Conclusion

Flink is hot because the community of data scientists and infrastructure engineers have decided that the future is Flink. We have all the ingredients: well-funded startups, well-resourced enterprises loaded with engineering talent, a battle-tested and open-source technology, and a huge market that is rapidly emerging from an early state into one that is looking to modernize data stacks to become real-time.

The bottom line is that Flink is hot and getting hotter. If you are looking to get started with Flink all by itself, check out this introduction. If you donā€™t want to learn Flink but just want to benefit from it, there are as-a-service offerings that you can sign up for, no credit card required.

ā€

Additional Resources

šŸ“« Email signup šŸ‘‡

Did you enjoy this issue of Checkpoint Chronicle? Would you like the next edition delivered directly to your email to read from the comfort of your own home?

Simply enter your email address here and we'll send you the next issue as soon as it's publishedā€”and nothing else, we promise!

šŸ‘ Got it!
Oops! Something went wrong while submitting the form.
Robert Metzger

Robert oversees the core Apache Flink-based data platform at Decodable powering the SaaS stream processing platform. Beyond this role, heā€™s a committer and the PMC Chair of the Apache Flink project. He has co-created Flink and contributed many core components of the project over the years. He previously co-founded and successfully exited data Artisans (now Ververica), the company that created and commercialized Flink.

This article originally appeared at Datanami.

In 2022 alone, a total of at least $55 million has been invested by venture capitalists into startups building companies around Apache Flink, the open source project thatā€™s used to process data streams at large scale and deliver real-time analytical insights. In 2023, Confluent announced acquiring a Flink startup for a rumored $100m. Investors have high confidence that Flink is the right technology for stream processing.

And itā€™s not just new companies; AWS offers Flink as a hosted service, and Alibaba does the same with an even more advanced platform. More cloud providers are going to offer hosted Flink services in the future.

In this article, we will explore why Apache Flinkā€”a project that has been around for over a decadeā€”is suddenly hot. Originally called Stratosphere, the first commits to the project date back to 2010, and weā€™ll come back to why this matters later.

Letā€™s look at five drivers why Flink is suddenly enjoying a lot of attention:

VC Money Is Attracting Attention

Looking back into the past a bit, we can actually see that these recent investments show a renewed interest in Flink, as there have been a few VC-backed companies before.

Data Artisans (founded in 2014), which has been renamed to Ververica and acquired by Alibaba (for a rumored $103m after $6.5m in total funding) was the first startup to receive funding for Flink. In 2016, eventador.io was started. They collected an aggregate of $3.8m in funding before their acquisition by Cloudera. Aiven.io has been started in 2016 as well, theyā€™ve collected a total of $420m for a whole bunch of open source projects offered in their services, among them, you guessed it, Apache Flink.

If thereā€™s an industry specialized in looking into the future, taking risks and shaping the future, then itā€™s venture capitalists. Many people follow venture activity, thus itā€™s instructive to examine the recent investment rounds in companies building streaming solutions atop Flink. Notable examples include Decodable ($25.5m), Immerok ($17m) and DeltaStream ($10m). These and others have drawn a lot of attention to Flink, each is putting the technology at the core of their offerings.

Flinkā€™s Proven, Has Strong Community

Flink is used deep-down in the technology stacks that companies use to power internal real time analytics infrastructures. It is the foundation supporting the money-makers in modern architectures: real-time ads, recommendations, fraud detection, quality-of-service monitoring, and more. If itā€™s tied to revenue generation or providing users with up-to-date information, thereā€™s a good chance Flink is making it happen.

Prominent, large-scale users include Stripe, Uber, Shopify, and many others like Pinterest and Netflix; Flinkā€™s ā€œpowered byā€ page contains the ā€œwho is whoā€ of modern-day tech.Ā  What matters to these usersā€”and also future users of Flinkā€”is that they can have confidence that Flink is battle tested at scale. This assurance is critical, because deploying Flink is not easy, and it is often a multi-month project for one or several teams in an organizationā€™s data science and data infrastructure organizations.

Another strong argument for Flink is its vibrant, diverse and vendor-independent open-source community at the Apache Software Foundation. The project is backed by a variety of organizations, and it has a rich ecosystem of adjacent projects such as the Flink CDC Connectors or the Kubernetes operator.

No Real Alternatives to Flink

Some folks might be offended by this assumption, but hear me out first: For certain use-cases, there are no real alternatives to Flink. As soon as you want to do stream processing with very large state or high throughput, or if you want to be independent of specific data streaming platforms like Apache Kafka, then Flink is the only choice.

If you open the history books youā€™ll see that there have been plenty of attempts to build open source and/or source available stream processors: Apache Storm, Apache Samza, Apache Apex, Kafka Streams, Ksqldb, Materialize, Apache Spark Streaming.

Some of those projects are now in maintenance mode, while others remain quite active or are still getting started. But in my opinion none of them offers the breadth and depth of Flink in terms of deployment options, use-cases covered and adaptability to various use-cases (be it large state, low latency, reactive application development, etc.).

There are of course new projects showing up at the horizon, with interesting takes on the future, for example Materialize (not open source until end of 2026) or Rising Wave. Letā€™s see which direction they are going and how they stand the test of time (and large production use-cases).

Broader Market Has Finally Caught Up

As a long-term contributor to Flink, I believe that the project has always been ā€œhot,ā€ but of course Iā€™m biased šŸ˜‰ What has changed is that the market of users has finally caught up.

While a small group of engineers and companies saw the need for a technology like Flink 10 years ago, the world wasnā€™t ready for it. Only the largest and most sophisticated companies had the data volumes and scale to really need a technology like Flink. Itā€™s no coincidence that Netflix, Uber, Alibaba, Pinterest, Twitter etc. are talking about their use of Flink for many years already.

Besides the need, there is also the ability to use Flink. Flink is a specialist system for distributed systems engineers. You can easily shoot yourself in the foot by using an inefficient serializer, sending too much data over the network, or misconfiguring RocksDB. This is another explanation for why Flink has to this point mostly been adopted at large tech companies with substantial infrastructure engineering horsepower.

However, we now see this rapidly changing. Companies like Decodable offer Flink as a hosted service, with a Snowflake-like experience, accessible through widely known tools like SQL, with the platform taking care of the heavy lifting in terms of infrastructure and operations. Of course, it is more than just SQL, itā€™s also connectors, schema definitions, developer experience and much more. But the important thing is that customers donā€™t need to fiddle with Flink configurations, state backends, or checkpointing timeouts.

I believe we are at the point where the broader market is understanding that whatā€™s needed is a set of technologies that allow users to make decisions faster and gain insights into their data instantly. Flink is the right technology to solve that problem.

Streaming SQL

As I mentioned in assumption number 4, deploying and operating Flink in a production system requires specialists. Writing a Flink application in Java is not a trivial endeavor, and productionizing it is even harder.

You may think what you want about the SQL syntax and the scattered landscape of SQL dialects. It is the lingua franca for analytics. SQL is still taught today to the next generation of data analysts and data scientists. Both the database research community (for example: ā€œOne SQL to Rule Them All: An Efficient and Syntactically Idiomatic Approach to Management of Streams and Tablesā€) as well as the open source community (with Flink SQL or ksqlDB) have agreed that stream processing with SQL is possibleā€”even preferable.

With SQL also being understood by modern stream processors, a huge new population of engineers has access to streaming technologies.

Besides wider reach, thereā€™s another argument to be made about Streaming SQL. With managed services such as Decodable, a complex stateful operation can be expressed with a few lines of code. Instead of spending weeks building a microservice for a problem, a Flink SQL streaming application in combination with a REST connector from providers like Decodable solve the same problem with little initial and ongoing costs. Of course in-house Flink SQL platforms or other vendors can also support this use-case. The point I want to make here is that Streaming SQL is not only about ā€œdemocratizing access to streamingā€, as the marketing team would say, it is also reducing the time and complexity to production dramatically, even for complex use-cases.

Conclusion

Flink is hot because the community of data scientists and infrastructure engineers have decided that the future is Flink. We have all the ingredients: well-funded startups, well-resourced enterprises loaded with engineering talent, a battle-tested and open-source technology, and a huge market that is rapidly emerging from an early state into one that is looking to modernize data stacks to become real-time.

The bottom line is that Flink is hot and getting hotter. If you are looking to get started with Flink all by itself, check out this introduction. If you donā€™t want to learn Flink but just want to benefit from it, there are as-a-service offerings that you can sign up for, no credit card required.

ā€

Additional Resources

šŸ“« Email signup šŸ‘‡

Did you enjoy this issue of Checkpoint Chronicle? Would you like the next edition delivered directly to your email to read from the comfort of your own home?

Simply enter your email address here and we'll send you the next issue as soon as it's publishedā€”and nothing else, we promise!

Robert Metzger

Robert oversees the core Apache Flink-based data platform at Decodable powering the SaaS stream processing platform. Beyond this role, heā€™s a committer and the PMC Chair of the Apache Flink project. He has co-created Flink and contributed many core components of the project over the years. He previously co-founded and successfully exited data Artisans (now Ververica), the company that created and commercialized Flink.