This article originally appeared at The New Stack.
Apache Flink has emerged as a linchpin of stream processing, used by many organizations, large and small, to orchestrate the delivery of real-time insights from streaming data.
Indeed, Flink’s importance reverberates through the data-science industry, not only as powerful software capable of both batch and stream processing at scale, but also as a catalyst for change. Flink is ushering in a long-imagined era in which data can finally be harnessed for on-target insights and informed, instantaneous decision-making.
As we get underway in 2024, let’s take a brief look back at the milestones achieved by the Apache Flink community and ecosystem in the year just past.
Project Statistics: The Heartbeat of a Thriving Community
Numbers often tell the story, and Flink’s statistics are no exception. The community has witnessed a surge in vitality, with new committers and PMC members joining the ranks. The GitHub stargazing journey reached a great milestone, crossing the 20,000 mark in April.
Of note is the announcement that Apache Flink received the SIGMOD Systems Award 2023 for its significant impact, stating that “Apache Flink greatly expanded the use of stream data-processing.” (SIGMOD is the biggest database research conference in the world.)
The SIGMOD Systems Award serves as a spotlight, acknowledging Flink’s contributions and the work of the more than 1,400 contributors, solidifying its status as a leader in the data processing arena.
Flink’s Continuous Stream of New Features
The Flink community has been a hotbed of activity over the last year, churning out a continuous stream of releases. Flink 1.17, released in March, saw the completion of 7 FLIPs (Flink Improvement Proposals) and more than 600 issues. The equally impressive Flink 1.18, released in October, completed 18 FLIPs and more than 700 issues.
Together, these two releases delivered many exciting new features and improvements to the community, including improvements to Flink’s incremental checkpointing feature, a wide range of new features and improvements to Flink SQL improvements (for instance, point-in-time queries and operator-level state TTL configurability), better cloud-native elasticity by enhancing Flink’s adaptive scheduler, support for Java 17 and much more.
The Flink community also delivered four releases of the Kubernetes Operator (1.4, 1.5, 1.6, 1.7), yielding significant improvements, including support for auto-scaling, the ability to scale the pipeline based on incoming data load and the utilization of the dataflow, more robust rollback management in case of failure scenarios, more flexible savepoint handling and much more.
In addition, Flink sub-projects ML 2.2.0 and StateFun 3.3.0 logged significant progress. The Table Store API was elevated into its own top-level project, Apache Paimon, and several connectors were extracted into separate projects with separate versioning.
Flink’s Ascent in the Streaming Wars Creates Stiff Market Competition
As Flink solidifies its position as the victor in the streaming wars, the industry landscape is undergoing a subtle transformation.
Key players recognize that data streaming is becoming a commodity and stream processing is where differentiation is going to happen.
As a result, a competitive ecosystem of entities offering Flink as a service is emerging. These managed platforms — including those offered by Confluent (which acquired Immerok), Decodable, DeltaStream, and Ververica, among others — offer to help organizations operationalize Flink for scalability, security and developer experience. Of course, the hyperscale cloud services have been quick to throw their hats in the ring: AWS, Microsoft Azure and the Google Cloud Platform now offer Flink services as well.
Chronicles of Success and Adoption
Key industry events are a great place to witness the market evolution. The slate of speakers at Current 2023 and Flink Forward 2023 paints a picture of Flink’s success and widespread adoption by industry giants like Alibaba, Apple, Bloomberg, BMW, Cisco, Deliveroo, DoorDash, IBM, Indeed, LinkedIn, Lyft, NASA, Netflix, Stripe, Uber and Warner Bros. Discovery. These organizations and many more are stepping up to the mic, eager to share how Flink is not just a tool but an integral part of business operations.
What Lies Ahead in 2024
Let’s conclude with a peek into the crystal ball, as I’m sure 2024 will have many exciting developments in store for Flink. The forecasted events I am most excited about are these:
- FLIP-319: Integrate with Kafka’s Support for Proper 2PC Participation is a high priority. Based on Kafka’s work toward supporting two-phase commit transactions (KIP-939), this FLIP aims to improve the Flink Kafka sink in regards to exactly-once guarantees (no more data loss in case of Kafka transaction timeouts) and maintainability. (Currently, the sink relies on Java reflection to adjust some parts of transaction handling in the Kafka client.)
- Flink 2.0 is officially on the roadmap. New features will include:
- Disaggregated state backend
- Removal of deprecated APIs: DataSet, Queryable State, Configuration Options, REST API
- Java 17 by default
- State compatibility for SQL jobs
- The team behind Flink CDC (Change Data Capture) proposes to move it to the Apache Foundation. This would make a highly popular, third-party, Flink-related project a part of the Apache Flink project proper, fostering collaboration and community growth.
Looking Forward to Another Year of Progress
Some might say that Flink, after being created nearly a decade ago, got off to a slow start. But few would argue that Flink is now hitting its stride. 2023 was filled with progress and an undeniable swell of momentum.
With exciting developments around Flink, but also stream processing at large, ahead of us is a great time to join the community and contribute your special talents to the innovations, collaborations and triumphs that will revolutionize the landscape of data processing.