
Medidata accelerates clinical insights with Decodable
See how Medidata streamlined data pipelines to power real-time decision-making.

See how Medidata streamlined data pipelines to power real-time decision-making.
$622M
Life Sciences
Data and Engineering
CDC -> Apache Flink -> Iceberg -> Snowflake
Medidata’s clinical trial operations rely on vast amounts of data flowing in from diverse sources, including electronic data capture (EDC) systems, file-based systems, and third-party platforms. This fragmented data landscape created inconsistencies, with data arriving at different times and in different formats. This made it difficult for teams to consolidate, transform, and harmonize data for analytics.
Siloed ETL pipelines operated on batch schedules, with data ingestion varying from minutes to hours—or even days if data missed a processing window. This delay meant that clinical teams lacked timely access to crucial trial data, slowing down decision-making and potentially impacting patient outcomes. Medidata needed a solution that could handle real-time data ingestion without the unpredictability of batch latency.
While Medidata sought to modernize its data platform with Apache Iceberg and Flink, most of its engineering team lacked experience in these technologies. Managing the underlying infrastructure—configuring pipelines, tuning performance, and ensuring stability at scale—would be resource-intensive. Without a fully managed solution, onboarding new users and scaling across the organization remained a challenge.
By adopting Decodable, Medidata transitioned from fragmented ETL pipelines to a fully managed, real-time streaming architecture powered by Apache Flink. This shift eliminated data inconsistencies across clinical applications, ensuring that trial data arrives in a structured, harmonized format—regardless of its source. “We’ve broken down the old model where apps had to iron out data contracts and delivery schedules,” explained Mike Araujo, Staff Engineer at Medidata. “Now, everyone's consuming from the same lake, where all data arrives in near real-time, ensuring consistency, durability, and scalability.” Medidata also needed a production-grade SLA that could support clinical operations with guaranteed uptime and compliance. Decodable’s infrastructure ensures reliable, low-latency data delivery while keeping all sensitive clinical data within Medidata’s BYOC deployment. This provides tight security and governance while maintaining operational flexibility.
Medidata no longer relies on batch processing windows that could delay insights for days. Streaming ingestion pipelines ensure clinical data is processed quickly, significantly improving decision-making for trial operations. Decodable’s intuitive developer toolkit enables engineers to easily build, manage, and scale streaming pipelines without the steep learning curve traditionally associated with Flink. According to Mike, “Definitely not to be overlooked is the excellent support and expertise that the Decodable team has provided. They are an incredibly responsive group. A lot of this is new technology—we don’t have Kubernetes experts everywhere, we don’t have Flink experts everywhere, we don’t have Iceberg experts everywhere. So, having the support of the Decodable team, available at all times, has really helped escalate our streaming vision in our lakehouse."
Before Decodable, Medidata’s engineering team struggled with ensuring data consistency and proactive monitoring across thousands of streaming jobs. Decodable’s seamless integration with Medidata’s observability stack (Prometheus, Grafana, Sumo Logic) enables real-time monitoring of streaming pipelines, allowing teams to detect and resolve issues before they impact operations. “We went from a system where we were forced to react—where customers would identify a problem before we even saw it—to a fully proactive approach,” said Mike. With this consistent level of observability and automation, Medidata can ensure high-quality data delivery across all their clinical applications.
Medidata has transformed its clinical data operations. Previous days-long wait times for critical insights hindered trial efficiency and slowed decision-making. “The time-to-insight has gone from days to minutes. We are taking in data from our EDC systems, streaming it in near real-time using Flink, and syncing it to Iceberg. Customers see their data within minutes of ingestion,” explained Mike. Now, clinical teams can ensure that patient outcomes, trial modifications, and operational decisions are based on the freshest available data. Decodable’s data teams are more productive, as they can focus on higher-value tasks rather than manually managing batch-based ETL pipelines.
Medidata also benefited from Decodable’s deep Flink expertise and responsive support, allowing the company to rapidly deploy thousands of streaming jobs without needing an extensive in-house Flink team. With Decodable’s developer-friendly, low-code integration, teams across Medidata were able to adopt real-time streaming pipelines at scale. According to Mike, “We’ve scaled into the thousands—something that wasn’t achievable before without building a wholesale orchestration system. Now, all our Decodable jobs emit Flink metrics, integrate with our monitoring stack, and allow us to proactively identify and resolve issues before they impact customers.”
The team has achieved significant cost savings by optimizing its data streaming infrastructure. With Decodable’s flexible, event-driven architecture, Medidata no longer needs to over-provision resources for batch-based ETL workloads, leading to more efficient compute utilization and reduced operational costs. “In comparison to some other managed solutions that we’ve tried, the Decodable platform has offered all sorts of different options for how you can scale and manage what’s running. You get more efficient allocation of your physical resources,” said Mike. “You also get more bang for your buck because you can decide when and how you want to run things.”
Staff Engineer, Medidata Solutions
Head of Product, Decodable
Medidata, a Dassault Systèmes company, provides cloud-based solutions for clinical trials, helping life sciences organizations accelerate drug development and improve patient outcomes. With over 25 years of experience, Medidata powers more than 35,000 trials and 10 million patients worldwide. As a Staff Engineer, Mike Araujo leads the development of Medidata’s next-generation data platform, ensuring real-time data processing and analytics to optimize clinical operations and decision-making.