Claims adjudication is the process used by insurance companies in determining their financial responsibility for the payment to the provider for a medical claim. The insurance company can decide to pay the claim in full, deny the claim, or to reduce the amount paid to the provider. The cost of performing claims administration and adjudication is approximately 5 percent of total revenues, the largest category of payer administrative expenses outside of general administration. These costs are driven mostly by the complexity of prevailing adjudication processes, which continue to rely on manual scanning for validation, and involve significant time delays in servicing patients.
Claims adjudication data refers to the information collected and analyzed during the process of evaluating and processing insurance claims, including information such as dates, amounts, and types of services or goods claimed. It can be used to determine common issues faced by policyholders to refine policies and services, analyze claim trends to control costs and forecast payouts, and identify patterns or anomalies that may indicate fraudulent claims. A variety of errors or inconsistencies in claims data can indicate the need for review by a claims examiner, including mismatched coding, improper documentation, omission of required data, and noncompliance.
Below we can see a sample of raw claims data where each record represents a given facility (hospital, clinic, etc.) for a given day, which could include hundreds of individual claims.
In its current form, it is far too complex and detailed for claims examiners to use for identifying potential issues. By using one or more Decodable pipelines, which are streaming SQL queries that process data, we can transform the raw data into a form that is best suited for how it will be consumed.
The Standard Claims Adjudication Pipeline
A standard claims adjudication pipeline typically involves several key components and tools that have been integrated to collect, process, and analyze user interactions. A typical architecture starts by implementing Apache Kafka to serve as the backbone for ingesting data from servers or web applications. Data streams are then formatted for downstream use, typically by converting raw logs into structured or semi-structured formats. The stream is then connected to a processing system like Apache Flink, which processes the data in real time, applying analytics such as sessionization, user behavior analysis, or conversion tracking. This architecture enables businesses to act on user data immediately, offering insights that can drive user engagement, marketing decisions, and overall product improvement.
While this pipeline is powerful, setting up and maintaining such a streaming system comes with significant challenges. Flink and Kafka are complex tools requiring deep technical expertise to configure and scale, as well as ongoing investments in monitoring, performance tuning, and scaling the infrastructure. Security and stability are major concerns in these architectures, particularly when handling sensitive user data. Organizations must ensure that their claims adjudication pipelines are compliant with regulations like SOC 2 Type II and GDPR, which mandate strict data handling and protection procedures. As business requirements change or data volumes grow, the system must be continuously updated, which introduces additional overhead in terms of time and resources.
The complexity of these systems and the need for continuous oversight create significant barriers to entry for many organizations looking to build and maintain real-time claims adjudication pipelines.
Claims Adjudication with Decodable
Here at Decodable, we’ve built a solution that goes beyond the foundational technologies, addressing the broader requirements of real-time stream processing for ELT, ETL, and data replication. This includes ensuring a solid developer experience, providing extensive and flexible connectivity, managing schema, ensuring scalability across different workloads and use cases, providing observability, maintaining security, data governance, compliance, and offering ongoing support.
As a fully managed service, our platform takes care of the stream processing infrastructure and the deployment of Flink jobs so you can focus on the business logic for your data pipelines. That means there are no servers for you to manage, no clusters to create, size, or monitor, and no software dependencies to update or maintain within our platform.
In this example, we'll walk through how the Decodable data service can be used to identify patients who may have claims that require review due to the same exam or procedure being performed more than once on the same day at the same facility.
Pipeline Architecture
For this example, two separate pipelines are used in series, with the output of each one being used as the input for the next. While it is possible to perform all the desired processing in a single large, complex pipeline, it is most often desirable to split them into smaller, more manageable processing steps. This results in pipelines that are easier to test and maintain. Each stage in the sequence of pipelines is used to bring the data closer to its final desired form using SQL queries.
Decodable uses SQL to process data that should feel familiar to anyone who has used relational database systems. The primary differences you'll notice are that:
- You activate a pipeline to start it, and deactivate a pipeline to stop it
- All pipeline queries specify a source and a sink
- Certain operations, notably JOINs and aggregations, must include windows
Unlike relational databases, all pipelines write their results into an output data stream (or sink). As a result, all pipelines are a single statement in the form INSERT INTO <sink> SELECT ... FROM <source>, where sink and source are streams you've defined.
Unnest Data Stream Array
For this example, each record of the raw tracking stream contains data about the date of the patient’s visit and the facility they visited, as well as a claims field, which contains an array of claims data that needs to be unnested (or demultiplexed) into multiple records. To accomplish this, a cross join is performed between the claims-raw data stream and the results of using the unnest function on the claims field.
For example, if a given input record contains an array of 100 insurance claims, this pipeline will transform each input record into 100 separate output records for processing by subsequent pipelines.
When the pipeline is running, the effects of unnesting the input records can be seen in the Overview tab which shows real-time data flow statistics. The input metrics will show a given number of records per second, while the output metrics will show a higher number based on how many elements are in the claims array.
Pipeline: Extract Claims Data
After creating a new pipeline and entering the SQL query, clicking the Run Preview button will verify its syntax and then fire up a new executable environment to process the next 10 records coming in from the source stream and display the results. Decodable handles all the heavy lifting on the backend, allowing you to focus on working directly with your data streams to ensure that you are getting the results you need.
Aggregate And Filter Claims Data
In this final pipeline stage, an inner select query leverages the SQL tumble group window function to create a set of records across a non-overlapping, continuous window with a fixed duration of 1 day. Grouping these records by patient and procedure allows the total number of records for that grouping to be calculated. Then in the outer select query, the results of the inner query are filtered to include only those records which indicate a noncompliant duplication of services.
Pipeline: Aggregate And Filter
Conclusion
At this point, a sink connection (one that writes a stream to an external system, such as AWS S3, Kafka, Kinesis, Postgres, Pulsar, or Redpanda) can be created to allow the results to be consumed by your own applications and services.
As we can see from this example, a sophisticated business problem can be addressed in a very straight-forward way using Decodable pipelines. It is not necessary to create docker containers, there is no SQL server infrastructure to set up or maintain, all that is needed is a working familiarity with creating the SQL queries themselves.
You can watch demonstrations of several examples on the Decodable YouTube channel.
Additional documentation for all of Decodable’s services is available here.
Please consider joining us on our community Slack.