Building a Financial Analysis Pipeline

Financial analysis is used to evaluate economic trends, set financial policy, build long-term plans for business activity, and identify investments or prioritize projects. This is done through the synthesis of financial activity, which is often available as real-time data streams. One of the most common ways to analyze financial data is to calculate changes in the data over time, which can be used to compare against another set of data or against its own historical performance. Insights into market fluctuations can then be used to drive business decisions.

Financial analysis data consists of quantitative and qualitative information used to evaluate an organization's financial health and performance, including income streams and cost structures, cash flow, debt-to-equity ratios, investment transactions, and more. It can be used to assess business profitability, inform decisions on investments, identify financial vulnerabilities, and mitigate risks. Below we can see a sample of raw currency data from Coinbase.

{
  "type": "ticker",
  "sequence": 27937593412,
  "product_id": "ETH-USD",
  "price ": 3031.1,
  "open_24h": 3289.45,
  "volume_24h": 205991.38400573,
  "low_24h": 2977.77,
  "high_24h": 3300.77,
  "volume_30d": 4819218.05843514,
  "best_bid": 3031.01,
  "best_ask": 3031.1,
  "side": "buy",
  "time": "2022-04-11T18:55:26.637Z",
  "trade_id": 254979310,
  "last_size": 0.02833571
}

In its current form, it is purely transactional, without any information about how the market is behaving over time. To gain an understanding of how the currency is performing, it would be better to have the price information aggregated over time. By using one or more Decodable pipelines, which are streaming SQL queries that process data, we can transform the raw data into a form that is best suited for how it will be used.

‍The Standard Financial Analysis Pipeline

A standard financial analysis pipeline typically involves several key components and tools that have been integrated to collect, process, and analyze user interactions. A typical architecture starts by implementing Apache Kafka to serve as the backbone for ingesting data from servers or web applications. Data streams are then formatted for downstream use, typically by converting raw logs into structured or semi-structured formats. The stream is then connected to a processing system like Apache Flink, which processes the data in real time, applying analytics such as sessionization, user behavior analysis, or conversion tracking. This architecture enables businesses to act on user data immediately, offering insights that can drive user engagement, marketing decisions, and overall product improvement.

While this pipeline is powerful, setting up and maintaining such a streaming system comes with significant challenges. Flink and Kafka are complex tools requiring deep technical expertise to configure and scale, as well as ongoing investments in monitoring, performance tuning, and scaling the infrastructure. Security and stability are major concerns in these architectures, particularly when handling sensitive user data. Organizations must ensure that their financial analysis pipelines are compliant with regulations like SOC 2 Type II and GDPR, which mandate strict data handling and protection procedures. As business requirements change or data volumes grow, the system must be continuously updated, which introduces additional overhead in terms of time and resources.

The complexity of these systems and the need for continuous oversight create significant barriers to entry for many organizations looking to build and maintain real-time financial analysis pipelines.

Financial Analysis with Decodable

Here at Decodable, we’ve built a solution that goes beyond the foundational technologies, addressing the broader requirements of real-time stream processing for ELT, ETL, and data replication. This includes ensuring a solid developer experience, providing extensive and flexible connectivity, managing schema, ensuring scalability across different workloads and use cases, providing observability, maintaining security, data governance, compliance, and offering ongoing support.

As a fully managed service, our platform takes care of the stream processing infrastructure and the deployment of Flink jobs so you can focus on the business logic for your data pipelines. That means there are no servers for you to manage, no clusters to create, size, or monitor, and no software dependencies to update or maintain within our platform.

In this example, we’ll walk through how the Decodable data service is used to clean, transform, enrich, and aggregate real-time currency data describing Coinbase transactions. The processed data can then be sent onward to a team of financial analysts or data scientists.

Pipeline Architecture

For this example, two separate pipelines are used in series, with the output of each one being used as the input for the next. While it is possible to perform all the desired processing in a single large, complex pipeline, it is most often desirable to split them into smaller, more manageable processing steps. This results in pipelines that are easier to test and maintain. Each stage in the sequence of pipelines is used to bring the data closer to its final desired form using SQL queries.

Decodable uses SQL to process data that should feel familiar to anyone who has used relational database systems. The primary differences you’ll notice are that:

  • You activate a pipeline to start it, and deactivate a pipeline to stop it
  • All pipeline queries specify a source and a sink
  • Certain operations, notably JOINs and aggregations, must include windows

Unlike relational databases, all pipelines write their results into an output data stream (or sink). As a result, all pipelines are a single statement in the form INSERT INTO <sink> SELECT ... FROM <source>, where sink and source are streams you’ve defined.

Transform Input Data Stream

As with most data services pipelines, the first step is to apply a variety of transformations to clean the input data. For this example, the data will be modified as follows:

  • the time field will be converted from a string to a timestamp, which will enable more sophisticated processing in subsequent pipelines
  • the type field will be renamed so that it doesn’t conflict with the SQL reserved keyword; this is done purely to simplify the SQL used in further processing
  • several fields are cast from floats to decimal types in order to ensure the desired level of precision

Pipeline: Coinbase Exchange Transactions

After creating a new pipeline and copying in the SQL query, clicking the Run Preview button will verify its syntax and then fire up a new executable environment to process the next 10 records coming in from the source stream and display the results. Decodable handles all the heavy lifting on the backend, allowing you to focus on working directly with your data streams to ensure that you are getting the results you need.

Filter And Aggregate

Decodable’s pipelines are able to leverage powerful SQL group window functions such as tumble. In this example, an inner nested select query is used to collect and filter buy-side pricing updates for BTC-USDacross a non-overlapping, continuous window with a fixed duration of 5 seconds. These records are aggregated and then enriched with data from the previous record using the lag window function, which provides access to a record at a specified physical offset which comes before the current record (in this case that is simply the previous record). This allows the pipeline to produce a final result that summarizes how the data is changing over time that is well suited for analysis.

Pipeline: Coinbase Transaction Summary

Conclusion

Clicking the Run Preview button will begin the 5-second tumble interval and then display the output data stream of this final step of the multi-stage pipeline for this example, as shown below. At this point, a sink connection (one that writes a stream to an external system, such as AWS S3, Kafka, Kinesis, Postgres, Pulsar, or Redpanda) can be created to allow the results to be consumed by your own applications and services.

As we can see from this example, a sophisticated business problem can be addressed in a very straight-forward way using Decodable pipelines. It is not necessary to create docker containers, there is no SQL server infrastructure to set up or maintain, all that is needed is a working familiarity with creating the SQL queries themselves.


You can watch a demonstration of this example on the Decodable YouTube channel.

Additional documentation for all of Decodable’s services is available here.

Please consider joining us on our community Slack.

Other Solutions

Inventory Control

The ability to track and manage the movement of products through your warehouse is critical to the health and growth of businesses and to satisfy customers, on time.

Learn more

Sporting Events

Sports fans want to find team or transfer news, match highlights, and club merchandise on-demand. A sports app driven by real-time data has never been so valuable.

Learn more

Shipping & Tracking

The ability to see, in real-time, logistics and tracking information improves transportation decisions leading to reduced costs and enhanced services.

Learn more

Health Monitoring

Data from healthcare monitoring devices can inform healthcare staff of any changes in patient condition, alert them to issues with devices, and respond proactively.

Learn more

Fraud Detection

Securing online applications and services is a major requirement for businesses of all types, and threat actors are constantly increasing the sophistication of their attacks.

Learn more

Food Delivery

Real-time data for food delivery is critical to customer satisfaction. Order status updates are constantly updated, sent to customers in apps and SMS.

Learn more

Building a Customer 360 Analytics Pipeline

Customer 360 connects apps and data sources from customer interactions to give businesses a 360-degree view across the end-to-end customer journey.

Learn more

Flight Status

Airline customers expect accurate, real-time updates on flight status at every stage of their journey and accurate data also helps streamline airline operations.

Learn more

Heading

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

Learn more