Apache Druid
Apache Druid is a real-time analytics database designed for fast OLAP queries on large data sets. Druid powers use cases where real-time and streaming ingestion, fast query performance, and high uptime are important.
In this blog, we will ingest COVID 19 global statistics data into Apache Druid, cleansing the data so that Druid can easily work with it. Once Druid has the COVID data, we will create a real-time dashboard in Apache SuperSet, an open source business intelligence dashboard. Realtime OLAP databases such as Druid operate faster with pre-processed datasets, reducing their workloads and enabling more focused query execution. Decodable is the ideal solution for performing this pre-processing.
Solution Architecture
In this example, a python script gathers COVID statistics, batching them up before sending to Decodable’s REST API. Decodable then performs a set of cleansing tasks to get the data in the format that is optimized for Apache Druid.

The script also runs every hour to get updates to send to Decodable. All of these settings can be externalized and passed into the script. The code for the python script follows:
The python module is called decodablepy which you can install with pip in the command below.
To run the script, execute the command below:
This script takes three command line arguments: a token for Decodable REST API, your Decodable account name, and the endpoint of the REST API you deployed. The command above will pull the endpoint value using the Decodable CLI. If you don’t have the Decodable CLI available, you can copy it from the Decodable UI.
Decodable SQL Pipeline
The Decodable pipeline does the work of cleansing the data for Druid. In the SQL below, we are selecting fields from the raw COVID data, filtering fields that are not needed. Then we use Decodable’s built-in temporal function: TO_TIMESTAMP(). This function takes in the column containing the timestamp data and the second parameters tells the function how to parse the data to represent a timestamp. The function changes the data type from string to timestamp.
Configuring Druid
Decodable uses Apache Kafka to feed the cleansed data to Apache Druid. We build a connection in Decodable and write our cleansed data into it. We then configure Apache Druid to read from that topic in Kafka.
To configure Druid, we need to complete the configuration below.
You will need to replace all of the parameters in the curly brackets {{ }} with the values that correspond to your setup. This configuration will be used in Apache Druid. From the top of the Druid UI, click on the button below.

Click on “Start a new spec”. Then click on the Apache Kafka tile.

Next paste the configuration you created into the “consumer properties” box and enter the topic name Decodable is writing into.

Click on “apply” then Next button. Follow the wizard to create a data source accepting all default values until you are complete.
At the top of the Druid UI, click on “query”. You should see your table on the left side. To test, run the Druid query below.
You should see the results similar to this:

Apache Superset Configuration

Use the URL below to point SuperSet to your Druid instance.
druid://{{ USERNAME }}:{{ PASSWORD }}@{{ DRUID HOST }}:{{ DRUID PORT }}/druid/v2/sql
Proceed to create a “DataSet” from Apache Druid choosing the COVID data that Decodable ingested. Next create a World Map chart. You will need to choose a column that contains the CountryCode as the Country Column. Use “Code ISO 3166-1 alpha-2 (cca2)” in the Country Field Type. This will allow SuperSet to map the country location to its corresponding statistics.
To only show the statistics for the last day, you can filter the data by applying this where clause in the Filters input:
Update and save the chart and assign it to an existing dashboard or create a new dashboard. Your dashboard to look similar to the image below.

Summary
In this post, we exported captured COVID 19 statistics and persisted them to Apache Druid, a real-time OLAP (RTOLAP) database. Decodable pre-processed the logs to a flat format and cleansed them to reduce the workload for Apache Druid so that it can focus on serving real-time data with low latency. If you would have any questions or would like help in implementing this solution, please contact us at support@decodable.co.
Video Demo
You can get started with Decodable for free - our developer account includes enough for you to build a useful pipeline and - unlike a trial - it never expires.
Learn more:
- Read the docs
- Check out our other blogs
- Subscribe to our YouTube Channel
- Join the community Slack