🧪 Virtual Hands-On Lab: Introduction to Real-time ETL

July 12, 2022

min read

Ingesting Covid Data Into Apache Druid

Decodable Team

Share this post

Apache Druid

Apache Druid is a real-time analytics database designed for fast OLAP queries on large data sets. Druid powers use cases where real-time and streaming ingestion, fast query performance, and high uptime are important.

In this blog, we will ingest COVID 19 global statistics data into Apache Druid, cleansing the data so that Druid can easily work with it. Once Druid has the COVID data, we will create a real-time dashboard in Apache SuperSet, an open source business intelligence dashboard. Realtime OLAP databases such as Druid operate faster with pre-processed datasets, reducing their workloads and enabling more focused query execution. Decodable is the ideal solution for performing this pre-processing.

Solution Architecture

In this example, a python script gathers COVID statistics, batching them up before sending to Decodable’s REST API. Decodable then performs a set of cleansing tasks to get the data in the format that is optimized for Apache Druid.

The script also runs every hour to get updates to send to Decodable. All of these settings can be externalized and passed into the script. The code for the python script follows:

‍

The python module is called decodablepy which you can install with pip in the command below.

$ pip install decodablepy

‍

To run the script, execute the command below:

‍

This script takes three command line arguments: a token for Decodable REST API, your Decodable account name, and the endpoint of the REST API you deployed. The command above will pull the endpoint value using the Decodable CLI. If you don’t have the Decodable CLI available, you can copy it from the Decodable UI.

Decodable SQL Pipeline

The Decodable pipeline does the work of cleansing the data for Druid. In the SQL below, we are selecting fields from the raw COVID data, filtering fields that are not needed. Then we use Decodable’s built-in temporal function: TO_TIMESTAMP(). This function takes in the column containing the timestamp data and the second parameters tells the function how to parse the data to represent a timestamp. The function changes the data type from string to timestamp.

‍

Configuring Druid

Decodable uses Apache Kafka to feed the cleansed data to Apache Druid. We build a connection in Decodable and write our cleansed data into it. We then configure Apache Druid to read from that topic in Kafka.

To configure Druid, we need to complete the configuration below.

‍

You will need to replace all of the parameters in the curly brackets {{ }} with the values that correspond to your setup. This configuration will be used in Apache Druid. From the top of the Druid UI, click on the button below.

Click on “Start a new spec”. Then click on the Apache Kafka tile.

‍

Next paste the configuration you created into the “consumer properties” box and enter the topic name Decodable is writing into.

‍

Click on “apply” then Next button. Follow the wizard to create a data source accepting all default values until you are complete.

At the top of the Druid UI, click on “query”. You should see your table on the left side. To test, run the Druid query below.

‍

You should see the results similar to this:

Apache Superset Configuration

Use the URL below to point SuperSet to your Druid instance.

druid://{{ USERNAME }}:{{ PASSWORD }}@{{ DRUID HOST }}:{{ DRUID PORT }}/druid/v2/sql

Proceed to create a “DataSet” from Apache Druid choosing the COVID data that Decodable ingested. Next create a World Map chart. You will need to choose a column that contains the CountryCode as the Country Column. Use “Code ISO 3166-1 alpha-2 (cca2)” in the Country Field Type. This will allow SuperSet to map the country location to its corresponding statistics.

To only show the statistics for the last day, you can filter the data by applying this where clause in the Filters input:

‍

Update and save the chart and assign it to an existing dashboard or create a new dashboard. Your dashboard to look similar to the image below.

‍

Summary

In this post, we exported captured COVID 19 statistics and persisted them to Apache Druid, a real-time OLAP (RTOLAP) database. Decodable pre-processed the logs to a flat format and cleansed them to reduce the workload for Apache Druid so that it can focus on serving real-time data with low latency. If you would have any questions or would like help in implementing this solution, please contact us at support@decodable.co.

‍

Video Demo

‍

You can get started with Decodable for free - our developer account includes enough for you to build a useful pipeline and - unlike a trial - it never expires.

Learn more:

Read the docs
Check out our other blogs
Subscribe to our YouTube Channel
Join the community Slack

‍

📫 Email signup 👇

Did you enjoy this issue of Checkpoint Chronicle? Would you like the next edition delivered directly to your email to read from the comfort of your own home?

Simply enter your email address here and we'll send you the next issue as soon as it's published—and nothing else, we promise!

👍 Got it!

Oops! Something went wrong while submitting the form.

Decodable Team

March 11, 2022

min read

Decodable Platform Overview

Sharon Xie

Powered by Apache Flink and Debezium, Decodable is a real-time data platform that unifies ELT, ETL, and stream processing.

Start Free Talk To An Expert

Heading 2

Apache Druid

Solution Architecture

The script also runs every hour to get updates to send to Decodable. All of these settings can be externalized and passed into the script. The code for the python script follows:

‍

The python module is called decodablepy which you can install with pip in the command below.

$ pip install decodablepy

‍

To run the script, execute the command below:

‍

Decodable SQL Pipeline

‍

Configuring Druid

To configure Druid, we need to complete the configuration below.

‍

Click on “Start a new spec”. Then click on the Apache Kafka tile.

‍

Next paste the configuration you created into the “consumer properties” box and enter the topic name Decodable is writing into.

‍

Click on “apply” then Next button. Follow the wizard to create a data source accepting all default values until you are complete.

At the top of the Druid UI, click on “query”. You should see your table on the left side. To test, run the Druid query below.

‍

You should see the results similar to this:

Apache Superset Configuration

Use the URL below to point SuperSet to your Druid instance.

druid://{{ USERNAME }}:{{ PASSWORD }}@{{ DRUID HOST }}:{{ DRUID PORT }}/druid/v2/sql

To only show the statistics for the last day, you can filter the data by applying this where clause in the Filters input:

‍

Update and save the chart and assign it to an existing dashboard or create a new dashboard. Your dashboard to look similar to the image below.

‍

Summary

‍

Video Demo

‍

You can get started with Decodable for free - our developer account includes enough for you to build a useful pipeline and - unlike a trial - it never expires.

Learn more:

Read the docs
Check out our other blogs
Subscribe to our YouTube Channel
Join the community Slack

‍

📫 Email signup 👇

Did you enjoy this issue of Checkpoint Chronicle? Would you like the next edition delivered directly to your email to read from the comfort of your own home?

Simply enter your email address here and we'll send you the next issue as soon as it's published—and nothing else, we promise!

Decodable Team

Let's get decoding

Decodable is free. No CC required. Never expires.

Start for Free Talk to an Expert Join the Community on Slack

Ingesting Covid Data Into Apache Druid

Apache Druid

Solution Architecture

Decodable SQL Pipeline

Configuring Druid

Apache Superset Configuration

Summary

Video Demo

📫 Email signup 👇

Did you enjoy this issue of Checkpoint Chronicle? Would you like the next edition delivered directly to your email to read from the comfort of your own home?

Related Posts

Decodable Platform Overview

Table of contents

Apache Druid

Solution Architecture

Decodable SQL Pipeline

Configuring Druid

Apache Superset Configuration

Summary

Video Demo

📫 Email signup 👇

Did you enjoy this issue of Checkpoint Chronicle? Would you like the next edition delivered directly to your email to read from the comfort of your own home?

Related Posts

Decodable Platform Overview

Let's get decoding