🧪 Virtual Hands-On Lab: Introduction to Real-time ETL

January 7, 2025

min read

Resource Tagging for Real-Time Data Pipelines with Decodable

Share this post

In this article we take a look at resource tagging by exploring how Decodable’s fully-managed data streaming platform helps enterprises implement their real-time data flows effectively while at the same time stay on top of the ever growing number of resources.

Managing Resources at Scale is Hard

Managing resources of any type at scale becomes incredibly challenging over time. In particular, in large enterprise environments resources related to end-to-end data flows—such as data source and sink connections or stream processing jobs—often number in the hundred thousands, encompassing diverse types, purposes, and owners. Without a structured way to categorize and describe these resources, locating specific items becomes time-consuming and costly. Traditional organizational methods, such as hierarchical folders, are rigid and quickly become unwieldy as the system grows. This lack of flexibility makes it difficult to apply bulk actions or monitor usage effectively. At some point, ensuring visibility, accountability, and efficiency when working across large resource pools is nearly impossible.

Tagging For the Win

Tagging is a simple yet powerful method for organizing and managing resources of any type. One of the primary benefits of tagging is its flexibility. Unlike traditional hierarchical categorization systems—typically composed of a closed set of pre-defined terms to choose from—tagging allows for multiple, arbitrary labels to be applied to a single resource, reflecting its multifaceted nature. By associating tags with resources, users can quickly categorize and describe them based on shared attributes.

Tags also enhance the discoverability of resources. They serve as metadata that can be indexed and searched, enabling users to effectively and efficiently locate resources based on specific criteria by means of filtering, sorting, and potentially suggesting related resources. In collaborative environments, tagging allows teams to share a common vocabulary, making shared resources more accessible and helping users maintain consistency.

Tags in Decodable are expressed as key/value pairs and conceptually inspired by how labels are used in Kubernetes. When tagging Decodable resources, the following rules apply for keys and values:

Keys and values are interpreted as case-sensitive strings.
Keys must start with a letter, followed by up to 62 characters that can be letters, numbers, dashes, or underscores.
Keys must be unique for a given resource.
Values are optional and can be arbitrary strings up to 255 characters.
A single resource can have at most 20 tags i.e. key/value pairs attached to it.

Decodable resources can be queried from within the web UI as well as the CLI based on a subset of SQL expressions like the following:

Checking for (in)equality of a tag key’s value:

my_key = 'some_value'
my_key != 'some_value' (inequality can also be expressed using <>)

Checking for partial matches (e.g. pre-/suffix) of a tag key’s value:

my_key LIKE 'some_%'
my_key LIKE '%_value'

Checking for a tag key’s value to be (not) in defined list of values:

my_key IN ('some_value_a','some_value_b')
my_key NOT IN ('some_value_a','some_value_b')

Checking for tag key existence irrespective of whether there is a value or not:

my_key
my_key=''

Multiple conditions can be combined using the boolean operators AND, OR, and NOT. You can use parentheses to explicitly define operator precedence where necessary.

Working with Tags in the Web UI

Inspecting Tags

Tags can be inspected in the Decodable web UI in two different places. First, for each of the resource kinds, namely Connections, Streams, Pipelines, and Secrets attached tags are shown in the tabular listing. For Connections this looks as follows:

Second, in the main view (i.e. Monitoring tab) of a specific resource like a Pipeline, tags are shown right below the other metadata such as the name, ID and description:

For a single Secret, you can explore and maintain tags in the “Edit Secret” dialog like so:

Adding / removing tags

After creating Decodable resources like Pipelines, Connections, or Streams tags can be added by clicking the “Add” button in the tags section found in the main view of a resource:

To remove any of the existing tags for a resource, you click the “x” symbol to the right of the tag in question:

For Secrets, tags can be added or removed directly in the “Edit Secret” dialog during or after the creation of a specific Secret:

Tag-based Search

For each Decodable resource kind, the tabular listing exposes a “Search tags” field to match resources based on the entered SQL expression:

Decodable's web UI provides a simple graphical interface to manipulate individual tags for single resources and to query resources based on tags. This notwithstanding, there are certain tag-related use cases which are better addressed by other means. For instance, adding/removing multiple tags at once, tagging several resources right at creation time, or applying operations onto resources matching tag-based search expressions are all examples, where declarative resource management and the Decodable CLI shine.

Working with Tags in Declarative Resource Definitions

Defining tags

Each Decodable resource kind supports tags as part of its YAML metadata block. This means other than for the web UI, you can define multiple tags already at creation time of the resource in question.

---
kind: <connection | pipeline | stream | secret>
metadata:
 name: <name_your_resource>
 description: <resource_description>
 tags:
   <tag_key_1>: <tag_value_1>
   # ...
   <tag_key_20>: <tag_value_20>
spec_version: <v1|v2>
spec:
 <resource_specifications>

Using the Decodable CLI, any such YAML manifests can be applied using decodable apply your_resource_definitions.yaml.

Modifying tags

At the moment, the CLI doesn’t expose separate commands to directly add or remove tags from resources. You can instead query for resources to retrieve their declarative definitions, make the necessary changes to their tags and then apply the resulting resource definitions.

Let’s assume an existing Decodable connection has been identified with the command decodable query --name "ecom_pg_source" --export and shows the following declarative resource definition:

---
kind: connection
metadata:
   name: ecom_pg_source
   description: "postgres connection ecommerce db"
   tags:
     context: demo
     domain: ecommerce
     team: dev-rel
spec_version: v2
spec:
 <resource_specification_here>

If you want to change the context tag’s value to become prod and add a new tag (key: purpose, value: use-case-1) for this resource you can use e.g. the yq tool to post-process the exported YAML and then apply the result like so:

decodable query --name "ecom_pg_source" --export | yq '(.metadata.tags.context = "prod" | .metadata.tags.purpose = "use-case-1")' | decodable apply -

If you now query the same resource again using decodable query --name "ecom_pg_source" --export you are expected to see the modified tags:

---
kind: connection
metadata:
   name: ecom_pg_source
   description: "postgres connection ecommerce db"
   tags:
       context: prod
       domain: ecommerce
       purpose: use-case-1
       team: dev-rel
spec_version: v2
spec:
 <resource_specifications>

Execute operations based on tags

Tag-based search has already been briefly shown as part of the web UI experience. Similarly, querying for resources based on SQL expressions which refer to tags can be done using the CLI. You can even take things further and apply operations onto the resources which matched the tag-based query.

Let’s say you want to deactivate i.e. stop all of your resources which have a tag domain=ecommerce and for which another tag with key purpose has either use-case-1 or use-case-2 as a value. You can do this conveniently with a single CLI command like this:

decodable query -t "domain='ecommerce' AND purpose IN ('use-case-1','use-case-2')" --operation deactivate

Pretending this query matched four Decodable resources (two Connections and two Pipelines), all of which have been running before executing the above command, you’re expected to get output signaling that these resources have been deactivated:

---
kind: connection
name: ecom_os_sink_customers
id: 314b4b93
result: deactivated
---
kind: connection
name: ecom_os_sink_orders
id: 26777c7c
result: deactivated
---
kind: pipeline
name: ecom_pipeline_agg_join
id: e9d1b75f
result: deactivated
---
kind: pipeline
name: ecom_pipeline_basic_join
id: f5f93a56
result: deactivated

Whenever you need to get something done that’s neither directly supported by declarative resource management nor the CLI itself you can always combine some of the available building blocks with a bit of custom scripting. A concrete example would be to come up with an easy way to clear all streams which are part of one specific end-to-end data flow as you iteratively develop on a new use case. While the CLI allows you to clear streams there is currently no option to combine this with tag-based search. However, you could do the following to achieve your goal:

1. Come up with a CLI query to identify all required resources of kind Stream based on a tag query expression:

decodable query --kind stream --keep-ids -t "domain='ecommerce' AND purpose='use-case-3'"

2. Run the output through a YAML processing tool like yq to extract only the plain IDs for all matched streams:

yq -o=csv '.metadata.id'

3. Put a loop around this and run the CLI command to clear all matched streams based on their respective IDs:

#!/bin/bash
for sid in $(decodable query --kind stream --keep-ids -t "domain='ecommerce' AND purpose='use-case-3'" | yq -o=csv '.metadata.id'); do
   decodable stream clear $sid   
done

Check out this quick demo of tagging in action.

Summary

This article highlighted the importance of resource tagging, an essential enterprise feature of data streaming platforms. We have explored how Decodable’s fully managed data streaming platform provides first-class support for tagging to keep real-time data flows manageable at scale. We have seen how tags can be conveniently used in the web UI or via the Decodable CLI as part of declarative resource management.

Interested in trying resource tagging for yourself? Sign up for a free Decodable trial today and start tagging resources!

📫 Email signup 👇

Did you enjoy this issue of Checkpoint Chronicle? Would you like the next edition delivered directly to your email to read from the comfort of your own home?

Simply enter your email address here and we'll send you the next issue as soon as it's published—and nothing else, we promise!

👍 Got it!

Oops! Something went wrong while submitting the form.

Hans-Peter Grahsl

Hans-Peter Grahsl is a Staff Developer Advocate at Decodable. He is an open-source community enthusiast and in particular passionate about event-driven architectures, distributed stream processing systems and data engineering. For his code contributions, conference talks and blog post writing at the intersection of the Apache Kafka and MongoDB communities, Hans-Peter received multiple community awards. He likes to code and is a regular speaker at developer conferences around the world.

January 2, 2024

min read

Powered by Apache Flink and Debezium, Decodable is a real-time data platform that unifies ELT, ETL, and stream processing.

Get the Technical Guide Watch Our Tech Talk

Heading 2

Managing Resources at Scale is Hard

Tagging For the Win

Tags in Decodable are expressed as key/value pairs and conceptually inspired by how labels are used in Kubernetes. When tagging Decodable resources, the following rules apply for keys and values:

Keys and values are interpreted as case-sensitive strings.
Keys must start with a letter, followed by up to 62 characters that can be letters, numbers, dashes, or underscores.
Keys must be unique for a given resource.
Values are optional and can be arbitrary strings up to 255 characters.
A single resource can have at most 20 tags i.e. key/value pairs attached to it.

Decodable resources can be queried from within the web UI as well as the CLI based on a subset of SQL expressions like the following:

Checking for (in)equality of a tag key’s value:

my_key = 'some_value'
my_key != 'some_value' (inequality can also be expressed using <>)

Checking for partial matches (e.g. pre-/suffix) of a tag key’s value:

my_key LIKE 'some_%'
my_key LIKE '%_value'

Checking for a tag key’s value to be (not) in defined list of values:

my_key IN ('some_value_a','some_value_b')
my_key NOT IN ('some_value_a','some_value_b')

Checking for tag key existence irrespective of whether there is a value or not:

my_key
my_key=''

Multiple conditions can be combined using the boolean operators AND, OR, and NOT. You can use parentheses to explicitly define operator precedence where necessary.

Working with Tags in the Web UI

Inspecting Tags

Second, in the main view (i.e. Monitoring tab) of a specific resource like a Pipeline, tags are shown right below the other metadata such as the name, ID and description:

For a single Secret, you can explore and maintain tags in the “Edit Secret” dialog like so:

Adding / removing tags

After creating Decodable resources like Pipelines, Connections, or Streams tags can be added by clicking the “Add” button in the tags section found in the main view of a resource:

To remove any of the existing tags for a resource, you click the “x” symbol to the right of the tag in question:

For Secrets, tags can be added or removed directly in the “Edit Secret” dialog during or after the creation of a specific Secret:

Tag-based Search

For each Decodable resource kind, the tabular listing exposes a “Search tags” field to match resources based on the entered SQL expression:

Working with Tags in Declarative Resource Definitions

Defining tags

Each Decodable resource kind supports tags as part of its YAML metadata block. This means other than for the web UI, you can define multiple tags already at creation time of the resource in question.

---
kind: <connection | pipeline | stream | secret>
metadata:
 name: <name_your_resource>
 description: <resource_description>
 tags:
   <tag_key_1>: <tag_value_1>
   # ...
   <tag_key_20>: <tag_value_20>
spec_version: <v1|v2>
spec:
 <resource_specifications>

Using the Decodable CLI, any such YAML manifests can be applied using decodable apply your_resource_definitions.yaml.

Modifying tags

---
kind: connection
metadata:
   name: ecom_pg_source
   description: "postgres connection ecommerce db"
   tags:
     context: demo
     domain: ecommerce
     team: dev-rel
spec_version: v2
spec:
 <resource_specification_here>

decodable query --name "ecom_pg_source" --export | yq '(.metadata.tags.context = "prod" | .metadata.tags.purpose = "use-case-1")' | decodable apply -

If you now query the same resource again using decodable query --name "ecom_pg_source" --export you are expected to see the modified tags:

---
kind: connection
metadata:
   name: ecom_pg_source
   description: "postgres connection ecommerce db"
   tags:
       context: prod
       domain: ecommerce
       purpose: use-case-1
       team: dev-rel
spec_version: v2
spec:
 <resource_specifications>

Execute operations based on tags

decodable query -t "domain='ecommerce' AND purpose IN ('use-case-1','use-case-2')" --operation deactivate

---
kind: connection
name: ecom_os_sink_customers
id: 314b4b93
result: deactivated
---
kind: connection
name: ecom_os_sink_orders
id: 26777c7c
result: deactivated
---
kind: pipeline
name: ecom_pipeline_agg_join
id: e9d1b75f
result: deactivated
---
kind: pipeline
name: ecom_pipeline_basic_join
id: f5f93a56
result: deactivated

1. Come up with a CLI query to identify all required resources of kind Stream based on a tag query expression:

decodable query --kind stream --keep-ids -t "domain='ecommerce' AND purpose='use-case-3'"

2. Run the output through a YAML processing tool like yq to extract only the plain IDs for all matched streams:

yq -o=csv '.metadata.id'

3. Put a loop around this and run the CLI command to clear all matched streams based on their respective IDs:

#!/bin/bash
for sid in $(decodable query --kind stream --keep-ids -t "domain='ecommerce' AND purpose='use-case-3'" | yq -o=csv '.metadata.id'); do
   decodable stream clear $sid   
done

Check out this quick demo of tagging in action.

Summary

Interested in trying resource tagging for yourself? Sign up for a free Decodable trial today and start tagging resources!

📫 Email signup 👇

Did you enjoy this issue of Checkpoint Chronicle? Would you like the next edition delivered directly to your email to read from the comfort of your own home?

Simply enter your email address here and we'll send you the next issue as soon as it's published—and nothing else, we promise!

Hans-Peter Grahsl

Let's get decoding

Decodable is free. No CC required. Never expires.

Start for Free Talk to an Expert Join the Community on Slack

Resource Tagging for Real-Time Data Pipelines with Decodable

Managing Resources at Scale is Hard

Tagging For the Win

Working with Tags in the Web UI

Inspecting Tags

Adding / removing tags

Tag-based Search

Working with Tags in Declarative Resource Definitions

Defining tags

Modifying tags

Execute operations based on tags

Summary

📫 Email signup 👇

Did you enjoy this issue of Checkpoint Chronicle? Would you like the next edition delivered directly to your email to read from the comfort of your own home?

Related Posts

Building a Managed Flink Service

Declarative Resource Management for Real-time ETL with Decodable

Failover Replication Slots with Postgres 17

Table of contents

Managing Resources at Scale is Hard

Tagging For the Win

Working with Tags in the Web UI

Inspecting Tags

Adding / removing tags

Tag-based Search

Working with Tags in Declarative Resource Definitions

Defining tags

Modifying tags

Execute operations based on tags

Summary

📫 Email signup 👇

Did you enjoy this issue of Checkpoint Chronicle? Would you like the next edition delivered directly to your email to read from the comfort of your own home?

Related Posts

Building a Managed Flink Service

Declarative Resource Management for Real-time ETL with Decodable

Failover Replication Slots with Postgres 17

Let's get decoding