So you’ve built your first real-time ETL pipeline with Decodable: congratulations! Now what?
Well now comes the bit that those cool kids over in the SRE world like to call “GitOps,” “Infrastructure as Code” (IaC), or will talk about “shift left.” In more formalised terms, it’s about being a grown-up and adopting good practices of Software Development Lifecycle (SDLC). This is all about taking what you’ve built and handling it as we would with any other software artifact, including:
- putting it in source control
- optionally, continuous integration (CI) to the main development environment
- optionally, continuous deployment (CD) to production
This is where declarative resource management comes in. A core part of Decodable CLIs capabilities, it takes the approach of declaring in a YAML document the resources that should be present, and Decodable figures out how to make it so. Contrast this to imperatively, in which you are responsible for specifying each resource and orchestrating the order of deployment.
For example, to create a sink connection, you’d need the stream from which it reads to exist first. In a declarative world, you create a YAML document that specifies the stream and the sink, and Decodable figures out the rest. Done imperatively, you wouldn’t be able to create the sink until the stream existed—which if you’re doing this by hand may be fine, but as soon as you start to try and script it the dependencies start to get complicated pretty quickly.
So as we’re going to see, declarative resource management enables you to tightly integrate Decodable with industry best-practice IaC/GitOps strategies. It also has further uses, including:
- Bulk administration tasks—such as removing all connections that match a particular pattern, perhaps that connect to a server that’s been decommissioned.
- Replicating resources from one Decodable environment to another. This could be to template parallel sandbox environments for multiple developers, or to manually deploy changes to an environment if full CI/CD isn’t in place.
- Ad-hoc development—some developers will simply prefer to interact with Decodable declaratively rather than imperatively.
What does declarative resource management look like?
Let’s say we’d like to create a connection. Since we’re thinking declaratively instead of imperatively we create a YAML file that defines the state that we’d like our Decodable account to be in:
Now we apply this resource definition:
And thus it is so:
We can see the same thing through the Web UI, if you prefer:
🍅 You say tomato, I say potato? 🥔
So now contrast the above to the imperative approach. Here we tell Decodable what to do. From the CLI this would be to run a <span class="inline-code">decodable connection create […]</span> command. Pretty much the same thing? Well, no. Because when we run this we see the problem straight away:
Because we’ve said do this thing, the Decodable CLI has gone and created the connection. Or rather, tried to—and failed, because it exists already (since we already created it above).
If this was part of a deployment script we’d now have a failure to deal with. Did we expect the connection to exist? Are we trying to update it and need to delete it first? What if it’s already running? All of this would fall to us to figure out and code around for the different eventualities. Instead, by working declaratively, we’re instead saying we want this thing to be in this state and Decodable then figures out how to make it so.
This may seem somewhat contrived for a single connection—but you’ll very quickly see how it becomes not just beneficial, but essential.
A good example of this difference is making a change to an existing connection. The connection that we created uses a datagen connector, with configuration telling it to simulate Envoy access logs generated every 500 milliseconds. Let’s say we want to change that interval to 1000 milliseconds.
Done declaratively, we update the value for delay in our <span class="inline-code">datagen01.yaml</span> file. Since we just dealing with text files, we can even just express this as a diff:
and then we apply the file again, just as before:
Note that it’s exactly the same command that we ran before.
If we want to do this imperatively, we need to do this:
…and that’s before we get onto any kind of error handling, logic branching (we’re assuming that the connection does exist, but if it doesn’t we need to skip the deletion, etc), execution control (for example, we can’t delete a running connection, so we’d have to stop it first).
Storing Decodable resources in source control
You may have spotted that the YAML file we created above to define the connection is perfectly suited to storing in source control.
Just as any software engineer worth their salt these days wouldn’t dream of not storing their code in a source control tool such as git, the same should be true in the world of data engineering. The definition of pipelines and connections is our version of ‘code’, and being able to store it in source control is important. It’s part of a broader adoption of Infrastructure as Code (IaC), with popular examples including Terraform, and Kubernetes’ <span class="inline-code">kubectl</span> tool.
How you curate your resources is a matter of personal preference and organizational standards. You might decide that the connection and its stream are one logical unit and store it as such (as shown in the example above):
You could alternatively split resources into individual files, which would make a git diff easier to read particularly with a larger number of objects:
Or put different resources in their own folders:
The Decodable CLI will take one or more yaml files as input, so however you arrange them, the declarative resource management remains the same in use:
Handling secrets
There’s one important part that we’ve not covered yet that’s going to be relevant here, and that’s how secrets are dealt with. These are used to store authentication details that are used by connections and pipelines. Secrets are useful for two key reasons: they avoid having to hard-code sensitive information, and they enable seamless movement between environments since the authentication details will (hopefully) be different.
Secrets are a resource defined in YAML just like a connection, but they have a special way of specifying their value. You can provide the value as a reference to an environment variable, or the contents of a file. You can also hardcode it…but you almost certainly don’t want to do that!
Here’s an example of a snippet of a connection resource definition referencing a secret:
Whilst <span class="inline-code">etl_ro_user</span> is the literal value of the user that will authenticate to Postgres (in this example), <span class="inline-code">etl-ro-secret</span> is a reference to a secret of that name. Its definition might look like this:
When this is used, the Decodable CLI will fetch the value of the environment variable <span class="inline-code">ETL_PW</span> and set this as the secret in Decodable. Here’s what it looks like in action:
CI/CD
With our code in source control, the next step on our GitOps journey is to automate its deployment. The Decodable CLI can be run as part of a GitHub Actions workflow. Common patterns that you might look to adopt here would be:
- Staging/Integration: Commit the resource YAML files in a feature branch. When you merge this into the shared development branch of your repository a workflow runs to deploy the resulting set of resources to a shared development or staging environment for integration/smoke testing.
- Production release When you merge into the production branch, a workflow runs to update the production Decodable environment to reflect the resources defined in the production branch of source control.
A workflow for deployment to production might look something like this:
This uses the approach described in the documentation to configure authentication for your Decodable account, and then invokes <span class="inline-code">decodable apply</span> with every YAML file under <span class="inline-code">./decodable</span> as an argument. The value for secrets (just one shown here, called <span class="inline-code">ETL_PW</span>) is set via an environment variable from a GitHub secret.
The excellent thing about declarative resource management is that if nothing has changed in the resource definition then no change will be made to the resource on Decodable. This means that you can pass the whole set of resource definitions without worry about them getting dropped and recreated, because they’re simply left alone unless there is a change to make. For example:
It’s also worth pointing out that the execution intent is also part of the resource definition. What’s that mean? It means that the execution state of a connection or pipeline—whether it should be active, and the task size and count to run the job if so—is dictated through the declarative process. After applying a set of YAML files, resources that were running and should be will still be running. Those that aren’t that should be, perhaps if they’ve just been created, will then be run.
Creating YAML files
So far, I’ve shown you a lot of existing YAML files and how these can be used to declare the resources that should exist on the target Decodable account. But how are these YAML files created?
One option is to lovingly craft them by hand, following the documented spec. That’s fine if writing raw YAML is your thing. Quite possibly though, you’ll want to get the definition of existing resources on your Decodable account. This could be so that you can then modify it to suit the definition of a new resource (and it’s quicker than writing it from scratch). But this is also a great way to quickly get a dump of some, or all, your resources in a form that you can then put into source control (like we saw above).
This then provides a very efficient workflow of creating new resources: you first can define new connections and pipelines using the web UI, exploring and defining all their options, and then export them as a declarative resource, so you can put them into revision control, apply them in other environments, etc.
At its simplest invocation, the <span class="inline-code">decodable query</span> command returns all information about all resources:
You’ll notice a <span class="inline-code">status</span> field there—useful for runtime details, but not if we’re putting it into source control. The query command provides an option specifically for this: <span class="inline-code">--export:</span>
Since it’s YAML that’s output, you can just redirect this into a target file, and then you have a file ready to be run with <span class="inline-code">decodable apply</span>:
What about if you don’t want all of the resources returned by the query? A rich set of filter options is available, such as <span class="inline-code">--name</span>, <span class="inline-code">--kind</span>, and more:
You can also do some neat stuff using <span class="inline-code">yq</span> to post-process the YAML, giving you even more flexibility. Here’s an example of getting the resource definitions for all connections that use the Kafka connector:
Copying resources between environments
It might have occurred to you by now that if the <span class="inline-code">decodable query</span> command returns YAML that’s consumable by <span class="inline-code">decodable apply</span>, this could be a nice way to copy resources between environments—and you’d be right!
You can configure multiple profiles in the Decodable CLI. For each one, make sure you run a <span class="inline-code">decodable login</span>. Let’s imagine my default profile is for my development environment, and I want to copy across some resources to the staging environment. First up, I’ll make sure I’m authentication to it:
Now I can copy resources from my default profile to the staging one:
Note the use of the – to tell the apply command to read from <span class="inline-code">stdin</span>. Now I have the connection and stream in my staging environment:
Of course, it’d probably be better practice to write the resource YAML to a file that’s getting committed to source control, and from there applied to the target Decodable environment. But for quick ad-hoc resource migration, this works just great.
Templating Resources
Another advantage of declarative resource management in Decodable is that it makes it easy to use templating to apply many resources of similar but slightly varying definitions. For example, you might have multiple Postgres databases which are logically identical but split across geographical locations. You want to set up each one as a connection in Decodable.
One option is you copy and paste 98% of the same content each time and hard code a set of files that are almost-but-not-quite the same:
You now have a bunch of files to manage, all of which need updating if something changes.
The better option here is to template it and generate the YAML dynamically. Tools that are commonly used for this include jsonnet and kustomize. Here’s an example with jsonnet. First we create the template connection in a function called <span class="inline-code">makeConnection</span>:
and then in the same file invoke that function for the list of locations for which we want to create connections:
Now we invoke jsonnet and pass the output through <span class="inline-code">yq</span> to format it as YAML:
With that we have a file of connection resource definitions that we can use with <span class="inline-code">decodable apply</span>, in this case four different connections, one for each location, using a different hostname and connection name for each:
Get started today!
Declarative resource management in Decodable is, as you’ve seen above, fantastically powerful. Not only does it enable SDLC best-practices such as source control and GitOps, but comes with numerous other uses such as ad-hoc resource management, migration between environments, and even deleting resources.
You can find full details in the documentation. Sign up with Decodable for free today to give it a try.