SQL is incredibly powerful, but it can be hard to remember what output you get from different parts of a SQL query. Here's my mental cheat-sheet for writing and reading non-trivial SQL queries.
The Tricks
1. The absence of 'group by' means you will get one output record per input, after predicates like 'where' clauses are applied.
2. If a query has a 'group by', you will get one record per group.
3. 'Over' clauses generate a field for each output record. They operate on a window defined by either a) the 'partition by' expression - which works like a 'group by' - or b) a range of records defined by 'rows between'.
4. Subqueries can be put in a bunch of different places in SQL. They can also appear in predicates in the 'where' clause, where their output acts as part of the filtering condition.
5. In a select list, a subquery (usually) runs once, before the outer query, and its output is inserted as a field.
6. In a 'from', a subquery's output acts as an input table. You can alias the subquery and reference its fields.
Here are a few examples of SQL tricks demonstrating these tricks, written for clarity not performance:
Example: Tweet Likes / All Likes
Example: Current Order Price vs Average Over The Last 10
Example: HTTP Errors By Path And Status, And By Path
There are lots (and lots) of intermediate SQL bits that can greatly simplify life. I've oversimplified a bunch of things and focused on some common cases. Performance can vary greatly based on the DBMS, data, indexes, and config, but hopefully, this helps! Join me on the Decodable community slack with questions, suggestions or comments!
You can get started with Decodable for free - our developer account includes enough for you to build a useful pipeline and - unlike a trial - it never expires.
Learn more:
- Read the docs
- Check out our other blogs
- Subscribe to our YouTube Channel
Join the community Slack