At 2nd Watch, we’re always keeping an eye on up-and-coming technologies. We investigate, test, and test some more to make sure we fully understand the benefits and potential drawbacks of any technology we may recommend to our clients. One unique tool we’ve recently spent quality time with is data build tool (dbt).
What is dbt?
Before loading data into a centralized data warehouse, it must be cleaned up, made consistent, and combined as necessary. In other words, data must be transformed – the “T” in ETL (extract, transform, load) and ELT. This allows an organization to develop valuable, trustworthy insights through analytics and reporting.
Dbt enables data analysts and data engineers to automate the testing and deployment of the data transformation process. This is especially useful because many companies have increasingly complex business logic behind their reporting data. The dbt tool keeps a record of all changes made to the underlying logic and makes it easy to trace data and update or fix the pipeline through version control.
Where does dbt fit in the market?
Dbt has few well-adopted direct competitors in the enterprise space, as no tool on the market offers quite the same functionality. Dbt does not extract or load data to/from a warehouse; it focuses only on transforming data after it has been ingested.
Some complementary tools are Great Expectations, Flyway, and Apache Airflow. Let’s take a closer look:
Airflow assists with ETL by creating automated processes, including pipelines and other operations commonly found in the orchestration workflow. It can integrate into a data warehouse, run commands, and operate off of a DAG similar to dbt’s; but it isn’t designed for full querying work. The dbt tool has a fleshed out front-end interface for query development and coding, whereas Airflow focuses more on the actual flow of data in its interface.
Flyway is a version control system that tracks updates made to tables in a data warehouse. It doesn’t allow for editing, merely easing the migration process for teams with different sets of code. Flyway advances documentation in a separate environment, while dbt manages this via integrations with services like GitHub and DevOps.
Great Expectations allows you to create comprehensive tests that run against your database, but it isn’t integrated with other ETL features. Unlike dbt, it doesn’t allow for any editing of the actual database.
What should you know about the dbt tool?
Dbt has a free open source version and a paid cloud version in which they manage all of the infrastructure in a SaaS offering. In 2020, they introduced the Integrated Developer Environment, which coincided with dbt pricing updates. Read more about the dbt cloud environment and dbt pricing here.
Dbt’s key functions include the following:
- Dbt tests data quality, integration, and code performance. Quality is built into the tool, and the others can be coded and run in dbt (automatically in some cases).
- Create test programs that check for missing/incomplete entries, unique constraints, and accepted values within specific columns.
- Manually run scripts that will then run automated tests and deploy changes after passing said tests. Notifications can be programmed to be sent out if a certain test fails.
- Dbt has a built-in package manager that allows analysts and engineers to publish both public and private repositories. These can then be referenced by other users.
- Deploy a dbt project after merging updated code in git.
- Updates to the server can run on a set schedule in git.
- Dbt automatically creates a visual representation of how data flows throughout an organization.
- Easily create documentation through schema files.
- Documents are automatically generated and accessible through dbt, with the ability to send files in deployment. Maps are created to show the flow of data through each table in the ETL process.
One other thing to know about dbt is that you can use Jinja, a coding language, in conjunction with SQL to establish macros and integrate other functions outside of SQL’s capabilities. Jinja is particularly helpful when you have to repeat calculations or need to condense code. Using Jinja will enhance SQL within any dbt project, and our dbt consultants are available to help you harness Jinja’s possibilities within dbt.
Where could dbt fit in your tech ecosystem?
As previously mentioned, dbt has a free open source version and a paid cloud version, giving your company flexibility in budget and functionality to build the right tech stack for your organization. Dbt fits nicely with an existing modern technology stack with native connections to tools such as Stitch, Fivetran, Redshift, Snowflake, BigQuery, Looker, and Mode.
With dbt, data analysts and data engineers are able to more effectively transform data in your data warehouses by easily testing and deploying changes to the transformation process, and they gain a visual representation of the dependencies at each stage of the process. Dbt allows you to see how data flows throughout your organization, potentially enhancing the results you see from other data and analytics technologies.
Are you ready to discuss implementing an enterprise data initiative? Contact with on of our data consulting experts.