Item: dbt
Rating: 9
Author: Judy Campion

Use Cases and Deployment Scope

We use dbt to transform source data into meaningful report data, so it can be easily consumed in dashboards, allowing our management insights and the ability to steer the company. We use Fivetran and other tools to land the data in our Snowflake data warehouse, and then dbt to transform and utilize that data.

Pros and Cons

Text based integration with github - it's very easy to see changes to code over time.
Leverages SQL which makes it a fast learning curve for most developers.
Removes complexity of deployment to multiple environments.
Adds powerful templating, making dynamic sql easy.
Data lineage and documentation.
Easy to add automated testing for data quality.
Easy to switch output between tables and views by setting a flag.
Excellent documentation, slack app, training, and support.
Packages (libraries) exist with helpful code readily available.
Failsafe - dbt core is open source so our investment in code is sound even if they hike the prices.

Field-level lineage (currently at table level)
Documentation inheritance - if a field is documented the downstream field of the same name could inherit the doc info
Adding python model support (in beta now)

Most Important Features

SQL-based (can hire and scale quickly)
Github integration (can see changes)
dbt core is open source - solid investment in business logic in case it gets pricey$$$
Powerful - ability to do dynamic work easily (enhances SQL)
Data lineage visibility

Return on Investment

In 3 months we re-wrote the data warehouse (15-20 sources) in dbt with 3 developers.
We are using it continually for the past year with no issues.
Sorry, I don't have ROI numbers but the impact was huge.

Alternatives Considered

SnapLogic

SnapLogic is great at the Extraction and Load processes of ETL. It can pull data from anywhere, even behind firewalls. So if you need to get data from various APIs, databases, files, S3, SFTP, etc it is easy to do so. However, it requires special knowledge in order to build and maintain pipelines, and it's not easy to find SnapLogic developers. It has a nice GUI so you can see the steps (snaps) in the pipelines. If you modify a pipeline it's very hard to tell where the modification happened (you could compare heavily nested JSON if you have saved it to get an idea, but it's not very user-friendly or visible), which makes group development tricky. Also, looping/reuse can be complicated to accomplish. Deploying to multiple environments required us to have copies of the pipelines in various projects, allowing the possibility of them being out-of-sync with each other. dbt doesn't do any Extraction or Load, it just does T(ransformation). So, if you don't need to get data from many places (APIs, files, S3, dbs) or can have another tool (Fivetran / Stitch) do that piece, then dbt is excellent. dbt uses text files in GitHub, so it's very easy to compare and see the changes each developer makes at any point in time. dbt is based in SQL, with jinja (a python-like language) that allows you to do dynamic things easily to create resulting SQL. It is very easy to work as a team with dbt, easy to hire SQL developers, and easy for them to become productive. It also has lineage, documentation, and testing built-in. It's set up to enable deployment to multiple environments from the same source base (or specific branch), so you know the code is consistent among environments. The code must exist in GitHub before it's deployed, removing the potential for human error during deployment.

Key Insights

Do you think dbt delivers good value for the price?

Yes

Are you happy with dbt's feature set?

Yes

Did dbt live up to sales and marketing promises?

Yes

Did implementation of dbt go as expected?

Yes

Would you buy dbt again?

Yes

Likelihood to Recommend

If you can load your data first into your warehouse, dbt is excellent. It does the T(ransformation) part of ELT brilliantly but does not do the E(xtract) or L(oad) part. If you know SQL or your development team knows SQL, it's a framework and extension around that. So, it's easy to learn and easy to hire people with that technical skill (as opposed to specific Informatica, SnapLogic, etc. experience). dbt uses plain text files and integrates with GitHub. You can easily see the changes made between versions. In GUI-based UIs it was always hard to tell what someone had changed. Each "model" is essentially a "SELECT" statement. You never need to do a "CREATE TABLE" or "CREATE VIEW" - it's all done for you, leaving you to work on the business logic. Instead of saying "FROM specific_db.schema.table" you indicate "FROM ref('my_other_model')". It creates an internal dependency diagram you can view in a DAG. When you deploy, the dependencies work like magic in your various environments. They also have great documentation, an active slack community, training, and support. I like the enhancements they have been making and I believe they are headed in a good direction.

dbt - an excellent transformation tool for the masses

Software Version

Overall Satisfaction with dbt