dbt - an excellent transformation tool for the masses
Overall Satisfaction with dbt
We use dbt to transform source data into meaningful report data, so it can be easily consumed in dashboards, allowing our management insights and the ability to steer the company. We use Fivetran and other tools to land the data in our Snowflake data warehouse, and then dbt to transform and utilize that data.
Pros
- Text based integration with github - it's very easy to see changes to code over time.
- Leverages SQL which makes it a fast learning curve for most developers.
- Removes complexity of deployment to multiple environments.
- Adds powerful templating, making dynamic sql easy.
- Data lineage and documentation.
- Easy to add automated testing for data quality.
- Easy to switch output between tables and views by setting a flag.
- Excellent documentation, slack app, training, and support.
- Packages (libraries) exist with helpful code readily available.
- Failsafe - dbt core is open source so our investment in code is sound even if they hike the prices.
Cons
- Field-level lineage (currently at table level)
- Documentation inheritance - if a field is documented the downstream field of the same name could inherit the doc info
- Adding python model support (in beta now)
- SQL-based (can hire and scale quickly)
- Github integration (can see changes)
- dbt core is open source - solid investment in business logic in case it gets pricey$$$
- Powerful - ability to do dynamic work easily (enhances SQL)
- Data lineage visibility
- In 3 months we re-wrote the data warehouse (15-20 sources) in dbt with 3 developers.
- We are using it continually for the past year with no issues.
- Sorry, I don't have ROI numbers but the impact was huge.
SnapLogic is great at the Extraction and Load processes of ETL. It can pull data from anywhere, even behind firewalls. So if you need to get data from various APIs, databases, files, S3, SFTP, etc it is easy to do so. However, it requires special knowledge in order to build and maintain pipelines, and it's not easy to find SnapLogic developers. It has a nice GUI so you can see the steps (snaps) in the pipelines. If you modify a pipeline it's very hard to tell where the modification happened (you could compare heavily nested JSON if you have saved it to get an idea, but it's not very user-friendly or visible), which makes group development tricky. Also, looping/reuse can be complicated to accomplish. Deploying to multiple environments required us to have copies of the pipelines in various projects, allowing the possibility of them being out-of-sync with each other. dbt doesn't do any Extraction or Load, it just does T(ransformation). So, if you don't need to get data from many places (APIs, files, S3, dbs) or can have another tool (Fivetran / Stitch) do that piece, then dbt is excellent. dbt uses text files in GitHub, so it's very easy to compare and see the changes each developer makes at any point in time. dbt is based in SQL, with jinja (a python-like language) that allows you to do dynamic things easily to create resulting SQL. It is very easy to work as a team with dbt, easy to hire SQL developers, and easy for them to become productive. It also has lineage, documentation, and testing built-in. It's set up to enable deployment to multiple environments from the same source base (or specific branch), so you know the code is consistent among environments. The code must exist in GitHub before it's deployed, removing the potential for human error during deployment.
Do you think dbt delivers good value for the price?
Yes
Are you happy with dbt's feature set?
Yes
Did dbt live up to sales and marketing promises?
Yes
Did implementation of dbt go as expected?
Yes
Would you buy dbt again?
Yes
Comments
Please log in to join the conversation