Likelihood to Recommend If you can load your data first into your warehouse, dbt is excellent. It does the T(ransformation) part of ELT brilliantly but does not do the E(xtract) or L(oad) part. If you know SQL or your development team knows SQL, it's a framework and extension around that. So, it's easy to learn and easy to hire people with that technical skill (as opposed to specific Informatica,
SnapLogic , etc. experience). dbt uses plain text files and integrates with GitHub. You can easily see the changes made between versions. In GUI-based UIs it was always hard to tell what someone had changed. Each "model" is essentially a "SELECT" statement. You never need to do a "CREATE TABLE" or "CREATE VIEW" - it's all done for you, leaving you to work on the business logic. Instead of saying "FROM specific_db.schema.table" you indicate "FROM ref('my_other_model')". It creates an internal dependency diagram you can view in a DAG. When you deploy, the dependencies work like magic in your various environments. They also have great documentation, an active slack community, training, and support. I like the enhancements they have been making and I believe they are headed in a good direction.
Read full review Majorly for all Batch and Streaming Scenarios we are designing StreamSets pipelines, few best suited and tried out use cases below : 1. JDBC to ADLS data transfer based on source refresh frequency. 2. Kafka to GCS. 3. Kafka to Azure Event. 4. Hub HDFS to ADLS data transfer. 5. Schema generation to generate Avro. The easy to design Canvas, Scheduling Jobs, Fragment creation and utilization, an inbuilt wide range of Stage availability makes it an even more favorable tool for me to design data engineering pipelines.
Read full review Pros user experience makes it easy to work with SQL and version control customer success team and the dbt (data build tool) community help establish best practices thorough and clear documentation Read full review A easy to use canvas to create Data Engineering Pipeline. A wide range of available Stages ie. Sources, Processors, Executors, and Destinations. Supports both Batch and Streaming Pipelines. Scheduling is way easier than cron. Integration with Key-Vaults for Secrets Fetching. Read full review Cons Slow load times of the dbt cloud environment (they're working on it via a new UI though) More out-of-the-box solutions for managing procedures, functions, etc would be nice to have, but honestly, it's pretty easy to figure out how to adapt dbt macros Read full review Monitoring/Visualization can be improvised and enhanced a lot (e.g. to monitor a Job to see what happened 7 days back with data transfer). The logging mechanism can be simplified (Logs can be filtered with "ERROR", "DEBUG", "ALL" etc but still takes some time to get familiar for understanding). Auto Scalability for heavy load transfer (Taking much time for >5 million record transfer from JDBC to ADLS destination in Avro file transfer). There should be a concept of creating Global variables which is missing. Read full review Alternatives Considered Most ETL pipeline products have a T layer, but dbt just does it better. The transformation is on steroids compared to the others. Also, just allows much more Adhoc solutions for very specific projects. Those ETL tools are probably better on the T part if you don't need too many transforms - also dbt is pretty much free dependent on how you work it, also extremely scalable.
Read full review StreamSets is a one-stop solution to design Data engineering Pipelines and doesn't require deep Programming knowledge, It's so user-friendly that anyone in Team can contribute to the Idea of pipeline design. In
Hadoop One has to be programming proficient to use its various components like Hive, HDFS, Kafka, etc but in StreamSets all these stages are built-in and ready to use with minor configuration.
Read full review Return on Investment Simplified our BI layer for faster load times Increased the quality of data reaching our end users Makes complex transformations manageable Read full review Simplified Improvised Overall data ingestion and Integration Process. Support to various Hetrogenous Source systems like RDBMS< Kafka, Salesforce, Key Vault. Secure, easy to launch Integration tool. Read full review ScreenShots