Streamsets : A Powerful DataEngineering + DataOPs Tool
Updated May 06, 2022

Streamsets : A Powerful DataEngineering + DataOPs Tool

Abhishek Katara | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User

Overall Satisfaction with StreamSets DataOps Platform

Being part of one of the Healthcare Service provider accounts, we as a data engineering Team utilized StreamSets to design Data Pipelines to hydrate/load On-Prem data (from various RDBMS sources) to Cloud i.e. Azure, GCP. These Datasets are further utilized by Data scientists and analysts to generate patterns and insights for the healthcare benefits of customers.

We use StreamSets heavily not only for our Batch use cases but for real-time use cases too like consuming from Kafka topic and streaming data to Azure Event Hub.
  • A easy to use canvas to create Data Engineering Pipeline.
  • A wide range of available Stages ie. Sources, Processors, Executors, and Destinations.
  • Supports both Batch and Streaming Pipelines.
  • Scheduling is way easier than cron.
  • Integration with Key-Vaults for Secrets Fetching.
  • Monitoring/Visualization can be improvised and enhanced a lot (e.g. to monitor a Job to see what happened 7 days back with data transfer).
  • The logging mechanism can be simplified (Logs can be filtered with "ERROR", "DEBUG", "ALL" etc but still takes some time to get familiar for understanding).
  • Auto Scalability for heavy load transfer (Taking much time for >5 million record transfer from JDBC to ADLS destination in Avro file transfer).
  • There should be a concept of creating Global variables which is missing.
  • Simplified Improvised Overall data ingestion and Integration Process.
  • Support to various Hetrogenous Source systems like RDBMS< Kafka, Salesforce, Key Vault.
  • Secure, easy to launch Integration tool.
  • Cloudera Distribution Hadoop (CDH)
StreamSets is a one-stop solution to design Data engineering Pipelines and doesn't require deep Programming knowledge, It's so user-friendly that anyone in Team can contribute to the Idea of pipeline design. In Hadoop One has to be programming proficient to use its various components like Hive, HDFS, Kafka, etc but in StreamSets all these stages are built-in and ready to use with minor configuration.

Do you think StreamSets DataOps Platform delivers good value for the price?

Yes

Are you happy with StreamSets DataOps Platform's feature set?

Yes

Did StreamSets DataOps Platform live up to sales and marketing promises?

I wasn't involved with the selection/purchase process

Did implementation of StreamSets DataOps Platform go as expected?

I wasn't involved with the implementation phase

Would you buy StreamSets DataOps Platform again?

Yes

Majorly for all Batch and Streaming Scenarios we are designing StreamSets pipelines, few best suited and tried out use cases below :
1. JDBC to ADLS data transfer based on source refresh frequency.
2. Kafka to GCS.
3. Kafka to Azure Event.
4. Hub HDFS to ADLS data transfer.
5. Schema generation to generate Avro.

The easy to design Canvas, Scheduling Jobs, Fragment creation and utilization, an inbuilt wide range of Stage availability makes it an even more favorable tool for me to design data engineering pipelines.

StreamSets Feature Ratings

Visualization Dashboards
7
Low Latency
8
Integrated Development Tools
10
Data wrangling and preparation
10
Data Enrichment
10