Streamsets : A Powerful DataEngineering + DataOPs Tool
Overall Satisfaction with StreamSets DataOps Platform
Being part of one of the Healthcare Service provider accounts, we as a data engineering Team utilized StreamSets to design Data Pipelines to hydrate/load On-Prem data (from various RDBMS sources) to Cloud i.e. Azure, GCP. These Datasets are further utilized by Data scientists and analysts to generate patterns and insights for the healthcare benefits of customers.
We use StreamSets heavily not only for our Batch use cases but for real-time use cases too like consuming from Kafka topic and streaming data to Azure Event Hub.
We use StreamSets heavily not only for our Batch use cases but for real-time use cases too like consuming from Kafka topic and streaming data to Azure Event Hub.
Pros
- A easy to use canvas to create Data Engineering Pipeline.
- A wide range of available Stages ie. Sources, Processors, Executors, and Destinations.
- Supports both Batch and Streaming Pipelines.
- Scheduling is way easier than cron.
- Integration with Key-Vaults for Secrets Fetching.
Cons
- Monitoring/Visualization can be improvised and enhanced a lot (e.g. to monitor a Job to see what happened 7 days back with data transfer).
- The logging mechanism can be simplified (Logs can be filtered with "ERROR", "DEBUG", "ALL" etc but still takes some time to get familiar for understanding).
- Auto Scalability for heavy load transfer (Taking much time for >5 million record transfer from JDBC to ADLS destination in Avro file transfer).
- There should be a concept of creating Global variables which is missing.
- Simplified Improvised Overall data ingestion and Integration Process.
- Support to various Hetrogenous Source systems like RDBMS< Kafka, Salesforce, Key Vault.
- Secure, easy to launch Integration tool.
- Cloudera Distribution Hadoop (CDH)
StreamSets is a one-stop solution to design Data engineering Pipelines and doesn't require deep Programming knowledge, It's so user-friendly that anyone in Team can contribute to the Idea of pipeline design. In Hadoop One has to be programming proficient to use its various components like Hive, HDFS, Kafka, etc but in StreamSets all these stages are built-in and ready to use with minor configuration.
Do you think IBM StreamSets delivers good value for the price?
Yes
Are you happy with IBM StreamSets's feature set?
Yes
Did IBM StreamSets live up to sales and marketing promises?
I wasn't involved with the selection/purchase process
Did implementation of IBM StreamSets go as expected?
I wasn't involved with the implementation phase
Would you buy IBM StreamSets again?
Yes
Comments
Please log in to join the conversation