Overview
What is StreamSets?
StreamSets in San Francisco offers their DataOps Platform, a subscription based streaming analytics platform including StreamSets Data Collector data source management, Control Hub for data movement architecture management, StreamSets Data Collector Edge IoT manager, DataFlow Performance Manager (DPM), and…
Streamsets : A Powerful DataEngineering + DataOPs Tool
Pricing
What is StreamSets?
StreamSets in San Francisco offers their DataOps Platform, a subscription based streaming analytics platform including StreamSets Data Collector data source management, Control Hub for data movement architecture management, StreamSets Data Collector Edge IoT manager, DataFlow Performance Manager…
Entry-level set up fee?
- No setup fee
Offerings
- Free Trial
- Free/Freemium Version
- Premium Consulting/Integration Services
Would you like us to let the vendor know that you want pricing?
9 people also want pricing
Alternatives Pricing
What is Striim?
Striim is an enterprise-grade platform that offers continuous real-time data ingestion, high-speed in-flight stream processing, and sub-second delivery of data to cloud and on-premises endpoints.
What is Cloudera Data Platform?
Cloudera Data Platform (CDP), launched September 2019, is designed to combine the best of Hortonworks and Cloudera technologies to deliver an enterprise data cloud. CDP includes the Cloudera Data Warehouse and machine learning services as well as a Data Hub service for building custom business…
Product Details
- About
- Tech Details
What is StreamSets?
StreamSets Technical Details
Operating Systems | Unspecified |
---|---|
Mobile Application | No |
Comparisons
Compare with
Reviews and Ratings
(8)Community Insights
- Business Problems Solved
- Recommendations
Users have found Streamsets to be a versatile and user-friendly platform that solves a variety of data integration challenges. One key use case is the ability to easily develop on-premises and deploy to the cloud, helping users control their cloud budget efficiently. The platform has also been praised for its seamless integration with Apache Kafka and Apache Nifi, simplifying the process of connecting these tools with a data lake.
Streamsets has proven valuable in handling real-time data consumption, filtering, tagging, and monitoring of systems, as well as anomaly detection based on traffic patterns. Users have utilized the platform for data movement, migration, and ingestion, reducing downtime and simplifying the process. Additionally, Streamsets has been widely used for data extraction from various source systems, including IoT devices, enabling users to gain insights from previously inaccessible data sources.
The tool's ability to handle different data formats elegantly and save time compared to hand-coded ETL tools has been appreciated by users. It has been effectively used for solving big data ETL problems, offering fast transfer, support for various sources and destinations, and prompt support. Streamsets has also been utilized in AI/ML tasks such as building transformations for knowledge graphs.
Overall, Streamsets has proven reliable and efficient in handling data ingestion from various sources, meeting the needs of users across industries and providing flexibility in designing pipelines with minimal coding.
Users have made several recommendations for StreamSets based on their experiences.
Firstly, they suggest trying out the data collector, as it is free to download and install. This allows users to explore the capabilities of the tool without any financial commitment.
Secondly, users recommend using Docker for local testing and deployment in a development environment. This suggestion helps streamline the process and ensure smooth integration with other systems.
Lastly, users praise StreamSets as one of the best ETL/ELT tools for data ingestion. They mention its ability to handle large volumes of data efficiently. Additionally, users appreciate the high level of customization offered by StreamSets, allowing them to tailor it to meet their specific enterprise needs. They also commend the support team for their dedication in tweaking the software for missing components.
To optimize performance, users advise analyzing data transfer requirements carefully and configuring the data conversion nodes appropriately. They emphasize the need for sufficient memory to support these requirements.
Overall, these recommendations highlight StreamSets' value as a versatile tool for fast-paced Data Engineering pipeline development and reliable data ingestion, especially when dealing with large amounts of data.
Reviews
(1-1 of 1)We use StreamSets heavily not only for our Batch use cases but for real-time use cases too like consuming from Kafka topic and streaming data to Azure Event Hub.
- A easy to use canvas to create Data Engineering Pipeline.
- A wide range of available Stages ie. Sources, Processors, Executors, and Destinations.
- Supports both Batch and Streaming Pipelines.
- Scheduling is way easier than cron.
- Integration with Key-Vaults for Secrets Fetching.
- Monitoring/Visualization can be improvised and enhanced a lot (e.g. to monitor a Job to see what happened 7 days back with data transfer).
- The logging mechanism can be simplified (Logs can be filtered with "ERROR", "DEBUG", "ALL" etc but still takes some time to get familiar for understanding).
- Auto Scalability for heavy load transfer (Taking much time for >5 million record transfer from JDBC to ADLS destination in Avro file transfer).
- There should be a concept of creating Global variables which is missing.
1. JDBC to ADLS data transfer based on source refresh frequency.
2. Kafka to GCS.
3. Kafka to Azure Event.
4. Hub HDFS to ADLS data transfer.
5. Schema generation to generate Avro.
The easy to design Canvas, Scheduling Jobs, Fragment creation and utilization, an inbuilt wide range of Stage availability makes it an even more favorable tool for me to design data engineering pipelines.
- Visualization Dashboards
- 70%7.0
- Low Latency
- 80%8.0
- Integrated Development Tools
- 100%10.0
- Data wrangling and preparation
- 100%10.0
- Data Enrichment
- 100%10.0
- Simplified Improvised Overall data ingestion and Integration Process.
- Support to various Hetrogenous Source systems like RDBMS< Kafka, Salesforce, Key Vault.
- Secure, easy to launch Integration tool.
- Cloudera Distribution Hadoop (CDH)