Apache Sqoop is a tool for use with Hadoop, used to transfer data between Apache Hadoop and other, structured data stores.
N/A
DemandTools
Score 9.8 out of 10
N/A
DemandTools for AppExchange is a data quality toolset for Salesforce.com CRM centric customers.
The product comprises 11 individual modules to control, standardize, verify, deduplicate, import and manipulate Salesforce and/or Force.com data.
Sqoop is great for sending data between a JDBC compliant database and a Hadoop environment. Sqoop is built for those who need a few simple CLI options to import a selection of database tables into Hadoop, do large dataset analysis that could not commonly be done with that database system due to resource constraints, then export the results back into that database (or another). Sqoop falls short when there needs to be some extra, customized processing between database extract, and Hadoop loading, in which case Apache Spark's JDBC utilities might be preferred
DemandTools is perfect for any system that constantly adds new records to its database. For example, in higher education, we are constantly purchasing search names from various vendors and DemandTools allows us to make sure we are not doubling up on the same records. It saves us money in the long run as we are not mailing out multiple copies of our brochures to the same person.
Sqoop2 development seems to have stalled. I have set it up outside of a Cloudera CDH installation, and I actually prefer it's "Sqoop Server" model better than just the CLI client version that is Sqoop1. This works especially well in a microservices environment, where there would be only one place to maintain the JDBC drivers to use for Sqoop.
I wish I could make changes to my existing scenarios using save rather than having to create a whole new scenario. Maybe you can, I just haven't been able to.
Some features aren't user intuitive and it takes a while to learn.
It's a great product. The only thing that holds us back is it was frustrating working with their sales team. We also don't like that when Validity purchased DemandTools they immediately started charging us quite a bit while it had been free for non-profit users when CRMFusion owned it. They also don't let you buy it for just 1 or 2 seats, you have to pay, I believe, in batches of 100 seats.
Support can be slow so do not expect quick turn for urgent issues. Help for specific queries not not there. Product tech support is offered. It would be great if query support, even if paid, was offered The training webinars help with the basics but not much if you need advanced functionality.
The trainings are free live webinars that give you a solid base for getting started with the program. The only weakness is they don't have any advanced classes.
I had just started using Salesforce about 3 months before I did the implementation myself, and it was easy to do just following their step-by-step instructions.
Sqoop comes preinstalled on the major Hadoop vendor distributions as the recommended product to import data from relational databases. The ability to extend it with additional JDBC drivers makes it very flexible for the environment it is installed within.
Spark also has a useful JDBC reader, and can manipulate data in more ways than Sqoop, and also upload to many other systems than just Hadoop.
Kafka Connect JDBC is more for streaming database updates using tools such as Oracle GoldenGate or Debezium.
Streamsets and Apache NiFi both provide a more "flow based programming" approach to graphically laying out connectors between various systems, including JDBC and Hadoop.
Though RingLead has much more to offer in terms of automation (and it is in Salesforce rather than a separate program), DemandTools still has our heart. DemandTools is very cost-friendly and we were able to increase the value of DemandTools by programming in additional saved scenarios. I decreased admin time in the DemandTools suite by approximately 25% after implementing saved scenarios for every reoccurring update or list upload.
When combined with Cloudera's HUE, it can enable non-technical users to easily import relational data into Hadoop.
Being able to manipulate large datasets in Hadoop, and them load them into a type of "materialized view" in an external database system has yielded great insights into the Hadoop datalake without continuously running large batch jobs.
Sqoop isn't very user-friendly for those uncomfortable with a CLI.