Apache Flume vs. Apache Hadoop

Apache Flume

Apache Flume

9 Reviews and Ratings

Apache Hadoop

Apache Hadoop

271 Reviews and Ratings

Overview
Product	Rating	Most Used By	Product Summary	Starting Price
Apache Flume	Score 7.1 out of 10	N/A	Apache Flume is a product enabling the flow of logs and other data into a Hadoop environment.	N/A
Hadoop	Score 7.5 out of 10	N/A	Hadoop is an open source software from Apache, supporting distributed processing and data storage. Hadoop is popular for its scalability, reliability, and functionality available across commoditized hardware.	N/A

Pricing

Apache Flume

Apache Hadoop

Editions & Modules

No answers on this topic

No answers on this topic

Offerings

Pricing Offerings
Apache Flume	Hadoop
Free Trial
No	No
Free/Freemium Version
No	Yes
Premium Consulting/Integration Services
No	No

Entry-level Setup Fee

No setup fee

No setup fee

Additional Details

—

—

More Pricing Information

Community Pulse
	Apache Flume	Apache Hadoop
Considered Both Products	Apache Flume Verified User Anonymous Chose Apache Flume Apache Flume is on par with Scribe with similar functions. Apache Kafka is a generation purpose while Apache Flume is specific to log aggregation. Google Pub/Sub and IBM MQ are costlier than Apache Flume ( open source ) and have a lot more cost associated with them. Apama … Incentivized Helpful? Juan Francisco Tavira Global Technology Centre - Middleware Chose Apache Flume Apache Flume is a very good solution when your project is not very complex at transformation and enrichment, and good if you have an external management suite like Cloudera, Hortonworks, etc. But it is not a real EAI or ETL like AB Initio or Attunity so you need to know exactly … Incentivized Helpful?	Hadoop Verified User Anonymous Chose Hadoop It’s open source nature it’s community support its being configurable Incentivized Helpful? Chantel Moreno Finance & Accounting Professional Chose Hadoop Different departments of my organization have been getting the benefit from Apache Hadoop as it serves the purpose of saving lives when large amounts of data is unable to be converted and processed in a timely manner from a node or a simple computer. Hadoop also has an easier … Incentivized Helpful? Peter Suter Senior Software Engineer (GUI) Chose Hadoop I feel that this is a highly reliable and scalable solution computing technology that is highly capable of processing large data sets across multiple servers and thousands of machines in a well-defined and distributed manner. Apache Hadoop can automatically scale up the number … Incentivized Helpful? Verified User Anonymous Chose Hadoop Spark is a good alternative to Hadoop that can have faster querying and processing performance and can offer more flexibility in terms of applications that it can support. Google Bigquery has also been a great alternative and is especially great in terms of ease of use. The … Incentivized Helpful? JH Joe Hughes Senior DevOps Engineer Chose Hadoop MariaDB - Better to be already in the cloud you will use it for. Issues have improved as it has matured over the year.s CockroachDB - Not nearly as performant (even out of the box) as Apache Hadoop. More configurations required just to make it work. In memory cacheing is an issue. Incentivized Helpful? Blake Baron Senior Financial Analyst Chose Hadoop Hadoop utilizes a SQL structure, which is great. You pay less for the services, but it's definitely less of an enterprise-level option and more just a good place to store your seldom-used data. Teradata and AWS are a lot faster in returning queries than Hadoop, but you pay … Incentivized Helpful? Gene Baker Vice President, Chief Architect, Development Manager and Software Engineer Chose Hadoop Hands down, Hadoop is less expensive than the other platforms we considered. Cloudera was easier to set up but the expense ruled it out. MS-SQL didn't have the performance we saw with the Hadoop clusters and was more expensive. We considered MS-SQL mainly for its ability … Incentivized Helpful? Verified User Anonymous Chose Hadoop When comparing to the sophistication of IBM GPFS (Spectrum Scale) to Hadoop, it is clear that Spectrum Scale is a much better choice. That is maybe something you don't want to hear, but in all of our research, this has been the final decision of the client. Incentivized Helpful? Kunal Sonalkar Data Research Analyst Chose Hadoop Apache Spark can be considered as an alternative because of its similar capabilities around processing and storing big data. The reason we went with Hadoop was the literature available online and integration capability with platforms like R Studio. The popularity of Hadoop has … Incentivized Helpful? Kartik Chavan Peer Educator (Tutor) & Supplemental Instructions (SI) Leader Chose Hadoop For real-time streaming, use Spark; can provide a stark contrast to the way MR works Hadoop offers a scalable, cost-effective and highly available solution for big data storage and processing. Amazon Redshift is somewhat closer to Hadoop. But to analyze Petabytes of data Hadoop … Incentivized Helpful? Bharadwaj (Brad) Chivukula Sr. Engineering Manager/Delivery Manager Chose Hadoop For real-time streaming, use Spark; can provide a stark contrast to the way MR works Use Hive for querying purposes Incentivized Helpful? Johanes Siregar Big Data Analytics - Data Engineer Chose Hadoop Hadoop offers a scalable, cost-effective and highly available solution for big data storage and processing. The use of a non-proprietary physical layer greatly reduces dependency on technology. It also offers elastic dimensioning capability when deployed on virtual machines or … Incentivized Helpful? Verified User Anonymous Chose Hadoop I haven't worked with other Big Data aggregation services like Hadoop. As far as I know, Hadoop is the leading choice in this field with good cause. There is a lot of community support, custom modules, paid consultants, free and paid training. All this makes it an ideal choice … Incentivized Helpful? Gyan Dwibedy Chief Data & Analytic Officer Chose Hadoop No SQL database were evaluated along with MPP platform. Hadoop performs very well compared to the other platforms. Also since lot of investment goes into Hadoop there is a good chance of getting what one needs from the developer community. Incentivized Helpful? Vinay Suneja Senior Consultant Level II Chose Hadoop Amazon Redshift is some what closer to Hadoop. But to analyze Petabytes of data Hadoop as better performance. Incentivized Helpful? Mark Gargiulo Senior Automation Engineer Chose Hadoop As I am new to the hadoop ecosystem I have not used or evaluated any other similar products at this time. This was handed to me from a previous much older installation that was very under utilized. Our new platform will be working the new cluster much harder with jobs that run … Incentivized Helpful? Muhammad Fazalul Rahman Research Assistant Chose Hadoop Hadoop was a cheaper alternative to Amazon. Since I had to pay for every minute I use with Amazon, I had to make sure multiple times that the code was good enough before I purchased with Amazon. But since Hadoop was available on the cluster, I had the opportunity to code on the … Incentivized Helpful? Piyush Routray Senior Software Developer Chose Hadoop Hadoop being open source, is cheaper to use and do POCs for clients. Cloudera, Hortonworks and MapR also compete to contribute to open source Hadoop and keep their product conceptually similar to Hadoop. Incentivized Helpful? Tushar Kulkarni Chose Hadoop Apache Spark has an in memory processing model, making it powerful for lightning fast data processing. Apache Spark also exposes Scala and Python in APIs which is one of the most commonly used programming languages in data analytic and data processing domains. Incentivized Helpful? Verified User Anonymous Chose Hadoop Not used any other product than Hadoop and I don't think our company will switch to any other product, as Hadoop is providing excellent results. Our company is growing rapidly, Hadoop helps to keep up our performance and meet customer expectations. We also use HDFS which … Incentivized Helpful? Mrugen Deshmukh Senior Software Engineer Chose Hadoop Hadoop provides storage for large data sets and a powerful processing model to crunch and transform huge amounts of data. It does not assume the underlying hardware or infrastructure and enables the users to build data processing infrastructure from commodity hardware. All the … Incentivized Helpful? Sudhakar Kamanboina Software Engineer Chose Hadoop Hadoop has a master slave architecture and comes with more features than Splunk. Incentivized Helpful? Gaurav Kasliwal Software Development Engineer Chose Hadoop Fast and scalable. More reliable as compared to the other products I have used. Incentivized Helpful? Verified User Anonymous Chose Hadoop Processing of big data has been the ultimate need for the me choosing Hadoop. Big data is massive and messy, and it’s coming at you uncontrolled. Data are gathered to be analyzed to discover patterns and correlations that could not be initially apparent, but might be useful in … Incentivized Helpful? Verified User Anonymous Chose Hadoop Hadoop solves lot of problems (involving unstructured data and huge volumes of data ) better than traditional database systems . And it is completely free and open source ( so lots of cost savings ). Data analysis is very fast when compared to old systems, resulting in more … Helpful?

Best Alternatives
	Apache Flume	Apache Hadoop
Small Businesses	No answers on this topic	No answers on this topic
Medium-sized Companies	Cloudera Manager Score 9.9 out of 10	Cloudera Manager Score 9.9 out of 10
Enterprises	IBM Analytics Engine Score 7.1 out of 10	IBM Analytics Engine Score 7.1 out of 10
All Alternatives	View all alternatives	View all alternatives

User Ratings
	Apache Flume	Apache Hadoop
Likelihood to Recommend	8.0 (0 ratings)	8.0 (0 ratings)
Likelihood to Renew	- (0 ratings)	9.6 (0 ratings)
Usability	- (0 ratings)	8.0 (0 ratings)
Performance	- (0 ratings)	8.0 (0 ratings)
Support Rating	5.0 (0 ratings)	7.5 (0 ratings)
Online Training	- (0 ratings)	6.1 (0 ratings)

User Testimonials
	Apache Flume	Apache Hadoop
Likelihood to Recommend	Apache Flume is well suited in small batch and near real time processing projects, taking data from one point to another with local processing (I mean not external enrichment). Filtering, transforming and multiple push destinations are common grounds for Flume. It is not so nice to use if your data needs external enrichment (taking data from external databases or web services), as transactions and (micro)batches may lead to reprocessing and it relies upon the application to avoid duplicates. Incentivized Juan Francisco Tavira Global Technology Centre - Middleware Read full review	Apache Hadoop (and its subsequent add-ons) are well-suited to larger, unstructured data flows, such as aggregation of web traffic or advertising. Geospatial algorithms and their outputs are well-suited for this kind of aggregation as structuring that data is challenging, but leaving it unstructured and performing queries as-needed is a better fit for most business models. With the advent of data science, I would expect Hadoop fits a LOT of their initial outputs quite well. Incentivized JH Joe Hughes Senior DevOps Engineer Read full review
Pros	Multiple sources of data (sources) and destinations (sinks) that allows you to move data form and to any relevant data storage It is very easy to setup and run Very open to personalization, you can create filters, enrichment, new sources and destinations Incentivized Juan Francisco Tavira Global Technology Centre - Middleware Read full review	HDFS is reliable and solid, and in my experience with it, there are very few problems using it Enterprise support from different vendors makes it easier to 'sell' inside an enterprise It provides High Scalability and Redundancy Horizontal scaling and distributed architecture Incentivized Bharadwaj (Brad) Chivukula Sr. Engineering Manager/Delivery Manager Read full review
Cons	It is very specific for log data ingestion so it is pretty hard to use for anything else besides log data Data replication is not built in and needs to be added on top of Apache Flume (not a hard job to do though) Incentivized Verified User Anonymous Read full review	Hadoop is a batch oriented processing framework, it lacks real time or stream processing. Hadoop's HDFS file system is not a POSIX compliant file system and does not work well with small files, especially smaller than the default block size. Hadoop cannot be used for running interactive jobs or analytics. Incentivized Mrugen Deshmukh Senior Software Engineer Read full review
Likelihood to Renew	No answers on this topic	Hadoop is organization-independent and can be used for various purposes ranging from archiving to reporting and can make use of economic, commodity hardware. There is also a lot of saving in terms of licensing costs - since most of the Hadoop ecosystem is available as open-source and is free Bhushan Lakhe Senior Vice President Read full review
Usability	No answers on this topic	Great! Hadoop has an easy to use interface that mimics most other data warehouses. You can access your data via SQL and have it display in a terminal before exporting it to your business intelligence platform of choice. Of course, for smaller data sets, you can also export it to Microsoft Excel. Incentivized Blake Baron Senior Financial Analyst Read full review
Support Rating	Apache Flume is open-source so support is limited. Never the less, it has great documentation and best practices documents from their end-users so it is not hard to use, setup and configure. Incentivized Verified User Anonymous Read full review	We went with a third party for support, i.e., consultant. Had we gone with Azure or Cloudera, we would have obtained support directly from the vendor. my rating is more on the third party we selected and doesn't reflect the overall support available for Hadoop. I think we could have done better in our selection process, however, we were trying to use an already approved vendor within our organization. There is plenty of self-help available for Hadoop online. Incentivized Gene Baker Vice President, Chief Architect, Development Manager and Software Engineer Read full review
Online Training	No answers on this topic	Hadoop is a complex topic and best suited for classrom training. Online training are a waste of time and money. Bhushan Lakhe Senior Vice President Read full review
Alternatives Considered	Apache Flume is on par with Scribe with similar functions. Apache Kafka is a generation purpose while Apache Flume is specific to log aggregation. Google Pub/Sub and IBM MQ are costlier than Apache Flume ( open source ) and have a lot more cost associated with them. Apama Streaming Analytics and Tibco Steaming are more comprehensive streaming solutions than Apache Flume so for deeper performance guarantees, it is easier to use Apache Flume. Incentivized Verified User Anonymous Read full review	I feel that this is a highly reliable and scalable solution computing technology that is highly capable of processing large data sets across multiple servers and thousands of machines in a well-defined and distributed manner. Apache Hadoop can automatically scale up the number of servers and machines that are needed to process, store, and analyze data sets. It also handles explosions in data with big data technology. Apache Hadoop is good at handling all node failures as well. Incentivized Peter Suter Senior Software Engineer (GUI) Read full review
Return on Investment	Positive impact on ROI due to a reduction in manual labor to generate and maintain compliance reports based on logs. Positive impact on the business objective by reducing the need for provisioning compute for log aggregate IT stack in advance but adding on an as-needed basis. Incentivized Verified User Anonymous Read full review	As it was open source makes it popular choice for handling large chuck of datasets It was free earlier but now it’s licensed but still enterprise is a fine tuned version which makes it easier for new users and administrators to use it Our investment is worth every single penny. Initial cost is more as you might need to hire administrators to setup the cluster and make them in scalable. But once done it’s pretty easy Incentivized Verified User Anonymous Read full review
ScreenShots