Apache Flume vs. Apache Hive

Apache Flume

Apache Flume

9 Reviews and Ratings

Apache Hive

Apache Hive

95 Reviews and Ratings

Overview
Product	Rating	Most Used By	Product Summary	Starting Price
Apache Flume	Score 7.1 out of 10	N/A	Apache Flume is a product enabling the flow of logs and other data into a Hadoop environment.	N/A
Apache Hive	Score 8.0 out of 10	N/A	Apache Hive is database/data warehouse software that supports data querying and analysis of large datasets stored in the Hadoop distributed file system (HDFS) and other compatible systems, and is distributed under an open source license.	N/A

Pricing

Apache Flume

Apache Hive

Editions & Modules

No answers on this topic

No answers on this topic

Offerings

Pricing Offerings
Apache Flume	Apache Hive
Free Trial
No	No
Free/Freemium Version
No	No
Premium Consulting/Integration Services
No	No

Entry-level Setup Fee

No setup fee

No setup fee

Additional Details

—

—

More Pricing Information

Community Pulse
	Apache Flume	Apache Hive
Considered Both Products	Apache Flume Verified User Anonymous Chose Apache Flume Apache Flume is on par with Scribe with similar functions. Apache Kafka is a generation purpose while Apache Flume is specific to log aggregation. Google Pub/Sub and IBM MQ are costlier than Apache Flume ( open source ) and have a lot more cost associated with them. Apama … Incentivized Helpful? Juan Francisco Tavira Global Technology Centre - Middleware Chose Apache Flume Apache Flume is a very good solution when your project is not very complex at transformation and enrichment, and good if you have an external management suite like Cloudera, Hortonworks, etc. But it is not a real EAI or ETL like AB Initio or Attunity so you need to know exactly … Incentivized Helpful?	Apache Hive Verified User Anonymous Chose Apache Hive To query a huge, distributed dataset, Apache Hive was built by Facebook. Unlike Apache Hive, Apache Spark is an in-memory computation engine, which is why it is significantly quicker than Apache Hive at querying large amounts of data. In contrast to Apache HBase, Apache Hive is … Incentivized Helpful? Prasanna Kumar TR Developer and Site Contributor Chose Apache Hive Apache hive gave more flexible than MS SQL server. ElasticSearch was little complex. GoogleBigQuery cost more. Incentivized Helpful? Verified User Anonymous Chose Apache Hive Community support and ease of use -not deployment. It enables querying and analyzing large amounts of data stored in HDFS, on the petabyte scale. It has a query language called HQL that transforms SQL queries into MapReduce jobs that run on Hadoop, and it is wonderful for the … Incentivized Helpful? Verified User Anonymous Chose Apache Hive Apache Spark is similar in the sense that it too can be used to query and process large amounts of data through its Dataframe interface. Hive is better for short-term querying while Spark is better for persistent and long-term analysis. Another product is Impala. For our … Incentivized Helpful? Camilo Palacios Administrador informático. Chose Apache Hive We have used a simple but necessary function such as merging certain data tables, which although they may be from different areas, complement each other or are necessary, you can use metadata if what you need is to validate the origin of your information and what impact it has, … Incentivized Helpful? Omkar Marne Research Application Software Engineer Chose Apache Hive Apache Hadoop is built on top of the Hadoop File system so it gives its best when integrated with Hadoop. Data analysis and query optimization become very easy when used with Hadoop to perform Extract transform load operations. As Hadoop is a big data system and handles large … Incentivized Helpful? Pablo Gonzalez Internet Marketing Manager Chose Apache Hive We have used the system to migrate data either for new versions or because we will use another operating program, the software helps us to synchronize programs between different operating systems, a history of information can be kept constant, it can be sent to third parties … Incentivized Helpful? Verified User Anonymous Chose Apache Hive Queries are easy to write and interface is similar to SQL so learning overhead is reduced. Multi user and data type support is provided. Can be easily scaled for very large amount of analytics. It is very flexible in terms of using file formats. Incentivized Helpful? Verified User Anonymous Chose Apache Hive Snowflake, Splunk Cloud, Talend Open Studio, Azure Data Factory and Apache Spark Incentivized Helpful? Verified User Anonymous Chose Apache Hive Due to effective queries resolved time and the performance and user-friendly framework compared to other products. Incentivized Helpful? Surendranatha Reddy Chappidi Senior Data Engineer Chose Apache Hive Azure Synapse Analytics (Azure SQL Data Warehouse) and Databricks Lakehouse Platform (Unified Analytics Platform) Incentivized Helpful? akshay kashyap CONSULTANT Chose Apache Hive Apache Hive is a query language developed by Facebook to query over a large distributed dataset. Apache is a query engine that runs on top of HDFS, so it utilizes the resources of HDFS Hadoop setup, while Apache Spark is an in memory compute engine, and that's why [it is] much … Incentivized Helpful? Manjeet Singh Senior Manager - Engineering Chose Apache Hive Besides Hive, I have used Google BigQuery, which is costly but have very high computation speed. Amazon Redshift is the another product, I used in my recent organisation. Both Redshift and BigQuery are managed solution whereas Hive needs to be managed Incentivized Helpful? Verified User Anonymous Chose Apache Hive Hive and Spark have the same parent company hence they share a lot of common features. Hive follows SQL syntax while Spark has support for RDD, DataFrame API. DataFrame API supports both SQL syntax and has custom functions to perform the same functionality. Spark is faster and … Incentivized Helpful? Verified User Anonymous Chose Apache Hive Apache Hive decouples the query layer from the storage layer, it is more flexible and expandable. Incentivized Helpful? Ananth Gouri Assistant Professor Chose Apache Hive One of the major advantages of using Presto or the main reason why people use Presto (Teradata) is due to that fact it can support multiple data sources - which is lacking as in the case of Apache Hive. But still, most people who come from a Structured data-based background … Incentivized Helpful? Nicolas Hubert Machine Learning Engineer Chose Apache Hive Easy to understand, well supported by the community, good documentation. However, it is possible that SAP Business Warehouse could be a good fit, too, even maybe better. I did not have the chance to try it though. We selected Apache Hive because it was far less expensive and … Incentivized Helpful? Kartik Chavan Data Science Trainee Chose Apache Hive I considered Hive because it is the best suited option when it comes to larger data access. Besides, learning HiveQL is comparatively easy. Incentivized Helpful? Verified User Anonymous Chose Apache Hive I have used Storm for real-time processing, but that only addresses a few data points. But for a larger access to data, Hive is well suited. Incentivized Helpful? Verified User Anonymous Chose Apache Hive [We selected Apache Hive because] It's from apache and opensource. So it's free. Incentivized Helpful? Tejaswar Rao Associate Consultant Chose Apache Hive Faster response time and also can handle complex analytical queries Can able to write custom function using python and hive Able to connect using hadoop components and also using R Incentivized Helpful? Bharadwaj (Brad) Chivukula Sr.Technical Manager/Delivery Manager Chose Apache Hive For storing bulk amount of data in a tabular manner, and where there's no need need of primary key, or just in case, if redundant data is received, it will not cause a problem. For small amounts of data, it does run MR, so beware. If your intention is to use it as a … Incentivized Helpful? Sameer Gupta Senior Data Analyst Chose Apache Hive I wasn't part of the evaluation process for Apache Hive. This was already implemented when I joined the company. I have worked with other big data plaftforms and I personally thinks most of them are quite comporable to one another. It really depends on what the company is going … Incentivized Helpful? Verified User Anonymous Chose Apache Hive Hive is SQL compliant which makes it easy for the data folks compared to Pig Incentivized Helpful? Verified User Anonymous Chose Apache Hive Apache Pig is probably the most direct technology to compare to Hive and has several different use cases to Hive. If you want to simplify processing tasks that run using MapReduce then Apache Pig may be a better tool for the job. However if you are going to be running many … Incentivized Helpful?

Best Alternatives
	Apache Flume	Apache Hive
Small Businesses	No answers on this topic	Google BigQuery Score 8.7 out of 10
Medium-sized Companies	Cloudera Manager Score 9.9 out of 10	Cloudera Enterprise Data Hub Score 9.0 out of 10
Enterprises	IBM Analytics Engine Score 7.1 out of 10	Oracle Exadata Score 9.8 out of 10
All Alternatives	View all alternatives	View all alternatives

User Ratings
	Apache Flume	Apache Hive
Likelihood to Recommend	8.0 (0 ratings)	8.0 (0 ratings)
Likelihood to Renew	- (0 ratings)	10.0 (0 ratings)
Usability	- (0 ratings)	8.5 (0 ratings)
Support Rating	5.0 (0 ratings)	7.0 (0 ratings)

User Testimonials
	Apache Flume	Apache Hive
Likelihood to Recommend	Apache Flume is well suited in small batch and near real time processing projects, taking data from one point to another with local processing (I mean not external enrichment). Filtering, transforming and multiple push destinations are common grounds for Flume. It is not so nice to use if your data needs external enrichment (taking data from external databases or web services), as transactions and (micro)batches may lead to reprocessing and it relies upon the application to avoid duplicates. Incentivized Juan Francisco Tavira Global Technology Centre - Middleware Read full review	Apache Hive shines for ad-hoc analysis and plugging into BI tools. Its SQL-like syntax allows for ease of use not for only for engineers but also for data analysts. Through our experience, there are probably more desirable tools to use if you are planning on integrating Hive into your processing pipeline. Incentivized Verified User Anonymous Read full review
Pros	Multiple sources of data (sources) and destinations (sinks) that allows you to move data form and to any relevant data storage It is very easy to setup and run Very open to personalization, you can create filters, enrichment, new sources and destinations Incentivized Juan Francisco Tavira Global Technology Centre - Middleware Read full review	Hive syntax is almost like SQL, so for someone already familiar with SQL it takes almost no effort to pick up Hive. To be able to run map reduce jobs using json parsing and generate dynamic partitions in parquet file format. Simplifies your experience with Hadoop especially for non-technical/coding partners. Incentivized Bharadwaj (Brad) Chivukula Sr.Technical Manager/Delivery Manager Read full review
Cons	It is very specific for log data ingestion so it is pretty hard to use for anything else besides log data Data replication is not built in and needs to be added on top of Apache Flume (not a hard job to do though) Incentivized Verified User Anonymous Read full review	Use Hive for analytical work loads. Write once and read many scenarios. Do not prefer updates and deletes. Behind scenes Hive creates map reduce jobs. Hive performance is slow compared to Apache Spark. Map reduce writes the intermediate outputs to dial whereas Spark operates in in-memory and uses DAG. Incentivized Verified User Anonymous Read full review
Likelihood to Renew	No answers on this topic	Since I do not know the second data warehouse solution that integrate with HDFS as well as Hive. Yinghua Hu Senior Data Scientist Read full review
Usability	No answers on this topic	Hive is a very good big data analysis and ad-hoc query platform, which supports scaling also. The BI processes can be easily integrated with Hadoop via the Hive. It can deal with a much larger data set that traditional RDBMS can not. It is a "must-have" component of the big data domain. Incentivized Verified User Anonymous Read full review
Support Rating	Apache Flume is open-source so support is limited. Never the less, it has great documentation and best practices documents from their end-users so it is not hard to use, setup and configure. Incentivized Verified User Anonymous Read full review	Apache Hive is a FOSS project and its open source. We need not definitely comment on anything about the support of open source and its developer community. But, it has got tremendous developer support, awesome documentation. I would justify the fact that much support can be gathered from the community backup. Incentivized Ananth Gouri Assistant Professor Read full review
Alternatives Considered	Apache Flume is on par with Scribe with similar functions. Apache Kafka is a generation purpose while Apache Flume is specific to log aggregation. Google Pub/Sub and IBM MQ are costlier than Apache Flume ( open source ) and have a lot more cost associated with them. Apama Streaming Analytics and Tibco Steaming are more comprehensive streaming solutions than Apache Flume so for deeper performance guarantees, it is easier to use Apache Flume. Incentivized Verified User Anonymous Read full review	We have used a simple but necessary function such as merging certain data tables, which although they may be from different areas, complement each other or are necessary, you can use metadata if what you need is to validate the origin of your information and what impact it has, is also feasible. Incentivized Camilo Palacios Administrador informático. Read full review
Return on Investment	Positive impact on ROI due to a reduction in manual labor to generate and maintain compliance reports based on logs. Positive impact on the business objective by reducing the need for provisioning compute for log aggregate IT stack in advance but adding on an as-needed basis. Incentivized Verified User Anonymous Read full review	Good ROI for being able to access data easily across the network, we have large amounts of data and this is a good system to access it Good ROI for being easy to learn how to use for new employees, not much time spent which saves costs Good ROI for being able to integrate with Spark and other applications, hence data can be analyzed through programs Incentivized Verified User Anonymous Read full review
ScreenShots