Open source Hadoop: smart choice smart price
Use Cases and Deployment Scope
We are using the Apache Hadoop to handle the data which is continuously coming from different devices in real time from different geographical location across the globe and then run spark jobs and notebook to ingest the data and process it and then load it other external systems for further processing.
Pros
- It’s ability to handle magnitude of data is what makes Hadoop a go to open source product
- It’s open source nature makes if quite configurable
- Its community support is superb.
Cons
- It’s set up is quite complex which requires good knowledge of it
- It’s fine tuning in terms of configuration requires in depth knowledge of the product
- It’s logging can be improved
Likelihood to Recommend
When you have real time data which amounts to massive volumes close to terabytes daily, it’s become quite imperative that we should have a system which can handle it and ingest without losing it. Having Hadoop in place makes our product more robust, its stability comes handy. <div>
</div><div>The only challenge in running huge clusters is it require huge amount of space and memory for efficient working.</div>
