Azure Data Lake Storage Gen2 is a highly scalable and cost-effective data lake solution for big data analytics. It combines the power of a high-performance file system with massive scale and economy to help you speed your time to insight. Data Lake Storage Gen2 extends Azure Blob Storage capabilities and is optimized for analytics workloads.
N/A
Apache Derby
Score 7.0 out of 10
N/A
Apache Derby is an embedded relational database management system, originally developed by IBM and called IBM Cloudscape.
Azure Data Lake is an absolutely essential piece of a modern data and analytics platform. Over the past 2 years, our usage of Azure Data Lake as a reporting source has continued to grow and far exceeds more traditional sources like MS SQL, Oracle, etc.
If you need a SQL-capable database-like solution that is file-based and embeddable in your existing Java Virtual Machine processes, Apache Derby is an open-source, zero cost, robust and performant option. You can use it to store structured relational data but in small files that can be deployed right alongside with your solution, such as storing a set of relational master data or configuration settings inside your binary package that is deployed/installed on servers or client machines.
Apache Derby is SMALL. Compared to an enterprise scale system such as MSSQL, it's footprint is very tiny, and it works well as a local database.
The SPEED. I have found that Apache Derby is very fast, given the environment I was developing in.
Based in JAVA (I know that's an obvious thing to say), but Java allows you to write some elegant Object Oriented structures, thus allowing for fast, Agile test cases against the database.
Derby is EASY to implement and can be accessed from a console with little difficulty. Making it appropriate for everything from small embedded systems (i.e. just a bash shell and a little bit of supporting libraries) to massive workstations.
Azure Data Lake Storage from a functionality perspective is a much easier solution to work with. It's implementation from Amazon EMR went smooth, and continued usage is definitely better. However, Amazon EMR was significantly cheaper overall between the high transaction fees and cost of storage due to growth. The two both have their advantages and disadvantages, but the functionality of Azure Data Lake Storage outweighed it's cost
SQLite is another open-source zero-cost file-based SQL-capable database solution and is a good alternative to Apache Derby, especially for non-Java-based solutions. We chose Apache Derby as it is Java-based, and so is the solution we embedded it in. However, SQLite has a similar feature set and is widely used in the industry to serve the same purposes for native solutions such as C or C++-based products.
Instead of having separate pools of storage for data we are now operating on a single layer platform which has cut down on time spent on maintaining those separate pools.
We have had more of an ROI with the scalability as we are able to control costs of storage when need be.
We are able to operate in a more streamlined approach as we are able to stay within the Azure suite of products and integrate seamlessly with the rest of the applications in our cloud-based infrastructure
Being Open source, the resources spent on the purchase of the product are ZERO.
Contrary to popular belief, open source software CAN provide support, provided that the developers/contributors are willing to answer your emails.
Overall, the ROI was positive: being able to experiment with an open source technology that could perform on par with the corporate products was promising, and gave us much information about how to proceed in the future.