High Performance Data Warehouse
Use Cases and Deployment Scope
We are using Amazon Redshift as a warehousing solution, where we are doing multiple ETL sync from clickstream events as well as transaction DBs.
We are doing analytics on the top this data and utilise this data to build and train data-science models.
We are in gaming industry we are solving business problem such as increasing the number of user gameplay, increasing the revenue, increasing the registration as well as the acquisition.
Pros
- Fast data retrieval from the table with complex joins via columnar storage and advanced query optimization techniques like parallel execution
- Great reliable integration with AWS MSK using Amazon Redshift Streaming a low-latency streaming ingestion, AWS Glue and S3
- Concurrency scaling and work load management - helps in segregating the load distribution based on roles
- Decoupled storage and compute using RA3 instance type
- Distribute cluster using Amazon Redshift data sharing i.e centralised write cluster with multiple readonly cluster
Cons
- Data governance can be better
- Data catalog and data discovery
- Data lineage
Likelihood to Recommend
For data integration using Amazon MSK and seemless integration with Transaction DB.
Faster data retrieval with complex joins as well as it is giving functionality to add dist key as well as sort key to make the performance better.
Vacuum and Analyse command for improvement is the cheery on the top.