Good platform to analyze data
December 25, 2020

Good platform to analyze data

Anonymous | TrustRadius Reviewer
Score 8 out of 10
Vetted Review
Verified User

Overall Satisfaction with Treasure Data

I am part of a team that has been involved in the development of a customer data platform for an international consumer good producer.
The client has decided to adopt Treasure Data among other technologies. Using Treasure Data we are able to collect datapoints from different sources and rearrange/propagate them according to the datamodel designed by the business team.
The propagation of data is executed by scheduled workflows.
There exists a native segmentation tool, on top of this, it was necessary to develop workflows responsible to do deduplication of customer, housekeeping of datapoints, validation rules, data quality indexes etc.

  • Simple to set up and to use
  • Support is fully available
  • Cloud technology --> almost nothing is on prem
  • Documentation is not always fully update --> better off reaching to support for some topics that are not covered
  • Small bugs on the graphical user interface
  • If 2 people are editing on the same project simultaneously, the latter that saves the workflow overwrites the changes of the former one
  • When there are Treasure Data updates, there might be old functions that are deprecated or existing functions which no longer work as before --> this may have impact on existing workflows/queries
  • As many developers are working on the same environment, the jobs are queued because there is a limited amount of computation cores available --> if we want to increase it, our client needs to pay for more cores
  • As data are increasing, some workflows are too expensive and need to be rethought / made more efficient --> this means re-designing existing workflows and also requires constant support from Treasure Data which analyzes the queries and identifies points of improvement that allows client to pay less
  • For GDPR, there is a workflow that automatically sends a PDF report translated into the client native language containing all their information/data
  • To schedule different workflows, a new scheduler was created without using native scheduling functions
  • Used deduplication rules running in Python to identify potential double accounts from the same customer
When there are many different data sources and the amount of data is huge and we need to create segments Treasure Data is the right tool as it has a growing list of connectors as well as it supports Hive
May not be the right tool to run simple python code --> not as easy as SQL code (Presto or Hive).