Item: Treasure Data
Rating: 10
Author: Verified User

Overall Satisfaction with Treasure Data

Use Cases and Deployment Scope

Treasure Data is used by the Data and Analytics team to act as the basis for our entire data infrastructure.

Treasure Data makes the process of ingesting, organising, processing and then outputting data extremely easy, centralised and reliable. It allows our small data team to focus on the outcomes that the data supports, the use-cases, instead of dev-ops.

As such, the data inside of TD is used by everyone in the organisation in some form or another, from making data available to Looker, our BI tool, to pushing audiences out to Advertising Platforms, to generating complicated reporting for specific management stakeholders. The easy of not having to build connectors into services, or a workflow management system is a major benefit to us.

Pros and Cons

Pros

Workflow Management -> Easy to integrate saved queries, centralised, good debugging, powerful Directed Acyclic Graph functionality
Support -> Absolutely outstanding support, I have never had an issue which has been put in the "too hard, cannot fix, work around" pile.
Effective Management of Hadoop Clusters -> Interactive querying of our Hadoop clusters, never having to think about the dev-ops, availability or CPU load of our queries is an incredible force multiplier for us

Cons

The breadth of connectors to APIs is good, but some of the connectors are at best confusing, and at worst outright hostile to users. Some of the errors and connection settings are incomprehensible.
I would like to be able to run simple arbitrary scripts in the workflows, though I understand why this is hard.
I would like a breakdown of utilisation by query, to allow me to understand which elements of my workflows are potentially so inefficient as to be causing problems. While TD's compute power is (to me) effectively infinite this isn't a blocking problem. However on the understanding that it actually is finite, this is important. This will get more important the more we set up.

Return on Investment

It enables us to do things that no one else our size can do, and which cause bigger vendors like Google with BigQuery to get very uncomfortable when I tell them we can do everything they consider core USPs.
The most direct impact on ROI is supporting the creation and syndication of audiences to ad platforms based upon behavioural data which can only be stitched together between several sources (Shopify and Zendesk and Segment).

Innovative Uses

The JS API allows us to easily connect Node and Python scripts, which perform scraping and data analysis tasks, directly to our storage. This was surprisingly easy and painless
Easy access to GA reporting connectors allows you to hack together more complicated reports by querying the endpoint frequently and retaining data, than a single report would allow.

Alternatives Considered

Snowflake and Amazon Redshift

Treasure Data provides a combination of out of the box connectors, end to end functionality (Ingestion, Storage, Interactive Querying, Workflows and outputs all in one place) that no other solution we've found seems to do well. The fully managed nature of Workflow, combined with robust connectors allow us to spend the minimum time possible ensuring data pipeline integrity and the most time actually creating value as a Data Team.

Other Software Used

Looker, Segment, Amazon Redshift, Klaviyo

Likelihood to Recommend

The only scenario where I would not suggest Treasure Data, is either in organisations which are too small to pay the fee and must resort to open source solutions, or organisations so large and sophisticated that limitations around workflows and connectors are more critical than the overwhelming efficiency saving.

TD is good for any team which does not wish to invest significant resources in developing and maintaining their data infrastructure. Even teams with dedicated Data Engineers should benefit from those engineers working on more interesting issues than "keeping the lights on".

Comments

Please log in to join the conversation

Let's my team focus on the value we can bring, not building data pipelines