Item: Treasure Data
Rating: 7
Author: Verified User

Overall Satisfaction with Treasure Data

Use Cases and Deployment Scope

We use it mainly as a logging tool and to take snapshots of important collections. It is also our gateway between the production database and Redshift, which is used by less technical users. Any task that requires use of big data goes through the Treasure Data pipeline. Currently, it is only used by the Engineering team.

Pros and Cons

Pros

It uses Hive, which allows you to analyze TBs of data in a reasonable amount of time.
Since some tables may have "duplicate" records with respect to some columns, TD provides functions that allow you to pick essential data from the different records that represent "the same event".
When you need faster queries, there's also Presto. Presto does not have the overhead of Hive.

Cons

When exploring a table, there should be a fast way to query it. E.g., a button that says "Query Table" and upon clicking on it, it opens the query page with boilerplate SQL prewritten
When a database has too many tables, the Query page becomes unresponsive while it loads a lot of data from all the tables. There should be a way to opt out of that behavior.
Bad error messages when a query fails. (For example, I've received errors about a parenthesis when the real issue was I didn't assign an alias to a subquery.)
I often use the same tables. There should be a tab with "My Most Used Tables" or something like that, so I can get faster access to what I need in order to do work.
API throws 404s in some instances where it should be returning 403s or 401s. It becomes hard to debug when new team members haven't received the same level of access as older team members.

Return on Investment

This question might be more appropriate to team leads.

Innovative Uses

There is a data visualization software we use which seems to be able to pull data from Presto TD easily. This speeds up developing time, although it is not as performant as pushing data to redshift beforehand.

Alternatives Considered

Looker, MongoDB and Amazon Redshift

We still use all of the above. They are part of an ecosystem of data software products and each of them has its own purpose. As I mentioned before, easiness of "writes" to TD and the capability of querying vast amounts of data in a reasonable time are a reason we will not be letting TD go any time soon.

Other Software Used

Looker, MongoDB, Amazon Redshift

Likelihood to Recommend

It is a great solution for storing (and querying) a large amount of data. Its API is mostly good (I would love to have some more documentation. There's a chunker I still have no idea how to use, and a row handler that I basically ended up copying from a colleague's code when I needed my own). Building reports from TD is pretty simple.

Not good as a production database. Response times can be lengthy, which would drive users away.

Using Treasure Data

Usability

It took me just a few minutes before I managed to get started using their system. Aside from minor bumps, I think it's pretty straightforward and it's a great tool.

Usability Pros and Cons

Pros	Cons
Like to use Relatively simple Easy to use Well integrated Consistent Quick to learn Convenient Feel confident using Familiar	None

Easy Tasks

The Query interface is pretty straightforward. It even highlights syntax errors as they happen.
You can query from a python shell via an API call. That allows you to build dynamic queries that depend on other data (such as dates or some particular ids you might be interested in)

Difficult Tasks

When the results are huge, it is not very easy to process the results via API call. There are row handlers and chunkers, but these are not well documented (or, if they are, they're not easily accessible, as I couldn't find them).
The Query interface attempts to load all the (metadata?) tables in the database, which makes the website unresponsive for a bit.
Poor error messages when the query was not typed directly in the interface.
There are no line numbers in a query. This would simplify detecting where the problems are when there are syntax errors.

Comments

Please log in to join the conversation

Easy interface to use Hive