TrustRadius: an HG Insights company

Dataiku

Score8.5 out of 10

25 Reviews and Ratings

What is Dataiku?

The Dataiku platform unifies data work from analytics to Generative AI. It supports enterprise analytics with visual, cloud-based tooling for data preparation, visualization, and workflow automation.

Top Performing Features

  • Extend Existing Data Sources

    Use R or Python to create custom connectors for any APIs or databases

    Category average: 8.9

  • Automatic Data Format Detection

    Automatic detection of data formats and schemas

    Category average: 9.2

  • Visualization

    The product’s support and tooling for analysis and visualization of data.

    Category average: 8.2

Areas for Improvement

  • Flexible Model Publishing Options

    Publish models as REST APIs, hosted interactive web apps or as scheduled jobs for generating reports or running ETL tasks.

    Category average: 9.2

  • Security, Governance, and Cost Controls

    Built-in controls to mitigate compliance and audit risk with user activity tracking

    Category average: 8.5

  • MDM Integration

    Integration with MDM and metadata dictionaries

    Category average: 7.8

Data Scientist and Customer Friendly

Use Cases and Deployment Scope

We use Dataiku to build automated flows for specific data science problems. Mostly ETL scripts to clean the data, and also to monitor for data drift. Because its visually easy to use, its also great for less technical users to follow along with the flows and what we are doing when we demo it

Pros

  • great ETL, super easy to clean datasets
  • Visually easy to understand and use
  • Love having multiple sections/flows
  • When samples shown, they are intuitive

Cons

  • I think the drift detection functionality could be better
  • sometimes I feel like operations are slow ( but that might be on my hardware side)

Return on Investment

  • On one team, when a data engineer was sick, a lot of the ETL work was nevertheless done using dataiku
  • It just sped things up and freed us up for more analysis.

Usability

Alternatives Considered

Azure Data Factory and Azure Databricks

Dataiku DSS: Click or code--the choice is yours!

Pros

  • The intuitiveness of this tool is very good.
  • Click or Code - If you are a coder, you can code. If you are a manager, you can wrangle with data with visuals
  • The way you can control things, the set of APIs gives a lot of flexibility to a developer.

Cons

  • The integrated windows of frontend and backend in web applications make it cumbersome for the developer.
  • When dealing with multiple data flows, it becomes really confusing, though they have introduced a feature (Zones) to cater to this issue.
  • Bundling, exporting, and importing projects sometimes create issues related to code environment. If the code environment is not available, at least the schema of the flow we should be able to import should be.

Return on Investment

  • So far it has had a positive impact. Multiple departments are coming to us with their business problems.
  • I can't specifically say about ROI as I'm a developer, though I have heard this solution is economical compared to other AI/ML enterprise tools.
  • By using this tool, my client has let go of software that was used earlier, and we have created a simpler framework to replace that software.

Alternatives Considered

Alteryx Analytics Gallery, KNIME Analytics Platform and H2O

Usability

Dataiku DSS - One-Stop Solution for All Data Science Applications

Use Cases and Deployment Scope

Dataiku DSS is being used in my team to perform various tasks which ranges from data preprocessing to machine learning model creation. It provides a one-stop solution to fetch data from different sources such as Amazon S3, SQL Server databases, etc. and merge them onto a single platform. We use Dataiku DSS to perform data imputations, data cleaning and feature engineering to prepare datasets for creating machine learning models. We also extract business insights (data analytics) using various statistical methods and visual representations such as scatter plots, histograms, boxplots, etc. Furthermore, optimized ML models are created which are used to predict/forecast target variables and drive business decisions.

Pros

  • Allows users to collaborate and monitor individual tasks
  • Caters to both types of analysts, coders and non-coders, alike
  • Integrate graphs and plots with visualization tools such as Tableau

Cons

  • Its community support is very limited at the moment
  • Complex to integrate with automation tools such as Blue Prism

Most Important Features

  • Very friendly interface for users
  • All data analytics services provided on a single platform
  • Keeps track of all models created and every actions performed on a dataset

Return on Investment

  • Customer satisfaction
  • Timely project delivery

Alternatives Considered

Anaconda

Other Software Used

Microsoft SQL Server, Anaconda, Blue Prism Intelligent RPA

Low-Code Open-Source Data Analytics Platform!

Pros

  • Low-code platform.
  • Open source version includes most valuable modules.
  • User friendly documentation.

Cons

  • End product deployment.

Return on Investment

  • Given its open source status, only cost is the learning curve, which is minimal compared to time savings for data exploration.
  • Platform also ease tracking of data processing workflow, unlike Excel.
  • Build-in data visualizations covers many use cases with minimal customization; time saver.

Alternatives Considered

RStudio, Domino and TIBCO Spotfire

Other Software Used

RStudio, Domino, TIBCO Spotfire

Dataiku - a complete Data Analytic and AI/ML solution

Pros

  • Very intuitive and easy to use UI, making a lot of types of users can collaborate with each other easily, by visualizing the same workflow.
  • Many building blocks can be reused immediately, avoid a lot of non-standard boiler plate implementation.
  • Data pre-analysis and feature engineering assistance increase the productivity as well as the efficiency of data scientists.
  • Many data connectors support wide range of data storage, from SQL, TeraData, Hadoop Hive, etc.
  • Support from research till final MaaS solution deployment.

Cons

  • The visualization feature of flow still has a lot room to improve, when the flow is complex.
  • The "non-coding" template/building block for deep learning lack of many important configurable parameters.
  • Lack of the unified way to allow applying the "design pattern" on the Python codes (if we want to develop our own module or building blocks.

Return on Investment

  • Dataiku provides a consistent platform, covering almost all needs from the data analytic till AI/ML areas.
  • This platform "glues" all departments and business flows and IT data source together, making the data more exploitative.

Alternatives Considered

Anaconda

Other Software Used

Chameleon, Cloudera DataFlow (formerly Hortonworks DataFlow), Sparx Systems Enterprise Architect