Data Science Platforms
Best Data Science Platforms
- Top Rated Data Science Platforms include: Anaconda, KNIME Analytics Platform, RapidMiner Studio, and Databricks Unified Analytics Platform.
- Other Data Science Platforms on the TrustMap include: IBM Watson Studio, Data Science Workbench, and TIBCO Statistica.
- A complete list of Data Science Platforms is available here.
TrustMaps are two-dimensional charts that compare products based on satisfaction ratings and research frequency by prospective buyers. Products must have 10 or more ratings to appear on this TrustMap, and those above the median line are considered Top Rated.
Data Science Software Overview
What is Data Science Software?
Data science technology can help organizations turn their data into a valuable resource in the creation of business value. Data science tools are capable of handling data volumes that are too big for traditional databases or statistical tools.
Data science tools create value by mining large amounts of structured and unstructured data to identify patterns can help an organization to more effectively manage costs and achieve competitive advantage.
Data science tools incorporate a variety of component technologies such as machine learning, data mining, data modeling, data mining, and visualization.
Why did Data Science Technology Emerge?Data science is an emerging response to the unprecedented volumes of data that are available to businesses for decision-making purposes. The desire to derive social and economic value from this newly available data is limited only by the lack of expertise and tools. Data science tools have emerged to fill this gap.
Data Science Software Features & Capabilities
Data Science platform features vary considerably from one product to the next. Available features also differ between features for bona fide data science platforms and machine learning tools which are really a subset of data science.
Data access and ingestion
Data visualization and discovery
Integration with big data tools like Hadoop
What Skills are Required?
The skill sets required for true data science are in short supply. This puts pressure on tool developers to reduce complexity to increase the potential user pool. In general, data scientists have advanced analytics skills like actuaries, who calculate insurance risks and premiums. They are quantitative researchers who typically have strong programming skills.
What do Data Scientists Do?
Although data scientists write code, they are quite different to developers. One of the main differences is that data scientists are primarily engaged in research, rather than in building products.
The job performed by data scientists is largely experimental. They build models that are designed to be predictive and then run them to see how well they perform. Then they tweak them and run them again.
The kinds of problems data scientists try to solve use huge amounts of unstructured data. Some examples are models that accurately predict customer churn or models to optimize vacation rental pricing based on location, time of year, etc.
Collaboration with Business
Agile development processes require a user or client to articulate the business need and provide constant feedback on what is being built. Data science teams are similar. They are most effective when they work closely with the people who actually use the models that they build. Without this alignment, the models built may not effectively solve the business problem that the business unit is struggling with.
Types of Data Science Platforms
Some platforms are primarily about model development and contain coding language capabilities to this end. Data Science models are typically quite complex and require advanced coding skills and often specialized hardware. Data scientists also frequently utilize many machines concurrently by spreading work across them.
A number of platforms do not contain languages for writing code but, instead, allow products like R, SAS, or Python to execute model code. Instead, they function as a system of record for all the models being developed by an entire data science team.
Data scientists use a wide variety of tools some of which, like Python and Apache Spark and Hadoop, are open-source. Data scientists also use data mining tools, NoSQL databases, statistical computing tools like R, and others.
Enterprise data science platforms that keep track of projects and automate some of the code writing are relatively new arrivals. These platforms charge for an annual subscription. The price can vary depending on whether the software is running in the cloud or on-premise. There are also likely to be usage fees for compute time.
Data Science Products
IBM Data Science Experience is a collaborative, cloud-based environment providing data scientists with a variety of tools including RStudio, Jupyter, Python, Scala, Spark, IBM Watson Machine Learning, and more.
Alteryx Analytics is a business intelligence and predictive analytics offering.
MatLab is a predictive analytics and computing platform based on a proprietary programming language. MatLab is used across industry and academia.
RapidMiner Studio is a data science and data mining platform from RapidMiner in Cambridge, Massachusetts.
Anaconda is an open source Python distribution / data discovery & analytics platform.
TIBCO Statistica offers a comprehensive and open data science portfolio featuring the latest techniques – via code free and open source programming. According to the vendor key features and business benefits include: • Accelerate innovation and solve your organization’s most complex problems by s...
Databricks in San Francisco offers the Databricks Unified Analytics Platform, a data science platform and Apache Spark cluster manager.
IBM Streaming Analytics is a fully managed service that frees you from time-consuming installation, administration, and management tasks, giving you more time to develop streaming applications. It is powered by IBM Streams, an advanced analytic platform that you can use to ingest, analyze, and co...
Wolfram's flagship product Mathematica is a modern technical computing application featuring a flexible symbolic coding language and a wide array of graphing and data visualization capabilities.
Swiss company KNIME offers their KNIME Analytics Platform for big data and predictive analytics.
Microsoft R (formerly Revolution R) is a big data R distribution for servers, Hadoop clusters, and data warehouses. Microsoft acquired original developer Revolution Analytics in 2016. Microsoft R is available in two editions: Microsoft R Open (formerly Revolution R Open) and Microsoft R Enterp...
SAP Predictive Analytics is, as the name would suggest, a statistical analysis and data mining platform that can be deployed with SAP HANA.
Cloudera Data Science Workbench enables secure self-service data science for the enterprise. It is a collaborative environment where developers can work with a variety of libraries and frameworks.
Alpine Data Labs in San Francisco, California offers TouchPoints, their latest predictive analytics platform.
Angoss Software in Toronto offers KnowledgeSTUDIO, billed as an easy to use data mining and predictive analytics platform. Angoss Software was acquired by Datawatch Corporation in January 2018, Angoss products are now supported by Datawatch.
The FICO Model Builder is a predictive analytics and modeling tool, from FICO.
Dataiku is a French startup and its product, DSS, is a challenger to market incumbents and features some visual tools to assist in building workflows.
Domino Data Lab in San Francisco offers the Domino data science platform. It accelerates the development and delivery of models and increases data scientist productivity.
Wolfram Data Science Platform enabling a full spectrum of data science analysis and visualization and automatically generating rich interactive reports.
DataScience.com is a scalable, collaborative platform with features that cater to data science, business, and IT teams. It was acquired by Oracle in 2018.
SAS Enterprise Miner is a data science and statistical modeling solution enabling the creation of predictive and descriptive models on very large data sources across the organization.
Teradata offers their data science platform, the Teradata Analytics Platform, a unified analytic and data framework integrating SQL, R, Python, and a wide array of tools (RStudio, Jupyter, Teradata Studio, etc) to provide a flexible solution for machine learning and advanced analytics.
IBM CPLEX Optimization Studio is a mathematical decision optimization application, for building applications or deploying optimization models.