Business Intelligence is Just One Flavor of Analytics
Business Intelligence (BI) is a critical tool for understanding what is happening in a business, even if recently it has been somewhat eclipsed by emerging technologies that extend the promise of BI-type analytics far beyond its roots in business reporting and dashboards.
Both BI and more recent technologies like data science platforms are predicated on gaining business insight based on data analysis. The primary function of BI tools is to explore past trends in a business’s historical data and compare data across time, or evaluate the impact of certain events on the bottom line. Machine learning and data science tools, on the other hand, use algorithms to understand and predict a business’s future performance.
The kind of analytics provided by BI tools is often referred to as “descriptive analytics”, meaning that they describe in some detail what has already happened. But this is just one flavor of analytics. BI today is best considered as one component of a continuum of analytics capabilities which are different, but complementary to each other.
From Descriptive to Prescriptive
There are several other types of analytics in addition to descriptive analytics. For example, diagnostic analytics tries to find out why something happened. Predictive analytics use statistical models and forecasting techniques to answer the question “what could happen?”
The newest analytics kid on the block is called prescriptive analytics, and this class of tools uses optimization and simulation algorithms to provide guidance to the business by asking “what should we do?”
But the technology required to answer this “what should we do?” question is still emerging and can be confusing to say the least. There are a plethora of terms that get thrown around to describe this domain, frequently causing confusion and providing more heat than light.
Let’s look at some at these terms and explain what they mean:
1. Big Data
Big data is really the base on which advanced analytics sits. Business data is no longer only collected from operational and transactional internal systems, but also from physical devices like sensors and machines, and from human sources like social media, image designers, etc.
Many large companies now have terabytes of unstructured data like streaming data, video data, machine data from devices, etc. which can be used to improve business decision-making and business outcomes. Without this mass of data, data science and AI machine learning would not be possible (or necessary).
2. Artificial Intelligence (AI)
Artificial Intelligence has been around since the 1950s and is really a superset of machine learning and deep learning. This is the most general category name. The essence of AI is developing computer systems that can perform tasks that are normally thought to require human intelligence. Today, as we will see, machine learning and deep learning are the specific applications within AI that make this possible.
3. Machine Learning
Machine learning is an application of Artificial Intelligence that allows software to become more accurate in predicting outcomes without being explicitly programmed. The idea is that a model or algorithm is used to get data from the world, and that data is fed back into the model so that it improves over time. It’s called machine learning because the model “learns” as it is fed more and more data. The essence of a machine learning system is a self-driving car. Another good example is Amazon’s recommendation engine which gets better the more data it has to process.
Machine learning is sometimes considered to be identical to the software category Predictive Analytics. This is not exactly correct. Prescriptive Analytics is less a specific technology than it is a data practice: It is the analysis of historical information to find patterns which allow predictions to be made about future events. This is often achieved by using machine learning technology. A narrower business application of machine learning techniques is in the area of Predictive Sales Analytics tools which are used for lead scoring and predictive sales forecasting.
4. Deep Learning
Deep learning is one of the hottest topics in technology today and is really a very specific subtype of machine learning. Deep learning technology is inspired by the structure and function of the brain. What this means is that deep learning computers have artificial neural networks that are stacked on top of each other so that they can make connections a little like nodes in the brain.
The premise is that computers can be made to mimic our own decision-making. Deep learning enables computers to perform recognition tasks like handwritten text recognition, language translation, and speech recognition. For example, when the software is analyzing, say, stop signs, one layer is focused on colors, while another might be focused on shapes, and yet another tries to assess whether any given image is, in fact, a stop sign. Eventually, the software becomes capable of determining whether or not a given image is a stop sign with the same degree of accuracy as a human.
Google’s acquisition of the deep learning platform Deep Mind in 2014 has already led to the development of several real-world developments based on this technology. For example, Google uses deep learning for image recognition which enables it to automatically index images from all over the web, and provide accurate classification to improve image search. An extension of this is image enhancement, where missing image detail can be extrapolated from what the engine knows about other similar images to restore the missing details. This is also the technology that drives Google Translate.
5. Data Science
Data science is not synonymous with machine learning but is closely related. Data scientists use machine learning (and deep learning) algorithms when they build data models to try to answer questions or discover predictive patterns in huge volumes of data. The unprecedented volumes of data that are now available to data scientists for decision-making purposes have led to the use of big Graphics Processing Units (GPU) that were originally designed for graphics intensive gaming software, but are now frequently used to accelerate applications in cars, drones, robots and many artificial intelligence technologies.
An evolving set of software tools has also emerged. The most fundamental tool is probably Python, but there are other important tools like MATLAB, Rapid Miner, Hadoop, etc. These tools are used by data scientists to build predictive models and then refine them over time.
One well-known practical application of a data science predictive model is the Aerosolve model built by Airbnb, and now available as an open source project. The model uses machine learning techniques to predict the optimal price for a vacation rental based on a broad swath of variables including location, time of year, major events occurring at the location (like SXSW for example), guest reviews, rental photographs, and other variables.
Do Machine Learning Engineers and Data Scientists Do the Same Job?
Both categories of engineers do similar jobs and have skills that are equally sought after in the modern technology marketplace, but there are important differences.
Data scientists are somewhat like a cross between actuaries and computer programmers. They do statistical analysis and research to determine which machine learning approach to use, and then they write code to construct an appropriate model or algorithm. This is frequently written in R or Python.
The machine learning engineer works in tandem with the data scientist to scale the model and make it work efficiently in production. This sometimes involves re-writing the model in a more scalable language like Java or C++. Hence the machine learning role is more of a standard computer science role.
Machine learning engineers and data scientists use overlapping sets of tools, but given the recent excitement around machine learning and data science, there has been an explosion of platforms and tools in this general area. Here are some examples of tools often used in each role:
Data Science tools
- Anaconda (Python distribution)
- Apache Pig
- IBM SPSS
- Microsoft Revolution R
- Dataiku DSS
- Cloudera Data Science Workbench
- Domino Data Lab
- Wolfram Data Science Platform
- Rapid Miner