Companies collect large quantities of operations data as a by-product of doing business. Huge quantities of data are stored in finance, procurement, sales, marketing systems and multiple other data repositories. Being able to analyze and understand this data is extremely important to running the business. For example, Enterprise Resource Planning (ERP) systems typically contain data concerning the supply chain and inventory levels in addition to financial data. HR systems contain all employee records including demographic data, salary level, and performance reviews. Customer Relationship Management (CRM) systems contain customer, sales pipeline, forecasting and sometimes customer support case data.
The problem is that all this operational data is typically not accessible in one place for analysis in order to make decisions and provide strategic guidance to the business as a whole. For example, inventory data from an ERP system could be combined with sales forecasting information to understand how to optimize inventories in response to demand. This is the problem that business intelligence systems were designed to solve.
Traditional business intelligence solutions solve this problem by putting data into a common store called a warehouse. The data is then normalized - removing redundancy and duplication - making it easier to run queries and retrieve data for reporting. Newer data discovery and visualization platforms solve the problem differently, by either connecting directly to the various data sources, or storing data in-memory for analysis and visualization. There are many different types of business intelligence technology, not all of which depend on the business warehouse paradigm. Many new approaches have emerged, and the following sections describe some of the major classes of business intelligence technology.
On-premise full stack BI solutions have been around the longest and are now being eclipsed by newer, more flexible technology. However, thee tools still have a very large installed base, and are still very effective for managing structured data from many sources and structuring it for standard reporting across the enterprise. They have a number of key components, although every solution does not necessarily have each component of the stack:
- Data warehouse: A relational database designed specifically for data analysis instead of standard transactional processing. It acts as the conduit between operational data stores and the gaining of insight based on composite data. Slices of data from the warehouse—usually summary data for a single department like sales or finance—are stored in a “data mart” for quicker access.
- Extract, Transform, Load (ETL): The first important task is to extract the data from the various data sources and load it into a data warehouse where it is normalized (organized into tables while cleaning the data and removing redundancy and inconstancies). Once it has been appropriately structured it is available for querying and analysis.
- OLAP or ad-hoc query tools: OLAP (Online Analytical Processing), and its close cousin ROLAP—(Relational) Online Analytical Processing, is a technology that allows users to query data across multiple dimensions, for building standard reports or for enabling users to ask a specific business question.
- Presentation layer: Dashboards, scorecards and reports presenting the data to users in a visually appealing way that is easy to understand.
These tools are useful for organizations that wish to deliver relatively stable operational reports in a consistent format to front-line staff across the organization to help them monitor their progress or understand where performance is lagging. The advantage of this kind of enterprise reporting capability is the consistency of the data sets being used across the entire organization, which makes it easy to create alignment. It is notoriously difficult to achieve alignment if there is no common agreement about the accuracy of data, and stakeholders have different sets of data showing contradictory information. This is typically what people mean when they refer to a “single source of the truth”.
However, on-premise first, full-stack BI systems are difficult to build and implement, expensive, and often difficult to learn and use. They also lack flexibility and are difficult to change once they have been built. It has been relatively common in recent years for publications and analysts to bemoan the high failure rates for BI projects, and full-stack deployments are often the culprits. Implementation times for these tools can be long, because setting up the data warehouse and creating the schema are inherently IT-intensive, complex tasks. When they are finally up and running, the ROI can be low, often because of usability problems.
However, it should be pointed out that this does not have to be so. In recent years, a new category of data warehouse automation tools has emerged to mitigate these problems. Products like TimeXtender, Kalido, WhereScape, and Attunity go some way to making data warehouse creation and maintenance a far more agile and collaborative experience. These tools are capable of automating the creation of a data warehouse schema, indexes, cubes, etc. They can also create business metadata for specific business intelligence tools. In this way, they can dramatically simplify and speed up both data warehouse development and subsequent maintenance.
Additionally, not all tools in this category are legacy tools. There are more modern approaches to providing end-to-end capabilities using newer technology. A good example of this is Sisense, which uses a more flexible version of OLAP cubes, called “elasticubes”, and leverages data storage provided by the chip set to eliminate some of the speed limitations of disk storage. This approach yields significant speed increases of more than 50x over the competition.
Full-stack BI tools built on a data warehouse can still provide immense value to larger organizations with the resources to deploy and manage them, and the deep pockets required to invest in them.
Best Fit For
- Organizations whose primary need is for alignment and consistency of data across a very large organization and the provision of accurate reports to line of business managers and operational employees. These tools provide “a single version of the truth” as a basis for decision making across an entire enterprise.
- Organizations with access to a highly skilled IT division, which includes ETL developers, report developers, data architects, data administrators and—very importantly—corporate trainers. (However, some newer products that attempt to radically simplify both deployment and usage need far less IT oversight).
The primary reason for choosing open-source BI tools is often perceived cost. Commercial BI tools are still largely seen as having superior technology, while open-source tools are viewed as offering good-enough technology at a fraction of the price. But although download of the software can be completely free, large-scale open-source deployments can still turn out to be a significant investment when factoring development costs. Also, there are very often commercial versions of the products that offer capabilities that the core free product does not. These typically include enterprise-level features like integrated security, connectivity to multiple data sources, administration tools, etc.
It is also important to bear in mind that these are developer-led tools and are designed with a developer mindset, which often means that significant development resources will be required to deploy and integrate them in an existing corporate environment.
There is however renewed interest in open-source BI tools today, partly fuelled by the extraordinary success of open-source products like Hadoop and Revolution Analytics R, recently acquired by Microsoft, which have raised awareness of the open-source approach.
- Actuate (commercial product built on open-source technology)
- Jaspersoft (acquired by TIBCO in 2014)
Best Fit For
- Open-Source BI can be a good choice for organizations that have the technical expertise required to integrate the code base and make it work effectively within the organization. Typically these tool sets are very complete, due to the large number of developers working on the code base.
- Open-source reporting engines are a particularly logical case for application vendors looking for a reporting engine to integrate into their product.
Cloud full-stack BI products are a subset of full-stack products. They tend to include a data store, an ETL and semantic layer, and a range of front-end presentation tools sitting on top. The difference is largely in the deployment model (cloud versus on-premise). However, it makes sense to consider these solutions independently since they have some unique characteristics. For example, they are far easier to deploy, and do not require nearly as much IT oversight as traditional full-stack BI products.
Increasingly traditional full-stack BI providers are offering cloud versions, but most are single tenant, i.e. a single instance of the software supporting a single customer. Cloud-only full-stack BI products like Birst and GoodData are true multi-tenant SaaS products deployed on public clouds and offer all the advantages of true SaaS products, i.e. lower cost, frequent updates, and no data center infrastructure required. Tableau introduced a cloud-version of the product, Tableau Online, in 2013.
This familiarity with the cloud paradigm for enterprise business systems, in conjunction with massive and growing demand for analytics by business users, has spurred the development of cloud BI systems, and a basic comfort level with analytics in the cloud. It was inevitable that as operational data moved to the cloud, analytics would soon follow, and cloud BI is fast becoming ubiquitous, despite some reservations among highly regulated industries. Cloud BI platforms are often positioned as splitting the difference between legacy and discovery tools, offering the ease-of-use of discovery tools, with legacy data integration capabilities.
Cloud BI has been talked up as the next big thing in the BI world for some time now, but adoption has been has been slower than expected. One of the major obstacles has been concern over data security—corporations have been reluctant to put sensitive data in the cloud. However, as more operational data is located in the cloud as cloud-based operational systems like Salesforce, NetSuite, Zendesk, SuccessFactors and a multitude of others become ubiquitous, cloud BI adoption is becoming much more mainstream. This familiarity with the cloud paradigm for enterprise business systems, in conjunction with massive and growing demand for analytics by business users, has provided a basic comfort level with analytics in the cloud. It was inevitable that as operational data moved to the cloud, analytics would soon follow, and cloud BI is fast becoming ubiquitous, despite some reservations among companies in highly regulated industries.
Best Fit For
- Organizations that have come from the Internet world and have been using SaaS applications like Salesforce and SuccessFactors to run their businesses. These organizations are likely to have fewer security concerns around storing their data in the cloud.
- Organizations of all sizes that want a much easier to deploy, less IT-centric version of the full-stack products allowing “single version of the truth” reporting across a department or a whole company.
- Smaller organizations with a limited budget that want a fully featured system at far lower initial cost due to the absence of any capital outlay for on-premises infrastructure.
Data discovery and visualization tools are designed for data analysts and more technical business users. The focus of these tools is not really reporting and monitoring, but rather ad-hoc analysis of multiple data sources. They provide data analysts with an intuitive way to sift through large volumes of disparate data to expose patterns and outliers hidden in the data. They replace the traditional rows and columns of traditional data presentations with graphical pictures and charts.
These tools have taken the BI world by storm, largely because of the low cost of implementation and because they do not require IT support. Ease of use is another key feature encouraging rapid adoption. They allow end users with some comfort level in data analysis to access multiple different data sources and perform data mash-ups and display the results in visually compelling ways. For example, a company might produce a visualization of expenses by department across a large enterprise to help hone in on outliers and figure out the reason for the disparity.
- IBM Watson Analytics
- QlikView and Qlik Sense
- MicroStrategy Analytics Desktop
- SAP BusinessObjects Lumira
- Tableau Online, Desktop & Server
- TIBCO Spotfire
Best Fit For
- Business analysts requiring access to data from disparate systems, and the ability to blend the data with no required IT assistance, and produce visually compelling images to understand the data and tell a story.
- These are not the right tools for providing a reporting infrastructure across an entire company and very few companies use these tools as their corporate BI standard, but it is also rare for at least one of these tools not be used at an individual or departmental level.
Big data does not describe a single technology or initiative, but rather a broad trend that is affecting all kinds of organizations. Big data technology emerged in response to the enormous volumes of data that have inundated organizations in recent years, and that are beyond the capacity of traditional business intelligence tools to process and manage. The problem that Big Data technology vendors are trying to solve is how to actually use this data to improve business outcomes. Terabytes of digital information are collected from actual physical devices like RFID sensors and machines, along with human-sourced communications like text image or video. Most existing BI systems cannot easily comprehend this kind of data, as they have been designed to make sense of highly structured data organized in tables and stored in a data warehouse. That leaves a vast quantity of potentially very useful data out in the cold. This is the driver behind the rapid ascension of the Hadoop and noSQL data stores like MongoDB and Cassandra, and the constellation of products that have accrued around them.
The value the big data technology can bring to the enterprise is varied and profound. Here are some typical use cases among many:
- IT Data Center Optimization: Running a large, complex modern data center is not an easy task. A large data center can produce terabytes of plain text log files. Big data systems can help analyze this massive volume of log files to understand the root cause for any system breakdown, or sub-optimal performance. These systems can analyze terabytes of data daily to decipher what is happening across the stack with every single transaction. Without big data systems, this is impossible.
- Fraud Detection: Fraud detection is all about building models in order to identify customers engaging in fraudulent behaviors. The problem is building these models however, is the underlying data. Because the volume of transactional data is so intimidatingly large, models are usually constructed on subsets or segments of the entire data set. Partial data and high latency can seriously reduce the predictive power of these models. Big data tools allow models to be built on the entire data set and with very low latency, thus vastly improving the power and accuracy of the predictive models.
- Call center analytics: Big data models can help to understand customer loyalty decay, and to remediate customer dissatisfaction at key touch points to increase customer loyalty.
- Social media analytics: Analysis of torrents of data in the form of social media streams can provide insight into what customers are saying about a company and its products along with those of competitors. While this sentiment analysis is important, the real power of social analytics is linking this sentiment data to transactional data to understand how sales promotions, loyalty programs and competitor activities correlate to this social sentiment.
- Hadoop Infrastructure
- Big Data Analytics
- SQL on Hadoop
Best Fit For
- Companies who need to analyze very high volumes of data, from very diverse data sources, to solve pressing and complex business problems.
- Data-rich organizations with an IT department closely connected to business units, and with a strong desire to use their data to gain competitive advantage.
Several BI vendors also sell their products to ISVs to embed analytics capabilities in their own products. While this is not a separate class of products, this specific use case has become increasingly important as companies grapple with ever-growing data volumes and become more familiar with data discovery and visualization tools. Many software vendors realize that built-in analytics capabilities are critically important to the success of their products in the marketplace and are faced with a critical build-or-buy decision. They can add significant competitive advantage to their own customers by providing tested BI technology. This can allow them a rapid time to market and better cost management than building capabilities in a domain where they have limited expertise.
In the early days of this model, companies embedded proprietary code in their products using APIs provided by BI vendors. However, web-based solutions no longer need to be embedded, but reside adjacent to the application, greatly simplifying deployment and administration.
The market initially developed around open-source products designed primarily for developers like Logi Analytics, Tibco Jaspersoft, OpenText and Pentaho (Hitachi Data Systems). The emergence of cloud hosting infrastructure allows vendors to make their solutions available in platform-as-a-service cloud environments.
This use case is quickly becoming pervasive in the marketplace. GoodData, for example, is now almost exclusively focused on this market. In addition to GoodData, embedded solutions are also available from Logi Analytics, Tibco Jaspersoft, OpenText and Pentaho (Hitachi Data Systems), Birst, Qlik, Tableau, Looker and Sisense.