TR: What distinguishes an organization that is successful with business intelligence (BI)?
BD: The important thing to remember is that it’s usually not about tools, and is more likely to be about organizational maturity and the ability to get decisions made and get things done. Many organizations I encounter are somewhat internally dysfunctional and this ability to get things done is weak or even non-existent.
TR: What steps should an organization take to establish a BI program or strategy?
BD: I prefer to work on larger enterprise-wide programs where organizations are looking at bringing coherence to their data across departments and divisions. These large cross-functional efforts are best thought of as a process containing multiple projects. The leader of such a major process change needs the ability to understand multiple stakeholders with various different needs and different timeframes. Managing through what I called a “staged implementation roadmap” is essential. In order to do that successfully you need strong program management skills. It’s important to be politically aware and understand things like the consequences of deciding to work with one department or division before another.
TR: Who drives – IT or business?
BD: Perhaps I should answer that by drawing a distinction between who should drive and who is usually driving when I get there! There is no question that a major project should always be driven by a high-level executive on the business side – either a CEO or some direct report who can take a cross-enterprise view. The reality is that IT is often reluctantly in the driving seat, and trying to get out of it as fast as they can. I try to transition leadership from IT and get the business people to step up to their responsibilities. IT is a co-owner in a process like this, but should not drive.
TR: What is your review of the BI technology landscape today?
BD: The BI technology landscape has gone crazy. A large yellow elephant (Hadoop) has caused a certain amount of frenzy, and the idea of using cheap or free, open-source software platforms and inexpensive clusters of commodity hardware is driving craziness in the marketplace. In some sense whether you decide to use a more traditional, relational technology or Hadoop, is not really the issue. However, these emerging platforms are, by definition, emerging and untested. There is a ton of hype that is really distorting the marketplace today.
TR: How do you see the emergence of in-memory visualization tools like Tableau and QlikView? Where do they fit in the traditional relational technology landscape?
BD: It’s not an either/or decision. These platforms are in some ways the natural inheritors of the data mart world view. These visualization tools emerged as an answer to a set of business users who needed easier to use, more flexible tools with a quicker time to value. They are largely agnostic as to the data platform that sits underneath the data (although they are quasi-relational in approach). They emerged as the answer to an end-user need, and they answered that need very well indeed. However, many are now trying to get in bed with IT departments as they learn how to do enterprise level projects that go beyond the immediate needs of an individual analyst, and this will ultimately slow them down.
A few years ago it would have been said that Business Objects or Cognos, for example, were the data mart vendors delivering valuable tools to end users, but as they have scaled their products across the enterprise, few would say that of them today. QlikView and Tableau are certainly more nimble and have newer technology platforms, but they are replicating the same inevitable journey that MicroStrategy, Business Objects and Cognos made many years ago.
The newer players will carve a substantial niche in the market, but I don’t necessarily see them replacing the older platforms. Vendors like Cognos and Business Objects solve real business problems and have large installed bases – they will be around for a long time to come.
TR: What do you think about the emergence of cloud BI tools?
BD: This is more of an implementation question than anything else. When I look at BI from a high-level architecture standpoint, it doesn’t make much difference where the bits and bytes reside. Of course, there are financial considerations and major concerns with cloud security etc. but there is still a significant market for cloud-based tools. There are two sorts of organization that are “cloud-friendly”.
Firstly, organizations that have come from the internet world and have been using applications like Salesforce, and other cloud applications to run their businesses for years. For these organizations, cloud technology is part of their DNA and they are obvious targets for cloud BI vendors. The second class of potential clients are cost-sensitive organizations who see that cloud BI is less expensive – it has no initial capital expenditure, and much lower IT expense since there is no need to provide any on-premise infrastructure.
These two constituencies are large enough that cloud BI is likely to be quite successful over the medium- and long-term.
TR: Let’s talk about some of the work that you have done regarding the evolution of BI architectures, particularly as developed in your book, “Business unIntelligence “.
BD: My starting point is data architecture. Are there different types and categories of data? If so, what does that mean in how we manage them?
In the 1960s when computers were being born, the focus was on the automation of legally binding processes. A customer buys things from me, I need a contract governing the terms of sale, etc. It’s all about the company processes and legally binding agreements with customers and suppliers. The essential questions were, “how many of these widgets did we actually sell?” The desire was to build a single version of the truth. This has remained the basic focus of business systems ever since. Business systems generate structured, consistent data which is analyzed and understood by business intelligence systems that parse the data.
About ten or twelve years ago, the big data movement – especially social media data – started to happen. The important thing became looking at data from Facebook, Twitter and other social platforms to predict sentiment and behavior based on this data. This is human-sourced information rather than operational data, and this information is used to ask the questions like, “what might people be thinking of doing next?” This is not legal/operational data, so we don’t really need a single source of truth – we don’t need the same level of consistency or quality because this is human data.
After all, people lie or change their minds, so you can play more gently than in the world of profit-related operational data. Social media listening tools that can gauge sentiment and predict buyer behavior emerged to comprehend this data. A whole ecosystem of tools and infrastructure has grown up to deal with this new class of very important data.
The third big change is the “Internet of Things”, or machine-generated data from sensors in automobiles or phones, for example, or computers of various kinds. This kind of data has a structure which is somewhere between highly structured operational data and social media information. It’s structured because it comes from machines, but it changes a lot; it evolves much more quickly than operational data. This kind of data also has its own set of tools and technology architectures. The most obvious of these are the modern NoSQL tools like MongoDB.
These three data pillars replace the single pillar of operational/informational data, which was based on the paradigm of an operations application layer with a data warehouse or data mart layer. We’ve moved from a world of moving data through layered architectures to a world of vertical pillars of different kinds of data – each with its own architecture and tool sets.
TR: Will these different data “pillars” merge over time, or will they remain distinct?
BD: I believe that these pillars will remain separate. I do not believe that all of the data will end up in a data lake or warehouse. We will have a number of these pillars – it could be 5 or 6 data pillars that, for the foreseeable future, will be separate. The relational environment (whether in-memory or on disk) is optimized for dealing with how things relate to one another. This is crucially important for the process/legal side of the business and it is unlikely to ever go away.
One question we might ask is, “will this data ever get implemented in a Hadoop environment?” The answer is that many vendors are already trying to do this today. But it’s still a relational framework built on top of a different technology.
TR: Are there other data pillars we need to think about?
BD: The three already mentioned – relational, Hadoop and NoSQL – are the main ones. Another could be streaming data and processing tools. Streaming data moves so fast that it never even lands on a disk. There exist specialized “complex event processing” tools that can be seen as yet another pillar. We might consider enterprise content management environments as another pillar. How many we define all depends on the business and data management needs of a particular business at a specific time.
The number of data pillars is likely to expand and there will be overlap between them; they tend to blend a bit at the edges because vendors like to invade each other’s spaces. But from an architectural point of view I try to make them as distinct as possible.
TR: Is there anything important we have not talked about?
BD: One important topic that we have not discussed is data virtualization. The discussion of pillars that we have just had implies that different kinds of data reside on different platforms. Users want to pull and join data from these disparate platforms with minimal fuss, and this has growing importance in the BI world. This implies real-time access to any data source with no ETL or prior data modeling.
Vendors in the data integration space like Informatica have offerings here. There are also some specialized vendors like Composite Software, now a business unit within Cisco and Denodo which originated in Spain, but now operates internationally. Database vendors like Teradata and IBM are also building such functionality into their products.