Whether you are the Web Analyst, the Data Scientist, the simple user or the manager, everyone is trying to understand the exploitation of all the available data and to determine the real benefits for the company. The amount of information has gone from scant to abundant in a few years. Among the most important challenges, four are to be noted: the explosion of information, the increase of exchanges on social networks, the proliferation of information consultation terminals, and the evolution of demography.
This brings new perspectives, but also many questions about the use of traditional technologies to exploit this massive amount of data. This new paradigm can be summed up in one sentence: an abundance of data without any real explanation or context makes it difficult to transform this data into actionable information.
All the examples that could be cited about the explosion of data show that data generation is happening at a faster and faster speed. It is therefore important to know how to treat this information to draw trends in terms of new business in particular perspectives such as fight crime, reorganize cities, improve customer knowledge, innovate faster in the life sciences, promote the collaborative economy, etc.
Fundamentals: Business Intelligence versus Big Data
Before getting into the heart of the subject of this post about the choice between the uses of Business Intelligence (BI) or Big Data technologies, let’s start with a reminder of the basics of Business Intelligence.
BI is a set of tools and techniques for collecting, cleaning up and enriching structured or semi-structured data for storage in different multidimensional SQL database forms. The data will therefore be managed in standardized formats to facilitate access to information and processing speeds.
The goal of BI is to produce performance indicators to understand the past, analyze the present to extrapolate a long-term vision for and define the future competitive advantages of the business. BI is used by a large number of internal or external users to support the business operations of the business up to strategic monitoring.
4Vs to better understand
Let’s try to better understand Big Data around the traditional definition of 4V by taking an example. A customer database contains the following information: last name, first name, gender, age, occupation, status, and so on. All of this information is stored in a traditional data warehouse. If we apply the definition of 4V to decide whether this application should migrate to a Big Data infrastructure, the answer would be negative.
The volume of data is no longer a problem in itself, we can speak today of large Data Warehouse. The variety of sources is therefore taken into account with new technologies and a low cost of integration of additional sources. The velocity is managed by the application data buses allowing an increase in the volume of data per unit of time. The veracity of the data, finally, is an immutable theorem in the analysis of data whatever the infrastructure.
Two different analysis methodologies
Let’s explore further and more deeply the data by introducing new dimensions of analysis: the detection of events, the chronology of events in the collection of information, the lapse of time between events or the situations or contexts that can qualify the data events occurred.
The demonstration can be done by example:
- 1st case: a consumer watches an advertisement, the next day he visits the website, two days later he calls a consultant and the next day he makes a purchase.
- 2nd case: a consumer buys a product, the same day he visits the website, then three months later he calls a consultant and the next month he looks at advertising.
These two cases show us the need to understand the events as well as the sequence. Even if in these two examples the customer has bought the same product, the analysis of the customer experience and its journey are radically different.
Now consider the case of a customer who speaks to a customer service advisor.
- 1st case: the customer visits the website twice a day and at the end of the day he calls a counselor.
- 2nd case: The customer visits the website twice during the day and finds the answer to his question without coming into contact.
Different interpretation of the data
The interpretation of the information will be different even if in both cases the customer has obtained the correct answer to his question.
In these two examples, we can easily measure the difference in Business Intelligence and Big Data. Thus, in the first example, the marketing implements precise sequences to capture and lock the client in a path defined according to business rules. The volatile, spontaneous, hybrid and undecided client constantly breaks the rules, pre-established pathways and incoming and outgoing marketing processes.
To understand its behaviour, it will be necessary to destructure the information and to treat it with a question-oriented approach. Indeed, the Big Data technologies make it possible to store the same data, but in different contexts, by applying distinct processing and a series of differentiated algorithms and this to treat several problems simultaneously.
One can also initiate learning operations on the data without preconceived ideas as well as observation phases to detect the famous weak signals (partial or fragmentary information provided by the environment). Thus, all the information, the degrees of personalization or the types of recommendations collected will have to be reproduced to be modeled, thus industrialized, on a large scale. The knowledge gained will infer the company’s strategy, organizations, people, and processes.
No direct link to be established between BI and Big Data
This is a personal reflection, that there is no direct link between BI and Big Data. Analytical techniques are radically different, practiced with new know-how and technologies. The new paradigm breaks with current thinking and tends to revolutionize the very approach of data analysis.
The question is therefore well beyond the technological debate around SQL databases, no SQL, column, memory and any other variant. The interest of Big Data resides less in the subjects treated than in the way of understanding and solving the problems in transversal domains (marketing, logistics, risk management …) or in specialized fields (health, energy, distribution… ). This is the heart of the challenge of Big Data: to know the human activity, to understand its context, to establish the relations between the data of activity to provide, at a given instant, an individualized and personalized real-time service.