What are the Differences between Data Analytics, Data Science, and Big Data Analytics?
Is Data Analytics reserved for large groups? These days, we are talking a lot about Data Analytics (DA), Business Intelligence (BI), Data Mining, Data Science, Big Data, and so on. But do we know their meanings of these buzz words?
Year after year, the notions hidden behind these words evolve and because each professional of the analytics (data specialist or software publisher) assigns them a perimeter specific to their own profession and their environment of intervention. For example, where some will talk about Data Analytics, others will see Data Mining, while still for others the two concepts merge.
Let’s try to identify the main concepts so that everyone is a little better and especially that SMEs understand that Data Analytics is not reserved only for large groups.
Business Intelligence (BI) and Data Analytics
We’ve been talking about data analysis for a long time. Now we talk more often about Data Analytics. It is clear that Data Analysis and Data Analytics are confused both in their boundaries and in their semantics because what analytics proceeds is by way of analysis. The words Data Analytics and Data Analysis refer to the same Wikipedia page in English.
Data Analytics (DA) consists of examining raw data (often in large volume) in order to extract information that is comprehensible to humans and difficult to observe by direct analysis. The graphic representation (Data Visualization or DataViz) makes this information intelligible and allows interpretation and decision-making.
We can differentiate between Data Analytics and Advanced Data Analytics (like Data Mining), or even Big Data Analytics , by positioning this data analytics only on what is already known. In other words, the DA does not identify new hidden relationships between data or events. The DA allows for example to search the equipment responsible for peak consumption by comparing the start-up phases of all equipment with consumption values, while the advanced analytics will identify a previously unknown cause and independent of equipment startups.
Business Intelligence (BI) has been around for more than 20 years, but is becoming more and more popular with the diffusion of software that offers visualization and analysis environments for business specialists (who do not have any skills in mathematics and computer science to process data). As BI tools are becoming more advanced, analytics and advanced analytics are becoming increasingly confused and pushing the boundaries of analytics.
Data Visualization or Dataviz
Data Visualization must enable decision-making and communication. Business Intelligence (BI) tools integrate Dataviz from a certain volume of data. It is no longer possible to analyze tables of raw data, and the effort of abstraction being too important. The Dataviz displays the data graphically and allows the creation of groups, rankings, filters. But here, volumes and data sources have nothing to do with a spreadsheet file. Spreadsheets also do not deal with automation of collection and processing, or automatic updating, or data quality checking phase.
Data Science
Data Science is a term that some people use as synonymous with Data Analytics, others consider that Data Science, literally the science of data, combines analytics (Data Analytics) and Machine Learning, Data mining, Artificial Intelligence (AI) and a whole set of mathematical and computer methods .
Data Mining
The term Data Mining can be translated by data mining. Data Mining involves drilling, exploring or delving into the data. Unlike Data Analytics, which provides information only from known elements, Data Mining allows us to establish associations and relationships between data (we speak of patterns ) that are hidden or not obvious, very often by mixing large volumes of data spread across multiple relational databases. These patterns provide usable information for decision making.
Data Mining is an essential component of advanced Data Analytics and Big Data Analytics. One of the forms of Data Mining is predictive analytics.
Big Data – Big Data is not big
You might think that Big Data is big data. Not to mention big data, it is now possible to store and exploit very large volumes of data with a wide variety of sources in large data warehouses. Current technologies can handle large volumes of data using analytical business intelligence methods without the use of Big Data.
Big Data is indeed about big data, but one of its specificities is to look at both structured and unstructured data. These are the unstructured data that the usual tools of analytics cannot handle. The storage of data, which are no longer stored in Data Warehouses but in Data Lakes. It is also this new form of storage that allows the application of Big Data analytics, another specificity of Big Data. More than volumes, what makes Big Data is the nature of the data, the way in which it is stored and the analysis techniques practiced with know-how and clean technologies.
Big Data analytics
The interest of Big Data is to shake conventional analysis by providing agility in the way of understanding and solving problems and dealing with heterogeneous data. It is possible to apply BI methods, including data visualization, and advanced analytics methods such as Data Mining, but there is a set of methods and techniques specific to Big Data; if only because BI tools cannot take into account unstructured data.
Big Data analysis offers new dimensions of analysis such as taking into account the chronology of events and the context of events. Unlike Data Analytics, Big Data Analytics applies different treatments to address multiple issues simultaneously and is not locked into a predefined relationship schema.
Descriptive, predictive and prescriptive data analytics
Descriptive analytics provide information about what has been achieved and thus helps to understand what has happened.
The purpose of predictive analytics is to provide models to predict what might happen. It relies on Data Mining, which provides statistical models. One of the common techniques is regression analysis, which predicts the values of several related variables.
Prescriptive analytics help to choose the best solution between several possible actions to guide what will happen.
Structured, semi-structured and unstructured data
The structured data is formatted and organized according to a structure allowing processing to extract information from it. They are stored in databases, possibly constituting more complex sets that are the data warehouses.
Unstructured data describes data external to one type of structure. They are of two types:
- Unstructured textual data from emails, documents such as letters, presentations, chats, etc.
- Unstructured non-textual data from images, audio files, video files.
These data can be of digital or physical origin, and must be stored in digital form to allow extractions by semantic analysis. Unstructured data cannot be stored in a relational database, data warehouse structure.
Finally, semi-structured data is not organized in a database, but associated metadata allows the data to be described in order to allow their processing. For example, the author and the date of creation of a file (Word, MP4, …) are metadata, as well as descriptions of web pages found in a search engine.
Data Warehouse, Datamart and Data Lake
The proliferation of databases and the increasing need for analysis has led to the creation of data warehouses that centralize data and facilitate their management. The notion is not new, but the word itself is recent. Rather than querying operational databases (risk of performance loss), the analysis treatments are applied to a database created and administered for this purpose. BI software that integrates a Data Warehouse makes it possible to have free access to all data.
The Datamart is a subset of the Data Warehouse. If the Data Warehouse is the central repository for all data, the Datamart addresses the needs of a single group of users and is therefore organized for data retrieval in a single mode.
The Data Lake is a form of storage specific to Big Data to store in a single place raw data or very little processed and of various natures (structured, semi structured and unstructured data) and to allow the Big Data analytics processing application.
What you should remember
Beyond buzz words, Data Analytics takes different forms and can be realized at different levels. If Big Data analytics requires the intervention of specialists and the implementation of a complex IT architecture and tools, the Data Analytics through Business Intelligence and business analysis software makes it possible to make the data talk, to minimum confirm theories and start to determine hidden relationships between data.