Data Science and Big Data – The Top 6 Questions Answered
The traditional Business Intelligence (BI) is experiencing unprecedented changes due to the arrival of the Data Science. With Big Data, now we can design dashboards with indicators to monitor the business processes and support decision-making.
1.How to define good indicators?
What guarantees that an indicator is correct? A set of indicators should indeed have some features:
- Cover all aspects of an activity (known as endogenous and exogenous indicators) without omitting dimensions
- Indicators should also be independent of each other (also called orthogonal)
Now these two features are covered by what we seem to get with the Data Science through factorial algorithms. Databases are becoming more and more comprehensive, and treatment information becomes more and more mature, it has become urgent to think of Data Science before implementing new indicators in an activity.
2.What is Data Science?
We can give the Data Science the following working definition:
Data Science is based on Statistics, Machine Learning (ML) and Artificial Intelligence (AI). The objective of the discipline is to extract intelligence from data in order to make it operable by business, and many techniques used in Data Science are not accessible by traditional Business Intelligence, data visualization or data mining.
3.Data Science vs. Big Data – any differences?
Big Data brings data collection and storage to business; whereas Data Science brings agility and intelligence from data.
In addition, Big Data is focused on data collection, data storage and database architecture set up (including a Data Lake). Data Science is focused on data processing, and it generally does not operate directly on the Data Lake, but on a specialized and structured carrier (optionally located in the Data Lake). This specialized and structured carrier is called Data Hub.
Big Data itself is not able to generate knowledge or intelligence; intelligence can only be generated by Big Data with Data Science. In other word, Big Data provides data, Data Science offers knowledge.
4.Is Data Science really a new science?
You may have heard that “We did Data Science such as predictive analytics before the rise of Big Data”. This statement is true because Statistics are prevalent in business for many years. Most of the algorithms used today were developed in the 70s or even earlier. Accodding to Datafloq (https://datafloq.com/read/history-predictive-analytics-infographic/438):
“Predictive analytics has its origin in the 1940s, when governments started using the first computational models. With non-linear programming and real-time analytics, data analytics and prescriptive analytics go mainstream and becomes available to all organizations.”
Foundations of Data Science thus remain the same as the 1940s. However, how to manage and operate a data project have changed a lot.
5.Any new approaches offered by Data Science?
Following are some new features of Data Science in the Big Data era:
- The contribution of new approaches combining statistical and Machine Learning
- The massive appearance of new tools and languages (open sources, including R and Python)
- The development of new types and sources of data (external data)
- Big Data tools and infrastructure (Hadoop, NoSQL Cassandra, MongoDB, etc.), with huge storage capacities
- Cloud accessible everywhere
- The computing power of new technologies allows considering new applications
6.What is the origin of the Data Science?
Data Science is ready to meet the business. But for many of us, the origin of the Data Science remains rather mysterious or obscure. We can better understand Data Science by remembering where it comes from.
Data Science primarily belongs to the domain of Statistics, invented by the British in the early 20th century by brilliant mathematicians as:
- Karl Pearson (1857 – 1936), British mathematician, he discovered the mathematical foundations of statistics and in particular the correlation coefficient
- William Sealy Gosset (1876 – 1937), an English statistician. He published under the pen name Student, and developed the Student’s t-distribution
- Ronald Fisher (1890 – 1962), “A genius who almost single-handedly created the foundations for modern statistical science”
The other aspects of the Data Science such as Machine Learning or Artificial Intelligence are rather American discipline. Again, we find names like:
- John von Neumann (1903 – 1957) Mathematician and physicist American computer inventor
- Alan Mathison Turing (1912 – 1954) Mathematician Columbia, inventor of the concept of programming and algorithm
Of course, these lists are not exhaustive. Data Science is the discipline that brings together (finally) the two streams (statistics and machine learning).