The Big Data subject continues to grow, especially with the general public. One can legitimately ask how the public comes to understand Big Data. The following three formulas may help you understand this term a little bit more.
1. Big Data = Big Brother?
In 1949, When George Orwell wrote the novel “1984”, he envisioned a fictional future society focused on mind-control and suppressing individuality. This society was led by a fictional leader Big Brother who was all knowing and omnipresent of everything that you did.
Is Big Data a Big Brother? This really makes Big Data a public enemy. But no, Big Data is not the fraudulent use of your personal data.
- Cultural and technological phenomenon is causing an exponential accumulation of data in our information systems. We share, communicate and produce the given growing all the time and everywhere.
- We now have improved infrastructure, technology and statistical methods to analyze massive data.
- Given the amount of data produced, the global human brain mass will not be able to analyze everything. Hence it is important to use Data Science techniques such as Machine Learning and Artificial Intelligence to transform this information to knowledge.
The phenomena have no moral inclination; it is no more than tools or technologies. Data analysis always exists; the only new for Big Data is that we can understand our world much better because more data is available for us to analyze.
Big Data is not our enemy; Instead, Big Data applications relate primarily to the improvement of health services, optimizing and reducing the energy consumption (e.g., smart metering, smart appliances and smart city), improving our user experience, the sharing of human knowledge the fight against bank fraud.
2. Big Data = Large Data Volumes?
No, not just that. This is only one characteristic of Big Data.
Big Data is actually an all-inclusive term to describe large amounts of information in contrast to traditional data which is typically stored in a relational database. Some people like to define Big Data using the four Vs – volume, velocity, variety and value; or even the fifth V, visualization, was suggested.
These definitions are interesting, however, the four Vs, five Vs, or even the term “Big Data” itself, seems only emphasize the “big problems”. But actually, Big Data should not be an IT issue, especially as BI and data warehousing vendors are getting better at real time or near-real time information delivery to allow analysts to quickly spot trends and avoid business problems. We should emphasize Big Data’s “big potentials”, and it should be a tool to help addressing use cases and business issues. In addition, for most data science projects, other skills such as math and statistics, business and subject expertise are usually required, and they are not limited to IT.
3.Big Data = unstructured data?
We talk a lot of unstructured data and the ability of Big Data technologies to analyze unstructured data. So, what is unstructured data? It is usually the binary data which has no identifiable internal structure. Following is a limited list of typical types of unstructured data:
- Word Processing Files
- PDF files
- Digital Images
- Social Media Posts
Unstructured data is a big deal for Big Data. However, outside the companies specializing in the media, the majority of the data that we need to analyze is still structured data. For a minority of videos, images, sound files, these data are less structured but still possess formats, more or less unified.
Moreover, the marketing people targeted for years to solve the heterogeneity of data formatting problem. However, the search related tools can greatly simplify the use of weakly structured data, and there are methods to treat and unify these formats (e.g., the Apache™ Tika toolkit).
Finally, “Big Data is not unstructured data only, that’s a myth. Structured data can be Big Data as well. Let’s not distinguish what’s Big Data or not based on whether the data is structured or not.”(Rick van der Lans)