The main characteristics of the data that come from Internet of Things (IoT) are determined by their automatic origin. Much of this IoT data is automatically created by machines (such as data sensors) and others are a combination of human-generated data and machine-generated data (such as updates on social networks that also have geolocation data). What does this mean in terms of data quality?
The answer requires two tasks: a reconsideration of the dimensions of data quality, and a focus on the usability of the data by the end user.
There are many possible dimensions for measuring the quality of the data but we will focus on four key aspects: accuracy, consistency, integrity and timeliness. In a Big Data environment that must support the characteristics of IoT data, we are not only monitoring the quality of data from a single source. Rather, quality must be applied at the aggregate level. From this point of view, the above dimensions take on a slightly different meaning from the usual ones.
Four main dimensions to measure the quality of IoT data
- Accuracy: for the values that have accumulated through the network of IoT devices, do they reflect exactly what occurred in each device? For example, if you have 10 devices in the same room reporting room temperature, are all those reporting the same temperature or at least reporting temperatures that are within a reasonable deviation?
- Consistency: for the values recorded in the Big Data, are they consistent with the context in which the values were produced by each device? For example, if an application on a mobile device reports several events, and these are labelled with a geolocation, are those geolocations the same or close to each other?
- Integrity: have all data values been accumulated? Is there a gap in the reported event series or sensor values that should have been captured?
- Opportunity: are the values captured within a reasonable time? If much of the data is transmitted and come from a wide variety of devices, is there any monitoring to ensure that all the data is synchronized?
Usability of IoT data for the user
These questions only scratch the surface. We can go deeper into each of these dimensions and add some other dimensions to create a set of expectations regarding the IoT data usability and characteristics, and this brings us to the second task of characterizing the quality of data in terms of usability for the end user.
Some applications of IoT are mainly engaged in monitoring the operational behaviour, but in the meantime, start paying close attention to the IoT analytics and the results of analytical modelling and pattern analysis can identify business opportunities. Examples include predictive maintenance (in the industrial case) and analysis of customer behaviour (for smart devices). In any case, the usability of the data is not measured in terms of quality of data sources, but rather how to interpret the data users for their combined use.
Advances in data preparation and integration will have a major impact on BI, visual analytics, and data discovery, and that’s where data preparation tools can add value to the IoT data. These tools are a conglomerate of functionality for data quality, profiling, standardization and transformations, all together and managed by the user. By allowing users to investigate data characteristics (especially important as new device flows are added to the mix) and enable them to define their own data quality criteria, it enables them to produce reports and analyzes that meet their specific objectives without forcing their quality criteria. In turn, the global usability of the data increases.