Big data thinking in the automotive sector is inevitably moving towards the strategies to adopt in the management of data from research on autonomous driving.
The march towards the autonomy of the vehicles is gaining speed. If the experts do not yet agree on the definition of a specific date or the first fields of application, they are few to deny the future of autonomous cars. Thus, thinking about Big Data projects within automotive companies is now inevitably moving towards strategies to adopt data management from research on autonomous driving.
Autonomous driving research – what is it?
This may seem obvious, “learning” a vehicle to drive in the widest range of conditions (road conditions, environmental conditions, behaviour of other vehicles, cars or trucks, behaviour of other individuals) represents a complex challenge.
According to the American Automobile Association (AAA), 75% of consumers are not ready to adopt autonomous cars. However, it is precisely the challenge facing manufacturers: to teach vehicles to analyze and respond to any combination of operational conditions in a systematic and immediate manner through separate rules (algorithms).
Machines can also learn
It is interesting to note that the learning process of a human being and a machine is similar. In a given situation, humans and machines must first assimilate experiences (data), before applying a set of rules (algorithms) facilitating the resolution of the problem. Whether the result is positive or negative, a lesson in exercise is usually used.
Traditional approaches to data management are delaying the development of autonomy
It turns out that learning cars is a process requiring a particularly large amount of data. Traditional data management approaches are struggling to meet the demands of self-driving research. The challenges must be considered from two points of view:
1. Challenges regarding data storage
Vehicles used for range testing (cars running with an oversized, state-of-the-art camera) generate several terabytes of video and RADAR, LIDAR and sensor data per vehicle per day. For the large fleet of vehicles currently being tested, simple mathematical operations suggest that car manufacturers receive and save several petabytes of test data on autonomous cars. Why keep so much data? The reason is simple. Automakers are attempting to capture data that virtually describes every operational condition that a vehicle may encounter (for example, crossing a night intersection in slippery conditions) to use this information to “teach” cars to drive.
Traditionally, this test data is backed up on many networked storage servers , often spread across multiple locations around the world. However, given the cost and performance limitations of server-based systems, automakers are looking for more efficient storage solutions for their autonomous vehicle research.
2. Challenges in data processing
Once all of this data is stored, how do you exploit it to teach a car to drive? Several stages of data processing are necessary. As a first step, each video image (associated with RADAR, LIDAR, and sensor data) is analyzed to accurately capture the “observed” events (for example, a person crossing an intersection) and listed, providing a library of “entries” – driving scenarios, on the basis of which engineers can develop rules (algorithms) capable of dictating the reaction of a vehicle. In a second step, these algorithms must be tested by simulations, using Big Data of real autonomous vehicles collected previously.
Traditional architectures are not optimized for large-scale data processing operations that are required to test algorithms by simulation. Using traditional methods, vehicle driving data is stored in server-based solutions and transferred to workstations where engineers test algorithms under development. This process involves two fundamental challenges. The first challenge is the huge volumes of data that need to be moved, which requires considerable time and bandwidth. The second challenge concerns individual workstations, which do not provide the massive computing power required to return the simulation results in a timely manner. Unsurprisingly, car manufacturers are looking for more efficient solutions.
An optimized solution for an acceleration of the development of autonomy
Among the players in the automotive sector, one approach is emerging to meet the data management challenges associated with autonomous vehicle research. This approach is based on two basic principles:
1. Data storage within the Hadoop free and open source framework
With the ability to store data of unlimited size (beyond the petabyte) and varied (video, LIDAR, RADAR, sensor …), the Hadoop distributed file system (HDFS) provides a high-performance and cost-effective support for storing the data – data relating to research on autonomous vehicles.
2. Data processing within Hadoop
This approach also leverages Hadoop’s intrinsic ability to accomplish highly scalable MapReduce and Spark workloads, which is particularly useful for processing algorithm simulation tests. By doing so, companies are reshaping the data processing strategy. Rather than transfer the data to algorithms run on workstations in order to process the data, this new method prescribes the opposite method, redeploying the algorithms to the data storage location (Hadoop), where the high computations are performed.
This approach greatly improves data processing performance. The results of simulation tests requiring several days ago are now obtained in a few minutes, thus accelerating the progress towards autonomy.
Complementary technologies further extend these benefits. For example, by equipping Hadoop nodes with graphics processors, simulation calculations based on deep learning structures can be dramatically accelerated. In addition, container technologies can be used to deploy legacy applications previously only available on workstations directly to high-performance Hadoop clusters without having to adapt them.