The selected processing techniques will significantly impact the data analysis outcome. The most difficult task, however, is to achieve lower latency analytics practiced on a large data sets, which usually requires the processing terabytes of data in seconds.
Currently, one key challenge in Big Data is performing low-latency analysis with real-time data, including work to investigate the state-of-the-art in distributed and parallel computing, storage and query, and evaluates tools for real-time analysis. The requirements regarding response time, the status of data to analyze, or workload are the issues that will eventually determine the best choice for data processing and analysis techniques.
Businesses today demand lower latencies, minimizing response time, and maximum accuracy in making decisions. According to IT Release, the pro and cons for batch processing are listed below.
Advantages of batch processing systems:
- Repeated jobs are done fast in batch systems without user interaction.
- You don’t need special hardware and system support to input data in batch systems.
- Best for large organizations but small organizations can also benefit from it.
- Batch systems can work offline so it makes less stress on processor.
- Processor consumes good time while processing that mean it knows which job to process next. In real-time systems we don’t have expectation time of how long the job is and what is estimated time to complete it. But in batch systems the processor knows how long the job is as it is queued.
- Sharing of batch system for multiple users.
- The idle time batch system is very less.
- You can assign specific time for the batch jobs so when the computer is idle it starts processing the batch jobs i.e. at night or any free time.
- The batch systems can manage large repeated work easily.
Disadvantages of batch processing systems:
- Computer operators must be trained for using batch systems.
- It is difficult to debug batch systems.
- Batch systems are sometime costly.
- If some job takes too much time i.e. if error occurs in job then other jobs will wait for unknown time.
Some good examples of the batch processing are:
- The calculation of the asset market value that need not be revised more than once a day.
- Costing monthly phone bills of employees.
- Reporting related to tax issues.
- Payroll system
- Bank statement
Stream processing is model that computes a small amount of recent data at a time as the data continuously flow through a network of processing entities. Technology capable of stream processing is able to produce near real-time data because it slows data through the system and processes it as it comes through (evariant).
Unlike what happens with processing techniques in real-time, stream processing usually does not have limitations of mandatory time in the processing flow. The only limitations are:
- It must have enough memory to store entries in the queue.
- The productivity rate of long – term system should be faster or at least equal to the input data rate over the same period. If this were not so, the system storage requirements grow without limit.
Examples of records within streams include (Wikipedia):
- In graphics, each record might be the vertex, normal, and color information for a triangle;
- In image processing, each record might be a single pixel from an image;
- In a video encoder, each record may be 256 pixels forming a macroblock of data; or
- In wireless signal processing, each record could be a sequence of samples received from an antenna.
When the online analytical processing work needs to be in real-time basis, the processing time requirement is extremely strict and the margin is less than seconds. Due to this reason, real-time systems usually simply try to deal with process input as soon as possible.
The question is what can happen if they miss the entrance. When this happens, the system will ignore the loss and continue processing and analyzing without stopping. When working in real-time, the system cannot stop operations to return to fix something that already happened, it was seconds behind. The data that keeps coming and the system must make every effort to continue processing.
In any case, the processing techniques and data analysis in real-time deserve serious consideration before implementation because:
- They are not as simple to implement using common software systems.
- Its cost is higher than the streaming options.
Depending on the purpose of the application, it may be preferable to use an intermediate option between streaming and real-time, such as to guarantee a result that does not exceed in any case the hundred or two hundred milliseconds for the 99% of all applications.