先决条件 – 操作系统类型
1. 批处理:
批处理是指在特定时间跨度内批量处理大量数据。它一次处理大量数据。当数据大小已知且有限时使用批处理。处理数据需要更长的时间。它需要专门的人员来处理问题。批处理器处理多遍处理数据。当数据被加班收集并且类似的数据被批处理/分组在一起时,那么在这种情况下使用批处理。
批处理的挑战:
- 这些系统的调试很困难,因为它需要专门的专业人员来修复错误。
- 软件和培训最初只是为了理解批处理调度、触发、通知等,就需要高昂的费用。
2. 流处理:
流处理是指在产生连续数据流时立即对其进行处理。它实时分析流数据。当数据大小未知且无限连续时使用流处理。处理数据需要几秒钟或几毫秒。在流处理中,数据输出速率与数据输入速率一样快。流处理器以几次方式处理数据。当数据流是连续的并且需要立即响应时,则在这种情况下使用流处理。
流处理的挑战:
- 数据输入速率和输出速率有时会产生问题。
- 应对海量数据,即时响应。
批处理和流处理的区别:
S.No. | BATCH PROCESSING | STREAM PROCESSING |
---|---|---|
01. | Batch processing refers to processing of high volume of data in batch within a specific time span. | Stream processing refers to processing of continuous stream of data immediately as it is produced. |
02. | Batch processing processes large volume of data all at once. | Stream processing analyzes streaming data in real time. |
04. | In Batch processing data size is known and finite. | In Stream processing data size is unknown and infinite in advance. |
05. | In Batch processing the data is processes in multiple passes. | In stream processing generally data is processed in few passes. |
06. | Batch processor takes longer time to processes data. | Stream processor takes few seconds or milliseconds to process data. |
07. | In batch processing the input graph is static. | In stream processing the input graph is dynamic. |
08. | In this processing the data is analyzed on a snapshot. | In this processing the data is analyzed on continuous. |
09. | In batch processing the response is provided after job completion. | In stream processing the response is provided immediately. |
10. | Examples are distributed programming platforms like MapReduce, Spark, GraphX etc. | Examples are programming platforms like spark streaming and S4 (Simple Scalable Streaming System) etc. |
11. | Batch processing is used in payroll and billing system, food processing system etc. | Stream processing is used in stock market, e-commerce transactions, social media etc. |