Apache Kafka 和 Apache Flume 的区别

Apache Kafka：它是一个用Java和 Scala 编写的开源流处理软件平台。它由 LinkedIn 制作，并提供给 Apache 软件基金会。 Apache Kafka 旨在提供一个高吞吐量、统一、低延迟的平台来处理实时数据馈送。 Kafka 通常使用基于 TCP 的协议来优化效率。它非常快，每秒执行 200 万次写入。
它还保证数据丢失百分比为零。
Apache Kafka 通常用于实时分析、将数据摄取到 Hadoop 中并进行触发、错误恢复、网站活动跟踪。

Flume： Apache Flume 是一种可靠、分布式且可用的软件，用于高效聚合、收集和移动大量日志数据。它具有基于流数据流的灵活而简单的架构。它是用Java编写的。它有自己的查询处理引擎，可以在将每批新数据移动到预期接收器之前对其进行转换。它具有灵活的设计。

卡夫卡与 Flume
下面是 Apache Kafka 和 Apache Flume 之间的差异表：

Apache Kafka	Apache Flume
Apache Kafka is a distributed data system.	Apache Flume is a available, reliable, and distributed system.
It is optimized for ingesting and processing streaming data in real-time.	It is efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store.
It is basically working as a pull model.	It is basically working as a push model .
It is easy to scale.	It is not scalable in comparison with Kafka.
An fault-tolerant, efficient and scalable messaging system.	It is specially designed for Hadoop.
It supports automatic recovery if resilient to node failure.	You will lose events in the channel in case of flume-agent failure.
Kafka runs as a cluster which handles the incoming high volume data streams in the real time.	Flume is a tool to collect log data from distributed web servers.
Kafka will treat each topic partition as an ordered set of messages.	Flume can take in streaming data from the multiple sources for storage and analysis which use in Hadoop.