Apache Storm-简介 - 芒果文档

📌 相关文章

📜 Apache Storm-简介

📅 最后修改于: 2020-12-02 05:53:58 🧑 作者: Mango

什么是Apache Storm?

Apache Storm是一个分布式实时大数据处理系统。 Storm设计为以容错和水平可伸缩方法处理大量数据。它是具有最高摄取速率能力的流数据框架。尽管Storm是无状态的，但它通过Apache ZooKeeper管理分布式环境和集群状态。这很简单，您可以并行对实时数据执行各种操作。

Apache Storm继续成为实时数据分析的领导者。 Storm易于设置，操作，并确保通过拓扑至少处理一次每条消息。

基本上，Hadoop和Storm框架用于分析大数据。它们两者相辅相成，在某些方面有所不同。 Apache Storm会执行除持久性之外的所有操作，而Hadoop在所有方面都擅长，但在实时计算方面比较落后。下表比较了Storm和Hadoop的属性。

Storm	Hadoop
Real-time stream processing	Batch processing
Stateless	Stateful
Master/Slave architecture with ZooKeeper based coordination. The master node is called as nimbus and slaves are supervisors.	Master-slave architecture with/without ZooKeeper based coordination. Master node is job tracker and slave node is task tracker.
A Storm streaming process can access tens of thousands messages per second on cluster.	Hadoop Distributed File System (HDFS) uses MapReduce framework to process vast amount of data that takes minutes or hours.
Storm topology runs until shutdown by the user or an unexpected unrecoverable failure.	MapReduce jobs are executed in a sequential order and completed eventually.
Both are distributed and fault-tolerant
If nimbus / supervisor dies, restarting makes it continue from where it stopped, hence nothing gets affected.	If the JobTracker dies, all the running jobs are lost.

Apache Storm以实时大数据流处理而闻名。因此，大多数公司都将Storm用作其系统的组成部分。一些值得注意的例子如下-

Twitter -Twitter将Apache Storm用于其“ Publisher Analytics产品”范围。 “发布商分析产品”处理Twitter平台中的每条推文和单击。 Apache Storm已与Twitter基础架构深度集成。

NaviSite -NaviSite正在使用Storm进行事件日志监视/审计系统。系统中生成的每个日志都将通过Storm。 Storm将根据配置的正则表达式集检查消息，如果存在匹配项，则该特定消息将保存到数据库中。

Wego -Wego是位于新加坡的旅行元搜索引擎。与旅行相关的数据来自世界各地的不同时间来源。 Storm帮助Wego搜索实时数据，解决并发问题并为最终用户找到最佳匹配。

这是Apache Storm提供的好处的列表-