📜  数据湖和数据仓库的区别

📅  最后修改于: 2021-09-15 01:33:49             🧑  作者: Mango

1. 数据湖:
在这个概念中,所有类型的数据都可以以低成本但适应性极强的存储/区域登陆。然后进行检查以获得潜在的洞察力。这是 ETL/DWH 专家所说的数据着陆区的又一进步。目前,我们正在查看各种信息。独立于构造、结构、元数据等。Data Lake 背后的一个想法是,目前的创新使得存储公司生成/购买的所有信息成为可能(之前它会例如,公司必须选择相关信息并将其存储在结构化的配送中心中。)

2. 数据仓库:
它本质上是一个基于云的社交数据库或一个集中式计算机服务器。它从转移的、异构的来源收集信息,主要是为了支持任何企业管理的调查和选择准备。
数据仓库的特点是面向主题、坐标、时变和不稳定的信息收集,以便在选择过程中提供业务洞察力和帮助。

数据湖和数据仓库的区别:

Data Lake Data Warehouse
Data is kept in its raw frame in Data Lake and here all the data are kept independent of the source of the information. They are as it was changed into other shapes at whatever point required. Data Warehouse is composed of data that are extricated from value-based and other measurement frameworks. Here the information isn’t in raw shape and is continuously changed and clean.
The most target for Data Lake is Data Researchers, Big Data Engineers, and Machine Learning Engineers who ought to do to profound investigation to form models for commerce such as predictive modeling. The primary target of Data Warehouse is the operational clients as this information is in an organized organize and can give prepared to construct reports. So they are generally utilized for trade intelligence.
The most inputs to data Lake are all sorts of information such as organized, semi-structured, and unstructured information. This information dwells in data Lake in their unique form. The primary inputs to Data warehouse are organized information that is coming from value-based and measurements frameworks which are at that point organized within the shape of schemas.
Comprises of raw data that will or might not be curated. It comprises of curated data which is centralized and is prepared to be sued for commerce insights and analytics purpose.
data is not in normalized form. Denormalized schemas
The advances that are utilized in data lakes such as Hadoop, Machine Learning are moderately modern as compared to the information warehouse. Here the technology that’s utilized for a data warehouse is older.
A data lake can have all sorts of information and can be utilized with keeping past, show and prospects in mind. Data Warehouse is concerned, here most of the time is went through on analyzing different sources of the data.
Data in interior of the data lake are profoundly open and can be rapidly updated. Data in interior of the data warehouse are more complicated and it requires more fetched to bring any changes to them, availability is additionally confined as it were authorized users.