大数据包括海量、高速和可扩展的各种数据。它们是 3 种类型:结构化数据、半结构化数据和非结构化数据。
- 结构化数据——
结构化数据是其元素可寻址以进行有效分析的数据。它被组织成一个格式化的存储库,通常是一个数据库。它涉及可以存储在数据库 SQL 中的具有行和列的表中的所有数据。它们具有关系键,可以轻松映射到预先设计的字段中。今天,这些数据大多以开发和最简单的信息管理方式进行处理。示例:关系数据。 - 半结构化数据——
半结构化数据是不存在于关系数据库中但具有一些易于分析的组织属性的信息。对于某些进程,您可以将它们存储在关系数据库中(对于某种半结构化数据可能非常困难),但半结构化的存在是为了减轻空间。示例:XML 数据。 - 非结构化数据——
非结构化数据是指没有以预定义的方式组织或没有预定义的数据模型的数据,因此不适合主流关系数据库。因此,对于非结构化数据,存在用于存储和管理的替代平台,它在 IT 系统中越来越普遍,并被组织用于各种商业智能和分析应用程序。示例:Word、PDF、文本、媒体日志。
结构化、半结构化和非结构化数据之间的差异:
Properties | Structured data | Semi-structured data | Unstructured data |
---|---|---|---|
Technology | It is based on Relational database table | It is based on XML/RDF(Resource Description Framework). | It is based on character and binary data |
Transaction management | Matured transaction and various concurrency techniques | Transaction is adapted from DBMS not matured | No transaction management and no concurrency |
Version management | Versioning over tuples,row,tables | Versioning over tuples or graph is possible | Versioned as a whole |
Flexibility | It is schema dependent and less flexible | It is more flexible than structured data but less flexible than unstructured data | It is more flexible and there is absence of schema |
Scalability | It is very difficult to scale DB schema | It’s scaling is simpler than structured data | It is more scalable. |
Robustness | Very robust | New technology, not very spread | — |
Query performance | Structured query allow complex joining | Queries over anonymous nodes are possible | Only textual queries are possible |