小数据:可以定义为能够影响当前决策的小数据集。当前正在进行且其数据可以累积在 Excel 文件中的任何内容。小数据也有助于决策,但并不旨在对业务产生很大的影响,而是在短时间内小数据可以描述为能够对当前决策产生影响的小数据集。几乎所有正在进行的工作及其数据都可以在 Excel 文件中获取。小数据在决策中也很有用,但并不打算对业务产生大的影响,而是在短时间内产生影响。
简而言之,足够简单以供人类理解的数据,其数量和结构使其易于访问、简洁和可行,称为小数据。
大数据:它可以表示为大量结构化和非结构化数据。存储的数据量是巨大的。因此,分析师必须彻底挖掘整个事情,使其与做出正确的业务决策相关且有用。
简而言之,传统数据处理技术无法管理的真正庞大而复杂的数据集被称为大数据。
下表列出了小数据和大数据之间的差异:
Feature | Smalll Data | Big Data |
---|---|---|
Technology | Traditional | Modern |
Collection | Generally, it is obtained in an organized manner than is inserted into the database | The Big Data collection is done by using pipelines having queues like AWS Kinesis or Google Pub / Sub to balance high-speed data |
Volume | Data in the range of tens or hundreds of Gigabytes | Size of Data is more than Terabytes |
Analysis Areas | Data marts(Analysts) | Clusters(Data Scientists), Data marts(Analysts) |
Quality | Contains less noise as data is less collected in a controlled manner | Usually, the quality of data is not guaranteed |
Processing | It requires batch-oriented processing pipelines | It has both batch and stream processing pipelines |
Database | SQL | NoSQL |
Velocity | A regulated and constant flow of data, data aggregation is slow | Data arrives at extremely high speeds, large volumes of data aggregation in a short time |
Structure | Structured data in tabular format with fixed schema(Relational) | Numerous variety of data set including tabular data, text, audio, images, video, logs, JSON etc.(Non Relational) |
Scalability | They are usually vertically scaled | They are mostly based on horizontally scaling architectures, which gives more versatility at a lower cost |
Query Language | only Sequel | Python, R, Java, Sequel |
Hardware | A single server is sufficient | Requires more than one server |
Value | Business Intelligence, analysis and reporting | Complex data mining techniques for pattern finding, recommendation, prediction etc. |
Optimization | Data can be optimized manually(human powered) | Requires machine learning techniques for data optimization |
Storage | Storage within enterprises, local servers etc. | Usually requires distributed storage systems on cloud or in external file systems |
People | Data Analysts, Database Administrators and Data Engineers | Data Scientists, Data Analysts, Database Administrators and Data Engineers |
Security | Security practices for Small Data include user privileges, data encryption, hashing, etc. | Securing Big Data systems are much more complicated. Best security practices include data encryption, cluster network isolation, strong access control protocols etc. |
Nomenclature | Database, Data Warehouse, Data Mart | Data Lake |
Infrastructure | Predictable resource allocation, mostly vertically scalable hardware. | More agile infrastructure with horizontally scalable hardware |