📜  数据科学与数据工程的区别

📅  最后修改于: 2021-10-19 06:23:44             🧑  作者: Mango

数据科学:对来自组织存储库中数据的信息流的详细研究称为数据科学。数据科学是通过应用分析、编程和业务技能从原始和非结构化数据中获得有意义的见解。

数据科学生命周期包括:

  1. 数据发现:搜索不同的数据源并捕获结构化和非结构化数据。
  2. 数据准备:将数据转换为通用格式。
  3. 数学模型:使用变量和方程建立关系。
  4. 采取行动:收集信息并根据业务需求得出结果。
  5. 交流:将调查结果传达给决策者。

数据工程:数据工程专注于大数据的应用和收获。数据工程侧重于数据收集和分析的实际应用。在此数据中,数据被转换为一种有用的分析格式。数据工程在很多方面与软件工程非常相似。从具体目标开始,数据工程师的任务是将功能系统组合在一起以实现该目标。

数据科学与数据工程

下表列出了数据科学和数据工程之间的差异:

S.No. Data Engineering Data Science
1. Develop, construct, test, and maintain architectures (such as databases and large-scale processing systems) Cleans and Organizes (big)data. Performs descriptive statistics and analysis to develop insights, build models and solve business need.
2. SAP, Oracle, Cassandra, MySQL, Redis, Riak, PostgreSQL, MongoDB, neo4j, Hive, and Sqoop. Scala, Java, and C#. SPSS, R, Python, SAS, Stata and Julia to build models. Scala, Java, and C#.
3. Ensure architecture will support the requirements of the business Leverage large volumes of data from internal and external sources to answer that business
4. Discover opportunities for data acquisition Employ sophisticated analytics programs, machine learning and statistical methods to prepare data for use in predictive and prescriptive modeling
5. Develop data set processes for data modeling, mining and production Explore and examine data to find hidden patterns
6. Employ a variety of languages and tools (e.g. scripting languages) to marry systems together Automate work through the use of predictive and prescriptive analytics
7. Recommend ways to improve data reliability, efficiency and quality Communicating findings to decision makers