Big Data Analytics 简明教程
Big Data Analytics - Data Scientist
数据科学家的作用通常与预测建模、开发细分算法、推荐系统、A/B 测试框架以及经常处理原始非结构化数据等任务相关。
The role of a data scientist is normally associated with tasks such as predictive modeling, developing segmentation algorithms, recommender systems, A/B testing frameworks and often working with raw unstructured data.
他们工作的性质要求深入了解数学、应用统计学和编程。数据分析师和数据科学家之间有一些共同的技能,例如查询数据库的能力。两者都分析数据,但数据科学家的决策对组织的影响可能会更大。
The nature of their work demands a deep understanding of mathematics, applied statistics and programming. There are a few skills common between a data analyst and a data scientist, for example, the ability to query databases. Both analyze data, but the decision of a data scientist can have a greater impact in an organization.
以下是数据科学家通常需要具备的一组技能 −
Here is a set of skills a data scientist normally need to have −
-
Programming in a statistical package such as: R, Python, SAS, SPSS, or Julia
-
Able to clean, extract, and explore data from different sources
-
Research, design, and implementation of statistical models
-
Deep statistical, mathematical, and computer science knowledge
在数据大分析中,人们通常混淆数据科学家和数据架构师的角色。事实上,两者的差别非常简单。数据架构师定义工具和数据存储体系结构,而数据科学家使用这种体系结构。当然,数据科学家应该能够在特别项目需要时建立新工具,但基础设施定义和设计不应该是其工作的一部分。
In big data analytics, people normally confuse the role of a data scientist with that of a data architect. In reality, the difference is quite simple. A data architect defines the tools and the architecture the data would be stored at, whereas a data scientist uses this architecture. Of course, a data scientist should be able to set up new tools if needed for ad-hoc projects, but the infrastructure definition and design should not be a part of his task.