Data Science 简明教程

Data Science - Data Analysis

What is Data Analysis in Data Science?

数据分析是数据科学的关键组成部分之一。数据分析被描述为一个清理、转换和建模数据的过程,以获得可操作的商业智能。它使用统计和计算方法来从大量数据中获取见解并提取信息。数据分析的目标是从数据中提取相关信息,并基于此知识做出决策。

Data analysis is one of the key component of data science. Data analysis is described as the process of cleaning, converting, and modelling data to obtain actionable business intelligence. It uses statistical and computational methods to gain insights and extract information form the large amount of data. The objective of data analysis is to extract relevant information from data and make decisions based on this knowledge.

尽管数据分析可能会纳入统计流程,但通常是一个持续、迭代的流程,其中持续收集数据,并同时进行分析。事实上,研究人员通常在整个数据收集过程中评估趋势方面的观察。特定的定性技术(实地调查、人种学内容分析、口述历史、传记、不受干扰的研究)和数据的本质决定分析的结构。

Although data analysis might incorporate statistical processes, it is often an ongoing, iterative process in which data are continually gathered and analyzed concurrently. In fact, researchers often assess observations for trends during the whole data gathering procedure. The particular qualitative technique (field study, ethnographic content analysis, oral history, biography, unobtrusive research) and the nature of the data decide the structure of the analysis.

更确切地说,数据分析将原始数据转换成有意义的见解和有价值的信息,这有助于在医疗保健、教育、商业等各个领域做出明智的决策。

To be more precise, Data analysis converts raw data into meaningful insights and valuable information which helps in making informed decisions in various fields like healthcare, education, business, etc.

Why Data Analysis is Important?

以下是数据分析为何在当今至关重要的原因列表 −

Below is the list of reasons why is data analysis crucial today −

  1. Accurate Data − We need data analysis that helps businesses acquire relevant and accurate information that they can use to plan business strategies and make informed decisions related to future plans and realign the company’s vision and goal.

  2. Better decision-making − Data analysis helps in making informed decisions by identifying patterns and trends in the data and providing valuable insights. This enables businesses and organizations to make data-driven decisions, which can lead to better outcomes and increased success.

  3. Improved Efficiency − Analyzing data can help identify inefficiencies and areas for improvement in business operations, leading to better resource allocation and increased efficiency.

  4. Competitive Advantage − By analyzing data, businesses can gain a competitive advantage by identifying new opportunities, developing new products or services, and improving customer satisfaction.

  5. Risk Management − Analyzing data can help identify potential risks and threats to a business, enabling proactive measures to be taken to mitigate those risks.

  6. Customer insights − Data analysis can provide valuable insights into customer behavior and preferences, enabling businesses to tailor their products and services to better meet customer needs.

Data Analysis Process

随着企业可访问数据复杂程度和数量的增长,对数据分析的需求也随之增加,用于清理数据并提取企业可用于做出明智决策的相关信息。

As the complexity and quantity of data accessible to business grows the complexity, so does the need for data analysis increases for cleaning the data and to extract relevant information that can be used by the businesses to make informed decisions.

data analysis process

通常,数据分析过程涉及许多迭代。让我们更详细地检查每一个。

Typically, the data analysis process involves many iterative rounds. Let’s examine each in more detail.

  1. Identify − Determine the business issue you want to address. What issue is the firm attempting to address? What must be measured, and how will it be measured?

  2. Collect − Get the raw data sets necessary to solve the indicated query. Internal sources, such as client relationship management (CRM) software, or secondary sources, such as government records or social media application programming interfaces, may be used to gather data (APIs).

  3. Clean − Prepare the data for analysis by cleansing it. This often entails removing duplicate and anomalous data, resolving inconsistencies, standardizing data structure and format, and addressing white spaces and other grammatical problems.

  4. Analyze the Data − You may begin to identify patterns, correlations, outliers, and variations that tell a narrative by transforming the data using different data analysis methods and tools. At this phase, you may utilize data mining to identify trends within databases or data visualization tools to convert data into an easily digestible graphical format.

  5. Interpret − Determine how effectively the findings of your analysis addressed your initial query by interpreting them. Based on the facts, what suggestions are possible? What constraints do your conclusions have?

Types of Data Analysis

数据可以通过多种方式用于回答问题并协助决策制定。要选择最佳数据分析方法,您必须了解该领域广泛使用的四种数据分析类型,这可能有帮助。

Data may be utilized to answer questions and assist decision making in several ways. To choose the optimal method for analyzing your data, you must have knowledge about the four types of data analysis widely used in the area might be helpful.

我们将在下面的章节中详细讨论每一部分−

We will discuss each one in detail in the below sections −

Descriptive Analysis

描述性分析是对当前和过去数据进行检查以查找模式和趋势的过程。它有时被称为观察数据的最简单方法,因为它显示了趋势和关系,而不深入详情。

Descriptive analytics is the process of looking at both current and past data to find patterns and trends. It’s sometimes called the simplest way to look at data because it shows about trends and relationships without going into more detail.

描述性分析简单易用,而且可能是几乎每家公司每天都在做的事情。Microsoft Excel 等简单的统计软件或 Google Charts 和 Tableau 等数据可视化工具可以帮助分离数据、查找变量之间的趋势和关系,以及以可视化的方式显示信息。

Descriptive analytics is easy to use and is probably something almost every company does every day. Simple statistical software like Microsoft Excel or data visualisation tools like Google Charts and Tableau can help separate data, find trends and relationships between variables, and show information visually.

描述性分析是一种展示事物随着时间推移如何变化的好方法。它还使用趋势作为更多分析的起点以帮助做出决策。

Descriptive analytics is a good way to show how things have changed over time. It also uses trends as a starting point for more analysis to help make decisions.

这种类型的分析回答了“发生了什么?”的问题。

This type of analysis answers the question, “What happened?”.

描述性分析的一些示例包括财务报表分析、调查报告。

Some examples of descriptive analysis are financial statement analysis, survey reports.

Diagnostic Analysis

诊断分析是使用数据找出趋势和变量之间的相关发生的缘故的过程。这是在使用描述性分析识别趋势之后的下一步。您可以使用算法或统计软件(例如 Microsoft Excel)手动进行诊断分析。

Diagnostic analytics is the process of using data to figure out why trends and correlation between variables happen. It is the next step following identifying trends using descriptive analytics. You can do diagnostic analysis manually, with an algorithm, or with statistical software (such as Microsoft Excel).

在进行诊断分析之前,你必须了解如何进行假设检验,相关性和因果关系的区别,以及诊断回归分析是什么。

Before getting into diagnostic analytics, you should know how to test a hypothesis, what the difference is between correlation and causation, and what diagnostic regression analysis is.

此类型的分析回答以下问题:“为什么会发生这种情况”?

This type of analysis answers the question, “Why did this happened?”.

一些诊断分析的示例是考察市场需求、解释客户行为。

Some examples of diagnostic analysis are examining market demand, explaining customer behavior.

Predictive Analysis

预测分析是使用数据来尝试找出未来会发生什么事的过程。它使用过去的数据对可能发生的未来情况进行预测,这有助于制定战略决策。

Predictive analytics is the process of using data to try to figure out what will happen in the future. It uses data from the past to make predictions about possible future situations that can help make strategic decisions.

预测可能是针对近期或未来,例如预测当天晚些时候设备会发生故障,或预测公司明年的现金流等远期预测。

The forecasts might be for the near term or future, such as anticipating the failure of a piece of equipment later that day, or for the far future, such as projecting your company’s cash flows for the next year.

预测分析可以手动完成,也可以借助机器学习算法来完成。在这两种情况下,都使用过去的数据对未来将发生的事情进行猜测或预测。

Predictive analysis can be done manually or with the help of algorithms for machine learning. In either case, data from the past is used to make guesses or predictions about what will happen in the future.

回归分析是一种预测分析方法,它可以检测两个变量(线性回归)或三个或更多个变量(多元回归)之间的关联。变量之间的关联用一个数学等式表示,该等式可用于预测如果一个变量发生变化,结果会如何。

Regression analysis, which may detect the connection between two variables (linear regression) or three or more variables, is one predictive analytics method (multiple regression). The connections between variables are expressed in a mathematical equation that may be used to anticipate the result if one variable changes.

回归分析使我们能够深入了解该关联的结构,并提供有关数据与该关联的匹配程度的度量。此类见解对于评估过去模式和制定预测非常有用。预测有助于我们制定数据驱动的计划并做出更明智的决策。

Regression allows us to gain insights into the structure of that relationship and provides measures of how well the data fit that relationship. Such insights can be extremely useful for assessing past patterns and formulating predictions. Forecasting can help us to build data-driven plans and make more informed decisions.

此类型的分析回答以下问题:“未来可能发生什么事”?

This type of analysis answers the question, “What might happen in the future?”.

一些预测分析的示例是市场行为定位、医疗保健疾病或过敏反应的早期检测。

Some examples of predictive analysis are Marketing-behavioral targeting, Healthcare-early detection of a disease or an allergic reaction.

Prescriptive Analysis

规范分析是使用数据找出下一步最佳行动的过程。此类型的分析会研究所有重要因素,并提出接下来该做什么的建议。这使得规范分析成为一个基于数据进行决策的实用工具。

Prescriptive analytics is the process of using data to figure out the best thing to do next. This type of analysis looks at all the important factors and comes up with suggestions for what to do next. This makes prescriptive analytics a useful tool for making decisions based on data.

在规范分析中,机器学习算法经常用于比人更快、通常更高效地对大量数据进行分类。算法使用 “if” 和 “else” 语句对数据进行分类,并根据一定的要求提出建议。例如,如果一个数据集中的至少 50% 的客户表示他们对你的客服团队“非常不满意”,则该算法可能会建议你的团队需要更多培训。

In prescriptive analytics, machine-learning algorithms are often used to sort through large amounts of data faster and often more efficiently than a person can. Using "if" and "else" statements, algorithms sort through data and make suggestions based on a certain set of requirements. For example, if at least 50% of customers in a dataset said they were "very unsatisfied" with your customer service team, the algorithm might suggest that your team needs more training.

请务必记住,算法可以根据数据提出建议,但它们不能取代人的判断。规范分析是一个工具,应将其用作帮助做出决策和制定战略的工具。在理解和限制算法产生的结果时,你的判断非常重要且必要。

It’s important to remember that algorithms can make suggestions based on data, but they can’t replace human judgement. Prescriptive analytics is a tool that should be used as such to help make decisions and come up with strategies. Your judgement is important and needed to give context and limits to what an algorithm comes up with.

此类型的分析回答以下问题:“接下来我们应该做什么”?

This type of analysis answers the question, “What should we do next?”.

一些规范分析的示例是:投资决策、销售:潜在客户评分。

Some examples of prescriptive analysis are: Investment decisions, Sales: Lead scoring.