Excel Data Analysis 简明教程
Data Analysis - Process
数据分析是一个收集、转换、清理和建模数据的过程,目的是发现所需信息。获得的结果被传达出来,提出结论并支持决策。有时使用数据可视化来描述数据,以便于发现数据中的有用模式。术语数据建模和数据分析具有相同含义。
Data Analysis is a process of collecting, transforming, cleaning, and modeling data with the goal of discovering the required information. The results so obtained are communicated, suggesting conclusions, and supporting decision-making. Data visualization is at times used to portray the data for the ease of discovering the useful patterns in the data. The terms Data Modeling and Data Analysis mean the same.
数据分析流程包括以下具有迭代性质的阶段 −
Data Analysis Process consists of the following phases that are iterative in nature −
-
Data Requirements Specification
-
Data Collection
-
Data Processing
-
Data Cleaning
-
Data Analysis
-
Communication
Data Requirements Specification
分析所需数据基于问题或实验。根据指导分析的人员的要求,确定作为分析输入所必需的数据(例如,人口)。可以指定和获取关于人群的特定变量(例如,年龄和收入)。数据可以是数值的或分类的。
The data required for analysis is based on a question or an experiment. Based on the requirements of those directing the analysis, the data necessary as inputs to the analysis is identified (e.g., Population of people). Specific variables regarding a population (e.g., Age and Income) may be specified and obtained. Data may be numerical or categorical.
Data Collection
数据收集是收集对确定为数据要求的目标变量的信息的过程。重点是确保准确和诚实地收集数据。数据收集确保收集的数据准确,以便相关的决策有效。数据收集既提供了衡量基准,也提供了改进目标。
Data Collection is the process of gathering information on targeted variables identified as data requirements. The emphasis is on ensuring accurate and honest collection of data. Data Collection ensures that data gathered is accurate such that the related decisions are valid. Data Collection provides both a baseline to measure and a target to improve.
数据可从不同来源收集,包括组织数据库到网页信息。这样获取的数据可能未经结构化且可能包含不相关的信息。因此,收集到的数据需要经过数据处理和数据清理。
Data is collected from various sources ranging from organizational databases to the information in web pages. The data thus obtained, may not be structured and may contain irrelevant information. Hence, the collected data is required to be subjected to Data Processing and Data Cleaning.
Data Processing
收集的数据必须经过处理或整理以为分析做准备。这包括根据相关分析工具的要求对数据进行结构化。例如,数据可能需要放入一个电子表格或统计应用程序中的表格中的行和列中。一个数据模型可能需要被创建。
The data that is collected must be processed or organized for analysis. This includes structuring the data as required for the relevant Analysis Tools. For example, the data might have to be placed into rows and columns in a table within a Spreadsheet or Statistical Application. A Data Model might have to be created.
Data Cleaning
处理和整理的数据可能是残缺不全的,包含重复项或包含错误。数据清理是防止和纠正这些错误的过程。有几种数据清理方式,具体取决于数据的类型。例如,在清理财务数据时,可以将某些总量与可靠的公布数字或已定义的阈值进行比较。同样,定量数据方法可用于异常值检测,之后将在分析中将这些异常值排除在外。
The processed and organized data may be incomplete, contain duplicates, or contain errors. Data Cleaning is the process of preventing and correcting these errors. There are several types of Data Cleaning that depend on the type of data. For example, while cleaning the financial data, certain totals might be compared against reliable published numbers or defined thresholds. Likewise, quantitative data methods can be used for outlier detection that would be subsequently excluded in analysis.
Data Analysis
经过处理、整理和清理的数据将准备好进行分析。有各种数据分析技术可用于根据需求理解、解释和得出结论。还可以使用数据可视化以图形格式检查数据,以便获得有关数据中消息的其他见解。
Data that is processed, organized and cleaned would be ready for the analysis. Various data analysis techniques are available to understand, interpret, and derive conclusions based on the requirements. Data Visualization may also be used to examine the data in graphical format, to obtain additional insight regarding the messages within the data.
诸如相关性、回归分析的统计数据模型可用于识别数据变量中的关系。这些描述数据的模型有助于简化分析和传达结果。
Statistical Data Models such as Correlation, Regression Analysis can be used to identify the relations among the data variables. These models that are descriptive of the data are helpful in simplifying analysis and communicate results.
该过程可能需要额外的“数据清理”或额外的“数据收集”,因此,这些活动本质上是迭代的。
The process might require additional Data Cleaning or additional Data Collection, and hence these activities are iterative in nature.
Communication
数据分析的结果应按照用户要求的格式报告,以支持其决策和进一步的行动。用户反馈可能导致额外的分析。
The results of the data analysis are to be reported in a format as required by the users to support their decisions and further action. The feedback from the users might result in additional analysis.
数据分析师可以选择数据可视化技术,例如表格和图表,这有助于以清晰而有效的方式向用户传达信息。分析工具提供了使用颜色代码和表格与图表中格式突出显示所需信息的功能。
The data analysts can choose data visualization techniques, such as tables and charts, which help in communicating the message clearly and efficiently to the users. The analysis tools provide facility to highlight the required information with color codes and formatting in tables and charts.