Excel Data Analysis 简明教程

Cleaning Data with Text Functions

从不同来源获得的数据可能不适合于分析。在本章中,你将了解如何准备你的数据,即以文本格式用于分析的数据。

The data that you obtain from different sources many not be in a form ready for analysis. In this chapter, you will understand how to prepare your data that is in the form of text for analysis.

最初,你需要清理数据。数据清理包括从文本中移除无用字符。接下来,你需要按你所需的分析格式来组织数据。通过以下步骤可以完成该操作:

Initially, you need to clean the data. Data cleaning includes removing unwanted characters from text. Next, you need to structure the data in the form you require for further analysis. You can do the same by −

  1. Finding required text patterns with the text functions.

  2. Extracting data values from text.

  3. Formatting data with text functions.

  4. Executing data operations with the text functions.

Removing Unwanted Characters from Text

当您从另一个应用程序导入数据时,其中可能包含不可打印的字符和/或多余的空格。多余的空格可能是:

When you import data from another application, it can have nonprintable characters and/or excess spaces. The excess spaces can be −

  1. leading spaces, and/or

  2. extra spaces between words.

如果您对该数据进行排序或分析,您会得到错误的结果。

If you sort or analyze such data, you will get erroneous results.

考虑以下示例 −

Consider the following example −

product data

这是您获得的产品信息中包含产品 ID、产品描述和价格的原始数据。字符“|”分隔每一行中的字段。

This is the raw data that you have obtained on product information containing the Product ID, Product description and the price. The character “|” separates the field in each row.

当您将该数据导入 Excel 工作表时,它将显示如下所示 −

When you import this data into Excel worksheet, it looks as follows −

import data

正如您所看到的,整个数据都在单列中。在执行数据分析之前,您需要对数据进行结构化处理。但是,您首先需要清洗数据。

As you observe, the entire data is in a single column. You need to structure this data to perform data analysis. However, initially you need to clean the data.

您需要删除数据中可能存在的任何不可打印的字符和多余空格。为此,可以使用 CLEAN 函数和 TRIM 函数。

You need to remove any nonprintable characters and excess spaces that might be present in the data. You can use the CLEAN function and TRIM function for this purpose.

S.No.

Function & Description

1.

CLEAN Removes all nonprintable characters from text

2.

TRIM Removes spaces from text

  1. Select the Cells C3 – C11.

  2. Type =TRIM (CLEAN (B3)) and then press CTRL + Enter.

公式填入单元格 C3 – C11 中。

The formula is filled in the cells C3 – C11.

formula filled

结果将如下所示:

The result will be as shown below −

formula filled result

Finding required Text Patterns with the Text Functions

要对数据进行结构化处理,您可能必须执行某些文本模式匹配,以从中提取数据值。一些可用于此目的的文本函数包括:

To structure your data, you might have to do certain Text Pattern matching based on which you can extract the Data Values. Some of the Text Functions that are useful for this purpose are −

S.No.

Function & Description

1.

EXACT Checks to see if two text values are identical

2.

FIND Finds one text value within another (case-sensitive)

3.

SEARCH Finds one text value within another (not case-sensitive)

Extracting Data Values from Text

您需要从文本中提取所需数据以对其进行结构化处理。在上例中,假设您需要将数据放入三个列中:ProductID、Product_Description 和 Price。

You need to extract the required data from text in order to structure the same. In the above example, say, you need to place the data in three columns – ProductID, Product_Description and Price.

您可以通过以下方式之一提取数据:

You can extract data in one of the following ways −

  1. Extracting Data Values with Convert Text to Columns Wizard

  2. Extracting Data Values with Text Functions

  3. Extracting Data Values with Flash Fill

Extracting Data Values with Convert Text to Columns Wizard

如果您的字段符合以下条件,则可以使用 Convert Text to Columns Wizard 将数据值提取到 Excel 列中:

You can use the Convert Text to Columns Wizard to extract Data Values into Excel columns if your fields are −

  1. Delimited by a character, or

  2. Aligned in columns with spaces between each field.

在上例中,字段以字符“|”分隔。因此,可以使用 Convert Text to Columns 向导。

In the above example, the fields are delimited by the character “|”. Hence, you can use the Convert Text to Columns wizard.

  1. Select the data.

  2. Copy and paste values in the same place. Otherwise, Convert Text to Columns takes the functions rather than the data itself as the input.

convert text to columns
  1. Select the data.

  2. Click on Text to Columns in the Data Tools group under Data Tab on the Ribbon.

出现 Step 1 −“文本到列向导” - 步骤 1/3。

Step 1 − Convert Text to Columns Wizard - Step 1 of 3 appears.

  1. Select Delimited.

  2. Click Next.

convert text to columns step1

出现 Step 2 −“文本到列向导” - 步骤 2/3。

Step 2 − Convert Text to Columns Wizard - Step 2 of 3 appears.

  1. Under Delimiters, select Other.

  2. In the box next to Other, type the character |

  3. Click Next.

convert text to columns step2

出现 Step 3 −“文本到列向导” - 步骤 3/3。

Step 3 − Convert Text to Columns Wizard - Step 3 of 3 appears.

在此屏幕中,您可以在向导中选择数据的每一列,并为该列设置格式。

In this screen, you can select each column of your data in the wizard and set the format for that column.

  1. For Destination, select the cell D3.

  2. You can click Advanced, and set Decimal Separator and Thousands Separator in the Advanced Text Import Settings dialog box that appears.

  3. Click Finish.

convert text to columns step3

已转换为列的数据显示在三列中 - D、E 和 F。

Your data, which is converted to columns appears in the three Columns – D, E and F.

  1. Name the Column headers as ProductID, Product_Description and Price.

name column headers

Extracting Data Values with Text Functions

假设您数据中的字段既没有用字符分隔,也没有用空格排列在列中,则可以使用文本函数来提取数据值。即使字段已分隔,您仍然可以使用文本函数来提取数据。

Suppose the fields in your data neither are delimited by a character nor are aligned in columns with spaces between each field, you can use text functions to extract data values. Even in the case the fields are delimited, you can still use text functions to extract data.

可用于此目的的一些文本函数如下:

Some of the text functions that are useful for this purpose are −

S.No.

Function & Description

1.

LEFT Returns the leftmost characters from a text value

2.

RIGHT Returns the rightmost characters from a text value

3.

MID Returns a specific number of characters from a text string starting at the position you specify

4.

LEN Returns the number of characters in a text string

您还可以根据手头的数据,将两个或更多文本函数组合在一起,以提取所需的数据值。例如,使用 LEFT、RIGHT 和 VALUE 函数的组合或使用 FIND、LEFT、LEN 和 MID 函数的组合。

You can also combine two or more of these text functions as per the data you have at hand, to extract the required data values. For example, using a combination of LEFT, RIGHT and VALUE functions or using a combination of FIND, LEFT, LEN and MID functions.

在上面示例中,

In the above example,

  1. All the characters left to the first | give the name ProductID.

  2. All the characters right to the second | give the name Price.

  3. All the characters that lie between the first | and second | give the name Product_Description.

  4. Each | has a space before and after.

观察此信息后,您可以按以下步骤提取数据值 −

Observing this information, you can extract the data values with the following steps −

  1. Find the Position of First | - First | Position You can use FIND function

  2. Find the Position of Second | - Second | Position You can use FIND function again

  3. Beginning to (First | Position – 2) Characters of the Text give ProductID You can use LEFT Function

  4. (First | Position + 2) to (Second | Position - 2) Characters of the Text give Product_Description You can use MID Function

  5. (Second | Position + 2) to End Characters of the Text give Price You can use RIGHT Function

extract data values

结果将如下所示:

The result will be as shown below −

extract data values result

您可能会观察到,价格列中的值是文本值。要对这些值执行计算,您必须设置相应单元格的格式。您可以参阅下面提供的部分,了解如何设置文本格式。

You can observe that the values in the price column are text values. To perform calculations on these values, you have to format the corresponding cells. You can look at the section given below to understand formatting text.

Extracting Data Values with Flash Fill

使用 Excel Step 1 是从文本中提取数据值的另一种方式。但是,只有在 Excel 能够在数据中找到模式时,此方法才会起作用。

Using Excel Flash Fill is another way to extract data values from text. However, this works only when Excel is able to find a pattern in the data.

Step 2 − 在数据旁边创建三列,分别用于 ProductID、Product_Description 和 Price。

Step 1 − Create three columns for ProductID, Product_Description and Price next to the data.

create columns

Step 3 − 复制并粘贴 B3 中的 C3、D3 和 E3 的值。

Step 2 − Copy and paste the values for C3, D3 and E3 from B3.

paste values

Flash Fill − 选择单元格 C3,然后单击 Data Tools 选项卡上的 Data 组中的 Step 4 。将填充 ProductID 的所有值。

Step 3 − Select cell C3 and click Flash Fill in the Data Tools group on the Data tab. All the values for ProductID get filled.

flash fill

Functions that format the Text as per your need − 对 Product_Description 和 Price 重复上述步骤。将填充数据。

Step 4 − Repeat the above given steps for Product_Description and Price. The data is filled.

repeat steps

Formatting Data with Text Functions

Excel 具有几个内置文本函数,您可以使用这些函数设置包含文本的数据的格式。其中包括 −

Excel has several built-in text functions that you can use for formatting data containing text. These include −

PROPER

Functions that format the Text as per your need

S.No.

Function & Description

1.

LOWER Converts text to lowercase

S.No.

Function & Description

1.

UPPER Converts text to uppercase

2.

PROPER Capitalizes the first letter in each word of a text value

Functions that convert and/or format the Numbers as Text

S.No.

Function & Description

1.

DOLLAR Converts a number to text, using the $ (dollar) currency format

2.

FIXED Formats a number as text with a fixed number of decimals

3.

TEXT Formats a number and converts it to text

Functions that convert the Text to Numbers

S.No.

Function & Description

1.

VALUE Converts a text argument to a number

Executing Data Operations with the Text Functions

Executing Data Operations with the Text Functions

你可能需要在你的数据上执行一些文本操作。例如,如果组织中员工的登录 ID 更改为新格式,基于格式更改,可能必须执行文本替换。

You might have to perform certain Text Operations on your Data. For example, if Login-IDs for the Employees are changed to a New Format in an Organization, based on the Format Change, Text Replacements might have to be done.

以下文本函数可帮助你对你包含文本的数据执行文本操作:

Following Text Functions help you in performing Text Operations on your data containing Text −

S.No.

Function & Description

1.

REPLACE Replaces characters within text

2.

SUBSTITUTE Substitutes new text for old text in a text string

3.

CONCATENATE Joins several text items into one text item

4.

CONCAT Combines the text from multiple ranges and/or strings, but it does not provide the delimiter or IgnoreEmpty arguments.

5.

TEXTJOIN Combines the text from multiple ranges and/or strings, and includes a delimiter you specify between each text value that will be combined. If the delimiter is an empty text string, this function will effectively concatenate the ranges.

6.

REPT Repeats text a given number of times