R 简明教程

R - Factors

因子是用于对数据进行分类并将其存储为级别的 data 对象。它们可以存储字符串和整数。它们在具有有限数量唯一值(例如“男性”、“女性”和 True、False 等)的列中很有用。它们在用于统计建模的数据分析中很有用。

Factors are the data objects which are used to categorize the data and store it as levels. They can store both strings and integers. They are useful in the columns which have a limited number of unique values. Like "Male, "Female" and True, False etc. They are useful in data analysis for statistical modeling.

通过将向量作为输入,使用 factor () 函数创建因子。

Factors are created using the factor () function by taking a vector as input.

Example

# Create a vector as input.
data <- c("East","West","East","North","North","East","West","West","West","East","North")

print(data)
print(is.factor(data))

# Apply the factor function.
factor_data <- factor(data)

print(factor_data)
print(is.factor(factor_data))

当我们执行上述代码时,会产生以下结果 -

When we execute the above code, it produces the following result −

[1] "East"  "West"  "East"  "North" "North" "East"  "West"  "West"  "West"  "East" "North"
[1] FALSE
[1] East  West  East  North North East  West  West  West  East  North
Levels: East North West
[1] TRUE

Factors in Data Frame

在创建任何带有文本数据列的数据框架时,R 将文本列视为分类数据,并在其上创建因子。

On creating any data frame with a column of text data, R treats the text column as categorical data and creates factors on it.

# Create the vectors for data frame.
height <- c(132,151,162,139,166,147,122)
weight <- c(48,49,66,53,67,52,40)
gender <- c("male","male","female","female","male","female","male")

# Create the data frame.
input_data <- data.frame(height,weight,gender)
print(input_data)

# Test if the gender column is a factor.
print(is.factor(input_data$gender))

# Print the gender column so see the levels.
print(input_data$gender)

当我们执行上述代码时,会产生以下结果 -

When we execute the above code, it produces the following result −

  height weight gender
1    132     48   male
2    151     49   male
3    162     66 female
4    139     53 female
5    166     67   male
6    147     52 female
7    122     40   male
[1] TRUE
[1] male   male   female female male   female male
Levels: female male

Changing the Order of Levels

可以通过再次应用 factor 函数并指定新级别顺序来更改因子中级别的顺序。

The order of the levels in a factor can be changed by applying the factor function again with new order of the levels.

data <- c("East","West","East","North","North","East","West",
   "West","West","East","North")
# Create the factors
factor_data <- factor(data)
print(factor_data)

# Apply the factor function with required order of the level.
new_order_data <- factor(factor_data,levels = c("East","West","North"))
print(new_order_data)

当我们执行上述代码时,会产生以下结果 -

When we execute the above code, it produces the following result −

 [1] East  West  East  North North East  West  West  West  East  North
Levels: East North West
 [1] East  West  East  North North East  West  West  West  East  North
Levels: East West North

Generating Factor Levels

我们可以使用 gl() 函数生成因子级别。它将两个整数作为输入,以指示有多少个级别以及每个级别出现多少次。

We can generate factor levels by using the gl() function. It takes two integers as input which indicates how many levels and how many times each level.

Syntax

gl(n, k, labels)

以下是所用参数的描述 -

Following is the description of the parameters used −

  1. n is a integer giving the number of levels.

  2. k is a integer giving the number of replications.

  3. labels is a vector of labels for the resulting factor levels.

Example

v <- gl(3, 4, labels = c("Tampa", "Seattle","Boston"))
print(v)

当我们执行上述代码时,会产生以下结果 -

When we execute the above code, it produces the following result −

Tampa   Tampa   Tampa   Tampa   Seattle Seattle Seattle Seattle Boston
[10] Boston  Boston  Boston
Levels: Tampa Seattle Boston