R 简明教程
R - Mean, Median and Mode
R 中的统计分析通过使用许多内置函数来执行。这些函数大多数是 R base 包的一部分。这些函数将 R 向量作为输入以及参数,并给出了结果。
Statistical analysis in R is performed by using many in-built functions. Most of these functions are part of the R base package. These functions take R vector as an input along with the arguments and give the result.
我们在本章中讨论的函数是平均值、中值和众数。
The functions we are discussing in this chapter are mean, median and mode.
Mean
它通过计算值总和并将其除以数据序列中的值数来计算。
It is calculated by taking the sum of the values and dividing with the number of values in a data series.
mean() 函数用于在 R 中计算这一内容。
The function mean() is used to calculate this in R.
Syntax
在 R 中计算均值的低级语法是 −
The basic syntax for calculating mean in R is −
mean(x, trim = 0, na.rm = FALSE, ...)
以下是所用参数的描述 -
Following is the description of the parameters used −
-
x is the input vector.
-
trim is used to drop some observations from both end of the sorted vector.
-
na.rm is used to remove the missing values from the input vector.
Applying Trim Option
如果提供了修剪参数,则矢量中的值会被排序,然后会从计算均值中删除所需数量的观测。
When trim parameter is supplied, the values in the vector get sorted and then the required numbers of observations are dropped from calculating the mean.
如果修剪 = 0.3,则会从每端删除 3 个值,以通过计算查找均值。
When trim = 0.3, 3 values from each end will be dropped from the calculations to find mean.
在这种情况下,排序后的矢量是 (−21, −5, 2, 3, 4.2, 7, 8, 12, 18, 54),并且用于计算均值的从矢量中删除的值是从左到右的 (−21、-5、2) 和从右到左的 (12、18、54)。
In this case the sorted vector is (−21, −5, 2, 3, 4.2, 7, 8, 12, 18, 54) and the values removed from the vector for calculating mean are (−21,−5,2) from left and (12,18,54) from right.
# Create a vector.
x <- c(12,7,3,4.2,18,2,54,-21,8,-5)
# Find Mean.
result.mean <- mean(x,trim = 0.3)
print(result.mean)
当我们执行上述代码时,会产生以下结果 -
When we execute the above code, it produces the following result −
[1] 5.55
Applying NA Option
如果有缺失值,则均值函数将返回 NA。
If there are missing values, then the mean function returns NA.
若要从计算中删除缺失值,请使用 na.rm = TRUE。表示删除 NA 值。
To drop the missing values from the calculation use na.rm = TRUE. which means remove the NA values.
# Create a vector.
x <- c(12,7,3,4.2,18,2,54,-21,8,-5,NA)
# Find mean.
result.mean <- mean(x)
print(result.mean)
# Find mean dropping NA values.
result.mean <- mean(x,na.rm = TRUE)
print(result.mean)
当我们执行上述代码时,会产生以下结果 -
When we execute the above code, it produces the following result −
[1] NA
[1] 8.22
Median
数据系列中的最中间值称为中位数。 median() 函数用于在 R 中计算此值。
The middle most value in a data series is called the median. The median() function is used in R to calculate this value.
Mode
众数是数据集中出现次数最多的值。与均值和中位数不同,众数既可以是数字数据,也可以是字符数据。
The mode is the value that has highest number of occurrences in a set of data. Unike mean and median, mode can have both numeric and character data.
R 没有用来计算众数的标准内置函数。因此,我们创建了一个用户函数来计算 R 中数据集的众数。此函数将矢量作为输入,并给出众数值作为输出。
R does not have a standard in-built function to calculate mode. So we create a user function to calculate mode of a data set in R. This function takes the vector as input and gives the mode value as output.
Example
# Create the function.
getmode <- function(v) {
uniqv <- unique(v)
uniqv[which.max(tabulate(match(v, uniqv)))]
}
# Create the vector with numbers.
v <- c(2,1,2,3,1,2,3,4,1,5,5,3,2,3)
# Calculate the mode using the user function.
result <- getmode(v)
print(result)
# Create the vector with characters.
charv <- c("o","it","the","it","it")
# Calculate the mode using the user function.
result <- getmode(charv)
print(result)
当我们执行上述代码时,会产生以下结果 -
When we execute the above code, it produces the following result −
[1] 2
[1] "it"