R 简明教程

R - Logistic Regression

逻辑回归是一种回归模型,其中响应变量(因变量)具有分类值,如真/假或 0/1。它实际上测量了根据与预测变量有关的数学方程,二元响应的概率作为响应变量的值。

The Logistic Regression is a regression model in which the response variable (dependent variable) has categorical values such as True/False or 0/1. It actually measures the probability of a binary response as the value of response variable based on the mathematical equation relating it with the predictor variables.

逻辑回归的通用数学方程为 −

The general mathematical equation for logistic regression is −

y = 1/(1+e^-(a+b1x1+b2x2+b3x3+...))

以下是所用参数的描述 -

Following is the description of the parameters used −

  1. y is the response variable.

  2. x is the predictor variable.

  3. a and b are the coefficients which are numeric constants.

用于创建回归模型的函数是 glm() 函数。

The function used to create the regression model is the glm() function.

Syntax

glm() 函数在逻辑回归中的基本语法为 −

The basic syntax for glm() function in logistic regression is −

glm(formula,data,family)

以下是所用参数的描述 -

Following is the description of the parameters used −

  1. formula is the symbol presenting the relationship between the variables.

  2. data is the data set giving the values of these variables.

  3. family is R object to specify the details of the model. It’s value is binomial for logistic regression.

Example

内置数据集“mtcars”描述了不同汽车型号及其各种发动机规格。“mtcars”数据集中,变速模式(自动或手动)由 am 列描述,它是一个二进制值(0 或 1)。我们可以在“am”列和其他 3 个列(hp、wt 和 cyl)之间创建逻辑回归模型。

The in-built data set "mtcars" describes different models of a car with their various engine specifications. In "mtcars" data set, the transmission mode (automatic or manual) is described by the column am which is a binary value (0 or 1). We can create a logistic regression model between the columns "am" and 3 other columns - hp, wt and cyl.

# Select some columns form mtcars.
input <- mtcars[,c("am","cyl","hp","wt")]

print(head(input))

当我们执行上述代码时,会产生以下结果 -

When we execute the above code, it produces the following result −

                  am   cyl  hp    wt
Mazda RX4          1   6    110   2.620
Mazda RX4 Wag      1   6    110   2.875
Datsun 710         1   4     93   2.320
Hornet 4 Drive     0   6    110   3.215
Hornet Sportabout  0   8    175   3.440
Valiant            0   6    105   3.460

Create Regression Model

我们使用 glm() 函数创建回归模型并获得其摘要以进行分析。

We use the glm() function to create the regression model and get its summary for analysis.

input <- mtcars[,c("am","cyl","hp","wt")]

am.data = glm(formula = am ~ cyl + hp + wt, data = input, family = binomial)

print(summary(am.data))

当我们执行上述代码时,会产生以下结果 -

When we execute the above code, it produces the following result −

Call:
glm(formula = am ~ cyl + hp + wt, family = binomial, data = input)

Deviance Residuals:
     Min        1Q      Median        3Q       Max
-2.17272     -0.14907  -0.01464     0.14116   1.27641

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept) 19.70288    8.11637   2.428   0.0152 *
cyl          0.48760    1.07162   0.455   0.6491
hp           0.03259    0.01886   1.728   0.0840 .
wt          -9.14947    4.15332  -2.203   0.0276 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 43.2297  on 31  degrees of freedom
Residual deviance:  9.8415  on 28  degrees of freedom
AIC: 17.841

Number of Fisher Scoring iterations: 8

Conclusion

在摘要中,由于变量“cyl”和“hp”最后一列中的 p 值大于 0.05,我们认为它们对变量“am”的值贡献不大。此回归模型中只有重量(wt)影响“am”值。

In the summary as the p-value in the last column is more than 0.05 for the variables "cyl" and "hp", we consider them to be insignificant in contributing to the value of the variable "am". Only weight (wt) impacts the "am" value in this regression model.