R 简明教程

R - Poisson Regression

泊松回归涉及响应变量为计数形式而不是分数形式的回归模型。例如,出生人数或足球比赛系列中胜利的次数。响应变量的值也遵循泊松分布。

Poisson Regression involves regression models in which the response variable is in the form of counts and not fractional numbers. For example, the count of number of births or number of wins in a football match series. Also the values of the response variables follow a Poisson distribution.

泊松回归的通用的数学方程为 -

The general mathematical equation for Poisson regression is −

log(y) = a + b1x1 + b2x2 + bnxn.....

以下是所用参数的描述 -

Following is the description of the parameters used −

  1. y is the response variable.

  2. a and b are the numeric coefficients.

  3. x is the predictor variable.

用于创建泊松回归模型的函数是 glm() 函数。

The function used to create the Poisson regression model is the glm() function.

Syntax

glm() 函数在泊松回归中的基本语法是 -

The basic syntax for glm() function in Poisson regression is −

glm(formula,data,family)

以下是上面函数中使用的参数的描述 -

Following is the description of the parameters used in above functions −

  1. formula is the symbol presenting the relationship between the variables.

  2. data is the data set giving the values of these variables.

  3. family is R object to specify the details of the model. It’s value is 'Poisson' for Logistic Regression.

Example

我们有内置的数据集“warpbreaks”,它描述了羊毛类型(A 或 B)和张力(低、中或高)对每台织机经纱断裂次数的影响。我们不妨将“断裂”视为响应变量,它是断裂次数的计数。羊毛“类型”和“张力”被视为预测变量。

We have the in-built data set "warpbreaks" which describes the effect of wool type (A or B) and tension (low, medium or high) on the number of warp breaks per loom. Let’s consider "breaks" as the response variable which is a count of number of breaks. The wool "type" and "tension" are taken as predictor variables.

Input Data

Input Data

input <- warpbreaks
print(head(input))

当我们执行上述代码时,会产生以下结果 -

When we execute the above code, it produces the following result −

      breaks   wool  tension
1     26       A     L
2     30       A     L
3     54       A     L
4     25       A     L
5     70       A     L
6     52       A     L

Create Regression Model

output <-glm(formula = breaks ~ wool+tension, data = warpbreaks,
   family = poisson)
print(summary(output))

当我们执行上述代码时,会产生以下结果 -

When we execute the above code, it produces the following result −

Call:
glm(formula = breaks ~ wool + tension, family = poisson, data = warpbreaks)

Deviance Residuals:
    Min       1Q     Median       3Q      Max
  -3.6871  -1.6503  -0.4269     1.1902   4.2616

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)  3.69196    0.04541  81.302  < 2e-16 ***
woolB       -0.20599    0.05157  -3.994 6.49e-05 ***
tensionM    -0.32132    0.06027  -5.332 9.73e-08 ***
tensionH    -0.51849    0.06396  -8.107 5.21e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 297.37  on 53  degrees of freedom
Residual deviance: 210.39  on 50  degrees of freedom
AIC: 493.06

Number of Fisher Scoring iterations: 4

在摘要中,我们寻找最后一列中的 p 值是否小于 0.05,以考虑预测变量对响应变量的影响。如上所述,张力类型为 M 和 H 的 B 型羊毛对断数有影响。

In the summary we look for the p-value in the last column to be less than 0.05 to consider an impact of the predictor variable on the response variable. As seen the wooltype B having tension type M and H have impact on the count of breaks.