Statistics 简明教程
Statistics - Residual analysis
残差分析用于通过定义残差并检查残差图来评估线性回归模型的适用性。
Residual analysis is used to assess the appropriateness of a linear regression model by defining residuals and examining the residual plot graphs.
Residual
残差 ($ e $) 指观测值 ($ y $) 与预测值 ($ \hat y $) 之间的差异。每个数据点都有一个残差。
Residual($ e $) refers to the difference between observed value($ y $) vs predicted value ($ \hat y $). Every data point have one residual.
Residual Plot
残差图是残差位于纵轴、自变量位于横轴的图形。如果点随机分散在横轴周围,那么线性回归模型适合该数据;否则,选择非线性模型。
A residual plot is a graph in which residuals are on tthe vertical axis and the independent variable is on the horizontal axis. If the dots are randomly dispersed around the horizontal axis then a linear regression model is appropriate for the data; otherwise, choose a non-linear model.
Types of Residual Plot
下面的示例展示了残差图中的一些模式。
Following example shows few patterns in residual plots.

在第一个案例中,点随机分布。因此,更推荐使用线性回归模型。在第二个和第三个案例中,点非随机分布,提示更推荐使用非线性回归方法。
In first case, dots are randomly dispersed. So linear regression model is preferred. In Second and third case, dots are non-randomly dispersed and suggests that a non-linear regression method is preferred.
Example
Problem Statement:
Problem Statement:
检查线性回归模型是否适用于以下数据。
Check where a linear regression model is appropriate for the following data.
$ x $ |
60 |
70 |
80 |
85 |
95 |
$ y $ (Actual Value) |
70 |
65 |
70 |
95 |
85 |
$ \hat y $ (Predicted Value) |
65.411 |
71.849 |
78.288 |
81.507 |
87.945 |
Solution:
Solution:
Step 1: 为每个数据点计算残差。
Step 1: Compute residuals for each data point.
$ x $ |
60 |
70 |
80 |
85 |
95 |
$ y $ (Actual Value) |
70 |
65 |
70 |
95 |
85 |
$ \hat y $ (Predicted Value) |
65.411 |
71.849 |
78.288 |
81.507 |
87.945 |
$ e $ (Residual) |
4.589 |
-6.849 |
-8.288 |
13.493 |
-2.945 |
Step 2: - 绘制残差图。
Step 2: - Draw the residual plot graph.

Step 3: - 检查残差的随机性。
Step 3: - Check the randomness of the residuals.
此处残差图显示出一个随机模式 -第一个残差为正,接下来的两个为负,第四个为正,最后一个残差为负。由于模式相当随机,这表明线性回归模型适用于以上数据。
Here residual plot exibits a random pattern - First residual is positive, following two are negative, the fourth one is positive, and the last residual is negative. As pattern is quite random which indicates that a linear regression model is appropriate for the above data.