Statistics 简明教程
Statistics - Linear regression
一旦使用相关性分析建立了变量之间的关系程度,自然就会深入研究关系的性质。回归分析有助于确定变量之间的因果关系。如果可以使用图形方法或代数方法预测自变量的值,则可以预测其他变量(称为因变量)的值。
Once the degree of relationship between variables has been established using co-relation analysis, it is natural to delve into the nature of relationship. Regression analysis helps in determining the cause and effect relationship between variables. It is possible to predict the value of other variables (called dependent variable) if the values of independent variables can be predicted using a graphical method or the algebraic method.
Graphical Method
它涉及绘制散点图,其中自变量在 X 轴上,因变量在 Y 轴上。然后以一种方式绘制一条线,使其穿过大部分分布,其余点几乎均匀地分布在线的两侧。
It involves drawing a scatter diagram with independent variable on X-axis and dependent variable on Y-axis. After that a line is drawn in such a manner that it passes through most of the distribution, with remaining points distributed almost evenly on either side of the line.
回归线被称为最佳拟合线,它总结了数据的总体运动。它显示了一个变量相对于另一个变量的平均值的最佳平均值。回归线基于这样的标准:它是一条直线,可最小化因变量的预测值和观测值之间的平方偏差总和。
A regression line is known as the line of best fit that summarizes the general movement of data. It shows the best mean values of one variable corresponding to mean values of the other. The regression line is based on the criteria that it is a straight line that minimizes the sum of squared deviations between the predicted and observed values of the dependent variable.
Algebraic Method
代数方法发展了 X 在 Y 上和 Y 在 X 上的两个回归方程。
Algebraic method develops two regression equations of X on Y, and Y on X.
Regression equation of Y on X
其中——
Where −
-
${Y}$ = Dependent variable
-
${X}$ = Independent variable
-
${a}$ = Constant showing Y-intercept
-
${b}$ = Constant showing slope of line
a 和 b 的值由以下正规方程获得:
Values of a and b is obtained by the following normal equations:
其中——
Where −
-
${N}$ = Number of observations
Regression equation of X on Y
其中——
Where −
-
${X}$ = Dependent variable
-
${Y}$ = Independent variable
-
${a}$ = Constant showing Y-intercept
-
${b}$ = Constant showing slope of line
a 和 b 的值由以下正规方程获得:
Values of a and b is obtained by the following normal equations:
其中——
Where −
-
${N}$ = Number of observations
Example
Problem Statement:
Problem Statement:
一位研究人员发现,父子体重倾向之间存在相关性。他现在有兴趣根据给定的数据对两个变量建立回归方程式:
A researcher has found that there is a co-relation between the weight tendencies of father and son. He is now interested in developing regression equation on two variables from the given data:
Weight of father (in Kg) |
69 |
63 |
66 |
64 |
67 |
64 |
70 |
66 |
68 |
67 |
65 |
71 |
Weight of Son (in Kg) |
70 |
65 |
68 |
65 |
69 |
66 |
68 |
65 |
71 |
67 |
64 |
72 |
发展
Develop
Solution:
Solution:
${X}$ |
${X^2}$ |
${Y}$ |
${Y^2}$ |
${XY}$ |
69 |
4761 |
70 |
4900 |
4830 |
63 |
3969 |
65 |
4225 |
4095 |
66 |
4356 |
68 |
4624 |
4488 |
64 |
4096 |
65 |
4225 |
4160 |
67 |
4489 |
69 |
4761 |
4623 |
64 |
4096 |
66 |
4356 |
4224 |
70 |
4900 |
68 |
4624 |
4760 |
66 |
4356 |
65 |
4225 |
4290 |
68 |
4624 |
71 |
5041 |
4828 |
67 |
4489 |
67 |
4489 |
4489 |
65 |
4225 |
64 |
4096 |
4160 |
71 |
5041 |
72 |
5184 |
5112 |
${\sum X = 800}$ |
${\sum X^2 = 53,402}$ |
${\sum Y = 810}$ |
${\sum Y^2 = 54,750}$ |
${\sum XY = 54,059}$ |
Regression equation of Y on X
Y = a+bX
其中 a 和 b 由正规方程得出
Where , a and b are obtained by normal equations
用 800 乘以方程 (i) 并用 12 乘以方程 (ii),得到:
Multiplying equation (i) with 800 and equation (ii) with 12, we get:
从 (iii) 中减去方程式 (iv)
Subtracting equation (iv) from (iii)
将 b 的值代入方程式 (i) 中
Substituting the value of b in eq. (i)
因此,X 关于 Y 的方程式可以写成
Hence the equation Y on X can be written as
Regression equation of X on Y
X = a+bY
其中 a 和 b 由正规方程得出
Where , a and b are obtained by normal equations
将方程式 (v) 乘以 810,将方程式 (vi) 乘以 12,得到
Multiplying eq (v) by 810 and eq (vi) by 12, we get
从方程式 vii 中减去方程式 viii
Subtracting eq viii from eq vii
将 b 的值代入方程式 (v) 中
Substituting the value of b in equation (v)
因此,X 和 Y 的回归方程为
Hence regression equation of X and Y is