Statistics 简明教程

Statistics - Linear regression

一旦使用相关性分析建立了变量之间的关系程度,自然就会深入研究关系的性质。回归分析有助于确定变量之间的因果关系。如果可以使用图形方法或代数方法预测自变量的值,则可以预测其他变量(称为因变量)的值。

Once the degree of relationship between variables has been established using co-relation analysis, it is natural to delve into the nature of relationship. Regression analysis helps in determining the cause and effect relationship between variables. It is possible to predict the value of other variables (called dependent variable) if the values of independent variables can be predicted using a graphical method or the algebraic method.

Graphical Method

它涉及绘制散点图,其中自变量在 X 轴上,因变量在 Y 轴上。然后以一种方式绘制一条线,使其穿过大部分分布,其余点几乎均匀地分布在线的两侧。

It involves drawing a scatter diagram with independent variable on X-axis and dependent variable on Y-axis. After that a line is drawn in such a manner that it passes through most of the distribution, with remaining points distributed almost evenly on either side of the line.

回归线被称为最佳拟合线,它总结了数据的总体运动。它显示了一个变量相对于另一个变量的平均值的最佳平均值。回归线基于这样的标准:它是一条直线,可最小化因变量的预测值和观测值之间的平方偏差总和。

A regression line is known as the line of best fit that summarizes the general movement of data. It shows the best mean values of one variable corresponding to mean values of the other. The regression line is based on the criteria that it is a straight line that minimizes the sum of squared deviations between the predicted and observed values of the dependent variable.

Algebraic Method

代数方法发展了 X 在 Y 上和 Y 在 X 上的两个回归方程。

Algebraic method develops two regression equations of X on Y, and Y on X.

Regression equation of Y on X

其中——

Where −

  1. ${Y}$ = Dependent variable

  2. ${X}$ = Independent variable

  3. ${a}$ = Constant showing Y-intercept

  4. ${b}$ = Constant showing slope of line

a 和 b 的值由以下正规方程获得:

Values of a and b is obtained by the following normal equations:

其中——

Where −

  1. ${N}$ = Number of observations

Regression equation of X on Y

其中——

Where −

  1. ${X}$ = Dependent variable

  2. ${Y}$ = Independent variable

  3. ${a}$ = Constant showing Y-intercept

  4. ${b}$ = Constant showing slope of line

a 和 b 的值由以下正规方程获得:

Values of a and b is obtained by the following normal equations:

其中——

Where −

  1. ${N}$ = Number of observations

Example

Problem Statement:

Problem Statement:

一位研究人员发现,父子体重倾向之间存在相关性。他现在有兴趣根据给定的数据对两个变量建立回归方程式:

A researcher has found that there is a co-relation between the weight tendencies of father and son. He is now interested in developing regression equation on two variables from the given data:

Weight of father (in Kg)

69

63

66

64

67

64

70

66

68

67

65

71

Weight of Son (in Kg)

70

65

68

65

69

66

68

65

71

67

64

72

发展

Develop

Solution:

Solution:

${X}$

${X^2}$

${Y}$

${Y^2}$

${XY}$

69

4761

70

4900

4830

63

3969

65

4225

4095

66

4356

68

4624

4488

64

4096

65

4225

4160

67

4489

69

4761

4623

64

4096

66

4356

4224

70

4900

68

4624

4760

66

4356

65

4225

4290

68

4624

71

5041

4828

67

4489

67

4489

4489

65

4225

64

4096

4160

71

5041

72

5184

5112

${\sum X = 800}$

${\sum X^2 = 53,402}$

${\sum Y = 810}$

${\sum Y^2 = 54,750}$

${\sum XY = 54,059}$

Regression equation of Y on X

Y = a+bX

其中 a 和 b 由正规方程得出

Where , a and b are obtained by normal equations

用 800 乘以方程 (i) 并用 12 乘以方程 (ii),得到:

Multiplying equation (i) with 800 and equation (ii) with 12, we get:

从 (iii) 中减去方程式 (iv)

Subtracting equation (iv) from (iii)

将 b 的值代入方程式 (i) 中

Substituting the value of b in eq. (i)

因此,X 关于 Y 的方程式可以写成

Hence the equation Y on X can be written as

Regression equation of X on Y

X = a+bY

其中 a 和 b 由正规方程得出

Where , a and b are obtained by normal equations

将方程式 (v) 乘以 810,将方程式 (vi) 乘以 12,得到

Multiplying eq (v) by 810 and eq (vi) by 12, we get

从方程式 vii 中减去方程式 viii

Subtracting eq viii from eq vii

将 b 的值代入方程式 (v) 中

Substituting the value of b in equation (v)

因此,X 和 Y 的回归方程为

Hence regression equation of X and Y is