Machine Learning 简明教程
Machine Learning - Linear Regression
线性回归可以定义为分析因变量与给定的一组自变量之间的线性关系的统计模型。变量之间的线性关系意味着当一个或多个自变量的值变化(增加或减少)时,因变量的值也会相应地发生变化(增加或减少)。
Linear regression may be defined as the statistical model that analyzes the linear relationship between a dependent variable with given set of independent variables. Linear relationship between variables means that when the value of one or more independent variables will change (increase or decrease), the value of dependent variable will also change accordingly (increase or decrease).
在数学上,可以通过以下等式来表示这种关系 −
Mathematically the relationship can be represented with the help of following equation −
Y=mX+b
在此,
Here,
-
Y is the dependent variable we are trying to predict
-
X is the dependent variable we are using to make predictions
-
m is the slop of the regression line which represents the effect X has on Y.
-
b is a constant, known as the Y-intercept. If X = 0, Y would be equal to b.
此外,线性关系的本质可以是正面的或负面的,如下所述 −
Furthermore, the linear relationship can be positive or negative in nature as explained below −
Positive Linear Relationship
如果自变量和因变量均增加,则线性关系将称为正相关关系。可以通过以下图形来理解这一点 −
A linear relationship will be called positive if both independent and dependent variable increases. It can be understood with the help of following graph −
Negative Linear Relationship
如果自变量增加而因变量减小,则线性关系将称为正相关关系。可以通过以下图形来理解这一点 −
A linear relationship will be called positive if independent increases and dependent variable decreases. It can be understood with the help of following graph −
线性回归有两种类型,“简单线性回归”和“多元线性回归”,我们将在本教程的接下来两章中讨论这两种类型。
Linear regression is of two types, "simple linear regression" and "multiple linear regression", which we are going to discuss in the next two chapters of this tutorial.
Assumptions
以下是线性回归模型对数据集所做的一些假设 −
The following are some assumptions about dataset that is made by Linear Regression model −
Multi-collinearity − 线性回归模型假设数据中几乎没有或没有多重共线性。基本上,当自变量或特征其中有依赖关系时,就会出现多重共线性。
Multi-collinearity − Linear regression model assumes that there is very little or no multi-collinearity in the data. Basically, multi-collinearity occurs when the independent variables or features have dependency in them.
Auto-correlation − 线性回归模型的另一项假设是数据中几乎没有或没有自相关。基本上,当残差误差之间存在依赖关系时,就会出现自相关。
Auto-correlation − Another assumption Linear regression model assumes is that there is very little or no auto-correlation in the data. Basically, auto-correlation occurs when there is dependency between residual errors.
Relationship between variables − 线性回归模型假设响应变量和特征变量之间的关系必须是线性的。
Relationship between variables − Linear regression model assumes that the relationship between response and feature variables must be linear.