R 简明教程

R - Nonlinear Least Square

当对回归分析建模真实世界数据时，我们观察到这种情况很少，即模型的方程式是给出线性图的线性方程式。大多数情况下，实际世界数据的模型方程涉及高次数学函数，如 3 的指数或正弦函数。在这种情况下，模型图给出曲线而不是线。线性回归和非线性回归的目标都是调整模型参数的值，以找到最接近您的数据线或曲线。通过发现这些值，我们将能够以良好的准确度估计响应变量。

When modeling real world data for regression analysis, we observe that it is rarely the case that the equation of the model is a linear equation giving a linear graph. Most of the time, the equation of the model of real world data involves mathematical functions of higher degree like an exponent of 3 or a sin function. In such a scenario, the plot of the model gives a curve rather than a line. The goal of both linear and non-linear regression is to adjust the values of the model’s parameters to find the line or curve that comes closest to your data. On finding these values we will be able to estimate the response variable with good accuracy.

在最小二乘回归中，我们建立了一个回归模型，其中不同点从回归曲线的垂直距离的平方和最小化。我们通常从一个定义的模型开始，并为系数假定一些值。然后，我们应用 R 的 nls() 函数来获取更准确的值以及置信区间。

In Least Square regression, we establish a regression model in which the sum of the squares of the vertical distances of different points from the regression curve is minimized. We generally start with a defined model and assume some values for the coefficients. We then apply the nls() function of R to get the more accurate values along with the confidence intervals.

Syntax

在 R 中创建非线性最小二乘测试的基本语法为 −

The basic syntax for creating a nonlinear least square test in R is −

nls(formula, data, start)

以下是所用参数的描述 -

Following is the description of the parameters used −

formula is a nonlinear model formula including variables and parameters.
data is a data frame used to evaluate the variables in the formula.
start is a named list or named numeric vector of starting estimates.

Example

我们将考虑具有其系数初始值假设的非线性模型。接下来，我们将看到这些假设值的置信区间是什么，以便我们可以判断这些值如何很好地适应模型。

We will consider a nonlinear model with assumption of initial values of its coefficients. Next we will see what is the confidence intervals of these assumed values so that we can judge how well these values fir into the model.

因此，让我们考虑以下用于此目的的方程式 −

So let’s consider the below equation for this purpose −

a = b1*x^2+b2

让我们假设初始系数为 1 和 3，并将这些值代入 nls() 函数。

Let’s assume the initial coefficients to be 1 and 3 and fit these values into nls() function.

xvalues <- c(1.6,2.1,2,2.23,3.71,3.25,3.4,3.86,1.19,2.21)
yvalues <- c(5.19,7.43,6.94,8.11,18.75,14.88,16.06,19.12,3.21,7.58)

# Give the chart file a name.
png(file = "nls.png")


# Plot these values.
plot(xvalues,yvalues)


# Take the assumed values and fit into the model.
model <- nls(yvalues ~ b1*xvalues^2+b2,start = list(b1 = 1,b2 = 3))

# Plot the chart with new data by fitting it to a prediction from 100 data points.
new.data <- data.frame(xvalues = seq(min(xvalues),max(xvalues),len = 100))
lines(new.data$xvalues,predict(model,newdata = new.data))

# Save the file.
dev.off()

# Get the sum of the squared residuals.
print(sum(resid(model)^2))

# Get the confidence intervals on the chosen values of the coefficients.
print(confint(model))

当我们执行上述代码时，会产生以下结果 -

When we execute the above code, it produces the following result −

[1] 1.081935
Waiting for profiling to be done...
       2.5%    97.5%
b1 1.137708 1.253135
b2 1.497364 2.496484

我们可以得出结论，b1 的值更接近 1，而 b2 的值更接近 2 而不是 3。

We can conclude that the value of b1 is more close to 1 while the value of b2 is more close to 2 and not 3.