Least squares optimization — Computational Statistics and Statistical Computing 1.0 documentation (2024)

  • Docs »
  • Least squares optimization
  • View page source
In [11]:
%matplotlib inline
In [6]:
import matplotlib.pyplot as pltimport numpy as np

Many optimization problems involve minimization of a sum of squaredresiduals. We will take a look at finding the derivatives for leastsquares minimization.

In least squares problems, we usually have \(m\) labeledobservations \((x_i, y_i)\). We have a model that will predict\(y_i\) given \(x_i\) for some parameters \(\beta\),\(f(x) = X\beta\). We want to minimize the sum (or average) ofsquared residuals \(r(x_i) = y_i - f(x_i)\). For example, theobjective function is usually taken to be

\[\frac{1}{2} \sum{r(x_i)^2}\]

As a concrete example, suppose we want to fit a quadratic function tosome observed data. We have

\[f(x) = \beta_0 + \beta_1 x + \beta_2 x^2\]

We want to minimize the objective function

\[L = \frac{1}{2} \sum_{i=1}^m (y_i - f(x_i))^2\]

Taking derivatives with respect to \(\beta\), we get

\[\begin{split}\frac{dL}{d\beta} =\begin{bmatrix}\sum_{i=1}^m -f(x_i) \\\sum_{i=1}^m -x_i f(x_i) \\\sum_{i=1}^m -x_i^2 f(x_i)\end{bmatrix}\end{split}\]

Working with matrices

Writing the above system as a matrix, we have \(f(x) = X\beta\),with

\[\begin{split}X = \begin{bmatrix}1 & x_1 & x_1^2 \\1 & x_2 & x_2^2 \\\vdots & \vdots & \vdots \\1 & x_m & x_m^2\end{bmatrix}\end{split}\]

and

\[\begin{split}\beta = \begin{bmatrix}\beta_0 \\\beta_1 \\_beta_2\end{bmatrix}\end{split}\]

We want to find the derivative of \(\Vert y - X\beta \Vert^2\), so

\[\begin{split}\Vert y - X\beta \Vert^2 \\= (y - X\beta)^T(y - X\beta) \\= (y^T - \beta^TX^T)(y - X\beta) \\= y^Ty - \beta^TX^Ty -y^TX\beta + \beta^TX^TX\beta\end{split}\]

Taking derivatives with respect to \(\beta^T\) (we do this becausethe gradient is traditionally a row vector, and we want it as a columnvector here), we get

\[\frac{dL}{d\beta^T} = X^TX\beta - X^Ty\]

For example, if we are doing gradient descent, the update equation is

\[\beta_{k+1} = \beta_k + \alpha (X^TX\beta - X^Ty)\]

Note that if we set the derivative to zero and solve, we get

\[X^TX\beta - X^Ty = 0\]

and the normal equations

\[\beta = (X^TX)^{-1}X^Ty\]

For large \(X\), solving the normal equations can be more expensivethan simpler gradient descent. Note that the Levenberg-Marquadtalgorithm is often used to optimize least squares problems.

Example

You are given the following set of data to fit a quadratic polynomialto:

x = np.arange(10)y = np.array([ 1.58873597, 7.55101533, 10.71372171, 7.90123225, -2.05877605, -12.40257359, -28.64568712, -46.39822281, -68.15488905, -97.16032044])

Find the least squares solution using gradient descent.

In [101]:
x = np.arange(10)y = np.array([ 1.58873597, 7.55101533, 10.71372171, 7.90123225, -2.05877605, -12.40257359, -28.64568712, -46.39822281, -68.15488905, -97.16032044])
In [138]:
def f(x, y, b): return (b[0] + b[1]*x + b[2]*x**2 - y)def res(x, y, b): return sum(f(x,y, b)*f(x, y, b))# Elementary form of gradientdef grad(x, y, b): n = len(x) return np.array([ sum(f(x, y, b)), sum(x*f(x, y, b)), sum(x**2*f(x, y, b)) ])# Matrix form of gradientdef grad_m(X, y, b): return X.T@X@b- X.T@y
In [139]:
grad(x, y, np.zeros(3))
Out[139]:
array([ 227.0657638 , 1933.9094954 , 15758.14427298])
In [140]:
X = np.c_[np.ones(len(x)), x, x**2]grad_m(X, y, np.zeros(3))
Out[140]:
array([ 227.0657638 , 1933.9094954 , 15758.14427298])
In [131]:
from scipy.linalg import solvebeta1 = solve(X.T@X, X.T@y)beta1
Out[131]:
array([ 2.55079998, 7.31478229, -2.04118936])
In [143]:
max_iter = 10000
In [144]:
a = 0.0001 # learning ratebeta2 = np.zeros(3)for i in range(max_iter): beta2 -= a * grad(x, y, beta2)beta2
Out[144]:
array([ 2.73391723, 7.23152392, -2.03359658])
In [145]:
a = 0.0001 # learning ratebeta3 = np.zeros(3)for i in range(max_iter): beta3 -= a * grad_m(X, y, beta3)beta3
Out[145]:
array([ 2.73391723, 7.23152392, -2.03359658])
In [146]:
titles = ['svd', 'elementary', 'matrix']plt.figure(figsize=(12,4))for i, beta in enumerate([beta1, beta2, beta3], 1): plt.subplot(1, 3, i) plt.scatter(x, y, s=30) plt.plot(x, beta[0] + beta[1]*x + beta[2]*x**2, color='red') plt.title(titles[i-1])

Least squares optimization — Computational Statistics and Statistical Computing 1.0 documentation (1)

Curve fitting and least squares optimization

As shown above, least squares optimization is the technique mostassociated with curve fitting. For convenience, scipy.optimizeprovides a curve_fit function that uses Levenberg-Marquadt forminimization.

In [1]:
from scipy.optimize import curve_fit
In [4]:
def logistic4(x, a, b, c, d): """The four paramter logistic function is often used to fit dose-response relationships.""" return ((a-d)/(1.0+((x/c)**b))) + d
In [7]:
nobs = 24xdata = np.linspace(0.5, 3.5, nobs)ptrue = [10, 3, 1.5, 12]ydata = logistic4(xdata, *ptrue) + 0.5*np.random.random(nobs)
In [8]:
popt, pcov = curve_fit(logistic4, xdata, ydata)
In [9]:
perr = yerr=np.sqrt(np.diag(pcov))print('Param\tTrue\tEstim (+/- 1 SD)')for p, pt, po, pe in zip('abcd', ptrue, popt, perr): print('%s\t%5.2f\t%5.2f (+/-%5.2f)' % (p, pt, po, pe))
Param True Estim (+/- 1 SD)a 10.00 10.40 (+/- 0.14)b 3.00 4.20 (+/- 1.16)c 1.50 1.38 (+/- 0.09)d 12.00 12.03 (+/- 0.10)
In [13]:
x = np.linspace(0, 4, 100)y = logistic4(x, *popt)plt.plot(xdata, ydata, 'o')plt.plot(x, y)pass

Least squares optimization — Computational Statistics and Statistical Computing 1.0 documentation (2)

Least squares optimization — Computational Statistics and Statistical Computing 1.0 documentation (2024)

FAQs

What is least squares optimization? ›

Least squares (LS) optimiza- tion problems are those in which the objective (error) function is a quadratic function of the parameter(s) being optimized.

How to minimize least squares? ›

So a least-squares solution minimizes the sum of the squares of the differences between the entries of A K x and b . In other words, a least-squares solution solves the equation Ax = b as closely as possible, in the sense that the sum of the squares of the difference b − Ax is minimized.

What is the method of least squares in Python? ›

In Python, the least squares method is used internally by the model to calculate the slope (coefficient) and intercept of the best-fitting line. The line itself is the result of this method.

What is the least squares method of estimation? ›

The least squares method is a statistical procedure to find the best fit for a set of data points. The method works by minimizing the sum of the offsets or residuals of points from the plotted curve. Least squares regression is used to predict the behavior of dependent variables.

What is the purpose of least squares regression analysis? ›

Least squares is a method to apply linear regression. It helps us predict results based on an existing set of data as well as clear anomalies in our data. Anomalies are values that are too good, or bad, to be true or that represent rare cases.

Is linear regression the same as least squares? ›

Linear regression is a type of regression model that assumes a linear relationship between the target and features, while least squares regression is a method used to find the optimal parameters for a linear regression model.

What is the alternative to the least square method? ›

Least squares alternatives

The simplest methods of estimating parameters in a regression model that are less sensitive to outliers than the least squares estimates, is to use least absolute deviations.

What are alternatives to least squares? ›

Robust regression methods provide an alternative to least squares regression by requiring less restrictive assumptions. These methods attempt to dampen the influence of outlying cases in order to provide a better fit to the majority of the data.

What is the least square algorithm? ›

The least square method is the process of finding the best-fitting curve or line of best fit for a set of data points by reducing the sum of the squares of the offsets (residual part) of the points from the curve.

How to calculate least-squares regression? ›

The least-squares regression line equation is y = mx + b, where m is the slope, which is equal to (Nsum(xy) - sum(x)sum(y))/(Nsum(x^2) - (sum x)^2), and b is the y-intercept, which is equals to (sum(y) - msum(x))/N. N is the number of data points, and x and y are the coordinates of the data points.

What is least-squares dummy variable? ›

Least Squares Dummy Variable (LSDV) In dynamic panel data models, dummy variables may be introduced to the least squares to explain the effect of each individual unit of a cross section which is unobserved but correctly specifies the model of relation.

What are the limitations of the least square method? ›

The disadvantages of this method are:
  • It is not readily applicable to censored data.
  • It is generally considered to have less desirable optimality properties than maximum likelihood.
  • It can be quite sensitive to the choice of starting values.

How to derive multiple regression? ›

Matrix algebra is widely used for the derivation of multiple regression because it permits a compact, intuitive depiction of regression analysis. For example, an estimated multiple regression model in scalar notion is expressed as: Y=A+BX1+BX2+BX3+E Y = A + B X 1 + B X 2 + B X 3 + E .

What is least mean square optimization? ›

Least mean squares (LMS) algorithms are a class of adaptive filter used to mimic a desired filter by finding the filter coefficients that relate to producing the least mean square of the error signal (difference between the desired and the actual signal).

What is the least squares mean method? ›

The least-square method states that the curve that best fits a given set of observations, is said to be a curve having a minimum sum of the squared residuals (or deviations or errors) from the given data points.

Why is least squares optimal? ›

In 1822, Gauss was able to state that the least-squares approach to regression analysis is optimal in the sense that in a linear model where the errors have a mean of zero, are uncorrelated, and have equal variances, the best linear unbiased estimator of the coefficients is the least-squares estimator.

What does OLS optimize? ›

To sum up, think of OLS as an optimization strategy to obtain a straight line from your model that is as close as possible to your data points.

Top Articles
Latest Posts
Article information

Author: Rubie Ullrich

Last Updated:

Views: 5834

Rating: 4.1 / 5 (72 voted)

Reviews: 87% of readers found this page helpful

Author information

Name: Rubie Ullrich

Birthday: 1998-02-02

Address: 743 Stoltenberg Center, Genovevaville, NJ 59925-3119

Phone: +2202978377583

Job: Administration Engineer

Hobby: Surfing, Sailing, Listening to music, Web surfing, Kitesurfing, Geocaching, Backpacking

Introduction: My name is Rubie Ullrich, I am a enthusiastic, perfect, tender, vivacious, talented, famous, delightful person who loves writing and wants to share my knowledge and understanding with you.