- Docs »
- Least squares optimization
- View page source
In [11]:
%matplotlib inline
In [6]:
import matplotlib.pyplot as pltimport numpy as np
Many optimization problems involve minimization of a sum of squaredresiduals. We will take a look at finding the derivatives for leastsquares minimization.
In least squares problems, we usually have \(m\) labeledobservations \((x_i, y_i)\). We have a model that will predict\(y_i\) given \(x_i\) for some parameters \(\beta\),\(f(x) = X\beta\). We want to minimize the sum (or average) ofsquared residuals \(r(x_i) = y_i - f(x_i)\). For example, theobjective function is usually taken to be
\[\frac{1}{2} \sum{r(x_i)^2}\]
As a concrete example, suppose we want to fit a quadratic function tosome observed data. We have
\[f(x) = \beta_0 + \beta_1 x + \beta_2 x^2\]
We want to minimize the objective function
\[L = \frac{1}{2} \sum_{i=1}^m (y_i - f(x_i))^2\]
Taking derivatives with respect to \(\beta\), we get
\[\begin{split}\frac{dL}{d\beta} =\begin{bmatrix}\sum_{i=1}^m -f(x_i) \\\sum_{i=1}^m -x_i f(x_i) \\\sum_{i=1}^m -x_i^2 f(x_i)\end{bmatrix}\end{split}\]
Working with matrices¶
Writing the above system as a matrix, we have \(f(x) = X\beta\),with
\[\begin{split}X = \begin{bmatrix}1 & x_1 & x_1^2 \\1 & x_2 & x_2^2 \\\vdots & \vdots & \vdots \\1 & x_m & x_m^2\end{bmatrix}\end{split}\]
and
\[\begin{split}\beta = \begin{bmatrix}\beta_0 \\\beta_1 \\_beta_2\end{bmatrix}\end{split}\]
We want to find the derivative of \(\Vert y - X\beta \Vert^2\), so
\[\begin{split}\Vert y - X\beta \Vert^2 \\= (y - X\beta)^T(y - X\beta) \\= (y^T - \beta^TX^T)(y - X\beta) \\= y^Ty - \beta^TX^Ty -y^TX\beta + \beta^TX^TX\beta\end{split}\]
Taking derivatives with respect to \(\beta^T\) (we do this becausethe gradient is traditionally a row vector, and we want it as a columnvector here), we get
\[\frac{dL}{d\beta^T} = X^TX\beta - X^Ty\]
For example, if we are doing gradient descent, the update equation is
\[\beta_{k+1} = \beta_k + \alpha (X^TX\beta - X^Ty)\]
Note that if we set the derivative to zero and solve, we get
\[X^TX\beta - X^Ty = 0\]
and the normal equations
\[\beta = (X^TX)^{-1}X^Ty\]
For large \(X\), solving the normal equations can be more expensivethan simpler gradient descent. Note that the Levenberg-Marquadtalgorithm is often used to optimize least squares problems.
Example¶
You are given the following set of data to fit a quadratic polynomialto:
x = np.arange(10)y = np.array([ 1.58873597, 7.55101533, 10.71372171, 7.90123225, -2.05877605, -12.40257359, -28.64568712, -46.39822281, -68.15488905, -97.16032044])
Find the least squares solution using gradient descent.
In [101]:
x = np.arange(10)y = np.array([ 1.58873597, 7.55101533, 10.71372171, 7.90123225, -2.05877605, -12.40257359, -28.64568712, -46.39822281, -68.15488905, -97.16032044])
In [138]:
def f(x, y, b): return (b[0] + b[1]*x + b[2]*x**2 - y)def res(x, y, b): return sum(f(x,y, b)*f(x, y, b))# Elementary form of gradientdef grad(x, y, b): n = len(x) return np.array([ sum(f(x, y, b)), sum(x*f(x, y, b)), sum(x**2*f(x, y, b)) ])# Matrix form of gradientdef grad_m(X, y, b): return X.T@X@b- X.T@y
In [139]:
grad(x, y, np.zeros(3))
Out[139]:
array([ 227.0657638 , 1933.9094954 , 15758.14427298])
In [140]:
X = np.c_[np.ones(len(x)), x, x**2]grad_m(X, y, np.zeros(3))
Out[140]:
array([ 227.0657638 , 1933.9094954 , 15758.14427298])
In [131]:
from scipy.linalg import solvebeta1 = solve(X.T@X, X.T@y)beta1
Out[131]:
array([ 2.55079998, 7.31478229, -2.04118936])
In [143]:
max_iter = 10000
In [144]:
a = 0.0001 # learning ratebeta2 = np.zeros(3)for i in range(max_iter): beta2 -= a * grad(x, y, beta2)beta2
Out[144]:
array([ 2.73391723, 7.23152392, -2.03359658])
In [145]:
a = 0.0001 # learning ratebeta3 = np.zeros(3)for i in range(max_iter): beta3 -= a * grad_m(X, y, beta3)beta3
Out[145]:
array([ 2.73391723, 7.23152392, -2.03359658])
In [146]:
titles = ['svd', 'elementary', 'matrix']plt.figure(figsize=(12,4))for i, beta in enumerate([beta1, beta2, beta3], 1): plt.subplot(1, 3, i) plt.scatter(x, y, s=30) plt.plot(x, beta[0] + beta[1]*x + beta[2]*x**2, color='red') plt.title(titles[i-1])
Curve fitting and least squares optimization¶
As shown above, least squares optimization is the technique mostassociated with curve fitting. For convenience, scipy.optimize
provides a curve_fit
function that uses Levenberg-Marquadt forminimization.
In [1]:
from scipy.optimize import curve_fit
In [4]:
def logistic4(x, a, b, c, d): """The four paramter logistic function is often used to fit dose-response relationships.""" return ((a-d)/(1.0+((x/c)**b))) + d
In [7]:
nobs = 24xdata = np.linspace(0.5, 3.5, nobs)ptrue = [10, 3, 1.5, 12]ydata = logistic4(xdata, *ptrue) + 0.5*np.random.random(nobs)
In [8]:
popt, pcov = curve_fit(logistic4, xdata, ydata)
In [9]:
perr = yerr=np.sqrt(np.diag(pcov))print('Param\tTrue\tEstim (+/- 1 SD)')for p, pt, po, pe in zip('abcd', ptrue, popt, perr): print('%s\t%5.2f\t%5.2f (+/-%5.2f)' % (p, pt, po, pe))
Param True Estim (+/- 1 SD)a 10.00 10.40 (+/- 0.14)b 3.00 4.20 (+/- 1.16)c 1.50 1.38 (+/- 0.09)d 12.00 12.03 (+/- 0.10)
In [13]:
x = np.linspace(0, 4, 100)y = logistic4(x, *popt)plt.plot(xdata, ydata, 'o')plt.plot(x, y)pass