Least squares optimization — Computational Statistics and Statistical Computing 1.0 documentation (2024)

Computational Statistics and Statistical Computing

Popularity Contest
Getting started with Jupyter and Python
Python Functions
Numbers
Basic Graphics with matplotlib
Working with Text
Data
SQL
Bonus Material: The Humble For Loop
Bonus Material: Word count
Symbolic Algebra with sympy
Machine Learning with sklearn
Scalars
Vectors
Matrices
Sparse Matrices
Working with Matrices
Solving Linear Equations
Understanding the SVD
Linear least squares
Finding Roots of Equations
Numerical Optimization
Algorithms for Optimization and Root Finding for Multivariate Problems
Using optimization routines from scipy and statsmodels
Line search in gradient and Newton directions
Least squares optimization
- Working with matrices
- Example
  - Curve fitting and least squares optimization
Gradient Descent Optimizations
Constrained Optimization
Random Variables
Resampling and Monte Carlo Simulations
Markov Chains
Numerical Evaluation of Integrals
Abbreviated lecture notes
Probabilistic Programming
Using PyMC3
Metropolis and Gibbs Sampling
Hamiltonian Monte Carlo (HMC)
Introduction to C++
Just-in-time compilation (JIT)
Cython
Using pybind11
Native code compilation
Parallel Programming
Multi-Core Parallelism
Parallel Programming Example
Using ipyparallel
Introduction to Scalable Data Analytics
Scalable data storage and structures
Introduction to Spark
Architecture of a Spark Application
SparkContext
Resilient Distributed Datasets (RDD)
Actions and transforms
Working with key-value pairs
Persisting data
Spark DataFrames
Spark MLLib
Using Spark Efficiently
Spark magic
Using TensorFlow
Regression in TensorFlow
Using keras
CNN for MNIST digits
Sequential and Functional Interfaces

Lab00: Simple Python exercises
Lab02: Functions
Lab03: Intermediate Python Programs
Lab04: Data manipulation
Lab05: Numerics
Lab06: Topic Modeling with Latent Semantic Analysis
Lab07: Optimization
Lab08: Conjugate Gradient Descent
Lab09: C++
Lab10: Making Python faster
Lab11A: Parallel Processing
Lab11B: Spark
LabX1: Supplementary Practice Problems
Lab X2: Supplementary Practice Problems
Mock Exam
STA 663 Midterm Exams

Docs »
Least squares optimization
View page source

In [11]:

%matplotlib inline

In [6]:

import matplotlib.pyplot as pltimport numpy as np

Many optimization problems involve minimization of a sum of squaredresiduals. We will take a look at finding the derivatives for leastsquares minimization.

In least squares problems, we usually have \(m\) labeledobservations \((x_i, y_i)\). We have a model that will predict\(y_i\) given \(x_i\) for some parameters \(\beta\),\(f(x) = X\beta\). We want to minimize the sum (or average) ofsquared residuals \(r(x_i) = y_i - f(x_i)\). For example, theobjective function is usually taken to be

\[\frac{1}{2} \sum{r(x_i)^2}\]

As a concrete example, suppose we want to fit a quadratic function tosome observed data. We have

\[f(x) = \beta_0 + \beta_1 x + \beta_2 x^2\]

We want to minimize the objective function

\[L = \frac{1}{2} \sum_{i=1}^m (y_i - f(x_i))^2\]

Taking derivatives with respect to \(\beta\), we get

\[\begin{split}\frac{dL}{d\beta} =\begin{bmatrix}\sum_{i=1}^m -f(x_i) \\\sum_{i=1}^m -x_i f(x_i) \\\sum_{i=1}^m -x_i^2 f(x_i)\end{bmatrix}\end{split}\]

Working with matrices¶

Writing the above system as a matrix, we have \(f(x) = X\beta\),with

\[\begin{split}X = \begin{bmatrix}1 & x_1 & x_1^2 \\1 & x_2 & x_2^2 \\\vdots & \vdots & \vdots \\1 & x_m & x_m^2\end{bmatrix}\end{split}\]

and

\[\begin{split}\beta = \begin{bmatrix}\beta_0 \\\beta_1 \\_beta_2\end{bmatrix}\end{split}\]

We want to find the derivative of \(\Vert y - X\beta \Vert^2\), so

\[\begin{split}\Vert y - X\beta \Vert^2 \\= (y - X\beta)^T(y - X\beta) \\= (y^T - \beta^TX^T)(y - X\beta) \\= y^Ty - \beta^TX^Ty -y^TX\beta + \beta^TX^TX\beta\end{split}\]

Taking derivatives with respect to \(\beta^T\) (we do this becausethe gradient is traditionally a row vector, and we want it as a columnvector here), we get

\[\frac{dL}{d\beta^T} = X^TX\beta - X^Ty\]

For example, if we are doing gradient descent, the update equation is

\[\beta_{k+1} = \beta_k + \alpha (X^TX\beta - X^Ty)\]

Example¶

You are given the following set of data to fit a quadratic polynomialto:

x = np.arange(10)y = np.array([ 1.58873597, 7.55101533, 10.71372171, 7.90123225, -2.05877605, -12.40257359, -28.64568712, -46.39822281, -68.15488905, -97.16032044])

Find the least squares solution using gradient descent.

In [101]:

x = np.arange(10)y = np.array([ 1.58873597, 7.55101533, 10.71372171, 7.90123225, -2.05877605, -12.40257359, -28.64568712, -46.39822281, -68.15488905, -97.16032044])

In [138]:

def f(x, y, b): return (b[0] + b[1]*x + b[2]*x**2 - y)def res(x, y, b): return sum(f(x,y, b)*f(x, y, b))# Elementary form of gradientdef grad(x, y, b): n = len(x) return np.array([ sum(f(x, y, b)), sum(x*f(x, y, b)), sum(x**2*f(x, y, b)) ])# Matrix form of gradientdef grad_m(X, y, b): return X.T@X@b- X.T@y

In [139]:

grad(x, y, np.zeros(3))

Out[139]:

array([ 227.0657638 , 1933.9094954 , 15758.14427298])

In [140]:

X = np.c_[np.ones(len(x)), x, x**2]grad_m(X, y, np.zeros(3))

Out[140]:

array([ 227.0657638 , 1933.9094954 , 15758.14427298])

Curve fitting and least squares optimization¶

As shown above, least squares optimization is the technique mostassociated with curve fitting. For convenience, scipy.optimizeprovides a curve_fit function that uses Levenberg-Marquadt forminimization.

In [1]:

from scipy.optimize import curve_fit

In [4]:

def logistic4(x, a, b, c, d): """The four paramter logistic function is often used to fit dose-response relationships.""" return ((a-d)/(1.0+((x/c)**b))) + d

In [7]:

nobs = 24xdata = np.linspace(0.5, 3.5, nobs)ptrue = [10, 3, 1.5, 12]ydata = logistic4(xdata, *ptrue) + 0.5*np.random.random(nobs)

In [8]:

popt, pcov = curve_fit(logistic4, xdata, ydata)

In [9]:

perr = yerr=np.sqrt(np.diag(pcov))print('Param\tTrue\tEstim (+/- 1 SD)')for p, pt, po, pe in zip('abcd', ptrue, popt, perr): print('%s\t%5.2f\t%5.2f (+/-%5.2f)' % (p, pt, po, pe))

Param True Estim (+/- 1 SD)a 10.00 10.40 (+/- 0.14)b 3.00 4.20 (+/- 1.16)c 1.50 1.38 (+/- 0.09)d 12.00 12.03 (+/- 0.10)

In [13]:

x = np.linspace(0, 4, 100)y = logistic4(x, *popt)plt.plot(xdata, ydata, 'o')plt.plot(x, y)pass

Least squares optimization — Computational Statistics and Statistical Computing 1.0 documentation (2024)

FAQs

What is least squares optimization? ›

Least squares (LS) optimiza- tion problems are those in which the objective (error) function is a quadratic function of the parameter(s) being optimized.

Know More ›

How to minimize least squares? ›

So a least-squares solution minimizes the sum of the squares of the differences between the entries of A K x and b . In other words, a least-squares solution solves the equation Ax = b as closely as possible, in the sense that the sum of the squares of the difference b − Ax is minimized.

Find Out More ›

What is the method of least squares in Python? ›

In Python, the least squares method is used internally by the model to calculate the slope (coefficient) and intercept of the best-fitting line. The line itself is the result of this method.

See Details ›

What is the least squares method of estimation? ›

The least squares method is a statistical procedure to find the best fit for a set of data points. The method works by minimizing the sum of the offsets or residuals of points from the plotted curve. Least squares regression is used to predict the behavior of dependent variables.

Discover More Details ›

What is the purpose of least squares regression analysis? ›

Least squares is a method to apply linear regression. It helps us predict results based on an existing set of data as well as clear anomalies in our data. Anomalies are values that are too good, or bad, to be true or that represent rare cases.

Learn More ›

Is linear regression the same as least squares? ›

Linear regression is a type of regression model that assumes a linear relationship between the target and features, while least squares regression is a method used to find the optimal parameters for a linear regression model.

What is the alternative to the least square method? ›

Least squares alternatives

The simplest methods of estimating parameters in a regression model that are less sensitive to outliers than the least squares estimates, is to use least absolute deviations.

What are alternatives to least squares? ›

Robust regression methods provide an alternative to least squares regression by requiring less restrictive assumptions. These methods attempt to dampen the influence of outlying cases in order to provide a better fit to the majority of the data.

What is the least square algorithm? ›

The least square method is the process of finding the best-fitting curve or line of best fit for a set of data points by reducing the sum of the squares of the offsets (residual part) of the points from the curve.

Learn More ›

How to calculate least-squares regression? ›

The least-squares regression line equation is y = mx + b, where m is the slope, which is equal to (Nsum(xy) - sum(x)sum(y))/(Nsum(x^2) - (sum x)^2), and b is the y-intercept, which is equals to (sum(y) - msum(x))/N. N is the number of data points, and x and y are the coordinates of the data points.

What is least-squares dummy variable? ›

Least Squares Dummy Variable (LSDV) In dynamic panel data models, dummy variables may be introduced to the least squares to explain the effect of each individual unit of a cross section which is unobserved but correctly specifies the model of relation.

Learn More ›

What are the limitations of the least square method? ›

The disadvantages of this method are:

It is not readily applicable to censored data.
It is generally considered to have less desirable optimality properties than maximum likelihood.
It can be quite sensitive to the choice of starting values.

Show Me More ›

How to derive multiple regression? ›

Matrix algebra is widely used for the derivation of multiple regression because it permits a compact, intuitive depiction of regression analysis. For example, an estimated multiple regression model in scalar notion is expressed as: Y=A+BX1+BX2+BX3+E Y = A + B X 1 + B X 2 + B X 3 + E .

Know More ›

What is least mean square optimization? ›

Least mean squares (LMS) algorithms are a class of adaptive filter used to mimic a desired filter by finding the filter coefficients that relate to producing the least mean square of the error signal (difference between the desired and the actual signal).

Explore More ›

What is the least squares mean method? ›

The least-square method states that the curve that best fits a given set of observations, is said to be a curve having a minimum sum of the squared residuals (or deviations or errors) from the given data points.

Learn More ›

Why is least squares optimal? ›

In 1822, Gauss was able to state that the least-squares approach to regression analysis is optimal in the sense that in a linear model where the errors have a mean of zero, are uncorrelated, and have equal variances, the best linear unbiased estimator of the coefficients is the least-squares estimator.

What does OLS optimize? ›

To sum up, think of OLS as an optimization strategy to obtain a straight line from your model that is as close as possible to your data points.

See Details ›