Articles

# Straight line fit using least squares estimate

Two points suffice for drawing a straight line. However we may be presented with a set of data points (more than two?) presumably forming a straight line. How can one use the available set of data points to draw a straight line?

A probable approach is to draw a straight line which hopefully minimizes the error between the observed data points and estimated straight line.

$err = \Sigma_{i=1}^N\left(y_i - \hat{y}_i\right)^2$ where $y_i$ is the observed data points and $\hat{y}_i$ is the points from estimated straight line.

To draw the estimated straight line $\hat{y}=mx+c$, we need to estimate the slope, $m$ and the constant, $c$.

Formulating as a matrix,

$\left[\begin{eqnarray} y_1 \\ y_2 \\ y_3 \\ \vdots \\ y_n \end{eqnarray}\right]=\left[\begin{eqnarray} x_1\ 1\\ x_2\ 1 \\ x_3\ 1 \\ \vdots \\ x_n\ 1 \end{eqnarray}\right]\left[\begin{eqnarray} m \\ c \end{eqnarray}\right]+\left[\begin{eqnarray} \eta_1\\ \eta_2 \\ \eta_3 \\ \vdots \\ \eta_n\end{eqnarray}\right]$

$\mathbf{Y} = \mathbf{X}\left[\begin{eqnarray} m \\ c \end{eqnarray}\right]+ \mathbf{N}$,

where,

${y}_i$ = $\mathbf{Y}$ is the set of observations is a matrix of dimension $[N \times 1]$ ,

${x}_i$ = $\mathbf{X}$ is the set of coefficients is a matrix of dimension $[N \times 2]$,

$\left[\begin{eqnarray} m \\ c \end{eqnarray}\right]$ is the slope and constant estimate of dimension $[2\times 1]$,

$\eta_i$ = $\mathbf{N}$is the noise is a matrix of dimension $[N\times1]$ .

The least square estimate of the straight line is,

$\left[\begin{eqnarray} m \\ c \end{eqnarray}\right]=(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{Y}$.

A simple MATLAB code for least squares straight line fit is given below:

% Least Squares Estimate
rand(‘state’,100); % initializing the random number generation
y = [5:3:50]; % observations, y_i
y = y + 5*rand(size(y)); % y_i with noise added
x = 1:length(y); % the x co-ordinates

% Formulating in matrix for solving for least squares estimate
Y = y.’;
X = [x.’ ones(1,length(x)).’];
alpha = inv(X’*X)*X’*Y; % solving for m and c

% constructing the straight line using the estimated slope and constant
yEst = alpha(1)*x + alpha(2);

close all
figure
plot(x,y,’r.’)
hold on
plot(x,yEst,’b’)
legend(‘observations’, ‘estimated straight line’)
grid on
ylabel(‘observations’)
xlabel(‘x axis’)
title(‘least squares straight line fit’)

References:

## 7 thoughts on “Straight line fit using least squares estimate”

1. shashi kant singh says:

Fit a straight line trend & estimate the trend value for the year 2008 year : 2000, 2001, 2002, 2003, 2004, 2005, 2006 prod.: 110,112,115,119,121,123,1 26

2. Krishna Pillai says:

@Sajith: Is this a homework assignment? In general, I prefer not to solve homework assignments, rather help you towards the solution. You can formulate the information in the least square matrix formultion explained in the post.

Year = [1971 1976 1977 1978 1979 1980 1981 1982]
Sales = [6.7 5.3 4.3 6.1 5.6 7.9 5.8 6.1]
X = [Year’ ones(8,1)]
Y = Sales.’

Once you have done that, then solution for slope and the constant is obtained by the leastsquares equation.
alpha = inv(X’*X)*X’*Y

Once you have the slope and constant, you can find the y-value (sales) for any x-value (year)

Hope this helps.

3. Sajith says:

From the Data given below fit a straight line trend by the methord if least square and also estimate the sales for the year 1984.

Year 1971 1976 1977 1978 1979 1980 1981 1982
Sales 6.7 5.3 4.3 6.1 5.6 7.9 5.8 6.1

4. Krishna says:

No special reason, except that when does this way, I have a reasonably clear idea of the underlying operations. Maybe helpful if I want to implement.

For the example above, the least squares solution can be obtained either by using X\Y or pinv(X)*Y. However, when X is rank-deficient, then the code in the post may fail and more ‘intelligent’ operations X\Y or pinv(X)*Y might be needed.
And a quick check showed that \ operator runs faster than pinv() or the code in the post.