Least Squares in Gaussian Noise – Maximum Likelihood

From the previous posts on Linear Regression (using Batch Gradient descent, Stochastic Gradient Descent, Closed form solution), we discussed couple of different ways to estimate the  parameter vector in the least square error sense for the given training set. However, how does the least square error criterion work when the training set is corrupted by noise? In this post, let us discuss the case where training set is corrupted by Gaussian noise.

Continue reading “Least Squares in Gaussian Noise – Maximum Likelihood”

Symbol Error rate for QAM (16, 64, 256,.., M-QAM)

In May 2008, we derived the theoretical symbol error rate for a general M-QAM modulation (in  Embedded.comDSPDesignLine.com and dsplog.com) under Additive White Gaussian Noise. While re-reading that post, felt that the article is nice and warrants a re-run, using OFDM as the underlying physical layer. This post discuss the derivation of symbol error rate for a general M-QAM modulation. The companion Matlab script compares the theoretical and the simulated symbol error rate for 16QAM, 64QAM and 256QAM over OFDM in AWGN channel.

Enjoy and HAPPY NEW YEAR 2012 !!!

Continue reading “Symbol Error rate for QAM (16, 64, 256,.., M-QAM)”

Newton’s method to find square root, inverse

Some of us would have used Newton’s method (also known as Newton-Raphson method) in some form or other. The method has quite a bit of history,  starting with the Babylonian way of finding the square root and later over centuries reaching the present recursive way of finding the solution. In this post, we will describe Newton’s method and apply it to find the square root and the inverse of a number.

Geometrical interpretation

We know that the derivative of a function at  is the slope of the tangent (red line) at  i.e.,

.

Rearranging, the intercept of the tangent at x-axis is,

.

From the figure above, can see that the tangent (red line) intercepts the x-axis at  which is closer to  the  where compared to . Keep on doing this operation recursively, and it converges to the zero of the function OR in another words the root of the function.

In general for iteration, the equation is :

.

 

Finding square root

Let us, for example try to use this method for finding the square root of D=100. The function to zero out in the Newton’s method frame work  is,

, where .

The first derivative is

.

The recursive equation is,

.

Matlab code snippet

clear ; close all;
D = 100; % number to find the square root
x = 1; % initial value
for ii = 1:10
	fx = x.^2 - D;
	f1x = 2*x;
	x = x - fx/f1x;
	x_v(ii) = x;
end
x_v' =
   50.5000000000000
   26.2400990099010
   15.0255301199868
   10.8404346730269
   10.0325785109606
   10.0000528956427
   10.0000000001399
   10.0000000000000
   10.0000000000000
   10.0000000000000

We can see that the it converges within around 8 iterations. Further, playing around with the initial value,

a) if we start with initial value of x = -1, then we will converge to -10.

b) if we start with initial value of x = 0, then we will not converge

and so on…

Finding inverse (division)

Newton’s method can be used to find the inverse of a variable D. One way to write the function to zero out is, but we soon realize that this does not work as we need know in the first place.

Alternatively the function to zero out can be written as,

.

The first derivative is,

.

The equation in the recursive form is,

.

Matlab code snippet

clear ; close all;
D = .1; % number to find the square root
x = [.1:.2:1]; % initial value
for ii = 1:10
	fx = (1./x) - D;
	f1x = -1./x.^2;
	x = x - fx./f1x;
	x_v(:,ii) = x;
end
plot(x_v');
legend('0.1', '0.3', '0.5', '0.7', '0.9');
grid on; xlabel('number of iterations'); ylabel('inverse');
title('finding inverse newton''s method');

The following plot shows the convergence of inverse computation to the right value for different values of for this example matlab code snippet.


Figure : convergence of inverse computation

Finding the minima of a function

To find the minima of a function, we to find where the derivative of the function becomes zero i.e.  .

Using Newton’s method, the recursive equation becomes :

.

Thoughts

We have  briefly gone through the Newton’s method and its applications to find the roots of a function, inverse, minima etc. However, there are quite a few aspects which we did not go over, like :

a) Impact of the initial value on the convergence of the function

b) Rate of the convergence

c) Error bounds of the converged result

d) Conditions where the convergence does not happen

and so on…

Hoping to discuss those in another post… 🙂

References

Wiki on Newton’s method

Closed form solution for linear regression

In the previous post on Batch Gradient Descent and Stochastic Gradient Descent, we looked at two iterative methods for finding the parameter vector  which minimizes the square of the error between the predicted value  and the actual output  for all  values in the training set.

A closed form solution for finding the parameter vector  is possible, and in this post let us explore that. Ofcourse, I thank Prof. Andrew Ng for putting all these material available on public domain (Lecture Notes 1).retrieved 11th Sep 2024

Continue reading “Closed form solution for linear regression”

Stochastic Gradient Descent

For curve fitting using linear regression, there exists a minor variant of Batch Gradient Descent algorithm, called Stochastic Gradient Descent.

In the Batch Gradient Descent, the parameter vector  is updated as,

.

(loop over all elements of training set in one iteration)

For Stochastic Gradient Descent, the vector gets updated as, at each iteration the algorithm goes over only one among  training set, i.e.

.

When the training set is large, Stochastic Gradient Descent can be useful (as we need not go over the full data to get the first set of the parameter vector )

For the same Matlab example used in the previous post, we can see that both batch and stochastic gradient descent converged to reasonably close values.

Matlab/Octave code snippet

clear ;
close all;
x = [1:50].';
y = [4554 3014 2171 1891 1593 1532 1416 1326 1297 1266 ...
	1248 1052 951 936 918 797 743 665 662 652 ...
	629 609 596 590 582 547 486 471 462 435 ...
	424 403 400 386 386 384 384 383 370 365 ...
	360 358 354 347 320 319 318 311 307 290 ].';
%
m = length(y); % store the number of training examples
x = [ ones(m,1) x]; % Add a column of ones to x
n = size(x,2); % number of features
theta_batch_vec = [0 0]';
theta_stoch_vec = [0 0]';
alpha = 0.002;
err = [0 0]';
theta_batch_vec_v = zeros(10000,2);
theta_stoch_vec_v = zeros(50*10000,2);
for kk = 1:10000
	% batch gradient descent - loop over all training set
	h_theta_batch = (x*theta_batch_vec);
	h_theta_batch_v = h_theta_batch*ones(1,n);
	y_v = y*ones(1,n);
	theta_batch_vec = theta_batch_vec - alpha*1/m*sum((h_theta_batch_v - y_v).*x).';
	theta_batch_vec_v(kk,:) = theta_batch_vec;
	%j_theta_batch(kk) = 1/(2*m)*sum((h_theta_batch - y).^2);

	% stochastic gradient descent - loop over one training set at a time
	for (jj = 1:50)
		h_theta_stoch = (x(jj,:)*theta_stoch_vec);
		h_theta_stoch_v = h_theta_stoch*ones(1,n);
		y_v = y(jj,:)*ones(1,n);
		theta_stoch_vec = theta_stoch_vec - alpha*1/m*((h_theta_stoch_v - y_v).*x(jj,:)).';
		%j_theta_stoch(kk,jj) = 1/(2*m)*sum((h_theta_stoch - y).^2);
		theta_stoch_vec_v(50*(kk-1)+jj,:) = theta_stoch_vec;
	end
end

figure;
plot(x(:,2),y,'bs-');
hold on
plot(x(:,2),x*theta_batch_vec,'md-');
plot(x(:,2),x*theta_stoch_vec,'rp-');
legend('measured', 'predicted-batch','predicted-stochastic');
grid on;
xlabel('Page index, x');
ylabel('Page views, y');
title('Measured and predicted page views');

j_theta = zeros(250, 250);   % initialize j_theta
theta0_vals = linspace(-2500, 2500, 250);
theta1_vals = linspace(-50, 50, 250);
for i = 1:length(theta0_vals)
	  for j = 1:length(theta1_vals)
		theta_val_vec = [theta0_vals(i) theta1_vals(j)]';
		h_theta = (x*theta_val_vec);
		j_theta(i,j) = 1/(2*m)*sum((h_theta - y).^2);
    end
end
figure;
contour(theta0_vals,theta1_vals,10*log10(j_theta.'))
xlabel('theta_0'); ylabel('theta_1')
title('Cost function J(theta)');
hold on;
plot(theta_stoch_vec_v(:,1),theta_stoch_vec_v(:,2),'rs.');
plot(theta_batch_vec_v(:,1),theta_batch_vec_v(:,2),'kx.');

The converged values are :

.

 

Measured and predicted pageviews Batch and Stochastic gradient descent

From the below plot, we can see that Batch Gradient Descent (black line) goes straight down to the minima, whereas Stochastic Gradient Descent (red line) keeps hovering around (thick red line) before going down to the minima.

References

An Application of Supervised Learning – Autonomous Deriving (Video Lecture, Class2)

CS 229 Machine Learning Course Materials

 

 

 

 

Batch Gradient Descent

I happened to stumble on Prof. Andrew Ng’s Machine Learning classes which are available online as part of Stanford Center for Professional Development. The first lecture in the series discuss the topic of fitting parameters for a given data set using linear regression.  For understanding this concept, I chose to take data from the top 50 articles of this blog based on the pageviews in the month of September 2011.

Continue reading “Batch Gradient Descent”

Non coherent demodulation of pi/8 D8PSK (TETRA)

In TETRA specifications, one of the modulation technique used is Differential 8 Phase Shift Keying (D8PSK). We will discuss the bit error rate with non-coherent demodulation of D8PSK in Additive White Gaussian Noise (AWGN) channel.

Continue reading “Non coherent demodulation of pi/8 D8PSK (TETRA)”

IEEE 802.11ac – Very High Throughput for lower 6GHz band

IEEE 802.11ac Very High Throughput (for <6GHz band) is an upcoming standard which is development by IEEE standardization committee. The mandate of Task Group AC is supposed to enhance the High Throughput rates achieved by 802.11n.  As described in the document VHT below 6GHz PAR plus 5C’s (802.11-08/0807r4) the group has the following objectives :

Continue reading “IEEE 802.11ac – Very High Throughput for lower 6GHz band”

Non coherent demodulation of pi/4 DQPSK (TETRA)

In TETRA specifications, one of the modulation technique used is Differential Quaternary Phase Shift Keying (DQPSK). We will discuss the bit error rate with non-coherent demodulation of DQPSK in Additive White Gaussian Noise (AWGN) channel.

Continue reading “Non coherent demodulation of pi/4 DQPSK (TETRA)”

TETRA Air interface specifications – Overview

TETRA (TErrestrial Trunked RAdio) is a wireless specification intended to be used by government agencies, public health services, rail transport, military etc. TETRA specification comes from ETSI (European Telecommunication Standards Institute). We plan to start a post series on the building blocks of TETRA physical layer and radio specifications and this is the first post towards that step.

Continue reading “TETRA Air interface specifications – Overview”

BER for BPSK in ISI channel with MMSE equalization

In the past, we had discussed BER for BPSK in flat fading Rayleigh channel and BER for BPSK in a frequency selective channel using Zero Forcing Equalization. In this post, lets discuss a frequency selective channel with the use of Minimum Mean Square Error (MMSE) equalization to compensate for the inter symbol interference (ISI). For simplifying the discussion, we will assume that there is no pulse shaping at the transmitter. The ISI channel is assumed to be a fixed 3 tap channel.

Continue reading “BER for BPSK in ISI channel with MMSE equalization”

Happy New Year 2010

Wishing all the readers of dsplog.com a great year 2010 !

Its been a mixed year for dsplog. Some key milestones

a) Crossing 1000 subscribers with 1100+ comments in March 2009
b) Crossing 100 posts with 2200 subscribers and 2600+ comments in October 2009
c) As I write this, we have 102 posts with 2603 subscribers and 3600+ comments.

Towards the last three months, the posting frequency has drastically dropped. Partly attributed to paucity of time following my change of employer and of course, laziness. 🙂

Hoping for a great year 2010.

Continue reading “Happy New Year 2010”

BER for BPSK in ISI channel with Zero Forcing equalization

In the past, we had discussed BER for BPSK in flat fading Rayleigh channel. In this post, lets discuss a frequency selective channel with the use of Zero Forcing (ZF) equalization to compensate for the inter symbol interference (ISI). For simplifying the discussion, we will assume that there is no pulse shaping at the transmitter. The ISI channel is assumed to be a fixed 3 tap channel.

Continue reading “BER for BPSK in ISI channel with Zero Forcing equalization”