Modeling phase noise (frequency domain approach)

In typical wireless system simulations, there is  a need to model the phase noise profile of the local oscillator. For eg, the phase noise profile of the oscillator can be of the shape described in the post on Phase Noise Power Spectral Density to Jitter. While looking around for example Matlab code, found two references [1, 2] which uses the approach of defining the phase noise profile in frequency domain, and then using ifft() to convert to the time domain samples. This post gives a brief overview of the modeling and provides an example Matlab/Octave code.

Continue reading “Modeling phase noise (frequency domain approach)”

Weighted Least Squares and locally weighted linear regression

From the post on Closed Form Solution for Linear regression, we computed the parameter vector  which minimizes the square of the error between the predicted value  and the actual output  for all  values in the training set. In that model all the  values in the training set is given equal importance.  Let us consider the case where it is known some observations are important than the other. This post attempts to the discuss the case where some observations need to be given more weights than others (also known as weighted least squares).

Continue reading “Weighted Least Squares and locally weighted linear regression”

Least Squares in Gaussian Noise – Maximum Likelihood

From the previous posts on Linear Regression (using Batch Gradient descent, Stochastic Gradient Descent, Closed form solution), we discussed couple of different ways to estimate the  parameter vector in the least square error sense for the given training set. However, how does the least square error criterion work when the training set is corrupted by noise? In this post, let us discuss the case where training set is corrupted by Gaussian noise.

Continue reading “Least Squares in Gaussian Noise – Maximum Likelihood”

Newton’s method to find square root, inverse

Some of us would have used Newton’s method (also known as Newton-Raphson method) in some form or other. The method has quite a bit of history,  starting with the Babylonian way of finding the square root and later over centuries reaching the present recursive way of finding the solution. In this post, we will describe Newton’s method and apply it to find the square root and the inverse of a number.

Geometrical interpretation

We know that the derivative of a function at  is the slope of the tangent (red line) at  i.e.,

.

Rearranging, the intercept of the tangent at x-axis is,

.

From the figure above, can see that the tangent (red line) intercepts the x-axis at  which is closer to  the  where compared to . Keep on doing this operation recursively, and it converges to the zero of the function OR in another words the root of the function.

In general for iteration, the equation is :

.

 

Finding square root

Let us, for example try to use this method for finding the square root of D=100. The function to zero out in the Newton’s method frame work  is,

, where .

The first derivative is

.

The recursive equation is,

.

Matlab code snippet

clear ; close all;
D = 100; % number to find the square root
x = 1; % initial value
for ii = 1:10
	fx = x.^2 - D;
	f1x = 2*x;
	x = x - fx/f1x;
	x_v(ii) = x;
end
x_v' =
   50.5000000000000
   26.2400990099010
   15.0255301199868
   10.8404346730269
   10.0325785109606
   10.0000528956427
   10.0000000001399
   10.0000000000000
   10.0000000000000
   10.0000000000000

We can see that the it converges within around 8 iterations. Further, playing around with the initial value,

a) if we start with initial value of x = -1, then we will converge to -10.

b) if we start with initial value of x = 0, then we will not converge

and so on…

Finding inverse (division)

Newton’s method can be used to find the inverse of a variable D. One way to write the function to zero out is, but we soon realize that this does not work as we need know in the first place.

Alternatively the function to zero out can be written as,

.

The first derivative is,

.

The equation in the recursive form is,

.

Matlab code snippet

clear ; close all;
D = .1; % number to find the square root
x = [.1:.2:1]; % initial value
for ii = 1:10
	fx = (1./x) - D;
	f1x = -1./x.^2;
	x = x - fx./f1x;
	x_v(:,ii) = x;
end
plot(x_v');
legend('0.1', '0.3', '0.5', '0.7', '0.9');
grid on; xlabel('number of iterations'); ylabel('inverse');
title('finding inverse newton''s method');

The following plot shows the convergence of inverse computation to the right value for different values of for this example matlab code snippet.


Figure : convergence of inverse computation

Finding the minima of a function

To find the minima of a function, we to find where the derivative of the function becomes zero i.e.  .

Using Newton’s method, the recursive equation becomes :

.

Thoughts

We have  briefly gone through the Newton’s method and its applications to find the roots of a function, inverse, minima etc. However, there are quite a few aspects which we did not go over, like :

a) Impact of the initial value on the convergence of the function

b) Rate of the convergence

c) Error bounds of the converged result

d) Conditions where the convergence does not happen

and so on…

Hoping to discuss those in another post… 🙂

References

Wiki on Newton’s method

Closed form solution for linear regression

In the previous post on Batch Gradient Descent and Stochastic Gradient Descent, we looked at two iterative methods for finding the parameter vector  which minimizes the square of the error between the predicted value  and the actual output  for all  values in the training set.

A closed form solution for finding the parameter vector  is possible, and in this post let us explore that. Ofcourse, I thank Prof. Andrew Ng for putting all these material available on public domain (Lecture Notes 1).retrieved 11th Sep 2024

Continue reading “Closed form solution for linear regression”

Stochastic Gradient Descent

For curve fitting using linear regression, there exists a minor variant of Batch Gradient Descent algorithm, called Stochastic Gradient Descent.

In the Batch Gradient Descent, the parameter vector  is updated as,

.

(loop over all elements of training set in one iteration)

For Stochastic Gradient Descent, the vector gets updated as, at each iteration the algorithm goes over only one among  training set, i.e.

.

When the training set is large, Stochastic Gradient Descent can be useful (as we need not go over the full data to get the first set of the parameter vector )

For the same Matlab example used in the previous post, we can see that both batch and stochastic gradient descent converged to reasonably close values.

Matlab/Octave code snippet

clear ;
close all;
x = [1:50].';
y = [4554 3014 2171 1891 1593 1532 1416 1326 1297 1266 ...
	1248 1052 951 936 918 797 743 665 662 652 ...
	629 609 596 590 582 547 486 471 462 435 ...
	424 403 400 386 386 384 384 383 370 365 ...
	360 358 354 347 320 319 318 311 307 290 ].';
%
m = length(y); % store the number of training examples
x = [ ones(m,1) x]; % Add a column of ones to x
n = size(x,2); % number of features
theta_batch_vec = [0 0]';
theta_stoch_vec = [0 0]';
alpha = 0.002;
err = [0 0]';
theta_batch_vec_v = zeros(10000,2);
theta_stoch_vec_v = zeros(50*10000,2);
for kk = 1:10000
	% batch gradient descent - loop over all training set
	h_theta_batch = (x*theta_batch_vec);
	h_theta_batch_v = h_theta_batch*ones(1,n);
	y_v = y*ones(1,n);
	theta_batch_vec = theta_batch_vec - alpha*1/m*sum((h_theta_batch_v - y_v).*x).';
	theta_batch_vec_v(kk,:) = theta_batch_vec;
	%j_theta_batch(kk) = 1/(2*m)*sum((h_theta_batch - y).^2);

	% stochastic gradient descent - loop over one training set at a time
	for (jj = 1:50)
		h_theta_stoch = (x(jj,:)*theta_stoch_vec);
		h_theta_stoch_v = h_theta_stoch*ones(1,n);
		y_v = y(jj,:)*ones(1,n);
		theta_stoch_vec = theta_stoch_vec - alpha*1/m*((h_theta_stoch_v - y_v).*x(jj,:)).';
		%j_theta_stoch(kk,jj) = 1/(2*m)*sum((h_theta_stoch - y).^2);
		theta_stoch_vec_v(50*(kk-1)+jj,:) = theta_stoch_vec;
	end
end

figure;
plot(x(:,2),y,'bs-');
hold on
plot(x(:,2),x*theta_batch_vec,'md-');
plot(x(:,2),x*theta_stoch_vec,'rp-');
legend('measured', 'predicted-batch','predicted-stochastic');
grid on;
xlabel('Page index, x');
ylabel('Page views, y');
title('Measured and predicted page views');

j_theta = zeros(250, 250);   % initialize j_theta
theta0_vals = linspace(-2500, 2500, 250);
theta1_vals = linspace(-50, 50, 250);
for i = 1:length(theta0_vals)
	  for j = 1:length(theta1_vals)
		theta_val_vec = [theta0_vals(i) theta1_vals(j)]';
		h_theta = (x*theta_val_vec);
		j_theta(i,j) = 1/(2*m)*sum((h_theta - y).^2);
    end
end
figure;
contour(theta0_vals,theta1_vals,10*log10(j_theta.'))
xlabel('theta_0'); ylabel('theta_1')
title('Cost function J(theta)');
hold on;
plot(theta_stoch_vec_v(:,1),theta_stoch_vec_v(:,2),'rs.');
plot(theta_batch_vec_v(:,1),theta_batch_vec_v(:,2),'kx.');

The converged values are :

.

 

Measured and predicted pageviews Batch and Stochastic gradient descent

From the below plot, we can see that Batch Gradient Descent (black line) goes straight down to the minima, whereas Stochastic Gradient Descent (red line) keeps hovering around (thick red line) before going down to the minima.

References

An Application of Supervised Learning – Autonomous Deriving (Video Lecture, Class2)

CS 229 Machine Learning Course Materials

 

 

 

 

Batch Gradient Descent

I happened to stumble on Prof. Andrew Ng’s Machine Learning classes which are available online as part of Stanford Center for Professional Development. The first lecture in the series discuss the topic of fitting parameters for a given data set using linear regression.  For understanding this concept, I chose to take data from the top 50 articles of this blog based on the pageviews in the month of September 2011.

Continue reading “Batch Gradient Descent”

Approximate Vector Magnitude Computation

In this post, let us discuss a simple implementation friendly scheme for computing the absolute value of a complex number . The technique called (alpha Max + beta Min) algorithm is discussed in Chapter 13.2 of Understanding Digital Signal Processing, Richard Lyons and is also available online at Digital Signal Processing Tricks – High-speed vector magnitude approximation

The magnitude of a complex number is

.

The simplified computation of the absolute value is

where

.

Continue reading “Approximate Vector Magnitude Computation”

Linear to log conversion

In signal processing blocks like power estimation used in digital communication, it may be required to represent the estimate in log scale. This post explains a simple linear to log conversion scheme proposed in the DSP Guru column on DSP Trick: Quick-and-Dirty Logarithms. The scheme makes implementation of a linear to log conversion simple and small in a digital hardware like FPGA.

Continue reading “Linear to log conversion”

Negative Frequency

Last week, I received an email from Mr. Kishore. He was wondering about the physical significance of negative frequency. Does negative frequency really exist?

Though I have seen conflicting views on the net (thread in complextoreal.com, thread in comp.dsp), my perspective is that negative frequency exist. The concept of negative frequency helps me a lot to understand single sideband modulation (SSB), OFDM systems, I Q modulators etc (to name a few).

Continue reading “Negative Frequency”

Chi Square Random Variable

While trying to derive the theoretical bit error rate (BER) for BPSK modulation in a Rayleigh fading channel, I realized that I need to discuss chi square random variable prior.

What is chi-square random variable?

Let there be independent and identically distributed Gaussian random variables with mean 0 and variance and we form a new random variable,

.

Then is a chi square random variable with degrees of freedom.

Continue reading “Chi Square Random Variable”

Deriving PDF of Rayleigh random variable

In the post on Rayleigh channel model, we stated that a circularly symmetric random variable is of the form , where real and imaginary parts are zero mean independent and identically distributed (iid) Gaussian random variables. The magnitude which has the probability density,

is called a Rayleigh random variable. Further, the phase is uniformly distributed from . In this post we will try to derive the expression for probability density function (PDF) for and .

Continue reading “Deriving PDF of Rayleigh random variable”

Update: Correction in Matlab code for raised cosine filter

Thanks to the keen observation by Mr. Phan Minh Hoang, I was notified that the Matlab/Octave scripts provided along with the topic raised cosine filtering was not behaving properly.

Reason: I was not taking care of the division by zero when creating the raised cosine filter taps. 🙁 Continue reading “Update: Correction in Matlab code for raised cosine filter”

Using CORDIC for phase and magnitude computation

In a previous post (here), we looked at using CORDIC (Co-ordinate Rotation by DIgital Computer) for understanding how a complex number can be rotated by an angle without using actual multipliers. Let us know try to understand how we can use CORDIC for finding the phase and magnitude of a complex number.

Basics

The CORDIC algorithm is built on successively multiplying the complex number , by . As can be noticed, as the elements of can be represented in powers of 2, the multiplication can be achieved by using the appropriate ‘bit shift’. For further details, please refer to the previous post (CORDIC for phase rotation).

Continue reading “Using CORDIC for phase and magnitude computation”