Today it is all about Octave. I also used FreeMat as it was supporting Wav file read functions. Let's work with Octave for now.
It is easy to install it under Ubuntu. All I do is to type this:
sudo apt install octaveIt is installed in half a minute. When you run it, it welcomes you like this:
It looks like my Ubuntu repos are a little bit old. It doesn't matter. What we are going to do here is very basic stuff.
Octave has a nice plotting system. I would write pages of C++ code to do these things. This one-liner would have been pages in a formal language. Simply copy and paste this code into your Octave (or Matlab, or FreeMat by the way) and you'll have a nice plot:
x=1:10; y=x.^2; plot(x,y)
Here you can get the exponential graph:
Here I create an array, named a, with some arbitrary numbers.
a=[1,2,3,4,5,4,3,2,1,0,-1,-2,-3,-4,-5,-4,-3,-2,-1,0,1,2,3,4,5,4,3,2,1,0,-1,-2]
a =
Columns 1 through 21:
1 2 3 4 5 4 3 2 1 0 -1 -2 -3 -4 -5 -4 -3 -2 -1 0 1
Columns 22 through 32:
2 3 4 5 4 3 2 1 0 -1 -2
We can think of this signal as data captured with an ADC (which is an Analog to Digital Convertor). You might ask "what is a signal and why do we capture it". You can record analog signals to a tape recorder and then play them back. You can actually make a copy of the electrical signals "analogous" to the original electrical values. An analog recording medium is like recording the signal and it is very different than a digital record. What is a digital record then? If I play a piece of music and record it to an analog tape, it is something that you can play back through the speakers but you cannot run mathematical formulas on it. If you need to raise the volume you can change the gain of the amplifier circuit or by applying a higher voltage to the input of the amplifier circuit. If you need to raise the volume on a digital medium, however, it is easier. Just multiply it with a real number greater than 1. For example, to double the output just multiply the values in the array with 2. It is easy and you can do math on it. If I need to decompose the signal into frequency components I would use fft function (which is a DFT but optimized to work faster when the number of samples are in power of 2. By the way these two are my favourite FFT education videos. I usually suggest juniors to watch them first, then ask me which part they still need to understand.
Here is how we calculate the fft on an array in a math suit (whether it is Octave, Matlab or FreeMat).
>> b=fft(a)
b =
Columns 1 through 3:
22.00000 + 0.00000i 34.83869 + 14.85299i -33.56345 - 32.48106i
Columns 4 through 6:
-5.85866 - 11.24051i -1.41421 - 8.24264i -2.70218 + 2.17818i
Columns 7 through 9:
-0.09438 - 2.51875i 0.48831 - 2.32545i 4.00000 - 2.00000i
You can think of a digital signal as levels captured by the sensors. For example, if I record the level (or position) of your eardrum with a super speed camera and measure the movement in scale, say 65536 units of depth, 44100 times a second, I would encode a CD directly with that data. A CD is in fact a bunch of data files filled with this numbers. Your eardrum moves back and forth due to the sound signal's power, which is composed of different frequencies of the instruments and human voices.
FFT function decomposes the frequency components of the signal and IFFT function creates the original signal back from the frequency bins.
>> c=ifft(b)
c =
Columns 1 through 8:
1.00000 2.00000 3.00000 4.00000 5.00000 4.00000 3.00000 2.00000
Columns 9 through 16:
1.00000 0.00000 -1.00000 -2.00000 -3.00000 -4.00000 -5.00000 -4.00000
Columns 17 through 24:
-3.00000 -2.00000 -1.00000 -0.00000 1.00000 2.00000 3.00000 4.00000
Columns 25 through 32:
5.00000 4.00000 3.00000 2.00000 1.00000 0.00000 -1.00000 -2.00000
>>
Here you can see that we got our original array back. Don't stuck with the trailing zeroes. The numbers are the same with the original values. I will come to this values again. The symmetry and other beautiful properties of FFT/IFFT somehow put me under a spell which I still don't want to be freed of.
I remember the first time I've seen this phenomenon: I was in awe! I produced a 50 Hz signal and stored it into a real valued array. Then I produced a 100 Hz signal and added it to the array. I plotted the signal onto a canvas using Turbo Pascal 5.5 (yes, there was a graphical programming library
It all started with removing one of the sine waves with just writing a zero into the corresponding frequency bin. And Voila! My life changed forever.
Let me show you the magic. Here is how we produce the 50 Hz signal:
--> t = (0:0.001:1)';
--> y = sin(2*pi*50*t)
y =
0
0,3090
0,5878
0,80
and plot of it:...
No big deal. Let's create a new array filled with a 100 Hz sine wave. Just change the 50 with 100 in the function parameters.
--> z = sin(2*pi*100*t)
z =
0
0,5878
0,951
and plot of it:
Note that the frequency of it is two times the first one.
Ok, let's put them together:
--> w=y+z;
-->
There is nothing magical yet, right? Just get the fft of the signal and plot it:
--> f=fft(w);
--> plot(t(1:1000),f(1:1000))
You can see that the FFT is symmetrical and our 50 Hz and 100 Hz peaks are easy to spot.
The amazing thing about FFT is it is completely reversible. Let's reverse it, it is called an inverse FFT and you can use ifft command to achieve this.
g=ifft(f)
plot(t(1:100),g(1:100))
Lets clear the 50 Hz signal (from both ends):
--> f(40:60)=0
--> f(940:960)=0
and plot it:
Here we modified our frequency components. Yes, in frequency domain. And now get the signal back. Did you notice that this is our 100 Hz signal? Yes, it is.
Here is the magic started for me. The first time I saw this I knew that my life is going to change. It has. Forever.
I have created sound recognition algorithms for advertisement monitoring, music playlist logging, Shazam like mobile application for music identification etc. I have created High Efficiency Audio Recognition System (HEARS). I worked with defense companies, media monitoring agencies, publisher associations, broadcasters. I went to Berlin to present my Sound And Image Recognition Engine for Advertisement and News (SIRENA). It is far more useful. I used it to detect and register visual advertisements on TV. Yes, frequency analysis can be used to find the images in other images. I think I have found the next subject to blog about.
By the way, in case you need to replicate what I have done here, here is the source code for this basic (and fun) application
t = (0:0.001:1)';The only thing I did not tell you about here is t. It is the time component to plot the signals.
y = sin(2*pi*50*t)
z = sin(2*pi*100*t)
w=y+z;
f=fft(w);
g=ifft(f)
figure(1)
plot(t(1:100),y(1:100))
title("50 Hz Sine Wave")
xlabel("time")
ylabel("amplitude")
figure(2)
plot(t(1:100),z(1:100))
title("100 Hz Sine Wave")
xlabel("time")
ylabel("amplitude")
figure(3)
plot(t(1:100),w(1:100))
title("Mixed Sine Wave")
xlabel("time")
ylabel("amplitude")
figure(4)
plot(t(1:1000),f(1:1000))
title("FFT of the mixed signal")
figure(5)
plot(t(1:100),g(1:100))
title("Signal after inverse fft")
xlabel("time")
ylabel("amplitude")
f(40:60)=0
f(940:960)=0
figure(6)
plot(t(1:1000),f(1:1000))
title("Pruned FFT")
g=ifft(f)
figure(7)
plot(t(1:100),g(1:100))
title("Signal after inverse fft")
xlabel("time")
ylabel("amplitude")
Nowadays I am working in a job that lets me make good money but I dislike to write finance software. I am an expert in C#, I use TFS, Entity Framework etc technologies along with databases I have designed and implemented. I create Web sites using MVC and JavaScript. I learned MVC in a month to write a Web site. The point is that I can make money, be successful in other areas of computer science but really don't want to work with these. I really miss working with my DSP R&D projects. I used a lot of tools to create proof of concept systems. I remember that I have created an algorithm to find songs just like Shazam and first time I have shown proof that it works was with a SQL code I have written and executed against a SQLite database. Yes, SQL code. I used a SELECT command to find the sample in a music database and reporting its position.
No comments:
Post a Comment