Goals
This tutorial will show you how to:
- Create and run a GAUSS program from scratch.
- Use GAUSS functions, operators, and indexing.
By the end, you will have a simple GAUSS program which performs a nearest neighbor search. We recommend looking over Getting Started with GAUSS, before working through this tutorial.
Problem description
Suppose you have been given the following data about some automobiles:
mpg | weight | displacement |
---|---|---|
16 | 3880 | 231 |
41 | 2040 | 90 |
21 | 4290 | 350 |
28 | 1800 | 98 |
14 | 4130 | 302 |
and that your task is to create a GAUSS program which can find the most similar car from your list when given the mpg
, weight
, and displacement
for a new car.
Step 1: Create a new GAUSS program file
The first thing we need to do is to create a new file for this GAUSS program. We can do this by entering the following statement in the GAUSS Program Input/Output window:
edit find_neighbors.gss
This command will open the file find_neighbors.gss
in our current GAUSS working directory. Since the file does not yet exist, GAUSS will create a new file for us.
.gss
is recommended.Step 2: Enter the data
We will start our program by entering the data from the table at the top of this tutorial as well as data for a car that we want to compare.
// Create a 5x3 matrix containing
// our main 'auto' data
autos = { 16 3880 231,
41 2040 90,
21 4290 350,
28 1800 98,
14 4130 302 };
// A 1x3 matrix with the data
// for the new car
x = { 17 3170 163 };
Note that lines which start with //
are comments in GAUSS.
Step 3: Run the program
Now that we have some code in the file, let's run it and create the matrices autos
and x
. Select the downward pointing triangle next to the Run button on the GAUSS Toolbar and select Current File as shown in the image below.
Since the program only creates two matrices, we will not see any output. We can verify that the data was created, by using the print
keyword as shown in the above image.
Step 4: Compute the Euclidean distance
The distance metric we will use is Euclidean distance, shown in the equation below:
$$ dist = \sqrt{(m_1 - m_2)^2 + (w_1 - w_2)^2 + (d_1 - d_2)^2}\\ \ \\ m = mpg\\ w = weight\\ d = displacement $$
We can accomplish this task in one line of code, like this:
// Compute the Euclidean distance between 'x' and 'autos'
dist = sqrt(sumr((x - autos).^2));
However, to make things simpler, we will work through each part of that statement separately.
a. Find the differences
// Find the difference between
// the 3 variables in 'x' and 'autos'
diff = x - autos;
GAUSS will automatically expand the row vector, x
to have the same number of rows as the matrix autos
, so the code above will perform the computation below:
$$ \begin{pmatrix} 17 & 3170 & 163\\ 17 & 3170 & 163\\ 17 & 3170 & 163\\ 17 & 3170 & 163\\ 17 & 3170 & 163 \end{pmatrix}- \begin{pmatrix} 16 & 3880 & 231\\ 41 & 2040 & 90\\ 21 & 4290 & 350\\ 28 & 1800 & 98\\ 14 & 4130 & 302 \end{pmatrix}= \begin{pmatrix} 1 & -710 & -68\\ -24 & 1130 & 73\\ -4 & -1120 & -187\\ -11 & 1370 & 65\\ 3 & -960 & -139 \end{pmatrix} $$
b. Square the differences
Next we use the element-by-element exponentiation operator, .^
to square each element in the matrix of differences.
// Square each element in 'diff'
diff_sq = diff .^2;
$$ \begin{pmatrix} 1^2 & -710^2 & -68^2\\ -24^2 & 1130^2 & 73^2\\ -4^2 & -1120^2 & -187^2\\ -11^2 & 1370^2 & 65^2\\ 3^2 & -960^2 & -139^2 \end{pmatrix}= \begin{pmatrix} 1 & 504100 & 4624\\ 576 & 1276900 & 5329\\ 16 & 1254400 & 34969\\ 121 & 1876900 & 4225\\ 9 & 921600 & 19321 \end{pmatrix} $$
c. Sum the rows
We can sum across the rows of our squared difference matrix with the sumr
function.
// Sum across the rows
sum_diff_sq = sumr(diff_sq);
$$ \begin{pmatrix} 1 & + & 504100 & + & 4624\\ 576 & + & 1276900 & + & 5329\\ 16 & + & 1254400 & + & 34969\\ 121 & + & 1876900 & + & 4225\\ 9 & + & 921600 & + & 19321 \end{pmatrix}= \begin{pmatrix} 508725\\ 1282805\\ 1289385\\ 1881246\\ 940930 \end{pmatrix} $$
d. Compute the square roots
Now we can compute the distance by taking the square root of each element of our vector.
// Compute the square root of each element
dist = sqrt(sum_diff_sq);
$$ \begin{pmatrix} \sqrt{508725}\\ \sqrt{1282805}\\ \sqrt{1289385}\\ \sqrt{1881246}\\ \sqrt{940930} \end{pmatrix}= \begin{pmatrix} 713 \\ 1133\\ 1136 \\ 1372 \\ 970 \end{pmatrix} $$
Step 5: Find the smallest distance
It is clear from looking at the computations shown above, that in this case, the car from the first row of the autos
matrix has the shortest distance from the sample car represented by the row vector, x
. However, to compare larger datasets, we need our program to search for us.
The GAUSS function, minindc
, returns the index of the smallest element in each column of a matrix.
// Find the index of the smallest distance
idx = minindc(dist);
// Find the smallest distance
dist_min = dist[idx];
Step 6: Print the results
Now that we have found the index of the row with the smallest distance, we should add some print statements so that our program can tell us what it found.
print "The closest car is from row " idx;
print "with a distance of: " dist_min;
Run your program again and you should see the following output:
The closest car is from row 1.00 with a distance of: 713.25
Looking over the original data, this answer appears correct.
Conclusion
Congratulations! You have:
- Created and run a GAUSS program.
- Used GAUSS functions, operators and indexing to perform a nearest neighbor search.
- Printed the program results.
For convenience, the full program text is reproduced below.
// Create a 5x3 matrix containing
// our main 'auto' data
autos = { 16 3880 231,
41 2040 90,
21 4290 350,
28 1800 98,
14 4130 302 };
// A 1x3 matrix with the data
// for the new car
x = { 17 3170 163 };
// Find the difference between
// the 3 variables in 'x' and 'autos'
diff = x - autos;
// Square each element in 'diff'
diff_sq = diff .^2;
// Sum across the rows
sum_diff_sq = sumr(diff_sq);
// Compute the square root of each element
dist = sqrt(sum_diff_sq);
// Find the index of the smallest distance
idx = minindc(dist);
// Find the smallest distance
dist_min = dist[idx];
print "The closest car is from row " idx;
print "with a distance of: " dist_min;