Beginner Program: Nearest Neighbor Search

Goals

This tutorial will show you how to:

  • Create and run a GAUSS program from scratch.
  • Use GAUSS functions, operators, and indexing.

By the end, you will have a simple GAUSS program which performs a nearest neighbor search. We recommend looking over Getting Started with GAUSS, before working through this tutorial.

Problem description

GAUSS nearest neighbors plot. Suppose you have been given the following data about some automobiles:

mpg weight displacement
16 3880 231
41 2040 90
21 4290 350
28 1800 98
14 4130 302

and that your task is to create a GAUSS program which can find the most similar car from your list when given the mpg, weight, and displacement for a new car.

Step 1: Create a new GAUSS program file

The first thing we need to do is to create a new file for this GAUSS program. We can do this by entering the following statement in the GAUSS Program Input/Output window:

edit find_neighbors.gss

This command will open the file find_neighbors.gss in our current GAUSS working directory. Since the file does not yet exist, GAUSS will create a new file for us.

Step 2: Enter the data

We will start our program by entering the data from the table at the top of this tutorial as well as data for a car that we want to compare.

// Create a 5x3 matrix containing
// our main 'auto' data
autos = { 16 3880 231,
          41 2040 90,
          21 4290 350,
          28 1800 98,
          14 4130 302 };

// A 1x3 matrix with the data
// for the new car
x = { 17 3170 163 };

Note that lines which start with // are comments in GAUSS.

Step 3: Run the program

Now that we have some code in the file, let's run it and create the matrices autos and x. Select the downward pointing triangle next to the Run button on the GAUSS Toolbar and select Current File as shown in the image below.

Run a GAUSS program.

Since the program only creates two matrices, we will not see any output. We can verify that the data was created, by using the print keyword as shown in the above image.

Step 4: Compute the Euclidean distance

The distance metric we will use is Euclidean distance, shown in the equation below:

$$ dist = \sqrt{(m_1 - m_2)^2 + (w_1 - w_2)^2 + (d_1 - d_2)^2}\\ \ \\ m = mpg\\ w = weight\\ d = displacement $$

We can accomplish this task in one line of code, like this:

// Compute the Euclidean distance between 'x' and 'autos'
dist = sqrt(sumr((x - autos).^2));

However, to make things simpler, we will work through each part of that statement separately.

a. Find the differences

// Find the difference between
// the 3 variables in 'x' and 'autos'
diff = x - autos;

GAUSS will automatically expand the row vector, x to have the same number of rows as the matrix autos, so the code above will perform the computation below:

$$ \begin{pmatrix} 17 & 3170 & 163\\ 17 & 3170 & 163\\ 17 & 3170 & 163\\ 17 & 3170 & 163\\ 17 & 3170 & 163 \end{pmatrix}- \begin{pmatrix} 16 & 3880 & 231\\ 41 & 2040 & 90\\ 21 & 4290 & 350\\ 28 & 1800 & 98\\ 14 & 4130 & 302 \end{pmatrix}= \begin{pmatrix} 1 & -710 & -68\\ -24 & 1130 & 73\\ -4 & -1120 & -187\\ -11 & 1370 & 65\\ 3 & -960 & -139 \end{pmatrix} $$

b. Square the differences

Next we use the element-by-element exponentiation operator, .^ to square each element in the matrix of differences.

// Square each element in 'diff'
diff_sq = diff .^2;

$$ \begin{pmatrix} 1^2 & -710^2 & -68^2\\ -24^2 & 1130^2 & 73^2\\ -4^2 & -1120^2 & -187^2\\ -11^2 & 1370^2 & 65^2\\ 3^2 & -960^2 & -139^2 \end{pmatrix}= \begin{pmatrix} 1 & 504100 & 4624\\ 576 & 1276900 & 5329\\ 16 & 1254400 & 34969\\ 121 & 1876900 & 4225\\ 9 & 921600 & 19321 \end{pmatrix} $$

c. Sum the rows

We can sum across the rows of our squared difference matrix with the sumr function.

// Sum across the rows
sum_diff_sq = sumr(diff_sq);

$$ \begin{pmatrix} 1 & + & 504100 & + & 4624\\ 576 & + & 1276900 & + & 5329\\ 16 & + & 1254400 & + & 34969\\ 121 & + & 1876900 & + & 4225\\ 9 & + & 921600 & + & 19321 \end{pmatrix}= \begin{pmatrix} 508725\\ 1282805\\ 1289385\\ 1881246\\ 940930 \end{pmatrix} $$

d. Compute the square roots

Now we can compute the distance by taking the square root of each element of our vector.

// Compute the square root of each element
dist = sqrt(sum_diff_sq);

$$ \begin{pmatrix} \sqrt{508725}\\ \sqrt{1282805}\\ \sqrt{1289385}\\ \sqrt{1881246}\\ \sqrt{940930} \end{pmatrix}= \begin{pmatrix} 713 \\ 1133\\ 1136 \\ 1372 \\ 970 \end{pmatrix} $$

Step 5: Find the smallest distance

It is clear from looking at the computations shown above, that in this case, the car from the first row of the autos matrix has the shortest distance from the sample car represented by the row vector, x. However, to compare larger datasets, we need our program to search for us.

The GAUSS function, minindc, returns the index of the smallest element in each column of a matrix.

// Find the index of the smallest distance
idx = minindc(dist);

// Find the smallest distance
dist_min = dist[idx];

Step 6: Print the results

Now that we have found the index of the row with the smallest distance, we should add some print statements so that our program can tell us what it found.

print "The closest car is from row " idx;
print "with a distance of: " dist_min;

Run your program again and you should see the following output:

The closest car is from row 1.00
with a distance of: 713.25

Looking over the original data, this answer appears correct.

Conclusion

Congratulations! You have:

  • Created and run a GAUSS program.
  • Used GAUSS functions, operators and indexing to perform a nearest neighbor search.
  • Printed the program results.

For convenience, the full program text is reproduced below.

// Create a 5x3 matrix containing
// our main 'auto' data
autos = { 16 3880 231,
          41 2040 90,
          21 4290 350,
          28 1800 98,
          14 4130 302 };

// A 1x3 matrix with the data
// for the new car
x = { 17 3170 163 };

// Find the difference between
// the 3 variables in 'x' and 'autos'
diff = x - autos;

// Square each element in 'diff'
diff_sq = diff .^2;

// Sum across the rows
sum_diff_sq = sumr(diff_sq);

// Compute the square root of each element
dist = sqrt(sum_diff_sq);

// Find the index of the smallest distance
idx = minindc(dist);

// Find the smallest distance
dist_min = dist[idx];

print "The closest car is from row " idx;
print "with a distance of: " dist_min;

Have a Specific Question?

Get a real answer from a real person

Need Support?

Get help from our friendly experts.