Introduction
Today's blog will:
- Explain the cause of the index out-of-range error in GAUSS.
- Explain why performing index assignments past the end of your data can lead to bad outcomes.
- Show how to use some functions and operators that can assist with diagnosing and resolving this error.
- Work through an example to resolve an indexing problem.
What Causes the Index-out-of-Range Error?
The index out-of-range error occurs any time that you try to select an element of a dataframe, matrix, or string array that does not exist. Let's look at a basic example:
Index out-of-range on read
// Create a 2x1 vector
a = { 5, 9 };
// Attempt to access the third element and assign it to 'b'
b = a[3];
In the above example, the error is clear. Since a
does not have a third element, trying to access it is clearly an error.
Index out-of-range on write
// Create a 2x1 vector
a = { 5, 9 };
// Attempt to assign to the third element of 'a'
a[3] = 12;
Since this code is attempting to write past the end of the vector a
, it will also cause an index out-of-range error. This behavior is standard among most programming environments. However, it can sometimes confuse new GAUSS users who are coming from statistical software that has historically been designed more for prototyping.
At first, it might seem convenient for the extra elements to be automatically added past the end of your data. However, there are two reasons why it is not such a good idea:
- It allows errors to go undetected. Writing past the end of data is very easy to do by accident and one of the most common errors to make when programming--even for trained professional programmers. Letting the error go undetected at the source can cause a much harder-to-detect error to happen later on in the program. In other cases, it can even lead to wrong answers.
- It will slow down your code. Starting out with a vector, matrix, or dataframe of the correct size and filling it in can be 10 or more times faster than adding new elements an observation at a time.
Functions and Operators to Investigate and Resolve an Index out-of-range Error
Here are a few functions and operators that should be helpful when diagnosing and resolving an index out-of-range error.
Name | Purpose |
---|---|
cols | Returns the number of columns in a dataframe, matrix, or string array. |
rows | Returns the number of rows in a dataframe, matrix, or string array. |
getorders | Returns the size of each dimension in a dataframe, matrix, multi-dimensional array, or string array. |
zeros | Returns a matrix of a specified size filled with zeros. It is often used to pre-allocate a matrix that will be filled in later. |
Symbol | Description |
---|---|
~ | Adds columns to a matrix, dataframe, or string array. |
| | Adds rows to a matrix, dataframe, or string array. |
Basic Usage Examples
a = { 7 2,
4 1 };
b = { 5 8,
3 9 };
c = { 6 6 };
print "rows(a) = " rows(a);
print "cols(a) = " cols(a);
print "----";
print "getorders(a) = " getorders(a);
will return:
rows(a) = 2.0000000 cols(a) = 2.0000000 ---- getorders(a) = 2.0000000 2.0000000
Continuing with the previous code:
// Use horizontal concatenation to add columns
d = a ~ b;
// User vertical concatenation to add rows
e = c | b;
After the above code:
d = 7 2 5 8 4 1 3 9 e = 6 6 5 8 3 9
Solve an Example Index Error
Let's work through a simplified real-world example of an out-of-range error. Our code takes some bootstrap samples from a dataset and then computes and stores the means and standard deviations of these samples. Here is the initial code:
burn_in = 5;
ndraws = 10;
// Data to sample from
data = rndn(10, 1);
// Pre-allocate vector to hold results
stats = zeros(ndraws, 1);
for i(1, ndraws + burn_in, 1);
// Take a bootstrap sample
s = sampleData(data, rows(data), 1);
// Assign the mean of the sample
// to the 1st column of the i'th row of stats
stats[i,1] = meanc(s);
// Assign the standard deviation of the sample
// to the 2nd column of the i'th row of stats
stats[i,2] = stdsc(s);
endfor;
After running the above code, GAUSS reports the index out-of-range error from the line stats[i,2] = stdsc(s);
.
Let's start diagnosing this problem by printing out the size of stats
before the loop and the value of i
on each iteration, like this:
burn_in = 5;
ndraws = 10;
// Data to sample from
data = rndn(10, 1);
// Pre-allocate vector to hold results
stats = zeros(ndraws, 1);
print "rows(stats) = " rows(stats);
print "cols(stats) = " cols(stats);
for i(1, ndraws + burn_in, 1);
print "i = " i;
// Take a bootstrap sample
s = sampleData(data, rows(data), 1);
// Assign the mean of the sample
// to the 1st column of the i'th row of stats
stats[i,1] = meanc(s);
// Assign the standard deviation of the sample
// to the 2nd column of the i'th row of stats
stats[i,2] = stdsc(s);
endfor;
After running this version, we see the following printed output:
rows(stats) = 10.000000 cols(stats) = 1.0000000 i = 1.0000000
This tells us that the error happened on the first iteration of the loop, since i
is never greater than one. We also see that stats
only has one column. The line causing our error is trying to write to the second column of stats
. So we need to add a second column to stats
when we pre-initialize it.
After we fix that problem, our code looks like this:
burn_in = 5;
ndraws = 10;
// Data to sample from
data = rndn(10, 1);
// Pre-allocate vector to hold results
stats = zeros(ndraws, 2);
print "rows(stats) = " rows(stats);
print "cols(stats) = " cols(stats);
for i(1, ndraws + burn_in, 1);
print "i = " i;
// Take a bootstrap sample
s = sampleData(data, rows(data), 1);
// Assign the mean of the sample
// to the 1st column of the i'th row of stats
stats[i,1] = meanc(s);
// Assign the standard deviation of the sample
// to the 2nd column of the i'th row of stats
stats[i,2] = stdsc(s);
endfor;
Unfortunately, GAUSS is still reporting an index out-of-range error. This time it is on an earlier line, stats[i,1] = meanc(s);
. However, when we look at our printed output:
rows(stats) = 10.000000 cols(stats) = 2.0000000 i = 1.0000000 i = 2.0000000 i = 3.0000000 i = 4.0000000 i = 5.0000000 i = 6.0000000 i = 7.0000000 i = 8.0000000 i = 9.0000000 i = 10.000000 i = 11.000000
we see that the code has gotten through several iterations. We can also see that the error occurs when the code tries to write to the 11th row of stats
, but stats
only has 10 rows. We need to find out why the number of iterations in the loop and the number of rows in stats
don't agree.
Looking back over the code, we can see that stats
has ndraws
rows, but the loop has ndraws
+ burn_in
iterations. There are many ways we could resolve this, but since our main focus is on identifying the cause of the problem, we will just change the size of stats
to have enough rows to hold all iterations.
Below is our final code:
burn_in = 5;
ndraws = 10;
// Data to sample from
data = rndn(10, 1);
// Pre-allocate vector to hold results
stats = zeros(ndraws + burn_in, 2);
print "rows(stats) = " rows(stats);
print "cols(stats) = " cols(stats);
for i(1, ndraws + burn_in, 1);
print "i = " i;
// Take a bootstrap sample
s = sampleData(data, rows(data), 1);
// Assign the mean of the sample
// to the 1st column of the i'th row of stats
stats[i,1] = meanc(s);
// Assign the standard deviation of the sample
// to the 2nd column of the i'th row of stats
stats[i,2] = stdsc(s);
endfor;
Fortunately, this time the code runs without error and the printout shows us that all iterations have been performed--problem solved! We can now remove the print statements and keep going.
Conclusion
Congratulations! You have learned:
- The cause of the index out-of-range error in GAUSS.
- Why performing index assignments past the end of your data can lead to bad outcomes.
- How to use some functions and operators that can assist with finding the size of your data and adding rows and columns.
and worked through a simple, but realistic example.