Introduction
Minimum GAUSS version: 16.0
The GAUSS commands csvReadM
and csvReadSA
read data into GAUSS from delimited text files. These files assume that the data is separated by a comma; but as we will see, they can also read data that is delimited by tabs, spaces, semi-colons or any character. The difference between these two functions is that csvReadM
reads data into a GAUSS matrix, while csvReadSA
, reads data into a GAUSS string or string array.
Filename input
csvReadM
and csvReadSA
both have one required input, the name of the data file. They also both have one output, the data read from the specified file. If you do not provide a path to your file, csvReadM
and csvReadSA
will look for the file in your current working directory.
For our first example, we will read in some data from an example data file that is packaged with GAUSS. Before we read the data, we will change our working directory to the GAUSSHOME/examples directory. We can perform this action with the following commands:
//Get the name of our GAUSS examples directory
examp_dir = getGAUSSHome() $+ "examples";
//Change the working directory to the GAUSSHOME/examples directory
//Note: The caret operator (^) tells GAUSS to look at the contents
//of 'examp_dir', rather than the literal token, 'examp_dir'
chdir ^examp_dir;
Now that we have set our working directory, we can load and preview some example data:
//Load all rows and all columns of 'housing.csv'
//into the GAUSS matrix 'housing'
housing = csvReadM("housing.csv");
//Print the first 5 rows to the GAUSS program input/output window
print housing[1:5,.];
The output from the code above should look like this:
. . . . . . 3104.00 4.00 2.00 0.00 279.90 2048.00 1173.00 2.00 1.00 0.00 146.50 912.00 3076.00 4.00 2.00 0.00 237.70 1654.00 1608.00 3.00 2.00 0.00 200.00 2068.00
The first thing we notice is that the first row consists entirely of dots '.'
. This is because the first row of housing.csv
contains the header. Since we specified that the data should be read in as a matrix, the text data was converted to a missing value. Let us take a look at all of the inputs to csvReadM
and csvReadSA
and then work through some more simple examples.
csvReadM and csvReadSA input options
- filename
- string, the name of the data file to read from
- row_range
- Optional input. Scalar or 2x1 matrix, the rows to read from the data file. If row_range is a scalar, then data will be read from row row_range to the end of the file.
- col_range
- Optional input. Scalar or 2x1 matrix, the columns to read from the data file. If col_range is a scalar, then data will be read from column col_range to the end of the file.
- delimiter
- Optional input. String, the character used to separate data elements. Default =
","
.
Examples
Example 1: Specify a starting row
Continuing with our example from above, if we wanted to load the numeric housing data into a matrix without the header row, we simply add the optional row_range
argument like this:
//Load rows 2 to the end and all columns of 'housing.csv' into
//into the GAUSS matrix 'housing'
housing = csvReadM("housing.csv", 2);
//Print the first 5 rows to the GAUSS program input/output window
print housing[1:5,.];
This time, our code should produce this output:
3104.00 4.00 2.00 0.00 279.90 2048.00 1173.00 2.00 1.00 0.00 146.50 912.00 3076.00 4.00 2.00 0.00 237.70 1654.00 1608.00 3.00 2.00 0.00 200.00 2068.00 1454.00 3.00 3.00 0.00 159.90 1477.00
Example 2: Read in string data and specify a single row
In our last example, we read in just the numeric data from housing.csv
. If we want to load the header names as a string array, we can do this:
//First and last row to read is the first row
row_range = { 1, 1 };
//Read header names into a string array, named 'var_names'
var_names = csvReadSA("housing.csv", row_range);
//Print the header
print var_names;
This code should return:
taxes beds baths new price size
Example 3: Read numeric data from a row range
//First and last row to read
row_range = { 3, 5 };
//Read all columns of row 3 through 5
housing = csvReadM("housing.csv", row_range);
//Print all data read in
print housing;
The code above should return the following output:
1173.00 2.00 1.00 0.00 146.50 912.00 3076.00 4.00 2.00 0.00 237.70 1654.00 1608.00 3.00 2.00 0.00 200.00 2068.00
Example 4: Read rows 2 to the end from a range of columns
//Start from the second row
row_range = 2;
//Read columns 2 through 4
col_range = { 2, 4 };
//Read all columns of row 3 through 5
housing = csvReadM("housing.csv", row_range, col_range);
//Print the first 5 rows of the data read in
print housing[1:5,.];
The code above should return the following output:
4.00 2.00 0.00 2.00 1.00 0.00 4.00 2.00 0.00 3.00 2.00 0.00 3.00 3.00 0.00
Example 5: Read from a file with a different delimiter
So far, all of our examples have assumed that the elements in our data file are separated by a comma. GAUSS does not come with any space or tab-separated files. But you can read them in by specifying the fourth argument as a string, containing the separator used in the file. For example:
//Tab separator
sep = "\t";
//Read all rows and all columns of a tab-separated file
x = csvReadM("my_tab_data.txt", 1, 1, sep);
//Space separator
sep = " ";
//Read all rows and all columns of a space-separated file
x = csvReadM("my_space_data.txt", 1, 1, sep);
Troubleshooting
The most common reason that csvReadM
or csvReadSA
will fail with an error is if the file does not exist, or is not in the directory that you specify. The GAUSS function filesa
will return a string array of files that match a file specification. This can be helpful to double-check the existence of a file. For example, the code:
file_name = "housing.csv";
print filesa(file_name);
will return:
housing.csv
if housing.csv
exists in our current working directory. If the file does not exist, the filesa
command above will return an empty string. For example, if we misspell the filename for one of the examples above, like this:
//Misspell file name
file_name = "hhousing.csv";
//Attempt to read data from file that does not exist
x = csvReadM(file_name);
GAUSS will return an error. We can check for the existence of the file with filesa
, like this:
//Print 'hhousing.csv' if that file is found
print filesa(file_name);
Since that file does not exist, we should get an empty string as the only output to the command above. Since filesa
accepts the star '*'
character for wildcards, we can use filesa
to print out all files in our current working directory that end with .csv
like this:
//Print all files in the current working directory
//that end with .csv
print filesa("*.csv");
If our current working directory is GAUSSHOME/examples, the output from this command should be:
beef_prices.csv binary.csv housing.csv
If we compare this output to the file_name
variable, we should see that we need to change hhousing.csv
to housing.csv
.