Make your code portable: Data paths

Introduction

Most GAUSS users keep their code and data from different projects in separate directories. This is a good practice since it helps keep your work organized.

However, since it is your code and your computer, it can seem like the path of least resistance is to add full path references to any data that you read or write. Below is a toy example that illustrates the issue.

// Load data
y = loadd("C:\\Users\\YourUserName\\Projects\\GAUSS\\data\\mydata.csv");
oil_vars = xlsReadM("C:\\Users\\YourUserName\\Projects\\GAUSS\\data\\oil.xlsx", "A2:A220");

// Estimate least squares parameters
parameters = invpd(oil_vars'oil_vars)*(oil_vars'y);

// Write estimated parameters to output file
call xlsWriteM(parameters, "C:\\Users\\YourUserName\\Projects\\GAUSS\\data\\oil_parms.xlsx", "A2");

Assuming the paths are correct, this will work fine for you–at least when you first write it. However, if you want to share the code with others, run it on multiple computers or upgrade your computer, the paths will have to be changed.

For a simple program like the one above, you can probably get the change made in under a minute. But most programs are longer and more complicated than this and it is common to have path references throughout the code.

Step 1: Make variables to hold your paths

The first thing we can do to simplify this code is to make one variable at the top of the program which contains the path.

// Create variable to hold the path
path = "C:\\Users\\YourUserName\\Projects\\GAUSS\\data\\";

// Load data
y = loadd(path $+ "mydata.csv");
oil_vars = xlsReadM(path $+ "oil.xlsx", "A2:A220");

// Estimate least squares parameters
parameters = invpd(oil_vars'oil_vars)*(oil_vars'y);

// Write estimated parameters to output file
call xlsWriteM(parameters, path $+ "oil_parms.xlsx", "A2");

This is a great step because we now only have to change the path in one place. No more digging through hundreds or thousands of lines of code!

Step 2: Use __FILE_DIR to find the path for you

While only having to modify your paths in one location does not sound too bad, do you really want to have to make that change, or would you rather have it just work?

Furthermore, it can be a significant problem if multiple people are working with the code, or if you are using version control. Wouldn't it be nice if we could just ask GAUSS to figure out the path for us? Fortunately, __FILE_DIR (first available in version 18) does exactly this.

What does __FILE_DIR tell GAUSS to do?

Whenever GAUSS sees __FILE_DIR in a program, it replaces it with the full path to the location of the file which contains the __FILE_DIR statement. So if we add the following command to the top of the file C:\gauss22\examples\ols.e:

// Print the full path to the location of this file
print __FILE_DIR;

when we run it, in addition to the OLS estimates that the example prints out, we will see:

C:\gauss22\examples\

If you placed that same statement in the file /Users/david/programs/myprogram.gss, then it would print out:

/Users/david/programs/

Replace the first path with __FILE_DIR

Now that we have a tool to return the path of our program file, we need to decide how we want the project set up. If we just want to keep everything for the project in one folder, we can do this:

// Create variable to hold the path
path = __FILE_DIR;

// Load data
y = loadd(path $+ "mydata.csv");
oil_vars = xlsReadM(path $+ "oil.xlsx", "A2:A220");

// Estimate least squares parameters
parameters = invpd(oil_vars'oil_vars)*(oil_vars'y);

// Write estimated parameters to output file
call xlsWriteM(parameters, path $+ "oil_parms.xlsx", "A2");

For a very simple project this is probably fine and as long as the code and the data are in the same folder, the code will just run on any computer without modification. But we may wish to separate the code and the data into separate folders. For example, let's say we want our project to be laid out like this:

  • ols_oil - base path for project.
  • ols_oil/data - folder to hold input data files.
  • ols_oil/main - folder to hold our main program.
  • ols_oil/results - folder to hold the output written by the main program.

Since the data folder and the results folder are at the same level (or depth) as the 'main' folder which contains our main program, we will have to use .. to go back to ols_oil from ols_oil/main.

// Get full path to ols_oil/main
main_path = __FILE_DIR;

// Get full path to ols_oil/data
data_path = main_path $+ "..\\data\\";

// Get full path to ols_oil/results
rslt_path = main_path $+ "..\\results\\";

// Load data
y = loadd(data_path $+ "mydata.csv");
oil_vars = xlsReadM(data_path $+ "oil.xlsx", "A2:A220");

// Estimate least squares parameters
parameters = invpd(oil_vars'oil_vars)*(oil_vars'y);

// Write estimated parameters to output file
call xlsWriteM(parameters, rslt_path $+ "oil_parms.xlsx", "A2");

Now as long as the data, results and main sub-folders are all in the same location, this code will run on any computer, regardless of where they place the ols_oil folder or what their GAUSS working directory is set to.

Conclusion

You have just learned how to:

  • Create variables to hold paths.
  • Use __FILE_DIR for the base path.

can make your code more portable and easier to share with others.

Leave a Reply