I am trying to run simulations where I would need to create very large matrices, which would then need to be inverted (i.e. I'm running panel fixed effects models with a very large number of fixed effects).
Say N=50 and T=50. My aim is to first create an identity matrix
neye=eye((N*(N-1)));
which I then Kronecker product by a vector of length T
bifix=ones(T,1).*.neye;
Unsurprisingly Gauss tells me I have insufficient memory for bifix, since the latter step creates a matrix of dimension 122,500 by 2,450 (over 300m cells). Is there any way I can do this without going to sparse matrices? I tried increasing maxbytes to 1G but I guess that just affects reading in data, not creating data in Gauss.
I am using Gauss 9 at home and have access to 11 in the office. Apologies in advance if this is a stupid question, but in Stata I can have datasets with millions of observations for a large number of variables, so I thought that ought to be possible in Gauss, too.
1 Answer
0
A 122,500 by 2,450 matrix would be around 21 GB. That is too large to keep in memory for most machines these days (especially since it is probably not the matrix your calculations will need).
The two options for dealing with data that cannot fit into memory is to either: use sparse matrices, or read and process the data in chunks. The best format to keep data that you would like to process in chunks is probably a GAUSS datset. Here is a toy example with a procedure that will calculate the sum of all variables in a GAUSS dataset by reading in a few rows at a time.
new; //Create example dataset for use below rndseed 23543; x = rndn(1e3,3); ret = saved(x, "chunk.dat", "A"$|"B"$|"C"); s = sumDatasetVars("chunk.dat"); print "the sum of the variables = " s; proc (1) = sumDatasetVars(file_name); local x, chunk_size, fh, ret; x = 0; //How many rows to read in at a time chunk_size = 100; //Get handle for dataset so that we can //iterate over all rows in the file open fh = ^file_name for read; //Iterate over all rows in the dataset //and sum data do until eof(fh); x = x + sumc(readr(fh, chunk_size)); endo; //Close file handle and return sum ret = close(fh); retp(x); endp;
Please feel encouraged to post any questions you have about applying this to your specific application.
Your Answer
1 Answer
A 122,500 by 2,450 matrix would be around 21 GB. That is too large to keep in memory for most machines these days (especially since it is probably not the matrix your calculations will need).
The two options for dealing with data that cannot fit into memory is to either: use sparse matrices, or read and process the data in chunks. The best format to keep data that you would like to process in chunks is probably a GAUSS datset. Here is a toy example with a procedure that will calculate the sum of all variables in a GAUSS dataset by reading in a few rows at a time.
new; //Create example dataset for use below rndseed 23543; x = rndn(1e3,3); ret = saved(x, "chunk.dat", "A"$|"B"$|"C"); s = sumDatasetVars("chunk.dat"); print "the sum of the variables = " s; proc (1) = sumDatasetVars(file_name); local x, chunk_size, fh, ret; x = 0; //How many rows to read in at a time chunk_size = 100; //Get handle for dataset so that we can //iterate over all rows in the file open fh = ^file_name for read; //Iterate over all rows in the dataset //and sum data do until eof(fh); x = x + sumc(readr(fh, chunk_size)); endo; //Close file handle and return sum ret = close(fh); retp(x); endp;
Please feel encouraged to post any questions you have about applying this to your specific application.