I have a CSV file with about 80 variable names, but I do not know what their names are and what type of format or storage type they have. Is there a way to avoid using column vectors when subsetting a data set?
Also, how to get a quick picture of the variable names and format/storage type contained in a CSV file or Excel file? There is a getnamef()
function in GAUSS and loadDataVars()
, which seem to do that job, but it looks like that these GAUSS functions are not for CSV, XLSX, XLS, TXT data files.
4 Answers
0
You can use the GAUSS function getHeaders
to get a list of the variable names of any file that loadd
can read (i.e. CSV/DAT/DTA/XLS/XLSX). For example,
fname = getGAUSSHome() $+ "examples/housing.csv";
print getHeaders(fname);
will print out:
taxes beds baths new price size
If you also want to get a sense of the data, you can use the dstatmt
command to get the descriptive statistics. For example,
fname = getGAUSSHome() $+ "examples/housing.csv";
call dstatmt(fname);
will print
---------------------------------------------------------------------------------------- Variable Mean Std Dev Variance Minimum Maximum Valid Missing ---------------------------------------------------------------------------------------- taxes 1908 1236 1.527e+06 20 6627 100 0 beds 3 0.6513 0.4242 2 5 100 0 baths 1.96 0.5671 0.3216 1 4 100 0 new 0.11 0.3145 0.09889 0 1 100 0 price 155.3 101.3 1.025e+04 21 587 100 0 size 1629 666.9 4.448e+05 580 4050 100 0
I am not sure I understand your question about subsetting. However, I think maybe you mean that you need to know the variable names and types first before you can subset them and you thought that a way to do that might be to load all the column vectors as separate variables.
If I am correct about this, then I think the information above will give you the information you need. However, if not, let us know.
0
Thanks, but this is really frustrating. My CSV file name is g2.csv
. I follow (I guess) your directions and unfortunately it did not work.
fname = "g2.csv";
fnamen = loadd(fname);
print getHeaders(fnamen);
My code only has 7 lines. But, I am getting an error message in line 23:
G0041 : Argument must be scalar [parse_fname.src, line 23]
How can that be possible?
0
In my post "Subsetting a Dataset" I thought you suggested to ask for a post with the title "Is there a way to avoid using column vectors [when subsetting a data set?]" to explore this alternative option compared to your code snippet that you provided in the post "Subsetting a Dataset". This post is actually asking this question. My example is for a CSV file with 80 variables, but wanting to choose x1, x2, x10, x15, x79, and x80. Thanks!
0
The short answer is that you need to change the code to this:
fname = "g2.csv";
print getHeaders(fname);
or this, if you prefer
print getHeaders("g2.csv");
The function getHeaders
takes a filename, then loads the variable names from this file and returns them as a string array.
The code that you posted is loading the data from the file into a GAUSS matrix named fnamen
. Then it is passing this matrix to the getHeaders
function.
// Create file name.
fname = "g2.csv";
// Load data from 'g2.csv' into a GAUSS
// matrix with the name 'fnamen'
fnamen = loadd(fname);
// Pass a GAUSS matrix to 'getHeaders'
// This will cause an error
print getHeaders(fnamen);
The reason that the error was on line 23 when your code only has 7 lines is that the error was occurring inside of the file which contains the code for the getHeaders
function. The error was caused because, as we mentioned above, getHeaders
expects a 1x1 string as the input, but it got a matrix instead.
Your Answer
4 Answers
You can use the GAUSS function getHeaders
to get a list of the variable names of any file that loadd
can read (i.e. CSV/DAT/DTA/XLS/XLSX). For example,
fname = getGAUSSHome() $+ "examples/housing.csv";
print getHeaders(fname);
will print out:
taxes beds baths new price size
If you also want to get a sense of the data, you can use the dstatmt
command to get the descriptive statistics. For example,
fname = getGAUSSHome() $+ "examples/housing.csv";
call dstatmt(fname);
will print
---------------------------------------------------------------------------------------- Variable Mean Std Dev Variance Minimum Maximum Valid Missing ---------------------------------------------------------------------------------------- taxes 1908 1236 1.527e+06 20 6627 100 0 beds 3 0.6513 0.4242 2 5 100 0 baths 1.96 0.5671 0.3216 1 4 100 0 new 0.11 0.3145 0.09889 0 1 100 0 price 155.3 101.3 1.025e+04 21 587 100 0 size 1629 666.9 4.448e+05 580 4050 100 0
I am not sure I understand your question about subsetting. However, I think maybe you mean that you need to know the variable names and types first before you can subset them and you thought that a way to do that might be to load all the column vectors as separate variables.
If I am correct about this, then I think the information above will give you the information you need. However, if not, let us know.
Thanks, but this is really frustrating. My CSV file name is g2.csv
. I follow (I guess) your directions and unfortunately it did not work.
fname = "g2.csv";
fnamen = loadd(fname);
print getHeaders(fnamen);
My code only has 7 lines. But, I am getting an error message in line 23:
G0041 : Argument must be scalar [parse_fname.src, line 23]
How can that be possible?
In my post "Subsetting a Dataset" I thought you suggested to ask for a post with the title "Is there a way to avoid using column vectors [when subsetting a data set?]" to explore this alternative option compared to your code snippet that you provided in the post "Subsetting a Dataset". This post is actually asking this question. My example is for a CSV file with 80 variables, but wanting to choose x1, x2, x10, x15, x79, and x80. Thanks!
The short answer is that you need to change the code to this:
fname = "g2.csv";
print getHeaders(fname);
or this, if you prefer
print getHeaders("g2.csv");
The function getHeaders
takes a filename, then loads the variable names from this file and returns them as a string array.
The code that you posted is loading the data from the file into a GAUSS matrix named fnamen
. Then it is passing this matrix to the getHeaders
function.
// Create file name.
fname = "g2.csv";
// Load data from 'g2.csv' into a GAUSS
// matrix with the name 'fnamen'
fnamen = loadd(fname);
// Pass a GAUSS matrix to 'getHeaders'
// This will cause an error
print getHeaders(fnamen);
The reason that the error was on line 23 when your code only has 7 lines is that the error was occurring inside of the file which contains the code for the getHeaders
function. The error was caused because, as we mentioned above, getHeaders
expects a 1x1 string as the input, but it got a matrix instead.