Is there a way to avoid using column vectors when subseting a data set (e.g. CSV, XLSX, XLS, TXT files)?

I have a CSV file with about 80 variable names, but I do not know what their names are and what type of format or storage type they have. Is there a way to avoid using column vectors when subsetting a data set?

Also, how to get a quick picture of the variable names and format/storage type contained in a CSV file or Excel file? There is a getnamef() function in GAUSS and loadDataVars(), which seem to do that job, but it looks like that these GAUSS functions are not for CSV, XLSX, XLS, TXT data files.

4 Answers



0



You can use the GAUSS function getHeaders to get a list of the variable names of any file that loadd can read (i.e. CSV/DAT/DTA/XLS/XLSX). For example,

fname = getGAUSSHome() $+ "examples/housing.csv";
print getHeaders(fname);

will print out:

           taxes
            beds
           baths
             new
           price
            size

If you also want to get a sense of the data, you can use the dstatmt command to get the descriptive statistics. For example,

fname = getGAUSSHome() $+ "examples/housing.csv";

call dstatmt(fname);

will print

----------------------------------------------------------------------------------------
Variable        Mean     Std Dev      Variance     Minimum     Maximum     Valid Missing
----------------------------------------------------------------------------------------

taxes           1908        1236     1.527e+06          20        6627       100    0
beds               3      0.6513        0.4242           2           5       100    0
baths           1.96      0.5671        0.3216           1           4       100    0
new             0.11      0.3145       0.09889           0           1       100    0
price          155.3       101.3     1.025e+04          21         587       100    0
size            1629       666.9     4.448e+05         580        4050       100    0 

I am not sure I understand your question about subsetting. However, I think maybe you mean that you need to know the variable names and types first before you can subset them and you thought that a way to do that might be to load all the column vectors as separate variables.

If I am correct about this, then I think the information above will give you the information you need. However, if not, let us know.

aptech

1,773


0



Thanks, but this is really frustrating. My CSV file name is g2.csv. I follow (I guess) your directions and unfortunately it did not work.

fname = "g2.csv";
fnamen = loadd(fname);
print getHeaders(fnamen);

My code only has 7 lines. But, I am getting an error message in line 23:

G0041 : Argument must be scalar [parse_fname.src, line 23]

How can that be possible?



0



In my post "Subsetting a Dataset" I thought you suggested to ask for a post with the title "Is there a way to avoid using column vectors [when subsetting a data set?]" to explore this alternative option compared to your code snippet that you provided in the post "Subsetting a Dataset". This post is actually asking this question.  My example is for a CSV file with 80 variables, but wanting to choose x1, x2, x10, x15, x79, and x80. Thanks!



0



The short answer is that you need to change the code to this:

fname = "g2.csv";
print getHeaders(fname);

or this, if you prefer

print getHeaders("g2.csv");

The function getHeaders takes a filename, then loads the variable names from this file and returns them as a string array.

The code that you posted is loading the data from the file into a GAUSS matrix named fnamen. Then it is passing this matrix to the getHeaders function.

// Create file name. 
fname = "g2.csv";

// Load data from 'g2.csv' into a GAUSS
// matrix with the name 'fnamen'
fnamen = loadd(fname);

// Pass a GAUSS matrix to 'getHeaders'
// This will cause an error
print getHeaders(fnamen);

The reason that the error was on line 23 when your code only has 7 lines is that the error was occurring inside of the file which contains the code for the getHeaders function. The error was caused because, as we mentioned above, getHeaders expects a 1x1 string as the input, but it got a matrix instead.

aptech

1,773

Your Answer

4 Answers

0

You can use the GAUSS function getHeaders to get a list of the variable names of any file that loadd can read (i.e. CSV/DAT/DTA/XLS/XLSX). For example,

fname = getGAUSSHome() $+ "examples/housing.csv";
print getHeaders(fname);

will print out:

           taxes
            beds
           baths
             new
           price
            size

If you also want to get a sense of the data, you can use the dstatmt command to get the descriptive statistics. For example,

fname = getGAUSSHome() $+ "examples/housing.csv";

call dstatmt(fname);

will print

----------------------------------------------------------------------------------------
Variable        Mean     Std Dev      Variance     Minimum     Maximum     Valid Missing
----------------------------------------------------------------------------------------

taxes           1908        1236     1.527e+06          20        6627       100    0
beds               3      0.6513        0.4242           2           5       100    0
baths           1.96      0.5671        0.3216           1           4       100    0
new             0.11      0.3145       0.09889           0           1       100    0
price          155.3       101.3     1.025e+04          21         587       100    0
size            1629       666.9     4.448e+05         580        4050       100    0 

I am not sure I understand your question about subsetting. However, I think maybe you mean that you need to know the variable names and types first before you can subset them and you thought that a way to do that might be to load all the column vectors as separate variables.

If I am correct about this, then I think the information above will give you the information you need. However, if not, let us know.

0

Thanks, but this is really frustrating. My CSV file name is g2.csv. I follow (I guess) your directions and unfortunately it did not work.

fname = "g2.csv";
fnamen = loadd(fname);
print getHeaders(fnamen);

My code only has 7 lines. But, I am getting an error message in line 23:

G0041 : Argument must be scalar [parse_fname.src, line 23]

How can that be possible?

0

In my post "Subsetting a Dataset" I thought you suggested to ask for a post with the title "Is there a way to avoid using column vectors [when subsetting a data set?]" to explore this alternative option compared to your code snippet that you provided in the post "Subsetting a Dataset". This post is actually asking this question.  My example is for a CSV file with 80 variables, but wanting to choose x1, x2, x10, x15, x79, and x80. Thanks!

0

The short answer is that you need to change the code to this:

fname = "g2.csv";
print getHeaders(fname);

or this, if you prefer

print getHeaders("g2.csv");

The function getHeaders takes a filename, then loads the variable names from this file and returns them as a string array.

The code that you posted is loading the data from the file into a GAUSS matrix named fnamen. Then it is passing this matrix to the getHeaders function.

// Create file name. 
fname = "g2.csv";

// Load data from 'g2.csv' into a GAUSS
// matrix with the name 'fnamen'
fnamen = loadd(fname);

// Pass a GAUSS matrix to 'getHeaders'
// This will cause an error
print getHeaders(fnamen);

The reason that the error was on line 23 when your code only has 7 lines is that the error was occurring inside of the file which contains the code for the getHeaders function. The error was caused because, as we mentioned above, getHeaders expects a 1x1 string as the input, but it got a matrix instead.


You must login to post answers.

Have a Specific Question?

Get a real answer from a real person

Need Support?

Get help from our friendly experts.