Using "indices" function with GAUSS data

Hi there,

I've successfully been able to load in my dataset using the following command:

dataset = loadd("my_data.dat");

However, when I try to use the indices function like this:

{ q1, g1 } = indices(dataset,"var_1");

it crashes on me. This is the crash message I get:

Currently active call:

File datafile.src, line 227, in dataopen if ((strindx(filename, fsep, 1)));

Traceback:

File indices.src, line 55, in indices f1 = dataopen(dataset, "read");

However, when I run this command:

{ q1, g1 } = indices("my_data.dat","var_1");

it works out fine.

Is there any reason why I have to point back to the file on disk to get the indices function to work? Is it possible to get this function to work with the "dataset" variable already loaded into memory?

Thanks!

7 Answers



0



accepted

You could load the variable names at the start of the program and then look up the indices with the GAUSS function indsav.

// Create a 3x1 string array, using the
// string concatenation operator '$|'
hdrs = "AG_25_34" $| "V1" $| "AG_35_44";

// Find the index of 'V1' in
// the headers string array
idx = indsav("V1", hdrs);

// The number 2 should be printed
print idx;

Character arrays are deprecated, so this next example is not recommended but may be helpful in case the rest of the program is set up to use them.

// Create a 3x1 character array, using the
// MATRIX concatenation operator '|'
hdrs = "AG_25_34" | "V1" | "AG_35_44";

// Find the index of 'V1' in
// the headers CHARACTER array
idx = indcv("V1", hdrs);

// The number 2 should be printed
print idx;

aptech

1,773


0



After you have run the command

dataset = loadd("my_data.dat");

dataset is a GAUSS matrix containing the data from the my_data.dat file. A matrix contains only the numeric data and does not hold the variable names from the dataset file.

If you give a little more context about what you are trying to accomplish with the indices function, we might be able to give a better recommendation. For example, are you trying to locate the dependent variable from the matrix?

aptech

1,773


0



Thanks for the response!

What I'm trying to do is figure out which column contains the variable with the header Var1.

At later points in the code, I have other similar calls to this indices function where I try to get not one, but a group of indices, like this:

dataset_filename = "my_data.dat"

dataset = loadd(dataset_filename);

let selection = { AG_25_34 AG_35_44 AG_45_54 AG_55_64 AG_65_74 AG_75_84 AG_85P EDUC_2 EDUC_3 EDUC_4 EDUC_5 };

{ x1,sec_ind } = indices(dataset_filename,selection');

subset = dataset[.,sec_ind ]

This will store a bunch of column numbers (i.e. indices) in the variable sec_ind, which I can then use to select certain columns of my dataset variable and store them in yet a new variable called subset.

Is there a way to find the column indices (i.e. which is the number associated with a column that has a particular name) without having to point to the original file on disk?

Thanks again!



0



Perfect, I'll try to modify my code to fit the first example.

Thanks again!



0



Also, just to be clear: you're saying that if I want to get the indices of the headers of a dataset, I will HAVE to hard-code the headers into a string in the GAUSS code, and then use the indsav function, right?



0



Question
Also, just to be clear: you're saying that if I want to get the indices of the headers of a dataset, I will HAVE to hard-code the headers into a string in the GAUSS code, and then use the indsav function, right?

Answer
NO. You do NOT have to hard-code the headers as a string array. You can load the headers as a string array using getHeaders (with GAUSS 18+), getnamef or getname with the string combine operator as mentioned in this post.

aptech

1,773


0



Perfect, thank you!!!

Your Answer

7 Answers

0
accepted

You could load the variable names at the start of the program and then look up the indices with the GAUSS function indsav.

// Create a 3x1 string array, using the
// string concatenation operator '$|'
hdrs = "AG_25_34" $| "V1" $| "AG_35_44";

// Find the index of 'V1' in
// the headers string array
idx = indsav("V1", hdrs);

// The number 2 should be printed
print idx;

Character arrays are deprecated, so this next example is not recommended but may be helpful in case the rest of the program is set up to use them.

// Create a 3x1 character array, using the
// MATRIX concatenation operator '|'
hdrs = "AG_25_34" | "V1" | "AG_35_44";

// Find the index of 'V1' in
// the headers CHARACTER array
idx = indcv("V1", hdrs);

// The number 2 should be printed
print idx;

0

After you have run the command

dataset = loadd("my_data.dat");

dataset is a GAUSS matrix containing the data from the my_data.dat file. A matrix contains only the numeric data and does not hold the variable names from the dataset file.

If you give a little more context about what you are trying to accomplish with the indices function, we might be able to give a better recommendation. For example, are you trying to locate the dependent variable from the matrix?

0

Thanks for the response!

What I'm trying to do is figure out which column contains the variable with the header Var1.

At later points in the code, I have other similar calls to this indices function where I try to get not one, but a group of indices, like this:

dataset_filename = "my_data.dat"

dataset = loadd(dataset_filename);

let selection = { AG_25_34 AG_35_44 AG_45_54 AG_55_64 AG_65_74 AG_75_84 AG_85P EDUC_2 EDUC_3 EDUC_4 EDUC_5 };

{ x1,sec_ind } = indices(dataset_filename,selection');

subset = dataset[.,sec_ind ]

This will store a bunch of column numbers (i.e. indices) in the variable sec_ind, which I can then use to select certain columns of my dataset variable and store them in yet a new variable called subset.

Is there a way to find the column indices (i.e. which is the number associated with a column that has a particular name) without having to point to the original file on disk?

Thanks again!

0

Perfect, I'll try to modify my code to fit the first example.

Thanks again!

0

Also, just to be clear: you're saying that if I want to get the indices of the headers of a dataset, I will HAVE to hard-code the headers into a string in the GAUSS code, and then use the indsav function, right?

0

Question Also, just to be clear: you're saying that if I want to get the indices of the headers of a dataset, I will HAVE to hard-code the headers into a string in the GAUSS code, and then use the indsav function, right?

Answer NO. You do NOT have to hard-code the headers as a string array. You can load the headers as a string array using getHeaders (with GAUSS 18+), getnamef or getname with the string combine operator as mentioned in this post.

0

Perfect, thank you!!!


You must login to post answers.

Have a Specific Question?

Get a real answer from a real person

Need Support?

Get help from our friendly experts.