Appending Data Sets

Hi,

How do you append two datasets and what if the names of the variables in two datasets are the same, but their data types differ? For example, in one dataset the data type for a variable is numeric, but in the other dataset it is string?

Also, how do you append multiple say 10 csv or Excel files in Gauss?

Thanks!

 

 

1 Answer



0



Combine dataset variables of different types

It is fine to have variables of different types in a dataset. For example, it is OK to have 3 numeric variables and 2 string variables.

However, you really want each variable to maintain the same data type for all observations. To make things simple, let's say that you have datasets voters.csv and voters.xlsx and that it only has one variable, did_vote. In voters.csv, did_vote can be either a 0 if they did not vote, or a 1 if they did vote. In voters.xlsx, did_vote can be either a "No" if they did not vote, or a "Yes" if they did vote. Then you could do this:

// Load binary vector
vote_num = loadd("voters.csv", "did_vote");

// Load string vector
vote_str = xlsReadSA("voters.xlsx", "A2");

// Convert string variable to binary variable
from = "No" $| "Yes";
to = { 0, 1 };
vote_str = reclassify(vote_str, from, to);

// Combine both vectors (which are now numeric) into a single column vector
vote_num = vote_num | vote_str;

// Create a new Excel dataset with all observations
call saved(vote_num, "new_voters.xlsx", "did_vote");

If you want an example files to make the above example work, you can run this code:

// Create a vector of random zeros and ones
x1 = rndBernoulli(9, 1, 0.5);

saved(x1, "voters.csv", "did_vote");

// Create a vector of random 1's and 2's
idx = rndi(6, 1, 1|2);

// Create a vector "Yes" and "No"
txt = "Yes" $| "No";
x2 = txt[idx];

xlsWrite("did_vote", "voters.xlsx", "A1");
xlsWrite(x2, "voters.xlsx", "A2");

aptech

1,773

Your Answer

1 Answer

0

Combine dataset variables of different types

It is fine to have variables of different types in a dataset. For example, it is OK to have 3 numeric variables and 2 string variables.

However, you really want each variable to maintain the same data type for all observations. To make things simple, let's say that you have datasets voters.csv and voters.xlsx and that it only has one variable, did_vote. In voters.csv, did_vote can be either a 0 if they did not vote, or a 1 if they did vote. In voters.xlsx, did_vote can be either a "No" if they did not vote, or a "Yes" if they did vote. Then you could do this:

// Load binary vector
vote_num = loadd("voters.csv", "did_vote");

// Load string vector
vote_str = xlsReadSA("voters.xlsx", "A2");

// Convert string variable to binary variable
from = "No" $| "Yes";
to = { 0, 1 };
vote_str = reclassify(vote_str, from, to);

// Combine both vectors (which are now numeric) into a single column vector
vote_num = vote_num | vote_str;

// Create a new Excel dataset with all observations
call saved(vote_num, "new_voters.xlsx", "did_vote");

If you want an example files to make the above example work, you can run this code:

// Create a vector of random zeros and ones
x1 = rndBernoulli(9, 1, 0.5);

saved(x1, "voters.csv", "did_vote");

// Create a vector of random 1's and 2's
idx = rndi(6, 1, 1|2);

// Create a vector "Yes" and "No"
txt = "Yes" $| "No";
x2 = txt[idx];

xlsWrite("did_vote", "voters.xlsx", "A1");
xlsWrite(x2, "voters.xlsx", "A2");


You must login to post answers.

Have a Specific Question?

Get a real answer from a real person

Need Support?

Get help from our friendly experts.