Hi,
How do you append two datasets and what if the names of the variables in two datasets are the same, but their data types differ? For example, in one dataset the data type for a variable is numeric, but in the other dataset it is string?
Also, how do you append multiple say 10 csv or Excel files in Gauss?
Thanks!
1 Answer
0
Combine dataset variables of different types
It is fine to have variables of different types in a dataset. For example, it is OK to have 3 numeric variables and 2 string variables.
However, you really want each variable to maintain the same data type for all observations. To make things simple, let's say that you have datasets voters.csv
and voters.xlsx
and that it only has one variable, did_vote
. In voters.csv
, did_vote
can be either a 0 if they did not vote, or a 1 if they did vote. In voters.xlsx
, did_vote
can be either a "No"
if they did not vote, or a "Yes"
if they did vote. Then you could do this:
// Load binary vector
vote_num = loadd("voters.csv", "did_vote");
// Load string vector
vote_str = xlsReadSA("voters.xlsx", "A2");
// Convert string variable to binary variable
from = "No" $| "Yes";
to = { 0, 1 };
vote_str = reclassify(vote_str, from, to);
// Combine both vectors (which are now numeric) into a single column vector
vote_num = vote_num | vote_str;
// Create a new Excel dataset with all observations
call saved(vote_num, "new_voters.xlsx", "did_vote");
If you want an example files to make the above example work, you can run this code:
// Create a vector of random zeros and ones
x1 = rndBernoulli(9, 1, 0.5);
saved(x1, "voters.csv", "did_vote");
// Create a vector of random 1's and 2's
idx = rndi(6, 1, 1|2);
// Create a vector "Yes" and "No"
txt = "Yes" $| "No";
x2 = txt[idx];
xlsWrite("did_vote", "voters.xlsx", "A1");
xlsWrite(x2, "voters.xlsx", "A2");
Your Answer
1 Answer
Combine dataset variables of different types
It is fine to have variables of different types in a dataset. For example, it is OK to have 3 numeric variables and 2 string variables.
However, you really want each variable to maintain the same data type for all observations. To make things simple, let's say that you have datasets voters.csv
and voters.xlsx
and that it only has one variable, did_vote
. In voters.csv
, did_vote
can be either a 0 if they did not vote, or a 1 if they did vote. In voters.xlsx
, did_vote
can be either a "No"
if they did not vote, or a "Yes"
if they did vote. Then you could do this:
// Load binary vector
vote_num = loadd("voters.csv", "did_vote");
// Load string vector
vote_str = xlsReadSA("voters.xlsx", "A2");
// Convert string variable to binary variable
from = "No" $| "Yes";
to = { 0, 1 };
vote_str = reclassify(vote_str, from, to);
// Combine both vectors (which are now numeric) into a single column vector
vote_num = vote_num | vote_str;
// Create a new Excel dataset with all observations
call saved(vote_num, "new_voters.xlsx", "did_vote");
If you want an example files to make the above example work, you can run this code:
// Create a vector of random zeros and ones
x1 = rndBernoulli(9, 1, 0.5);
saved(x1, "voters.csv", "did_vote");
// Create a vector of random 1's and 2's
idx = rndi(6, 1, 1|2);
// Create a vector "Yes" and "No"
txt = "Yes" $| "No";
x2 = txt[idx];
xlsWrite("did_vote", "voters.xlsx", "A1");
xlsWrite(x2, "voters.xlsx", "A2");