Hi,
I am working on the panel data now, therefore, sometimes I need to merge two datasets together.
I have no problem with imporing data from excel to GAUSS by using the code:
Household_a120c=xlsReadM("Household_a120c.xls","A2:G624",1,"")
As we know that, by doing so I will lost table look of this data, it has been transformed into a matrix. It is a little bit annoying with respect to panel data. Therefore, I tried to assign variable names for the matrix like this
let xnames = ahhrhid ahifdip ahifdin ahh0_4 ahh5_9 ahh10_14 ahhadult;
let vnames = ahifdip ahifdin ahh0_4 ahh5_9 ahh10_14 ahhadult;
makevars(household_a120c, vnames, xnames);
let xnames_eperson_a120 = xwaveid ahhrhid;
let vnames_eperson_a120 = xwaveid ahhrhid;
makevars(eperson_a120, vnames_eperson_a120, xnames_eperson_a120);
However, the command above just generate vectors for me, but after that, I think I may use the mergeby function as I have already created variable names, then I did this
mergeby(household_a120c,eperson_a120,merge,ahhrhid); // ahhrhid is the ID I want to use to combine these two data sets.
Honestly, I do not think I have applied mergeby function correctly, so could you please tell how I should use it properly. Except from Mergeby function, is there any other method I can use to merge two data sets.
Thank you very much!
6 Answers
0
Please forgive me, I am having trouble understanding exactly what you want to do.
Do you want the merged data to be a file on disk? If so, do you want a GAUSS dataset (.dat) file, or do you want it to be an Excel (.xls, or .xlsx) file?
0
Hi,
I Want the merged data to be an excel file.
Thank you!
0
Hi, Maybe I need to clarify my questiuon again.
For instance, if I have
A=[ 3 3 2; 2 4 5; 8 9 11]// this is a 3*3 matrix, suppose the first column is the ID
B=[2 5 7; 8 10 12; 8 9 11;3 4 6;5 7 9]// this is a 5*3 matrix, the first column is the ID AS WELL.
Basically , what I want to do is to combine matrix A and matrix B according to their IDs in
the first column, I tried to use MERGEBY function, obviously it did not work.
Can you help me out with this?
Thank you very much!
0
Hi Aptech;
I am sorry that I post so many useless words here, I have tried serveral methods to merge two dataset, unfortunately, none of them work me at all.
Here's my question:
suppose my original datasets in the form of excel files(. xls)
My first data set(called countries) is:
Country GDP inflation
A 300 120
B 280 110
C 675 134
My second data set( called individuals) is:
id country Age sex
1 A 33 1
2 B 25 0
3 B 36 0
4 D 66 1
I'd like to achieve the combined data set that look like this
id country Age sex GDP Inflation
1 A 33 1 300 120
2 B 25 0 280 110
3 B 36 0 280 110
Therefore the unmatched observations will be eliminated.
I tried to use mergeby function as follows:
As I am using the .xls file, I first write them into .gda files like
For data set 1,
ret = gdaCreate("countries.gda",1);
ret = gdaWrite("countries.gda",countries,"country GDP inflation");
For data ste 2:
ret = gdaCreate("individuals.gda",1);
ret = gdaWrite("individuals.gda",individuals,"id age sex");
Then:
ret = gdaCreate("merge_1.gda",1);// create a null file
mergeby(individuals,countries,"","");
But, I have got error message
G0085 : Invalid file type C:\gauss14\eperson_a120.gda 'C:\gauss14\eperson_a120.gda' [sortd.src, line 431]
I msut make some mistakes here, but I did figure out what i did wrong,
Thank you!
0
Ok, I understand now. The problems you are running into are because you are using functions made for different file types. I think you will be best off to load the entire dataset in as a GAUSS matrix to do the merging and then write it to the updated data to the Excel file. Here is a procedure that will merge the matrices:
countries = { A 300 120,
B 280 110,
C 675 134 };
individuals = { 1 A 33 1,
2 B 25 0,
3 B 36 0,
4 D 66 1 };
data_full = mergeMatrices(individuals, 2, countries);
print_mask = { 1 0 1 1 1 1 };
ret = printfmt(data_full, print_mask);
//This procedure assumes:
// 1) individuals has as many rows (or more) than countries
// 2) You want to sort based upon the first column
// of 'countries' and the
// 'sort_col' column of 'individuals'
proc (1) = mergeMatrices(individuals, sort_col, countries);
local idx, individuals_new;
//Mask, permutation matrix
//that maps 'individuals' to 'countries'
idx = (individuals[.,sort_col] .== countries[.,1]');
//strip unmatched entries
individuals = selif(individuals, sumr(idx));
idx = selif(idx, sumr(idx));
//use idx as permutation matrix
//to create new columns
individuals_new = idx*countries[.,2:cols(countries)];
//Horizontaly concatenate
//new variables onto 'individuals'
individuals = individuals~individuals_new;
retp(individuals);
endp;
First make sure you can use this to arrange the data in the manner that you expect. Post any questions that come up. If you have questions about how to save this new data to an Excel file (since it is a mix of character and numeric data), please post that as a new question and we will happily assist you.
0
Hi Aptech
Thank you very much! the codes provided by you is just fatanstic!
I have a new question related to this topic now, what if I want to do something like this
countries = { A 300 120, B 280 110, C 675 134 }; individuals = { 1 A 33 1, 2 B 25 0, 3 B 36 0, 4 D 66 1 };
now I only want to create the matrix including different observations, like
id country Age sex GDP Inflation 4 D 66 1 na na 3 C na na 675 134
what should I do to get the above?
Thank you very much !
Your Answer
6 Answers
Please forgive me, I am having trouble understanding exactly what you want to do.
Do you want the merged data to be a file on disk? If so, do you want a GAUSS dataset (.dat) file, or do you want it to be an Excel (.xls, or .xlsx) file?
Hi,
I Want the merged data to be an excel file.
Thank you!
Hi, Maybe I need to clarify my questiuon again.
For instance, if I have
A=[ 3 3 2; 2 4 5; 8 9 11]// this is a 3*3 matrix, suppose the first column is the ID
B=[2 5 7; 8 10 12; 8 9 11;3 4 6;5 7 9]// this is a 5*3 matrix, the first column is the ID AS WELL.
Basically , what I want to do is to combine matrix A and matrix B according to their IDs in
the first column, I tried to use MERGEBY function, obviously it did not work.
Can you help me out with this?
Thank you very much!
Hi Aptech;
I am sorry that I post so many useless words here, I have tried serveral methods to merge two dataset, unfortunately, none of them work me at all.
Here's my question:
suppose my original datasets in the form of excel files(. xls)
My first data set(called countries) is:
Country GDP inflation
A 300 120
B 280 110
C 675 134
My second data set( called individuals) is:
id country Age sex
1 A 33 1
2 B 25 0
3 B 36 0
4 D 66 1
I'd like to achieve the combined data set that look like this
id country Age sex GDP Inflation
1 A 33 1 300 120
2 B 25 0 280 110
3 B 36 0 280 110
Therefore the unmatched observations will be eliminated.
I tried to use mergeby function as follows:
As I am using the .xls file, I first write them into .gda files like
For data set 1,
ret = gdaCreate("countries.gda",1);
ret = gdaWrite("countries.gda",countries,"country GDP inflation");
For data ste 2:
ret = gdaCreate("individuals.gda",1);
ret = gdaWrite("individuals.gda",individuals,"id age sex");
Then:
ret = gdaCreate("merge_1.gda",1);// create a null file
mergeby(individuals,countries,"","");
But, I have got error message
G0085 : Invalid file type C:\gauss14\eperson_a120.gda 'C:\gauss14\eperson_a120.gda' [sortd.src, line 431]
I msut make some mistakes here, but I did figure out what i did wrong,
Thank you!
Ok, I understand now. The problems you are running into are because you are using functions made for different file types. I think you will be best off to load the entire dataset in as a GAUSS matrix to do the merging and then write it to the updated data to the Excel file. Here is a procedure that will merge the matrices:
countries = { A 300 120,
B 280 110,
C 675 134 };
individuals = { 1 A 33 1,
2 B 25 0,
3 B 36 0,
4 D 66 1 };
data_full = mergeMatrices(individuals, 2, countries);
print_mask = { 1 0 1 1 1 1 };
ret = printfmt(data_full, print_mask);
//This procedure assumes:
// 1) individuals has as many rows (or more) than countries
// 2) You want to sort based upon the first column
// of 'countries' and the
// 'sort_col' column of 'individuals'
proc (1) = mergeMatrices(individuals, sort_col, countries);
local idx, individuals_new;
//Mask, permutation matrix
//that maps 'individuals' to 'countries'
idx = (individuals[.,sort_col] .== countries[.,1]');
//strip unmatched entries
individuals = selif(individuals, sumr(idx));
idx = selif(idx, sumr(idx));
//use idx as permutation matrix
//to create new columns
individuals_new = idx*countries[.,2:cols(countries)];
//Horizontaly concatenate
//new variables onto 'individuals'
individuals = individuals~individuals_new;
retp(individuals);
endp;
First make sure you can use this to arrange the data in the manner that you expect. Post any questions that come up. If you have questions about how to save this new data to an Excel file (since it is a mix of character and numeric data), please post that as a new question and we will happily assist you.
Hi Aptech
Thank you very much! the codes provided by you is just fatanstic!
I have a new question related to this topic now, what if I want to do something like this
countries = { A 300 120, B 280 110, C 675 134 }; individuals = { 1 A 33 1, 2 B 25 0, 3 B 36 0, 4 D 66 1 };
now I only want to create the matrix including different observations, like
id country Age sex GDP Inflation 4 D 66 1 na na 3 C na na 675 134
what should I do to get the above?
Thank you very much !