Hi, Aptech
Refers to my last post /questions/application-of-mergeby-and-generate-varibales-functions/ , now I have a new question in terms of it.
I am sorry that I open a new question as it seems that noboby noticed that I had added something new to my old question.
Now I want to know, if I have datasets like:
countries = { A 300 120, B 280 110, C 675 134 }; individuals = { 1 A 33 1, 2 B 25 0, 3 B 36 0, 4 D 66 1 };
How can I get ?
id country Age sex GDP Inflation 4 D 66 1 na na 3 C na na 675 134
This means I want to generate a new matrix only including unmatched observations.
Thank you very much!!!
3 Answers
0
Hello,
I am confused on one point. In your desired output, why does the second observation get an id of 3? From the two input matrices/datasets, I don't see how we get an id number for that second observation.
If you straighten me out on this point, we should be able to help out!
0
Hi, Aptech
I am sorry that there was a mistake.
The matrix should be
id country Age sex GDP Inflation 4 D 66 1 na na na C na na 675 134
As there is no ID assigned to country C.
Thank you very much!
0
This should do what you want.
new;
countries = { A 300 120,
B 280 110,
C 675 134 };
individuals = { 1 A 33 1,
2 B 25 0,
3 B 36 0,
4 D 66 1 };
num_vars = 6;
country_column = 1;
individual_column = 2;
merged_data = mergeUnique(countries, country_column, individuals, individual_column, num_vars);
//** country_column --- column to compare with 'individuals'
//** individual_column --- column to compare with 'countries'
//** num_vars --- total number of variables between both matrices
//** WARNING: This procedure assumes that the order of the variables:
//******* in 'countries' is: country, GDP, inflation
//******* in 'individuals' is: id, country, age, sex
proc (1) = mergeUnique(countries, country_column, individuals, individual_column, num_vars);
local mask, country_unique, individuals_unique, country_out, individuals_out, final_out;
mask = indcv(countries[.,country_column], individuals[.,individual_column]);
country_unique = selif(countries, (mask .== error(0)));
mask = indcv(individuals[.,individual_column], countries[.,country_column]);
individuals_unique = selif(individuals, (mask .== error(0)));
country_out = reshape(error(0), rows(country_unique), num_vars);
country_out[.,2] = country_unique[.,1];
country_out[.,5:6] = country_unique[.,2:3];
individuals_out = individuals_unique~reshape(error(0), rows(individuals_unique), num_vars-cols(individuals_unique));
final_out = individuals_out | country_out;
retp(final_out);
endp;
Your Answer
3 Answers
Hello,
I am confused on one point. In your desired output, why does the second observation get an id of 3? From the two input matrices/datasets, I don't see how we get an id number for that second observation.
If you straighten me out on this point, we should be able to help out!
Hi, Aptech
I am sorry that there was a mistake.
The matrix should be
id country Age sex GDP Inflation 4 D 66 1 na na na C na na 675 134
As there is no ID assigned to country C.
Thank you very much!
This should do what you want.
new;
countries = { A 300 120,
B 280 110,
C 675 134 };
individuals = { 1 A 33 1,
2 B 25 0,
3 B 36 0,
4 D 66 1 };
num_vars = 6;
country_column = 1;
individual_column = 2;
merged_data = mergeUnique(countries, country_column, individuals, individual_column, num_vars);
//** country_column --- column to compare with 'individuals'
//** individual_column --- column to compare with 'countries'
//** num_vars --- total number of variables between both matrices
//** WARNING: This procedure assumes that the order of the variables:
//******* in 'countries' is: country, GDP, inflation
//******* in 'individuals' is: id, country, age, sex
proc (1) = mergeUnique(countries, country_column, individuals, individual_column, num_vars);
local mask, country_unique, individuals_unique, country_out, individuals_out, final_out;
mask = indcv(countries[.,country_column], individuals[.,individual_column]);
country_unique = selif(countries, (mask .== error(0)));
mask = indcv(individuals[.,individual_column], countries[.,country_column]);
individuals_unique = selif(individuals, (mask .== error(0)));
country_out = reshape(error(0), rows(country_unique), num_vars);
country_out[.,2] = country_unique[.,1];
country_out[.,5:6] = country_unique[.,2:3];
individuals_out = individuals_unique~reshape(error(0), rows(individuals_unique), num_vars-cols(individuals_unique));
final_out = individuals_out | country_out;
retp(final_out);
endp;