Hi,
I have a matrix X in Gauss that is 10000 x 3 with numbers, one digit (number) per column but there is no variable name in each column. I want to tabulate them and get shares per category. More precise, I first need to merge them to then tabulate them. For example,
WANT
1 2 3 123
2 1 3 213
1 2 3 123
3 1 2 312
Then, I need to tabulate them,
NUMBER %
123 2 50%
213 1 25%
312 1 25%
How can I do that in Gauss?
3 Answers
1
I think the dstat
add-on module (not the built-in dstat
function) might have this capability. However, if I am understanding what you are trying to do. You can make some code to do it yourself in GAUSS.
I am assuming that you are starting with a matrix like this:
1 2 3 2 1 3 1 2 3 3 1 2
and you want to end up with a little print-out like this:
NUMBER % 123 2 50% 213 1 25% 312 1 25%
The first thing we will need to do is to convert each row of integers into a 3 digit integer:
// Starting data
X = { 1 2 3,
2 1 3,
1 2 3,
3 1 2 };
// Multiplier for each column
mlt = { 100 10 1 };
// Multiply each column
// so we can add them and get
// a 3 digit number
X_mlt = X .* mlt;
// Sum the rows
X_id = sumr(X_mlt);
print X_id;
The above code will print the following results:
123 213 123 312
Now that we have the integer id's for the rows, we need to:
- Figure out what our unique id's are.
- Compute what percentage corresponds to each unique id.
// Find the unique id's
unique_ids = unique(X_id);
// Create a matrix with the same
// number of rows as X_id and the
// same number of columns as unique_ids
// This will have a 1 for a match or a 0
// for no match
matches = X_id .== unique_ids';
// Sum down the columns to count the
// total found for each id
id_tot = sumc(matches);
// Percentage for each id
id_pct = id_tot ./ rows(X_id);
/*
** Make an output table
*/
// $~ does horizontal string concatenation
head = "ID" $~ "Count" $~ "%";
// the ~ operator performs horizontal concatenation
// ntos turns numbers into strings
body = ntos(unique_ids ~ id_tot ~ id_pct);
// $| performs vertical string concatenation
tbl = head $| body;
print tbl;
This code will return
ID Count % 123 2 0.5 213 1 0.25 312 1 0.25
We could make this code much more compact, but it is probably useful for you to see each step individually. I would encourage you to run the code and look at each intermediate variable to make sure you understand what is happening.
0
Thank you so much for your prompt response. I think the proposed solutions works great, but I wondered if you could please add to that the case of having zeros in an X matrix:
0 1 2
2 0 1
2 0 1
1 2 0
Your solution above without zeros in X does not work (and it is my fault) since the resulting X_id will not contain the zero in the beginning of each row, but a 2-digit number.
Thanks again!
0
One way you could do it is to add 1000 to the front of each id and then after it is converted to a string array, remove the leading 1.
// Add 1e(cols(X)) to each id before converting to a string
body = ntos((unique_ids + 10^cols(X))~ id_tot ~ id_pct);
// Remove the first character of the 'id'
body[.,1] = strtruncl(body[.,1], 1);
Your Answer
3 Answers
I think the dstat
add-on module (not the built-in dstat
function) might have this capability. However, if I am understanding what you are trying to do. You can make some code to do it yourself in GAUSS.
I am assuming that you are starting with a matrix like this:
1 2 3 2 1 3 1 2 3 3 1 2
and you want to end up with a little print-out like this:
NUMBER % 123 2 50% 213 1 25% 312 1 25%
The first thing we will need to do is to convert each row of integers into a 3 digit integer:
// Starting data
X = { 1 2 3,
2 1 3,
1 2 3,
3 1 2 };
// Multiplier for each column
mlt = { 100 10 1 };
// Multiply each column
// so we can add them and get
// a 3 digit number
X_mlt = X .* mlt;
// Sum the rows
X_id = sumr(X_mlt);
print X_id;
The above code will print the following results:
123 213 123 312
Now that we have the integer id's for the rows, we need to:
- Figure out what our unique id's are.
- Compute what percentage corresponds to each unique id.
// Find the unique id's
unique_ids = unique(X_id);
// Create a matrix with the same
// number of rows as X_id and the
// same number of columns as unique_ids
// This will have a 1 for a match or a 0
// for no match
matches = X_id .== unique_ids';
// Sum down the columns to count the
// total found for each id
id_tot = sumc(matches);
// Percentage for each id
id_pct = id_tot ./ rows(X_id);
/*
** Make an output table
*/
// $~ does horizontal string concatenation
head = "ID" $~ "Count" $~ "%";
// the ~ operator performs horizontal concatenation
// ntos turns numbers into strings
body = ntos(unique_ids ~ id_tot ~ id_pct);
// $| performs vertical string concatenation
tbl = head $| body;
print tbl;
This code will return
ID Count % 123 2 0.5 213 1 0.25 312 1 0.25
We could make this code much more compact, but it is probably useful for you to see each step individually. I would encourage you to run the code and look at each intermediate variable to make sure you understand what is happening.
Thank you so much for your prompt response. I think the proposed solutions works great, but I wondered if you could please add to that the case of having zeros in an X matrix:
0 1 2
2 0 1
2 0 1
1 2 0
Your solution above without zeros in X does not work (and it is my fault) since the resulting X_id will not contain the zero in the beginning of each row, but a 2-digit number.
Thanks again!
One way you could do it is to add 1000 to the front of each id and then after it is converted to a string array, remove the leading 1.
// Add 1e(cols(X)) to each id before converting to a string
body = ntos((unique_ids + 10^cols(X))~ id_tot ~ id_pct);
// Remove the first character of the 'id'
body[.,1] = strtruncl(body[.,1], 1);