I have an unbalanced panel setup. I have a variable of interest that I would like to sum up across groups for all individuals. Basically, something similar to Matlab's grpstats command. I would rather not create a huge list of dummy variables given the size of my panel.
data example:
person, group, x
1,1,1
1,2,3
2,1,1
Desired output:
person, group, x, group sum
1,1,1, 4
1,2,3,4
2,1,1,1
1 Answer
0
accepted
Below is a procedure that, I think, will do what you want. It would be simple to extend it to also allow computation of statistics other than just the sum. Also, with GAUSS 17's new ability to load data from CSV and Excel files by variable name, you could also make it operate on a file and specify the 'id' and 'sum' columns by name.
//Starting panel
data = { 1 1 1,
1 2 3,
2 1 1 };
//Which column should be used to define the selected group
id_col = 1;
//Which column should be summed?
col_sum = 3;
//Create new panel matrix with additional column
//containing, sum of column 3 by group
data_new = grpSum(data, id_col, col_sum);
proc (1) = grpSum(data, id_col, col_sum);
local out, groups, group_idx, group_var, group_sum;
//Add an extra column to hold the sums
out = data ~ zeros(rows(data), 1);
//List of all groups
groups = unique(data[.,id_col]);
//Loop over each group
for i(1, rows(groups), 1);
//Find indices of the i'th group
group_idx = indexcat(data[.,id_col] .== groups[i], 1);
//Extract data to sum for i'th group
group_var = data[group_idx,col_sum];
//Reshape to correct size for write
group_sum = reshape(sumc(group_var), rows(group_idx), 1);
//Write sums in appropriate location
out[group_idx,cols(out)] = group_sum;
endfor;
retp(out);
endp;
Your Answer
1 Answer
Below is a procedure that, I think, will do what you want. It would be simple to extend it to also allow computation of statistics other than just the sum. Also, with GAUSS 17's new ability to load data from CSV and Excel files by variable name, you could also make it operate on a file and specify the 'id' and 'sum' columns by name.
//Starting panel
data = { 1 1 1,
1 2 3,
2 1 1 };
//Which column should be used to define the selected group
id_col = 1;
//Which column should be summed?
col_sum = 3;
//Create new panel matrix with additional column
//containing, sum of column 3 by group
data_new = grpSum(data, id_col, col_sum);
proc (1) = grpSum(data, id_col, col_sum);
local out, groups, group_idx, group_var, group_sum;
//Add an extra column to hold the sums
out = data ~ zeros(rows(data), 1);
//List of all groups
groups = unique(data[.,id_col]);
//Loop over each group
for i(1, rows(groups), 1);
//Find indices of the i'th group
group_idx = indexcat(data[.,id_col] .== groups[i], 1);
//Extract data to sum for i'th group
group_var = data[group_idx,col_sum];
//Reshape to correct size for write
group_sum = reshape(sumc(group_var), rows(group_idx), 1);
//Write sums in appropriate location
out[group_idx,cols(out)] = group_sum;
endfor;
retp(out);
endp;