HOW TO DROP TOP 5% OBSERVATIONS IN YOUR DATA

Hello,

I am a new user to Gauss, could you please tell me how can i drop the top 5% observations of my data set, I did not find any code related to this. But I think I can line up all the observations and delete the 5%* the number of  observations' rows in the end.

Can anyone help me with this?

Thank you very much

 

 

1 Answer



0



I am not certain that I am understanding your question correctly. Below is something that may help you. If it does not answer your question, let us know and we will be happy to provide more help.

Deleting the first 5% of your observations
Let us suppose that we have one variable with 100 observations. For our example, we will create a random normal vector to represent this variable. We can remove the first 5% of observations by using an indexing operation to select the final 95% like this:

//create example variable
x_1 = rndn(100, 1);

//assign 'x_1' to equal the last 95% of observations
x_1 = x_1[6:100];

Since we will not always have 100 observations and would like code that works even if we get more data, we should make the code more abstract. In this next code snippet, we will use the rows function to calculate the length of our vector and the ceil function to round up in case 5% of our total number of rows is not a whole number.

//create example variable
x_1 = rndn(100, 1);

//calculate index of first row we want to keep
start_idx = ceil(rows(x_1) * 0.05);

//assign 'x_1' to equal the last 95% of observations
x_1 = x_1[start_idx:rows(x)];

aptech

1,773

Your Answer

1 Answer

0

I am not certain that I am understanding your question correctly. Below is something that may help you. If it does not answer your question, let us know and we will be happy to provide more help.

Deleting the first 5% of your observations
Let us suppose that we have one variable with 100 observations. For our example, we will create a random normal vector to represent this variable. We can remove the first 5% of observations by using an indexing operation to select the final 95% like this:

//create example variable
x_1 = rndn(100, 1);

//assign 'x_1' to equal the last 95% of observations
x_1 = x_1[6:100];

Since we will not always have 100 observations and would like code that works even if we get more data, we should make the code more abstract. In this next code snippet, we will use the rows function to calculate the length of our vector and the ceil function to round up in case 5% of our total number of rows is not a whole number.

//create example variable
x_1 = rndn(100, 1);

//calculate index of first row we want to keep
start_idx = ceil(rows(x_1) * 0.05);

//assign 'x_1' to equal the last 95% of observations
x_1 = x_1[start_idx:rows(x)];

You must login to post answers.

Have a Specific Question?

Get a real answer from a real person

Need Support?

Get help from our friendly experts.