I am trying to load the data and try to see descriptive statistics to see if the data is imported correctly but I am getting some very funny results. I don't know why? I am very new to this software and the tutorials are very unhelpful. I wish they had video tutorial. Here is what I am trying, please help
file_name = "aggregatedatasolarPV.xls";
cell_range = "A1:P851" ;
sheet_num = 1;
aggregatedatasolarPV = spreadSheetReadM(file_name, cell_range, sheet_num);
call dstatmt (file_name);
--------------------------------------------------------------------------------------------- Variable Mean Std Dev Variance Minimum Maximum Valid Missing --------------------------------------------------------------------------------------------- year 2002.0000 7.2153 52.0612 1990.0000 2014.0000 850 0 country 1073254573666765297747946816697568916641665775266292004766827021037095358687220923558683385540725295037702417197841334894769908662704784141345305197249818354111865306357970851674308149453898766191693439518537416704.0000 ----- ----- 0.000024772496087516789965860831789438539772200266282491155842297326103555453912180731313459040958378647996844792989120359904725547622506791512821695542880668093348562245609954356251039500273532449672748474908805068488704.0000 850 0 patents 57.8247 216.5785 46906.2478 0.0000 2132.1700 850 0 patentstotal 12661.1066 28916.0979836140718.1148 4.0800 184283.2200 850 0 oilprice 46.7416 30.5692 934.4773 14.4200 99.6700 850 0 price_si 1620.0000 629.4777 396242.1673 1040.0000 3330.0000 850 0 EPS ----- ----- ----- 0.2083 4.1333 850 0 marketEPS ----- ----- ----- 0.0000 3.9833 850 0 nonmarketEPS ----- ----- ----- 0.0000 5.5000 850 0 RERD ----- ----- ----- 0.0000 6.0000 850 0 standards ----- ----- ----- 0.0000 6.0000 850 0 sknowledge 131.9337 500.0775 250077.5343 0.0000 4460.5834 850 0 rGDPpercap ----- ----- ----- 530.8947 91617.2792 850 0 GDP ----- ----- -----29973079669.440216208861247400.0020 850 0 GDPgrowth ----- ----- ----- -14.5311 14.2314 850 0 energyprice 63.8914 35.1641 1236.5122 23.7814 125.5648 850 0 ----- ----- ----- ----- ----- 850 0 ----- ----- ----- ----- ----- 850 0 ----- ----- ----- ----- ----- 850 0 ----- ----- ----- ----- ----- 850 0 ----- ----- ----- ----- ----- 850 0 ----- ----- ----- ----- ----- 850 0 ----- ----- ----- ----- ----- 850 0 ----- ----- ----- ----- ----- 850 0 ----- ----- ----- ----- ----- 850 0 ----- ----- ----- ----- ----- 850 0 ----- ----- ----- ----- ----- 850 0 ----- ----- ----- ----- ----- 850 0 ----- ----- ----- ----- ----- 850 0 ----- ----- ----- ----- ----- 850 0 ----- ----- ----- ----- ----- 850 0 ----- ----- ----- ----- ----- 850 0 ----- ----- ----- ----- ----- 850 0 ----- ----- ----- ----- ----- 850 0 ----- ----- ----- ----- ----- 850 0
1 Answer
0
If you are loading all the data from the Excel file, it is simpler to use the loadd
command. Like this:
file_name = "aggregatedatasolarPV.xls";
// Load all variables
aggregatedatasolarPV = loadd(file_name);
Now that you have loaded them, you can print the results to the screen to inspect them with the print command, like this:
print aggregatedatasolarPV;
The reason you are seeing so many of the dstatmt
results as ---
is that your columns have missing values. You can check to see if the data has missing values with the ismiss
function. The simplest way to manage them is to remove the rows with missing values, using the function packr
. Then you can pass this matrix to dstatmt
like this:
// Print 1 if there are missing values
print ismiss(aggregatedatasolarPV);
// Remove all rows with missing values
aggregatedatasolarPV_nm = packr(aggregatedatasolarPV);
// Get descriptive statistics on the matrix
call dstatmt("", aggregatedatasolarPV_nm);
Also, FYI, I am not sure if you are aware, but your code is loading the data into a matrix, but then your call to dstatmt
reloads the data from the file before computing descriptive statistics on the data.
// Load data into GAUSS matrix
aggregatedatasolarPV = spreadSheetReadM(file_name, cell_range, sheet_num);
// Ignore data just loaded into matrix, the reload
// the data and compute descriptive stats
call dstatmt (file_name);
``
Your Answer
1 Answer
If you are loading all the data from the Excel file, it is simpler to use the loadd
command. Like this:
file_name = "aggregatedatasolarPV.xls";
// Load all variables
aggregatedatasolarPV = loadd(file_name);
Now that you have loaded them, you can print the results to the screen to inspect them with the print command, like this:
print aggregatedatasolarPV;
The reason you are seeing so many of the dstatmt
results as ---
is that your columns have missing values. You can check to see if the data has missing values with the ismiss
function. The simplest way to manage them is to remove the rows with missing values, using the function packr
. Then you can pass this matrix to dstatmt
like this:
// Print 1 if there are missing values
print ismiss(aggregatedatasolarPV);
// Remove all rows with missing values
aggregatedatasolarPV_nm = packr(aggregatedatasolarPV);
// Get descriptive statistics on the matrix
call dstatmt("", aggregatedatasolarPV_nm);
Also, FYI, I am not sure if you are aware, but your code is loading the data into a matrix, but then your call to dstatmt
reloads the data from the file before computing descriptive statistics on the data.
// Load data into GAUSS matrix
aggregatedatasolarPV = spreadSheetReadM(file_name, cell_range, sheet_num);
// Ignore data just loaded into matrix, the reload
// the data and compute descriptive stats
call dstatmt (file_name);
``