I have a CSV file. How do I create a DAT (GAUSS) file (not in the GAUSS memory) based on the CSV file?
6 Answers
0
y = saved() seems to work, but it seems to save it in the GAUSS memory, which I do not know how to open it. Any other alternative ways of creating a DAT file from a CSV file?
0
There are three steps:
- Loading the .CSV file
- Saving the data in the desired format (i.e. .DAT)
- Loading the data again.
Loading the CSV file
Assuming that we have 10 rows and 2 columns of data in a file named sample.csv, you can load it like this:
//load the data load x[] = sample.csv; //reshape the data into a 10x2 matrix x = reshape(x, 10, 2);
Saving the data
Now that the data is loaded into memory in a variable x, you can save the data into a dataset or matrix file for use later with GAUSS.
Example 1: Create a dataset file mydata.dat
string vnames = { "AGE", "HEIGHT" }; y = saved(x, "mydata", vnames);
After executing the line above, you will have the matrix x in GAUSS's memory and you will have a GAUSS dataset on disk named mydata.dat. You can load this dataset into another variable with the loadd command (or pass it into one of the GAUSS functions that takes a dataset as an input).
newx = loadd("mydata");
After this command newx and x should contain the same values. Note that there are more complicated functions for reading in specified rows or iterating over the dataset to read a bit at a time. For that take a look at readr.
Example 2: Create a GAUSS matrix file x.fmt
save x;
This will save the data in x into a file on disk named x.fmt. It is very fast and simple to use, but does not allow you to associate variable names, nor does it allow you to later read it a few rows at a time.
To later load this matrix file, you can do this:
load x;
This will create a new variable (or overwrite an existing one) with the contents of the file x.fmt
0
This is great feedback, but how do you get the file named mydata.dat, which you say is in my disk?what I need is to create a file in my C: directory with the extension .dat which I can take with me. Also, the file named newx only has the numerical values and not the variable names. How do I get the file newx with extension .dat with the variable names? Thank you very much
0
Following the example above:
string vnames = { "AGE", "HEIGHT" }; y = saved(x, "mydata", vnames);
You will end up with a file mydata.dat in your current GAUSS working directory. You know it will be called mydata.dat, because the second input to saved specifies the name of the file to create. saved will always add a .DAT file extension.
If you do not know your GAUSS current working directory, you can find it in the main toolbar at the top of the application, or enter: cdir(0); at the command line.
If you need a .DAT file with the variable names, ignore the second example above. That is another method of saving and loading just numerical data that can be more convenient in some circumstances. But, just for the sake of clarity, newx is a GAUSS variable in memory in that example. It is not a file.
Let me know if something is still unclear.
0
Could you please provide some directions to create a .DAT GAUSS file when having a .CSV file with the variable names and a .CSV file without the variable names? That would complete the whole picture about the question of going from .CSV to .DAT. Thanks.
0
Here is an example that will load a CSV file and create a .DAT file on your disk. I have attached the needed CSV file needed to run this example.
Example 1: CSV file WITHOUT variable names
Download population_a.csv
//load data from CSV file load tmp[] = population_a1.csv; //reshape data into 10x2 matrix tmp = reshape(tmp, 10, 2); //These are the variable names //to add to the dataset file //'AGE' is first since it corresponds //to the first column of 'tmp' string var_names = { "AGE", "HEIGHT" }; dataset_name = "age_and_height"; //Create dataset named age_and_height.dat //with 2 variables 'AGE' and 'HEIGHT' y = saved(tmp, dataset_name, var_names); print "Dataset: " dataset_name$+".dat" " created in: " cdir(0); //Check current directory for all files with a .dat extension dat_files = filesa("*.dat"); print ""; print ".dat file(s) in the current directory:"; print dat_files;
Example 2: CSV file WITH variable names
download population_a_vars.csv
//load data from CSV file load tmp[] = population_a_vars.csv; //reshape data into 11x2 matrix //first row is variable names tmp = reshape(tmp, 11, 2); //Extract variable names //from first row of 'tmp' var_names_2 = tmp[1,.]; dataset_name = "age_and_height_varnames"; //strip off variable names from data tmp = tmp[2:rows(tmp), .]; //Create dataset named age_and_height_varnames.dat //with variable names from CSV file y = saved(tmp, dataset_name, var_names_2);
Your Answer
6 Answers
y = saved() seems to work, but it seems to save it in the GAUSS memory, which I do not know how to open it. Any other alternative ways of creating a DAT file from a CSV file?
There are three steps:
- Loading the .CSV file
- Saving the data in the desired format (i.e. .DAT)
- Loading the data again.
Loading the CSV file
Assuming that we have 10 rows and 2 columns of data in a file named sample.csv, you can load it like this:
//load the data load x[] = sample.csv; //reshape the data into a 10x2 matrix x = reshape(x, 10, 2);
Saving the data
Now that the data is loaded into memory in a variable x, you can save the data into a dataset or matrix file for use later with GAUSS.
Example 1: Create a dataset file mydata.dat
string vnames = { "AGE", "HEIGHT" }; y = saved(x, "mydata", vnames);
After executing the line above, you will have the matrix x in GAUSS's memory and you will have a GAUSS dataset on disk named mydata.dat. You can load this dataset into another variable with the loadd command (or pass it into one of the GAUSS functions that takes a dataset as an input).
newx = loadd("mydata");
After this command newx and x should contain the same values. Note that there are more complicated functions for reading in specified rows or iterating over the dataset to read a bit at a time. For that take a look at readr.
Example 2: Create a GAUSS matrix file x.fmt
save x;
This will save the data in x into a file on disk named x.fmt. It is very fast and simple to use, but does not allow you to associate variable names, nor does it allow you to later read it a few rows at a time.
To later load this matrix file, you can do this:
load x;
This will create a new variable (or overwrite an existing one) with the contents of the file x.fmt
This is great feedback, but how do you get the file named mydata.dat, which you say is in my disk?what I need is to create a file in my C: directory with the extension .dat which I can take with me. Also, the file named newx only has the numerical values and not the variable names. How do I get the file newx with extension .dat with the variable names? Thank you very much
Following the example above:
string vnames = { "AGE", "HEIGHT" }; y = saved(x, "mydata", vnames);
You will end up with a file mydata.dat in your current GAUSS working directory. You know it will be called mydata.dat, because the second input to saved specifies the name of the file to create. saved will always add a .DAT file extension.
If you do not know your GAUSS current working directory, you can find it in the main toolbar at the top of the application, or enter: cdir(0); at the command line.
If you need a .DAT file with the variable names, ignore the second example above. That is another method of saving and loading just numerical data that can be more convenient in some circumstances. But, just for the sake of clarity, newx is a GAUSS variable in memory in that example. It is not a file.
Let me know if something is still unclear.
Could you please provide some directions to create a .DAT GAUSS file when having a .CSV file with the variable names and a .CSV file without the variable names? That would complete the whole picture about the question of going from .CSV to .DAT. Thanks.
Here is an example that will load a CSV file and create a .DAT file on your disk. I have attached the needed CSV file needed to run this example.
Example 1: CSV file WITHOUT variable names
Download population_a.csv
//load data from CSV file load tmp[] = population_a1.csv; //reshape data into 10x2 matrix tmp = reshape(tmp, 10, 2); //These are the variable names //to add to the dataset file //'AGE' is first since it corresponds //to the first column of 'tmp' string var_names = { "AGE", "HEIGHT" }; dataset_name = "age_and_height"; //Create dataset named age_and_height.dat //with 2 variables 'AGE' and 'HEIGHT' y = saved(tmp, dataset_name, var_names); print "Dataset: " dataset_name$+".dat" " created in: " cdir(0); //Check current directory for all files with a .dat extension dat_files = filesa("*.dat"); print ""; print ".dat file(s) in the current directory:"; print dat_files;
Example 2: CSV file WITH variable names
download population_a_vars.csv
//load data from CSV file load tmp[] = population_a_vars.csv; //reshape data into 11x2 matrix //first row is variable names tmp = reshape(tmp, 11, 2); //Extract variable names //from first row of 'tmp' var_names_2 = tmp[1,.]; dataset_name = "age_and_height_varnames"; //strip off variable names from data tmp = tmp[2:rows(tmp), .]; //Create dataset named age_and_height_varnames.dat //with variable names from CSV file y = saved(tmp, dataset_name, var_names_2);