Introduction
Panel data offers a unique opportunity to examine both individual-specific and time-specific effects. However, as anyone who has worked with panel data knows, these same features that make panel data so useful can also make exploration and cleaning particularly challenging.
GAUSS 25 was designed with these challenges in mind. It introduces a comprehensive new suite of tools, tailored to make working with panel data in GAUSS easier, faster, and more intuitive.
In today's blog, we’ll demonstrate how these tools can simplify everyday panel data tasks, including:
- Loading your data.
- Preparing your panel dataset.
- Exploring panel data characteristics.
- Visualizing panel data.
- Transforming your data for modeling.
Data
Today we will work use a subset of the publicly available Penn World Table version 10.01, available for download here.
|
|
---|---|
Variable Name | Description |
currency_unit | The currency unit used for GDP measurements. |
countrycode | The three-letter ISO country code. |
country | The name of the country. |
year | The year of observation. |
rgdpe | Real GDP at constant prices (expenditure-side). |
rgdpo | Real GDP at constant prices (output-side). |
pop | Population of the country. |
emp | Number of employed persons. |
irr | Investment rate of return. |
Feenstra, Robert C., Robert Inklaar and Marcel P. Timmer (2015), "The Next Generation of the Penn World Table" American Economic Review, 105(10), 3150-3182, available for download at www.ggdc.net.
Loading Our Panel Data
We'll start by using the loadd procedure to load our data.
// Load data from 'pwt_10.gdat
// Using __FILE_DIR to specify data path
pwt_10 = loadd(__FILE_DIR $+ "pwt_10.gdat");
// Preview data
head(pwt_10);
The head procedure prints the first five observations of the our dataset, helping us check that our data has loaded properly:
currency_unit countrycode country year rgdpe rgdpo pop emp irr Aruban Guilder ABW Aruba 1991-01-01 2804.5005 3177.4575 0.064622000 0.029200001 0.11486563 Aruban Guilder ABW Aruba 1992-01-01 2944.5161 3370.5376 0.068235000 0.030903272 0.11182721 Aruban Guilder ABW Aruba 1993-01-01 3131.3708 3698.5325 0.072504000 0.032911807 0.11131135 Aruban Guilder ABW Aruba 1994-01-01 3537.9534 4172.8242 0.076700000 0.034895979 0.10574290 Aruban Guilder ABW Aruba 1995-01-01 3412.8745 4184.1562 0.080324000 0.036628015 0.10471709
It's important to note that to identify our panel, GAUSS requires a dataframe to have at least one date variable and one categorical or string variable.
We will look more closely at how GAUSS identifies panels in the next section. For now, let's check that our data meets this requirement using the getcoltypes procedure.
// Check column types
getcoltypes(pwt_10);
type category category category date number number number number number
Our data meets the GAUSS requirement for panel data, with three categorical variables and one date variable.
Ready to get started using GAUSS for panel data? Contact us for a GAUSS 25 demo!
Preparing Panel Data
Besides the data type requirements, the GAUSS panel data procedures assume a few important things about the form of your panel data.
In particular, your panel data should:
- Be in stacked long form.
- Have the date and group identification columns occurring before other date and categorical/string variables. (This is not required but it is the most convenient way to work the GAUSS panel data procedures.)
- Be sorted by group then time.
Let’s look more closely at how to use GAUSS to ensure that our data meets these requirements.
Transforming panel data to long form
If your panel data is in wide form, it's easy to convert to long form using the dflonger procedure. This procedure is a very versatile procedure -- it's designed to be intuitive enough to cover basic transformation with little effort but flexible enough to tackle complex cases.
Since, the pwt_10
data is already in long form, so we don't need to transform our data. However, for an in-depth look at dflonger
, including examples, see our previous blog, Transforming Panel Data to Long Form in GAUSS.
Ordering variables
One of the most convenient features of the new panel data procedures is their ability to intelligently detect group and time variables. To ensure this works properly, simply make sure that the date variable and group variable identifying your panel are the first occurring date and categorical/string variables in your dataset.
Let's take a look at our pwt_10
dataframe:
Ctrl+E
hot key opens the variable under cursor in a floating symbol editor window, allowing you to quickly view workplace symbols. Identifying panel data groups
Our dataset contains three categorical variables: currency_unit, countrycode, and country. By default, GAUSS will use the first occurring categorical variable, currency_unit, to identify the groups in the panel, unless we specify otherwise.
Identifying time dimension
Our dataset also includes a date variable, year, which GAUSS will automatically use to identify the time dimension of the panel.
As the dataframe is now, GAUSS will use currency_unit and year to identify our panel. In this dataset, however, the panel should be identified by country and year. To address this, we could use optional arguments to specify that our group variable is country. However, we would need to do this every time we use one of the panel data procedures.
Instead, we can use the order procedure to move the country and year variables to the front of our dataframe.
// Move country
pwt_10 = order(pwt_10, "country"$|"year");
Now, in our reordered pwt_10
dataframe, we see that country and year appear as the first two columns. GAUSS will automatically use these to identify the group and time dimensions, respectively.
A few things to note:
- It is not necessary to move the year variable. Since it is only date variable in the dataframe, GAUSS will use year to identify our time dimension regardless of its position.
- The country variable does not need to be the first column in the dataframe. It only needs to appear before the other categorical variables for GAUSS to automatically recognize it as the group dimension.
Sorting panel data
Beyond the fact that the GAUSS panel data functions expect sorted data, there are many advantages to working with sorted data:
- Sorted data is easier to browse and explore.
- Econometric techniques, such as calculating lags and differences, rely on the data being ordered consistently.
- Proper sorting helps avoid errors, ensures reproducibility, and lays a solid foundation for reliable results.
The new pdsort procedure allows you to quickly sort panel data by the group then date dimension.
// Sort data using
// automatic group and date variables
pwt_10 = pdSort(pwt_10);
Assessing Panel Data Structure
When working with panel data, understanding your data's structure is important. It can play a role in the methods and assumptions applied in your models. For example, many techniques are only valid for balanced data and will produce unreliable results if your panel is unbalanced.
Some important considerations include:
- Whether the data is balanced.
- The presence of gaps or missing data.
- The ratio of groups to the number of time observations for each group.
By examining our panel’s structure upfront, we can:
- Identify potential challenges.
- Select the most appropriate analytical techniques.
- Prevent errors that might result in biased or misleading conclusions.
GAUSS includes a suite of panel data tools, introduced in GAUSS 25, that are designed for exploring the structure of panel data.
|
||
---|---|---|
Function Name | Description | Example |
pdIsBalanced | Determines whether each group in a panel dataset covers the maximum time span. | groupisBalanced = pdIsBalanced(pwt_10) |
pdAllBalanced | Checks if a panel dataset is strongly balanced and returns 1 if balanced, 0 otherwise. | isBalanced = pdAllBalanced(pwt_10) |
pdIsConsecutive | Checks if each group in a panel dataset covers consecutive time periods without gaps. | groupisConsecutive = pdIsConsecutive(pwt_10) |
pdAllConsecutive | Verifies whether all groups in a panel dataset have consecutive time periods without gaps. | isConsecutive = pdAllConsecutive(pwt_10) |
pdSize | Provides size description of a panel dataset including the number of groups, number of time observations for each group. | { num_grps, T, balanced } = pdSize(pwt_10) |
pdTimeSpans | Returns the time span (start and end dates) by group of variables in panel data. | df_tspans = pdTimeSpans(pwt_10) |
Exploring the structure of the Penn World Table
Now let's take a look at the structure of our Penn World Table data. First, we'll quickly check if our panel is balanced strongly balanced and consecutive.
print "Panel is balanced:";
pdAllBalanced(pwt_10);
// Check for consecutiveness
print "Panel is consecutive:";
pdAllConsecutive(pwt_10);
Panel is balanced: 0.0000000 Panel is consecutive: 1.0000000
This tells us that our panel is not strongly balanced but it is consecutive.
Now that we know our panel is unbalanced, we should take a closer look our data structure using pdSize
.
// Get summary of panel dimensions
{ num_grps, T, balanced } = pdSize(pwt_10);
================================================================================ Group ID: country Balanced: No Valid cases: 7540 Missings: 0 N. Groups: 137 T. Average: 55.036 ================================================================================ country T[i] Start Date End Date -------------------------------------------------------------------------------- Angola 50 1970-01-01 2019-01-01 Argentina 70 1950-01-01 2019-01-01 Armenia 30 1990-01-01 2019-01-01 Aruba 29 1991-01-01 2019-01-01 Australia 70 1950-01-01 2019-01-01 Austria 70 1950-01-01 2019-01-01 Azerbaijan 30 1990-01-01 2019-01-01 Bahamas 47 1973-01-01 2019-01-01 Bahrain 50 1970-01-01 2019-01-01 Barbados 60 1960-01-01 2019-01-01 Belarus 30 1990-01-01 2019-01-01 Belgium 70 1950-01-01 2019-01-01 Benin 40 1980-01-01 2019-01-01 Bermuda 34 1986-01-01 2019-01-01 Bolivia (Plurinational State of) 70 1950-01-01 2019-01-01 Bosnia and Herzegovina 30 1990-01-01 2019-01-01 Botswana 60 1960-01-01 2019-01-01 Brazil 70 1950-01-01 2019-01-01 British Virgin Islands 29 1991-01-01 2019-01-01 Bulgaria 50 1970-01-01 2019-01-01 Burkina Faso 61 1959-01-01 2019-01-01 Burundi 40 1980-01-01 2019-01-01 Cabo Verde 40 1980-01-01 2019-01-01 Cameroon 60 1960-01-01 2019-01-01 Canada 70 1950-01-01 2019-01-01 Cayman Islands 29 1991-01-01 2019-01-01 Central African Republic 40 1980-01-01 2019-01-01 Chad 60 1960-01-01 2019-01-01 Chile 69 1951-01-01 2019-01-01 China 68 1952-01-01 2019-01-01 China, Hong Kong SAR 60 1960-01-01 2019-01-01 China, Macao SAR 40 1980-01-01 2019-01-01 Colombia 70 1950-01-01 2019-01-01 Costa Rica 70 1950-01-01 2019-01-01 Croatia 30 1990-01-01 2019-01-01 Cyprus 70 1950-01-01 2019-01-01 Czech Republic 30 1990-01-01 2019-01-01 Côte d'Ivoire 60 1960-01-01 2019-01-01 Denmark 70 1950-01-01 2019-01-01 Djibouti 40 1980-01-01 2019-01-01 Dominican Republic 69 1951-01-01 2019-01-01 Ecuador 70 1950-01-01 2019-01-01 Egypt 70 1950-01-01 2019-01-01 Estonia 30 1990-01-01 2019-01-01 Eswatini 40 1980-01-01 2019-01-01 Fiji 40 1980-01-01 2019-01-01 Finland 70 1950-01-01 2019-01-01 France 70 1950-01-01 2019-01-01 Gabon 60 1960-01-01 2019-01-01 Georgia 30 1990-01-01 2019-01-01 Germany 70 1950-01-01 2019-01-01 Greece 69 1951-01-01 2019-01-01 Guatemala 70 1950-01-01 2019-01-01 Guinea 40 1980-01-01 2019-01-01 Honduras 50 1970-01-01 2019-01-01 Hungary 50 1970-01-01 2019-01-01 Iceland 70 1950-01-01 2019-01-01 India 70 1950-01-01 2019-01-01 Indonesia 60 1960-01-01 2019-01-01 Iran (Islamic Republic of) 65 1955-01-01 2019-01-01 Iraq 50 1970-01-01 2019-01-01 Ireland 70 1950-01-01 2019-01-01 Israel 70 1950-01-01 2019-01-01 Italy 70 1950-01-01 2019-01-01 Jamaica 67 1953-01-01 2019-01-01 Japan 70 1950-01-01 2019-01-01 Jordan 66 1954-01-01 2019-01-01 Kazakhstan 30 1990-01-01 2019-01-01 Kenya 70 1950-01-01 2019-01-01 Kuwait 50 1970-01-01 2019-01-01 Kyrgyzstan 30 1990-01-01 2019-01-01 Lao People's DR 40 1980-01-01 2019-01-01 Latvia 30 1990-01-01 2019-01-01 Lebanon 50 1970-01-01 2019-01-01 Lesotho 40 1980-01-01 2019-01-01 Lithuania 30 1990-01-01 2019-01-01 Luxembourg 70 1950-01-01 2019-01-01 Malaysia 65 1955-01-01 2019-01-01 Malta 66 1954-01-01 2019-01-01 Mauritania 43 1977-01-01 2019-01-01 Mauritius 70 1950-01-01 2019-01-01 Mexico 70 1950-01-01 2019-01-01 Mongolia 40 1980-01-01 2019-01-01 Morocco 70 1950-01-01 2019-01-01 Mozambique 60 1960-01-01 2019-01-01 Namibia 60 1960-01-01 2019-01-01 Netherlands 70 1950-01-01 2019-01-01 New Zealand 70 1950-01-01 2019-01-01 Nicaragua 40 1980-01-01 2019-01-01 Niger 60 1960-01-01 2019-01-01 Nigeria 70 1950-01-01 2019-01-01 North Macedonia 30 1990-01-01 2019-01-01 Norway 70 1950-01-01 2019-01-01 Oman 50 1970-01-01 2019-01-01 Panama 51 1969-01-01 2019-01-01 Paraguay 69 1951-01-01 2019-01-01 Peru 70 1950-01-01 2019-01-01 Philippines 70 1950-01-01 2019-01-01 Poland 50 1970-01-01 2019-01-01 Portugal 70 1950-01-01 2019-01-01 Qatar 50 1970-01-01 2019-01-01 Republic of Korea 67 1953-01-01 2019-01-01 Republic of Moldova 30 1990-01-01 2019-01-01 Romania 60 1960-01-01 2019-01-01 Russian Federation 30 1990-01-01 2019-01-01 Rwanda 60 1960-01-01 2019-01-01 Sao Tome and Principe 40 1980-01-01 2019-01-01 Saudi Arabia 50 1970-01-01 2019-01-01 Senegal 60 1960-01-01 2019-01-01 Serbia 30 1990-01-01 2019-01-01 Sierra Leone 40 1980-01-01 2019-01-01 Singapore 60 1960-01-01 2019-01-01 Slovakia 30 1990-01-01 2019-01-01 Slovenia 30 1990-01-01 2019-01-01 South Africa 70 1950-01-01 2019-01-01 Spain 70 1950-01-01 2019-01-01 Sri Lanka 70 1950-01-01 2019-01-01 Sudan 50 1970-01-01 2019-01-01 Suriname 47 1973-01-01 2019-01-01 Sweden 70 1950-01-01 2019-01-01 Switzerland 70 1950-01-01 2019-01-01 Taiwan 69 1951-01-01 2019-01-01 Tajikistan 30 1990-01-01 2019-01-01 Thailand 70 1950-01-01 2019-01-01 Togo 40 1980-01-01 2019-01-01 Trinidad and Tobago 70 1950-01-01 2019-01-01 Tunisia 60 1960-01-01 2019-01-01 Turkey 70 1950-01-01 2019-01-01 U.R. of Tanzania: Mainland 60 1960-01-01 2019-01-01 Ukraine 30 1990-01-01 2019-01-01 United Kingdom 70 1950-01-01 2019-01-01 United States 70 1950-01-01 2019-01-01 Uruguay 70 1950-01-01 2019-01-01 Uzbekistan 30 1990-01-01 2019-01-01 Venezuela (Bolivarian Republic of) 70 1950-01-01 2019-01-01 Zambia 65 1955-01-01 2019-01-01 Zimbabwe 66 1954-01-01 2019-01-01 ================================================================================
The pdSize
procedure provides a nice summary of our panel data structure including the:
- Total number of groups and a full list of the groups.
- Number of observations per a group.
- Number of missing values.
- The start and end date of each group in our panel.
While there are no missing values in this data, this isn't always the case. In fact, it is quite common that variables cover only part of the full timespan. For example, a country may have a longer history of providing real GDP data than IRR data.
The pdTimeSpans
procedure reports the full timespan for each group, along with the timespans for a specified variable list. If no variable list is provided, it returns the timespan for all variables in the dataframe.
For example, suppose we want to use the emp and rgdpo variables in a model and want to know the maximum timespan our model can cover. We can use pdTimeSpans
to see the timespan of each variable:
pwt_model_timespans = pdTimeSpans(pwt_10, "emp"$|"rgdpo");
pwt_model_timespans;
country Start year End year emp Start emp End rgdpo Start rgdpo End Angola 1970-01-01 2019-01-01 1970-01-01 2019-01-01 1970-01-01 2019-01-01 Argentina 1950-01-01 2019-01-01 1950-01-01 2019-01-01 1950-01-01 2019-01-01 Armenia 1990-01-01 2019-01-01 1990-01-01 2019-01-01 1990-01-01 2019-01-01 Aruba 1991-01-01 2019-01-01 1991-01-01 2019-01-01 1991-01-01 2019-01-01 Australia 1950-01-01 2019-01-01 1950-01-01 2019-01-01 1950-01-01 2019-01-01 Austria 1950-01-01 2019-01-01 1950-01-01 2019-01-01 1950-01-01 2019-01-01 Azerbaijan 1990-01-01 2019-01-01 1990-01-01 2019-01-01 1990-01-01 2019-01-01 Bahamas 1973-01-01 2019-01-01 1973-01-01 2019-01-01 1973-01-01 2019-01-01 Bahrain 1970-01-01 2019-01-01 1970-01-01 2019-01-01 1970-01-01 2019-01-01 Barbados 1960-01-01 2019-01-01 1960-01-01 2019-01-01 1960-01-01 2019-01-01 Belarus 1990-01-01 2019-01-01 1990-01-01 2019-01-01 1990-01-01 2019-01-01 Belgium 1950-01-01 2019-01-01 1950-01-01 2019-01-01 1950-01-01 2019-01-01 Benin 1980-01-01 2019-01-01 1980-01-01 2019-01-01 1980-01-01 2019-01-01 Bermuda 1986-01-01 2019-01-01 1986-01-01 2019-01-01 1986-01-01 2019-01-01 Bolivia (Plurina 1950-01-01 2019-01-01 1950-01-01 2019-01-01 1950-01-01 2019-01-01 Bosnia and Herze 1990-01-01 2019-01-01 1990-01-01 2019-01-01 1990-01-01 2019-01-01 Botswana 1960-01-01 2019-01-01 1960-01-01 2019-01-01 1960-01-01 2019-01-01 Brazil 1950-01-01 2019-01-01 1950-01-01 2019-01-01 1950-01-01 2019-01-01 British Virgin I 1991-01-01 2019-01-01 1991-01-01 2019-01-01 1991-01-01 2019-01-01 Bulgaria 1970-01-01 2019-01-01 1970-01-01 2019-01-01 1970-01-01 2019-01-01 Burkina Faso 1959-01-01 2019-01-01 1959-01-01 2019-01-01 1959-01-01 2019-01-01 Burundi 1980-01-01 2019-01-01 1980-01-01 2019-01-01 1980-01-01 2019-01-01 Cabo Verde 1980-01-01 2019-01-01 1980-01-01 2019-01-01 1980-01-01 2019-01-01 Cameroon 1960-01-01 2019-01-01 1960-01-01 2019-01-01 1960-01-01 2019-01-01 Canada 1950-01-01 2019-01-01 1950-01-01 2019-01-01 1950-01-01 2019-01-01 Cayman Islands 1991-01-01 2019-01-01 1991-01-01 2019-01-01 1991-01-01 2019-01-01 Central African 1980-01-01 2019-01-01 1980-01-01 2019-01-01 1980-01-01 2019-01-01 Chad 1960-01-01 2019-01-01 1960-01-01 2019-01-01 1960-01-01 2019-01-01 Chile 1951-01-01 2019-01-01 1951-01-01 2019-01-01 1951-01-01 2019-01-01 China 1952-01-01 2019-01-01 1952-01-01 2019-01-01 1952-01-01 2019-01-01 China, Hong Kong 1960-01-01 2019-01-01 1960-01-01 2019-01-01 1960-01-01 2019-01-01 China, Macao SAR 1980-01-01 2019-01-01 1980-01-01 2019-01-01 1980-01-01 2019-01-01 Colombia 1950-01-01 2019-01-01 1950-01-01 2019-01-01 1950-01-01 2019-01-01 Costa Rica 1950-01-01 2019-01-01 1950-01-01 2019-01-01 1950-01-01 2019-01-01 Croatia 1990-01-01 2019-01-01 1990-01-01 2019-01-01 1990-01-01 2019-01-01 Cyprus 1950-01-01 2019-01-01 1950-01-01 2019-01-01 1950-01-01 2019-01-01 Czech Republic 1990-01-01 2019-01-01 1990-01-01 2019-01-01 1990-01-01 2019-01-01 Côte d'Ivoire 1960-01-01 2019-01-01 1960-01-01 2019-01-01 1960-01-01 2019-01-01 Denmark 1950-01-01 2019-01-01 1950-01-01 2019-01-01 1950-01-01 2019-01-01 Djibouti 1980-01-01 2019-01-01 1980-01-01 2019-01-01 1980-01-01 2019-01-01 Dominican Republ 1951-01-01 2019-01-01 1951-01-01 2019-01-01 1951-01-01 2019-01-01 Ecuador 1950-01-01 2019-01-01 1950-01-01 2019-01-01 1950-01-01 2019-01-01 Egypt 1950-01-01 2019-01-01 1950-01-01 2019-01-01 1950-01-01 2019-01-01 Estonia 1990-01-01 2019-01-01 1990-01-01 2019-01-01 1990-01-01 2019-01-01 Eswatini 1980-01-01 2019-01-01 1980-01-01 2019-01-01 1980-01-01 2019-01-01 Fiji 1980-01-01 2019-01-01 1980-01-01 2019-01-01 1980-01-01 2019-01-01 Finland 1950-01-01 2019-01-01 1950-01-01 2019-01-01 1950-01-01 2019-01-01 France 1950-01-01 2019-01-01 1950-01-01 2019-01-01 1950-01-01 2019-01-01 Gabon 1960-01-01 2019-01-01 1960-01-01 2019-01-01 1960-01-01 2019-01-01 Georgia 1990-01-01 2019-01-01 1990-01-01 2019-01-01 1990-01-01 2019-01-01 Germany 1950-01-01 2019-01-01 1950-01-01 2019-01-01 1950-01-01 2019-01-01 Greece 1951-01-01 2019-01-01 1951-01-01 2019-01-01 1951-01-01 2019-01-01 Guatemala 1950-01-01 2019-01-01 1950-01-01 2019-01-01 1950-01-01 2019-01-01 Guinea 1980-01-01 2019-01-01 1980-01-01 2019-01-01 1980-01-01 2019-01-01 Honduras 1970-01-01 2019-01-01 1970-01-01 2019-01-01 1970-01-01 2019-01-01 Hungary 1970-01-01 2019-01-01 1970-01-01 2019-01-01 1970-01-01 2019-01-01 Iceland 1950-01-01 2019-01-01 1950-01-01 2019-01-01 1950-01-01 2019-01-01 India 1950-01-01 2019-01-01 1950-01-01 2019-01-01 1950-01-01 2019-01-01 Indonesia 1960-01-01 2019-01-01 1960-01-01 2019-01-01 1960-01-01 2019-01-01 Iran (Islamic Re 1955-01-01 2019-01-01 1955-01-01 2019-01-01 1955-01-01 2019-01-01 Iraq 1970-01-01 2019-01-01 1970-01-01 2019-01-01 1970-01-01 2019-01-01 Ireland 1950-01-01 2019-01-01 1950-01-01 2019-01-01 1950-01-01 2019-01-01 Israel 1950-01-01 2019-01-01 1950-01-01 2019-01-01 1950-01-01 2019-01-01 Italy 1950-01-01 2019-01-01 1950-01-01 2019-01-01 1950-01-01 2019-01-01 Jamaica 1953-01-01 2019-01-01 1953-01-01 2019-01-01 1953-01-01 2019-01-01 Japan 1950-01-01 2019-01-01 1950-01-01 2019-01-01 1950-01-01 2019-01-01 Jordan 1954-01-01 2019-01-01 1954-01-01 2019-01-01 1954-01-01 2019-01-01 Kazakhstan 1990-01-01 2019-01-01 1990-01-01 2019-01-01 1990-01-01 2019-01-01 Kenya 1950-01-01 2019-01-01 1950-01-01 2019-01-01 1950-01-01 2019-01-01 Kuwait 1970-01-01 2019-01-01 1970-01-01 2019-01-01 1970-01-01 2019-01-01 Kyrgyzstan 1990-01-01 2019-01-01 1990-01-01 2019-01-01 1990-01-01 2019-01-01 Lao People's DR 1980-01-01 2019-01-01 1980-01-01 2019-01-01 1980-01-01 2019-01-01 Latvia 1990-01-01 2019-01-01 1990-01-01 2019-01-01 1990-01-01 2019-01-01 Lebanon 1970-01-01 2019-01-01 1970-01-01 2019-01-01 1970-01-01 2019-01-01 Lesotho 1980-01-01 2019-01-01 1980-01-01 2019-01-01 1980-01-01 2019-01-01 Lithuania 1990-01-01 2019-01-01 1990-01-01 2019-01-01 1990-01-01 2019-01-01 Luxembourg 1950-01-01 2019-01-01 1950-01-01 2019-01-01 1950-01-01 2019-01-01 Malaysia 1955-01-01 2019-01-01 1955-01-01 2019-01-01 1955-01-01 2019-01-01 Malta 1954-01-01 2019-01-01 1954-01-01 2019-01-01 1954-01-01 2019-01-01 Mauritania 1977-01-01 2019-01-01 1977-01-01 2019-01-01 1977-01-01 2019-01-01 Mauritius 1950-01-01 2019-01-01 1950-01-01 2019-01-01 1950-01-01 2019-01-01 Mexico 1950-01-01 2019-01-01 1950-01-01 2019-01-01 1950-01-01 2019-01-01 Mongolia 1980-01-01 2019-01-01 1980-01-01 2019-01-01 1980-01-01 2019-01-01 Morocco 1950-01-01 2019-01-01 1950-01-01 2019-01-01 1950-01-01 2019-01-01 Mozambique 1960-01-01 2019-01-01 1960-01-01 2019-01-01 1960-01-01 2019-01-01 Namibia 1960-01-01 2019-01-01 1960-01-01 2019-01-01 1960-01-01 2019-01-01 Netherlands 1950-01-01 2019-01-01 1950-01-01 2019-01-01 1950-01-01 2019-01-01 New Zealand 1950-01-01 2019-01-01 1950-01-01 2019-01-01 1950-01-01 2019-01-01 Nicaragua 1980-01-01 2019-01-01 1980-01-01 2019-01-01 1980-01-01 2019-01-01 Niger 1960-01-01 2019-01-01 1960-01-01 2019-01-01 1960-01-01 2019-01-01 Nigeria 1950-01-01 2019-01-01 1950-01-01 2019-01-01 1950-01-01 2019-01-01 North Macedonia 1990-01-01 2019-01-01 1990-01-01 2019-01-01 1990-01-01 2019-01-01 Norway 1950-01-01 2019-01-01 1950-01-01 2019-01-01 1950-01-01 2019-01-01 Oman 1970-01-01 2019-01-01 1970-01-01 2019-01-01 1970-01-01 2019-01-01 Panama 1969-01-01 2019-01-01 1969-01-01 2019-01-01 1969-01-01 2019-01-01 Paraguay 1951-01-01 2019-01-01 1951-01-01 2019-01-01 1951-01-01 2019-01-01 Peru 1950-01-01 2019-01-01 1950-01-01 2019-01-01 1950-01-01 2019-01-01 Philippines 1950-01-01 2019-01-01 1950-01-01 2019-01-01 1950-01-01 2019-01-01 Poland 1970-01-01 2019-01-01 1970-01-01 2019-01-01 1970-01-01 2019-01-01 Portugal 1950-01-01 2019-01-01 1950-01-01 2019-01-01 1950-01-01 2019-01-01 Qatar 1970-01-01 2019-01-01 1970-01-01 2019-01-01 1970-01-01 2019-01-01 Republic of Kore 1953-01-01 2019-01-01 1953-01-01 2019-01-01 1953-01-01 2019-01-01 Republic of Mold 1990-01-01 2019-01-01 1990-01-01 2019-01-01 1990-01-01 2019-01-01 Romania 1960-01-01 2019-01-01 1960-01-01 2019-01-01 1960-01-01 2019-01-01 Russian Federati 1990-01-01 2019-01-01 1990-01-01 2019-01-01 1990-01-01 2019-01-01 Rwanda 1960-01-01 2019-01-01 1960-01-01 2019-01-01 1960-01-01 2019-01-01 Sao Tome and Pri 1980-01-01 2019-01-01 1980-01-01 2019-01-01 1980-01-01 2019-01-01 Saudi Arabia 1970-01-01 2019-01-01 1970-01-01 2019-01-01 1970-01-01 2019-01-01 Senegal 1960-01-01 2019-01-01 1960-01-01 2019-01-01 1960-01-01 2019-01-01 Serbia 1990-01-01 2019-01-01 1990-01-01 2019-01-01 1990-01-01 2019-01-01 Sierra Leone 1980-01-01 2019-01-01 1980-01-01 2019-01-01 1980-01-01 2019-01-01 Singapore 1960-01-01 2019-01-01 1960-01-01 2019-01-01 1960-01-01 2019-01-01 Slovakia 1990-01-01 2019-01-01 1990-01-01 2019-01-01 1990-01-01 2019-01-01 Slovenia 1990-01-01 2019-01-01 1990-01-01 2019-01-01 1990-01-01 2019-01-01 South Africa 1950-01-01 2019-01-01 1950-01-01 2019-01-01 1950-01-01 2019-01-01 Spain 1950-01-01 2019-01-01 1950-01-01 2019-01-01 1950-01-01 2019-01-01 Sri Lanka 1950-01-01 2019-01-01 1950-01-01 2019-01-01 1950-01-01 2019-01-01 Sudan 1970-01-01 2019-01-01 1970-01-01 2019-01-01 1970-01-01 2019-01-01 Suriname 1973-01-01 2019-01-01 1973-01-01 2019-01-01 1973-01-01 2019-01-01 Sweden 1950-01-01 2019-01-01 1950-01-01 2019-01-01 1950-01-01 2019-01-01 Switzerland 1950-01-01 2019-01-01 1950-01-01 2019-01-01 1950-01-01 2019-01-01 Taiwan 1951-01-01 2019-01-01 1951-01-01 2019-01-01 1951-01-01 2019-01-01 Tajikistan 1990-01-01 2019-01-01 1990-01-01 2019-01-01 1990-01-01 2019-01-01 Thailand 1950-01-01 2019-01-01 1950-01-01 2019-01-01 1950-01-01 2019-01-01 Togo 1980-01-01 2019-01-01 1980-01-01 2019-01-01 1980-01-01 2019-01-01 Trinidad and Tob 1950-01-01 2019-01-01 1950-01-01 2019-01-01 1950-01-01 2019-01-01 Tunisia 1960-01-01 2019-01-01 1960-01-01 2019-01-01 1960-01-01 2019-01-01 Turkey 1950-01-01 2019-01-01 1950-01-01 2019-01-01 1950-01-01 2019-01-01 U.R. of Tanzania 1960-01-01 2019-01-01 1960-01-01 2019-01-01 1960-01-01 2019-01-01 Ukraine 1990-01-01 2019-01-01 1990-01-01 2019-01-01 1990-01-01 2019-01-01 United Kingdom 1950-01-01 2019-01-01 1950-01-01 2019-01-01 1950-01-01 2019-01-01 United States 1950-01-01 2019-01-01 1950-01-01 2019-01-01 1950-01-01 2019-01-01 Uruguay 1950-01-01 2019-01-01 1950-01-01 2019-01-01 1950-01-01 2019-01-01 Uzbekistan 1990-01-01 2019-01-01 1990-01-01 2019-01-01 1990-01-01 2019-01-01 Venezuela (Boliv 1950-01-01 2019-01-01 1950-01-01 2019-01-01 1950-01-01 2019-01-01 Zambia 1955-01-01 2019-01-01 1955-01-01 2019-01-01 1955-01-01 2019-01-01 Zimbabwe 1954-01-01 2019-01-01 1954-01-01 2019-01-01 1954-01-01 2019-01-01
Again, because we aren't missing any data, both emp and rgdpo cover the full timespan for each group as reported by pdSize
.
Panel Data Summary Statistics
When analyzing panel data, it's important to understand how variability is distributed across different dimensions of the data. Specifically:
- Overall statistics which summarize the variability across all observations in the dataset, providing a high-level view of the data.
- Within-group statistics which measure variability within each individual group, reflecting how a variable changes over time for a specific group.
- Between-group statistics, which capture variability across groups, showing how groups differ from each other on average.
Understanding these patterns ensures that we select the right modeling approach and properly account for both group-specific and overall trends in our analysis.
We'll use the pdSummary
procedure to compute these statistics. However, to simplify our examples and output moving forward, let's limit our panel to include only countries that use the Euro.
// Filter to include only Euro using countries
pwt_10 = selif(pwt_10, pwt_10[., "currency_unit"] .$== "Euro");
// Get summary statistics
pdSummary(pwt_10);
========================================================================================== Group ID: country Balanced: No Valid cases: 1125 Missings: 0 N. Groups: 19 T. Average: 59.211 ========================================================================================== Variable Measure Mean Std. Dev. Minimum Maximum ------------------------------------------------------------------------------------------ emp Overall 7.933 10.754 0.088 44.795 Between ----- 10.382 0.128 38.430 Within ----- 1.339 0.359 14.298 irr Overall 0.097 0.049 0.010 0.316 Between ----- 0.042 0.049 0.214 Within ----- 0.026 0.025 0.259 pop Overall 18.322 23.778 0.296 83.517 Between ----- 23.039 0.364 78.163 Within ----- 2.763 4.844 29.645 rgdpe Overall 457547.712 746910.097 568.248 4308861.500 Between ----- 594454.666 6365.400 2072470.938 Within ----- 431603.648 -1262654.069 2693938.274 rgdpo Overall 454655.015 750364.973 69.909 4275312.000 Between ----- 596209.636 5725.827 2097340.112 Within ----- 434813.497 -1283383.285 2632626.903 ========================================================================================== Non-numeric variables dropped from summary.
One very clear observation from our summary table is that our GDP variables, rgdpo and rgpde, are a much different scale than our other variables. We'll look at how to transform these next.
Transforming Data for Modeling
Because panel data usually contains a time dimension, it is very common to need to take lags or differences of our data. While this is very straightforward with time series data, doing this with panel data can be much more difficult.
Fortunately, the pdLag
and pdDiff
procedures, introduced in GAUSS 25, will efficiently compute panel data lags and differences for you.
$$\text{rdgpo growth rate} = \ln rgdpo_{t} - \ln rgdpo_{t-1} $$
Let's use the pdDiff
procedure to create a new real GDP growth variable.
// Take natural log of rgdpo
ln_rgdpo = ln(pwt_10[., "rgdpo"]);
// Add to pwt_10 dataframe
// we need to do this so GAUSS
// can identify or panel
// using the 'country' and 'year' variables
pwt_10 = pwt_10 ~ asDF(ln_rgdpo, "ln_rgdpo");
// Take first difference of ln_rgdpo
// GAUSS will use 'country' and 'year' to
// automatically detect panel
gr_rgdpo = pdDiff(pwt_10[., "country" "year" "ln_rgdpo"]);
// Summarize 'gr_rgdpo'
// GAUSS will use 'country' and 'year' to
// automatically detect panel
call pdSummary(gr_rgdpo);
========================================================================================== Group ID: country Balanced: No Valid cases: 1106 Missings: 19 N. Groups: 19 T. Average: 58.211 ========================================================================================== Variable Measure Mean Std. Dev. Minimum Maximum ------------------------------------------------------------------------------------------ ln_rgdpo Overall 0.036 0.092 -1.741 1.476 Between ----- 0.013 0.008 0.062 Within ----- 0.091 -1.768 1.450 ==========================================================================================
Data Visualization
As a final step, let's create a quick visualization of this new variable using plotXY
and the by
keyword. We'll use a subset of countries to keep our plot from getting to crowded.
// Create subset of countries
country_list = "Austria"$|"France"$|"Germany"$|"Spain"$|"Italy";
// Select data for plot
plot_data = selif(gr_rgdpo, sumr(gr_rgdpo[., "country"] .$== country_list'));
// Plot rgdpo growth variable by country
plotXY(plot_data, "ln_rgdpo~year + by(country)");
Conclusion
Today we've seen how the new panel data tools in GAUSS 25 can simplify your everyday panel data tasks, using a hands-on example. We've covered fundamental tasks, including:
- Loading your data.
- Preparing your panel dataset.
- Exploring panel data characteristics.
- Visualizing panel data.
- Transforming your data for modeling.
Further Reading
- Introduction to the Fundamentals of Panel Data
- Panel Data Basics: One-way Individual Effects
- Get Started with Panel Data in GAUSS (Video)
- How to Aggregate Panel Data in GAUSS
Eric has been working to build, distribute, and strengthen the GAUSS universe since 2012. He is an economist skilled in data analysis and software development. He has earned a B.A. and MSc in economics and engineering and has over 18 years of combined industry and academic experience in data analysis and research.