Exploring and Cleaning Panel Data with GAUSS 25

Introduction

Panel data offers a unique opportunity to examine both individual-specific and time-specific effects. However, as anyone who has worked with panel data knows, these same features that make panel data so useful can also make exploration and cleaning particularly challenging.

GAUSS 25 was designed with these challenges in mind. It introduces a comprehensive new suite of tools, tailored to make working with panel data in GAUSS easier, faster, and more intuitive.

In today's blog, we’ll demonstrate how these tools can simplify everyday panel data tasks, including:

  • Loading your data.
  • Preparing your panel dataset.
  • Exploring panel data characteristics.
  • Visualizing panel data.
  • Transforming your data for modeling.

Data

Today we will work use a subset of the publicly available Penn World Table version 10.01, available for download here.


Penn World Table Variables

Variable Name Description
currency_unit The currency unit used for GDP measurements.
countrycode The three-letter ISO country code.
country The name of the country.
year The year of observation.
rgdpe Real GDP at constant prices (expenditure-side).
rgdpo Real GDP at constant prices (output-side).
pop Population of the country.
emp Number of employed persons.
irr Investment rate of return.

Loading Our Panel Data

We'll start by using the loadd procedure to load our data.

// Load data from 'pwt_10.gdat
// Using __FILE_DIR to specify data path
pwt_10 = loadd(__FILE_DIR $+ "pwt_10.gdat");

// Preview data 
head(pwt_10);

The head procedure prints the first five observations of the our dataset, helping us check that our data has loaded properly:

   currency_unit      countrycode          country             year            rgdpe            rgdpo              pop              emp              irr
  Aruban Guilder              ABW            Aruba       1991-01-01        2804.5005        3177.4575      0.064622000      0.029200001       0.11486563
  Aruban Guilder              ABW            Aruba       1992-01-01        2944.5161        3370.5376      0.068235000      0.030903272       0.11182721
  Aruban Guilder              ABW            Aruba       1993-01-01        3131.3708        3698.5325      0.072504000      0.032911807       0.11131135
  Aruban Guilder              ABW            Aruba       1994-01-01        3537.9534        4172.8242      0.076700000      0.034895979       0.10574290
  Aruban Guilder              ABW            Aruba       1995-01-01        3412.8745        4184.1562      0.080324000      0.036628015       0.10471709 

It's important to note that to identify our panel, GAUSS requires a dataframe to have at least one date variable and one categorical or string variable.

We will look more closely at how GAUSS identifies panels in the next section. For now, let's check that our data meets this requirement using the getcoltypes procedure.

// Check column types
getcoltypes(pwt_10);
            type
        category
        category
        category
            date
          number
          number
          number
          number
          number

Our data meets the GAUSS requirement for panel data, with three categorical variables and one date variable.


Ready to get started using GAUSS for panel data? Contact us for a GAUSS 25 demo!

Preparing Panel Data

Besides the data type requirements, the GAUSS panel data procedures assume a few important things about the form of your panel data.

In particular, your panel data should:

  • Be in stacked long form.
  • Have the date and group identification columns occurring before other date and categorical/string variables. (This is not required but it is the most convenient way to work the GAUSS panel data procedures.)
  • Be sorted by group then time.

Let’s look more closely at how to use GAUSS to ensure that our data meets these requirements.

Transforming panel data to long form

If your panel data is in wide form, it's easy to convert to long form using the dflonger procedure. This procedure is a very versatile procedure -- it's designed to be intuitive enough to cover basic transformation with little effort but flexible enough to tackle complex cases.

Since, the pwt_10 data is already in long form, so we don't need to transform our data. However, for an in-depth look at dflonger, including examples, see our previous blog, Transforming Panel Data to Long Form in GAUSS.

Ordering variables

One of the most convenient features of the new panel data procedures is their ability to intelligently detect group and time variables. To ensure this works properly, simply make sure that the date variable and group variable identifying your panel are the first occurring date and categorical/string variables in your dataset.

Let's take a look at our pwt_10 dataframe:

Identifying panel data groups
Our dataset contains three categorical variables: currency_unit, countrycode, and country. By default, GAUSS will use the first occurring categorical variable, currency_unit, to identify the groups in the panel, unless we specify otherwise.

Identifying time dimension
Our dataset also includes a date variable, year, which GAUSS will automatically use to identify the time dimension of the panel.

As the dataframe is now, GAUSS will use currency_unit and year to identify our panel. In this dataset, however, the panel should be identified by country and year. To address this, we could use optional arguments to specify that our group variable is country. However, we would need to do this every time we use one of the panel data procedures.

Instead, we can use the order procedure to move the country and year variables to the front of our dataframe.

// Move country 
pwt_10 = order(pwt_10, "country"$|"year");

Now, in our reordered pwt_10 dataframe, we see that country and year appear as the first two columns. GAUSS will automatically use these to identify the group and time dimensions, respectively.

A few things to note:

  • It is not necessary to move the year variable. Since it is only date variable in the dataframe, GAUSS will use year to identify our time dimension regardless of its position.
  • The country variable does not need to be the first column in the dataframe. It only needs to appear before the other categorical variables for GAUSS to automatically recognize it as the group dimension.

Sorting panel data

Beyond the fact that the GAUSS panel data functions expect sorted data, there are many advantages to working with sorted data:

  • Sorted data is easier to browse and explore.
  • Econometric techniques, such as calculating lags and differences, rely on the data being ordered consistently.
  • Proper sorting helps avoid errors, ensures reproducibility, and lays a solid foundation for reliable results.

The new pdsort procedure allows you to quickly sort panel data by the group then date dimension.

// Sort data using
// automatic group and date variables 
pwt_10 = pdSort(pwt_10);

Assessing Panel Data Structure

When working with panel data, understanding your data's structure is important. It can play a role in the methods and assumptions applied in your models. For example, many techniques are only valid for balanced data and will produce unreliable results if your panel is unbalanced.

Some important considerations include:

  • Whether the data is balanced.
  • The presence of gaps or missing data.
  • The ratio of groups to the number of time observations for each group.

By examining our panel’s structure upfront, we can:

  • Identify potential challenges.
  • Select the most appropriate analytical techniques.
  • Prevent errors that might result in biased or misleading conclusions.

GAUSS includes a suite of panel data tools, introduced in GAUSS 25, that are designed for exploring the structure of panel data.


GAUSS Functions for Panel Data Structure

Function Name Description Example
pdIsBalanced Determines whether each group in a panel dataset covers the maximum time span. groupisBalanced = pdIsBalanced(pwt_10)
pdAllBalanced Checks if a panel dataset is strongly balanced and returns 1 if balanced, 0 otherwise. isBalanced = pdAllBalanced(pwt_10)
pdIsConsecutive Checks if each group in a panel dataset covers consecutive time periods without gaps. groupisConsecutive = pdIsConsecutive(pwt_10)
pdAllConsecutive Verifies whether all groups in a panel dataset have consecutive time periods without gaps. isConsecutive = pdAllConsecutive(pwt_10)
pdSize Provides size description of a panel dataset including the number of groups, number of time observations for each group. { num_grps, T, balanced } = pdSize(pwt_10)
pdTimeSpans Returns the time span (start and end dates) by group of variables in panel data. df_tspans = pdTimeSpans(pwt_10)

Exploring the structure of the Penn World Table

Now let's take a look at the structure of our Penn World Table data. First, we'll quickly check if our panel is balanced strongly balanced and consecutive.

print "Panel is balanced:";
pdAllBalanced(pwt_10);

// Check for consecutiveness
print "Panel is consecutive:";
pdAllConsecutive(pwt_10);
Panel is balanced:
       0.0000000
Panel is consecutive:
       1.0000000 

This tells us that our panel is not strongly balanced but it is consecutive.

Now that we know our panel is unbalanced, we should take a closer look our data structure using pdSize.

// Get summary of panel dimensions
{ num_grps, T, balanced } = pdSize(pwt_10);
================================================================================
Group ID:                   country          Balanced:                        No
Valid cases:                   7540          Missings:                         0
N. Groups:                      137          T. Average:                  55.036
================================================================================
country                                       T[i]     Start Date       End Date
--------------------------------------------------------------------------------
Angola                                          50     1970-01-01     2019-01-01 
Argentina                                       70     1950-01-01     2019-01-01 
Armenia                                         30     1990-01-01     2019-01-01 
Aruba                                           29     1991-01-01     2019-01-01 
Australia                                       70     1950-01-01     2019-01-01 
Austria                                         70     1950-01-01     2019-01-01 
Azerbaijan                                      30     1990-01-01     2019-01-01 
Bahamas                                         47     1973-01-01     2019-01-01 
Bahrain                                         50     1970-01-01     2019-01-01 
Barbados                                        60     1960-01-01     2019-01-01 
Belarus                                         30     1990-01-01     2019-01-01 
Belgium                                         70     1950-01-01     2019-01-01 
Benin                                           40     1980-01-01     2019-01-01 
Bermuda                                         34     1986-01-01     2019-01-01 
Bolivia (Plurinational State of)                70     1950-01-01     2019-01-01 
Bosnia and Herzegovina                          30     1990-01-01     2019-01-01 
Botswana                                        60     1960-01-01     2019-01-01 
Brazil                                          70     1950-01-01     2019-01-01 
British Virgin Islands                          29     1991-01-01     2019-01-01 
Bulgaria                                        50     1970-01-01     2019-01-01 
Burkina Faso                                    61     1959-01-01     2019-01-01 
Burundi                                         40     1980-01-01     2019-01-01 
Cabo Verde                                      40     1980-01-01     2019-01-01 
Cameroon                                        60     1960-01-01     2019-01-01 
Canada                                          70     1950-01-01     2019-01-01 
Cayman Islands                                  29     1991-01-01     2019-01-01 
Central African Republic                        40     1980-01-01     2019-01-01 
Chad                                            60     1960-01-01     2019-01-01 
Chile                                           69     1951-01-01     2019-01-01 
China                                           68     1952-01-01     2019-01-01 
China, Hong Kong SAR                            60     1960-01-01     2019-01-01 
China, Macao SAR                                40     1980-01-01     2019-01-01 
Colombia                                        70     1950-01-01     2019-01-01 
Costa Rica                                      70     1950-01-01     2019-01-01 
Croatia                                         30     1990-01-01     2019-01-01 
Cyprus                                          70     1950-01-01     2019-01-01 
Czech Republic                                  30     1990-01-01     2019-01-01 
Côte d'Ivoire                                   60     1960-01-01     2019-01-01 
Denmark                                         70     1950-01-01     2019-01-01 
Djibouti                                        40     1980-01-01     2019-01-01 
Dominican Republic                              69     1951-01-01     2019-01-01 
Ecuador                                         70     1950-01-01     2019-01-01 
Egypt                                           70     1950-01-01     2019-01-01 
Estonia                                         30     1990-01-01     2019-01-01 
Eswatini                                        40     1980-01-01     2019-01-01 
Fiji                                            40     1980-01-01     2019-01-01 
Finland                                         70     1950-01-01     2019-01-01 
France                                          70     1950-01-01     2019-01-01 
Gabon                                           60     1960-01-01     2019-01-01 
Georgia                                         30     1990-01-01     2019-01-01 
Germany                                         70     1950-01-01     2019-01-01 
Greece                                          69     1951-01-01     2019-01-01 
Guatemala                                       70     1950-01-01     2019-01-01 
Guinea                                          40     1980-01-01     2019-01-01 
Honduras                                        50     1970-01-01     2019-01-01 
Hungary                                         50     1970-01-01     2019-01-01 
Iceland                                         70     1950-01-01     2019-01-01 
India                                           70     1950-01-01     2019-01-01 
Indonesia                                       60     1960-01-01     2019-01-01 
Iran (Islamic Republic of)                      65     1955-01-01     2019-01-01 
Iraq                                            50     1970-01-01     2019-01-01 
Ireland                                         70     1950-01-01     2019-01-01 
Israel                                          70     1950-01-01     2019-01-01 
Italy                                           70     1950-01-01     2019-01-01 
Jamaica                                         67     1953-01-01     2019-01-01 
Japan                                           70     1950-01-01     2019-01-01 
Jordan                                          66     1954-01-01     2019-01-01 
Kazakhstan                                      30     1990-01-01     2019-01-01 
Kenya                                           70     1950-01-01     2019-01-01 
Kuwait                                          50     1970-01-01     2019-01-01 
Kyrgyzstan                                      30     1990-01-01     2019-01-01 
Lao People's DR                                 40     1980-01-01     2019-01-01 
Latvia                                          30     1990-01-01     2019-01-01 
Lebanon                                         50     1970-01-01     2019-01-01 
Lesotho                                         40     1980-01-01     2019-01-01 
Lithuania                                       30     1990-01-01     2019-01-01 
Luxembourg                                      70     1950-01-01     2019-01-01 
Malaysia                                        65     1955-01-01     2019-01-01 
Malta                                           66     1954-01-01     2019-01-01 
Mauritania                                      43     1977-01-01     2019-01-01 
Mauritius                                       70     1950-01-01     2019-01-01 
Mexico                                          70     1950-01-01     2019-01-01 
Mongolia                                        40     1980-01-01     2019-01-01 
Morocco                                         70     1950-01-01     2019-01-01 
Mozambique                                      60     1960-01-01     2019-01-01 
Namibia                                         60     1960-01-01     2019-01-01 
Netherlands                                     70     1950-01-01     2019-01-01 
New Zealand                                     70     1950-01-01     2019-01-01 
Nicaragua                                       40     1980-01-01     2019-01-01 
Niger                                           60     1960-01-01     2019-01-01 
Nigeria                                         70     1950-01-01     2019-01-01 
North Macedonia                                 30     1990-01-01     2019-01-01 
Norway                                          70     1950-01-01     2019-01-01 
Oman                                            50     1970-01-01     2019-01-01 
Panama                                          51     1969-01-01     2019-01-01 
Paraguay                                        69     1951-01-01     2019-01-01 
Peru                                            70     1950-01-01     2019-01-01 
Philippines                                     70     1950-01-01     2019-01-01 
Poland                                          50     1970-01-01     2019-01-01 
Portugal                                        70     1950-01-01     2019-01-01 
Qatar                                           50     1970-01-01     2019-01-01 
Republic of Korea                               67     1953-01-01     2019-01-01 
Republic of Moldova                             30     1990-01-01     2019-01-01 
Romania                                         60     1960-01-01     2019-01-01 
Russian Federation                              30     1990-01-01     2019-01-01 
Rwanda                                          60     1960-01-01     2019-01-01 
Sao Tome and Principe                           40     1980-01-01     2019-01-01 
Saudi Arabia                                    50     1970-01-01     2019-01-01 
Senegal                                         60     1960-01-01     2019-01-01 
Serbia                                          30     1990-01-01     2019-01-01 
Sierra Leone                                    40     1980-01-01     2019-01-01 
Singapore                                       60     1960-01-01     2019-01-01 
Slovakia                                        30     1990-01-01     2019-01-01 
Slovenia                                        30     1990-01-01     2019-01-01 
South Africa                                    70     1950-01-01     2019-01-01 
Spain                                           70     1950-01-01     2019-01-01 
Sri Lanka                                       70     1950-01-01     2019-01-01 
Sudan                                           50     1970-01-01     2019-01-01 
Suriname                                        47     1973-01-01     2019-01-01 
Sweden                                          70     1950-01-01     2019-01-01 
Switzerland                                     70     1950-01-01     2019-01-01 
Taiwan                                          69     1951-01-01     2019-01-01 
Tajikistan                                      30     1990-01-01     2019-01-01 
Thailand                                        70     1950-01-01     2019-01-01 
Togo                                            40     1980-01-01     2019-01-01 
Trinidad and Tobago                             70     1950-01-01     2019-01-01 
Tunisia                                         60     1960-01-01     2019-01-01 
Turkey                                          70     1950-01-01     2019-01-01 
U.R. of Tanzania: Mainland                      60     1960-01-01     2019-01-01 
Ukraine                                         30     1990-01-01     2019-01-01 
United Kingdom                                  70     1950-01-01     2019-01-01 
United States                                   70     1950-01-01     2019-01-01 
Uruguay                                         70     1950-01-01     2019-01-01 
Uzbekistan                                      30     1990-01-01     2019-01-01 
Venezuela (Bolivarian Republic of)              70     1950-01-01     2019-01-01 
Zambia                                          65     1955-01-01     2019-01-01 
Zimbabwe                                        66     1954-01-01     2019-01-01 
================================================================================


The pdSize procedure provides a nice summary of our panel data structure including the:

  • Total number of groups and a full list of the groups.
  • Number of observations per a group.
  • Number of missing values.
  • The start and end date of each group in our panel.

While there are no missing values in this data, this isn't always the case. In fact, it is quite common that variables cover only part of the full timespan. For example, a country may have a longer history of providing real GDP data than IRR data.

The pdTimeSpans procedure reports the full timespan for each group, along with the timespans for a specified variable list. If no variable list is provided, it returns the timespan for all variables in the dataframe.

For example, suppose we want to use the emp and rgdpo variables in a model and want to know the maximum timespan our model can cover. We can use pdTimeSpans to see the timespan of each variable:

pwt_model_timespans = pdTimeSpans(pwt_10, "emp"$|"rgdpo");
pwt_model_timespans;
country       Start year         End year        emp Start          emp End      rgdpo Start        rgdpo End 
Angola       1970-01-01       2019-01-01       1970-01-01       2019-01-01       1970-01-01       2019-01-01 
Argentina       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
Armenia       1990-01-01       2019-01-01       1990-01-01       2019-01-01       1990-01-01       2019-01-01 
Aruba       1991-01-01       2019-01-01       1991-01-01       2019-01-01       1991-01-01       2019-01-01 
Australia       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
Austria       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
Azerbaijan       1990-01-01       2019-01-01       1990-01-01       2019-01-01       1990-01-01       2019-01-01 
Bahamas       1973-01-01       2019-01-01       1973-01-01       2019-01-01       1973-01-01       2019-01-01 
Bahrain       1970-01-01       2019-01-01       1970-01-01       2019-01-01       1970-01-01       2019-01-01 
Barbados       1960-01-01       2019-01-01       1960-01-01       2019-01-01       1960-01-01       2019-01-01 
Belarus       1990-01-01       2019-01-01       1990-01-01       2019-01-01       1990-01-01       2019-01-01 
Belgium       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
Benin       1980-01-01       2019-01-01       1980-01-01       2019-01-01       1980-01-01       2019-01-01 
Bermuda       1986-01-01       2019-01-01       1986-01-01       2019-01-01       1986-01-01       2019-01-01 
Bolivia (Plurina       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
Bosnia and Herze       1990-01-01       2019-01-01       1990-01-01       2019-01-01       1990-01-01       2019-01-01 
Botswana       1960-01-01       2019-01-01       1960-01-01       2019-01-01       1960-01-01       2019-01-01 
Brazil       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
British Virgin I       1991-01-01       2019-01-01       1991-01-01       2019-01-01       1991-01-01       2019-01-01 
Bulgaria       1970-01-01       2019-01-01       1970-01-01       2019-01-01       1970-01-01       2019-01-01 
Burkina Faso       1959-01-01       2019-01-01       1959-01-01       2019-01-01       1959-01-01       2019-01-01 
Burundi       1980-01-01       2019-01-01       1980-01-01       2019-01-01       1980-01-01       2019-01-01 
Cabo Verde       1980-01-01       2019-01-01       1980-01-01       2019-01-01       1980-01-01       2019-01-01 
Cameroon       1960-01-01       2019-01-01       1960-01-01       2019-01-01       1960-01-01       2019-01-01 
Canada       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
Cayman Islands       1991-01-01       2019-01-01       1991-01-01       2019-01-01       1991-01-01       2019-01-01 
Central African        1980-01-01       2019-01-01       1980-01-01       2019-01-01       1980-01-01       2019-01-01 
Chad       1960-01-01       2019-01-01       1960-01-01       2019-01-01       1960-01-01       2019-01-01 
Chile       1951-01-01       2019-01-01       1951-01-01       2019-01-01       1951-01-01       2019-01-01 
China       1952-01-01       2019-01-01       1952-01-01       2019-01-01       1952-01-01       2019-01-01 
China, Hong Kong       1960-01-01       2019-01-01       1960-01-01       2019-01-01       1960-01-01       2019-01-01 
China, Macao SAR       1980-01-01       2019-01-01       1980-01-01       2019-01-01       1980-01-01       2019-01-01 
Colombia       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
Costa Rica       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
Croatia       1990-01-01       2019-01-01       1990-01-01       2019-01-01       1990-01-01       2019-01-01 
Cyprus       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
Czech Republic       1990-01-01       2019-01-01       1990-01-01       2019-01-01       1990-01-01       2019-01-01 
Côte d'Ivoire       1960-01-01       2019-01-01       1960-01-01       2019-01-01       1960-01-01       2019-01-01 
Denmark       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
Djibouti       1980-01-01       2019-01-01       1980-01-01       2019-01-01       1980-01-01       2019-01-01 
Dominican Republ       1951-01-01       2019-01-01       1951-01-01       2019-01-01       1951-01-01       2019-01-01 
Ecuador       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
Egypt       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
Estonia       1990-01-01       2019-01-01       1990-01-01       2019-01-01       1990-01-01       2019-01-01 
Eswatini       1980-01-01       2019-01-01       1980-01-01       2019-01-01       1980-01-01       2019-01-01 
Fiji       1980-01-01       2019-01-01       1980-01-01       2019-01-01       1980-01-01       2019-01-01 
Finland       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
France       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
Gabon       1960-01-01       2019-01-01       1960-01-01       2019-01-01       1960-01-01       2019-01-01 
Georgia       1990-01-01       2019-01-01       1990-01-01       2019-01-01       1990-01-01       2019-01-01 
Germany       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
Greece       1951-01-01       2019-01-01       1951-01-01       2019-01-01       1951-01-01       2019-01-01 
Guatemala       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
Guinea       1980-01-01       2019-01-01       1980-01-01       2019-01-01       1980-01-01       2019-01-01 
Honduras       1970-01-01       2019-01-01       1970-01-01       2019-01-01       1970-01-01       2019-01-01 
Hungary       1970-01-01       2019-01-01       1970-01-01       2019-01-01       1970-01-01       2019-01-01 
Iceland       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
India       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
Indonesia       1960-01-01       2019-01-01       1960-01-01       2019-01-01       1960-01-01       2019-01-01 
Iran (Islamic Re       1955-01-01       2019-01-01       1955-01-01       2019-01-01       1955-01-01       2019-01-01 
Iraq       1970-01-01       2019-01-01       1970-01-01       2019-01-01       1970-01-01       2019-01-01 
Ireland       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
Israel       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
Italy       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
Jamaica       1953-01-01       2019-01-01       1953-01-01       2019-01-01       1953-01-01       2019-01-01 
Japan       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
Jordan       1954-01-01       2019-01-01       1954-01-01       2019-01-01       1954-01-01       2019-01-01 
Kazakhstan       1990-01-01       2019-01-01       1990-01-01       2019-01-01       1990-01-01       2019-01-01 
Kenya       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
Kuwait       1970-01-01       2019-01-01       1970-01-01       2019-01-01       1970-01-01       2019-01-01 
Kyrgyzstan       1990-01-01       2019-01-01       1990-01-01       2019-01-01       1990-01-01       2019-01-01 
Lao People's DR       1980-01-01       2019-01-01       1980-01-01       2019-01-01       1980-01-01       2019-01-01 
Latvia       1990-01-01       2019-01-01       1990-01-01       2019-01-01       1990-01-01       2019-01-01 
Lebanon       1970-01-01       2019-01-01       1970-01-01       2019-01-01       1970-01-01       2019-01-01 
Lesotho       1980-01-01       2019-01-01       1980-01-01       2019-01-01       1980-01-01       2019-01-01 
Lithuania       1990-01-01       2019-01-01       1990-01-01       2019-01-01       1990-01-01       2019-01-01 
Luxembourg       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
Malaysia       1955-01-01       2019-01-01       1955-01-01       2019-01-01       1955-01-01       2019-01-01 
Malta       1954-01-01       2019-01-01       1954-01-01       2019-01-01       1954-01-01       2019-01-01 
Mauritania       1977-01-01       2019-01-01       1977-01-01       2019-01-01       1977-01-01       2019-01-01 
Mauritius       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
Mexico       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
Mongolia       1980-01-01       2019-01-01       1980-01-01       2019-01-01       1980-01-01       2019-01-01 
Morocco       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
Mozambique       1960-01-01       2019-01-01       1960-01-01       2019-01-01       1960-01-01       2019-01-01 
Namibia       1960-01-01       2019-01-01       1960-01-01       2019-01-01       1960-01-01       2019-01-01 
Netherlands       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
New Zealand       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
Nicaragua       1980-01-01       2019-01-01       1980-01-01       2019-01-01       1980-01-01       2019-01-01 
Niger       1960-01-01       2019-01-01       1960-01-01       2019-01-01       1960-01-01       2019-01-01 
Nigeria       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
North Macedonia       1990-01-01       2019-01-01       1990-01-01       2019-01-01       1990-01-01       2019-01-01 
Norway       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
Oman       1970-01-01       2019-01-01       1970-01-01       2019-01-01       1970-01-01       2019-01-01 
Panama       1969-01-01       2019-01-01       1969-01-01       2019-01-01       1969-01-01       2019-01-01 
Paraguay       1951-01-01       2019-01-01       1951-01-01       2019-01-01       1951-01-01       2019-01-01 
Peru       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
Philippines       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
Poland       1970-01-01       2019-01-01       1970-01-01       2019-01-01       1970-01-01       2019-01-01 
Portugal       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
Qatar       1970-01-01       2019-01-01       1970-01-01       2019-01-01       1970-01-01       2019-01-01 
Republic of Kore       1953-01-01       2019-01-01       1953-01-01       2019-01-01       1953-01-01       2019-01-01 
Republic of Mold       1990-01-01       2019-01-01       1990-01-01       2019-01-01       1990-01-01       2019-01-01 
Romania       1960-01-01       2019-01-01       1960-01-01       2019-01-01       1960-01-01       2019-01-01 
Russian Federati       1990-01-01       2019-01-01       1990-01-01       2019-01-01       1990-01-01       2019-01-01 
Rwanda       1960-01-01       2019-01-01       1960-01-01       2019-01-01       1960-01-01       2019-01-01 
Sao Tome and Pri       1980-01-01       2019-01-01       1980-01-01       2019-01-01       1980-01-01       2019-01-01 
Saudi Arabia       1970-01-01       2019-01-01       1970-01-01       2019-01-01       1970-01-01       2019-01-01 
Senegal       1960-01-01       2019-01-01       1960-01-01       2019-01-01       1960-01-01       2019-01-01 
Serbia       1990-01-01       2019-01-01       1990-01-01       2019-01-01       1990-01-01       2019-01-01 
Sierra Leone       1980-01-01       2019-01-01       1980-01-01       2019-01-01       1980-01-01       2019-01-01 
Singapore       1960-01-01       2019-01-01       1960-01-01       2019-01-01       1960-01-01       2019-01-01 
Slovakia       1990-01-01       2019-01-01       1990-01-01       2019-01-01       1990-01-01       2019-01-01 
Slovenia       1990-01-01       2019-01-01       1990-01-01       2019-01-01       1990-01-01       2019-01-01 
South Africa       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
Spain       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
Sri Lanka       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
Sudan       1970-01-01       2019-01-01       1970-01-01       2019-01-01       1970-01-01       2019-01-01 
Suriname       1973-01-01       2019-01-01       1973-01-01       2019-01-01       1973-01-01       2019-01-01 
Sweden       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
Switzerland       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
Taiwan       1951-01-01       2019-01-01       1951-01-01       2019-01-01       1951-01-01       2019-01-01 
Tajikistan       1990-01-01       2019-01-01       1990-01-01       2019-01-01       1990-01-01       2019-01-01 
Thailand       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
Togo       1980-01-01       2019-01-01       1980-01-01       2019-01-01       1980-01-01       2019-01-01 
Trinidad and Tob       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
Tunisia       1960-01-01       2019-01-01       1960-01-01       2019-01-01       1960-01-01       2019-01-01 
Turkey       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
U.R. of Tanzania       1960-01-01       2019-01-01       1960-01-01       2019-01-01       1960-01-01       2019-01-01 
Ukraine       1990-01-01       2019-01-01       1990-01-01       2019-01-01       1990-01-01       2019-01-01 
United Kingdom       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
United States       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
Uruguay       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
Uzbekistan       1990-01-01       2019-01-01       1990-01-01       2019-01-01       1990-01-01       2019-01-01 
Venezuela (Boliv       1950-01-01       2019-01-01       1950-01-01       2019-01-01       1950-01-01       2019-01-01 
Zambia       1955-01-01       2019-01-01       1955-01-01       2019-01-01       1955-01-01       2019-01-01 
Zimbabwe       1954-01-01       2019-01-01       1954-01-01       2019-01-01       1954-01-01       2019-01-01 


Again, because we aren't missing any data, both emp and rgdpo cover the full timespan for each group as reported by pdSize.


Ready to elevate your research? Try GAUSS 25 today.

Panel Data Summary Statistics

When analyzing panel data, it's important to understand how variability is distributed across different dimensions of the data. Specifically:

  • Overall statistics which summarize the variability across all observations in the dataset, providing a high-level view of the data.
  • Within-group statistics which measure variability within each individual group, reflecting how a variable changes over time for a specific group.
  • Between-group statistics, which capture variability across groups, showing how groups differ from each other on average.

Understanding these patterns ensures that we select the right modeling approach and properly account for both group-specific and overall trends in our analysis.

We'll use the pdSummary procedure to compute these statistics. However, to simplify our examples and output moving forward, let's limit our panel to include only countries that use the Euro.

// Filter to include only Euro using countries
pwt_10 = selif(pwt_10, pwt_10[., "currency_unit"] .$== "Euro");
// Get summary statistics
pdSummary(pwt_10);
==========================================================================================
Group ID:                        country          Balanced:                             No
Valid cases:                        1125          Missings:                              0
N. Groups:                            19          T. Average:                       59.211
==========================================================================================
Variable               Measure           Mean      Std. Dev.        Minimum        Maximum
------------------------------------------------------------------------------------------
emp                    Overall          7.933         10.754          0.088         44.795
Between          -----         10.382          0.128         38.430
Within          -----          1.339          0.359         14.298
irr                    Overall          0.097          0.049          0.010          0.316
Between          -----          0.042          0.049          0.214
Within          -----          0.026          0.025          0.259
pop                    Overall         18.322         23.778          0.296         83.517
Between          -----         23.039          0.364         78.163
Within          -----          2.763          4.844         29.645
rgdpe                  Overall     457547.712     746910.097        568.248    4308861.500
Between          -----     594454.666       6365.400    2072470.938
Within          -----     431603.648   -1262654.069    2693938.274
rgdpo                  Overall     454655.015     750364.973         69.909    4275312.000
Between          -----     596209.636       5725.827    2097340.112
Within          -----     434813.497   -1283383.285    2632626.903
==========================================================================================
Non-numeric variables dropped from summary.

One very clear observation from our summary table is that our GDP variables, rgdpo and rgpde, are a much different scale than our other variables. We'll look at how to transform these next.

Transforming Data for Modeling

Because panel data usually contains a time dimension, it is very common to need to take lags or differences of our data. While this is very straightforward with time series data, doing this with panel data can be much more difficult.

Fortunately, the pdLag and pdDiff procedures, introduced in GAUSS 25, will efficiently compute panel data lags and differences for you.

$$\text{rdgpo growth rate} = \ln rgdpo_{t} - \ln rgdpo_{t-1} $$

Let's use the pdDiff procedure to create a new real GDP growth variable.

// Take natural log of rgdpo
ln_rgdpo = ln(pwt_10[., "rgdpo"]);
// Add to pwt_10 dataframe
// we need to do this so GAUSS
// can identify or panel 
// using the 'country' and 'year' variables
pwt_10 = pwt_10 ~ asDF(ln_rgdpo, "ln_rgdpo");
// Take first difference of ln_rgdpo
// GAUSS will use 'country' and 'year' to 
// automatically detect panel
gr_rgdpo = pdDiff(pwt_10[., "country" "year" "ln_rgdpo"]);
// Summarize 'gr_rgdpo' 
// GAUSS will use 'country' and 'year' to 
// automatically detect panel
call pdSummary(gr_rgdpo);
==========================================================================================
Group ID:                        country          Balanced:                             No
Valid cases:                        1106          Missings:                             19
N. Groups:                            19          T. Average:                       58.211
==========================================================================================
Variable               Measure           Mean      Std. Dev.        Minimum        Maximum
------------------------------------------------------------------------------------------
ln_rgdpo               Overall          0.036          0.092         -1.741          1.476
Between          -----          0.013          0.008          0.062
Within          -----          0.091         -1.768          1.450
==========================================================================================

Data Visualization

As a final step, let's create a quick visualization of this new variable using plotXY and the by keyword. We'll use a subset of countries to keep our plot from getting to crowded.

// Create subset of countries 
country_list = "Austria"$|"France"$|"Germany"$|"Spain"$|"Italy";
// Select data for plot
plot_data = selif(gr_rgdpo, sumr(gr_rgdpo[., "country"] .$== country_list'));
// Plot rgdpo growth variable by country
plotXY(plot_data, "ln_rgdpo~year + by(country)");

Graph of the log of GDP growth for 5 countries in GAUSS.

Conclusion

Today we've seen how the new panel data tools in GAUSS 25 can simplify your everyday panel data tasks, using a hands-on example. We've covered fundamental tasks, including:

  • Loading your data.
  • Preparing your panel dataset.
  • Exploring panel data characteristics.
  • Visualizing panel data.
  • Transforming your data for modeling.

Further Reading

  1. Introduction to the Fundamentals of Panel Data
  2. Panel Data Basics: One-way Individual Effects
  3. Get Started with Panel Data in GAUSS (Video)
  4. How to Aggregate Panel Data in GAUSS
Leave a Reply