All you have to know to use Panel Data proficiently using Stata
Good afternoon guys!
Today I want to spend some more words on Panel Data Analysis and extend our previous knowledge to what we know as Dynamic Panel data. When do we face those data? Well, let’s say that many economic issues are dynamic by nature, like employment models.
Usually, simple dynamic model regresses a dependent variable in polynomial in time or on lags of itself, so they are characterized by the presence of a lagged dependent variable among the regressors. To see what I am talking about, let’s write down an AR(1) model with individual specific effects:
In this case, we have reasons to suppose that our dependent variable is serially correlated over time through its lag (true state dependence), through some covariates, which may be serially correlated (observed heterogeneity), or through alpha (unobserved heterogeneity). If alphas are fixed-effects then the FE estimator is inconsistent. If we use the first differences to get rid of alphas, OLS estimates remain inconsistent because of the lagged variable. Arellano and Bond suggested to use first differences to get rid of alphas and then using an IV method. This proposed method leads to consistent but not necessarily efficient estimates and is a variation of OLS in first differences model that uses an unbalanced set of instruments with further lags as instruments. The moment conditions are formed assuming that particular levels of the dependent variable are orthogonal to the differenced disturbances.
Tip: This estimator works well on datasets with many panels and few periods (N>T) and it requires that there is no autocorrelation in the idiosyncratic errors.
The Stata command is xtabond, which basic syntax is:
xtabond y x
His constructor Arellano later revealed a potential weakness of this estimator because the lagged levels are often rather poor instruments for first-differenced variables. Therefore, a new estimator commonly termed system GMM was implemented to substitute this basic one (known as difference GMM). They both have one-step and two-step variants and the new command is now: xtabond2. The xtabond2 command offers you two opportunities. You can implement a difference GMM model that treats the model as a system of equations, one for each time period, that differ only in their instrument/moment condition sets. Alternatively, you may want to implement a system GMM model, the augmented version. This was created because lagged levels are often poor instruments for first differences, especially for variables that are close to a random walk. Thus, the original equations in levels can be added to the system, and the additional moment conditions could increase efficiency. In these equations, predetermined and endogenous variables in level are instrumented with suitable lags of their own first differences.
In order to work on a proper dataset we have to type in our Stata opened session the following command:
net from http://www.stata-press.com/data/imeus/
Here, a new window will open where you can install all the presented datasets. After you installed them all, you can select the one we need (traffic) by typing: use traffic.
This dataset is related to traffic accidents and we want to specify a model of fatal accidents as depending on the prior year’s value, the state’s spirits consumption and a time trend. We are going to use a set of instrument to control for endogeneity using the gmmstyle() option and use a variable as an IV instrument with the ivstyle() option. Then we can specify we want robust estimates the estimator to be in first differences with the noleveleq option. To review this command, you can estimate GMM-systems, GMM-levels and GMM-differences and you have both one- and two-step variants, with two-step estimates being asymptotically more efficient.
ssc install xtabond2
xtabond2 fatal L.fatal spircons year, gmmstyle(beertax spircons unrate perinc) ivstyle(year) twostep robust noleveleq
As we can see from the results, the Hansen test of over identifying restriction is satisfactory, as is the test for AR(2) errors.
Tip: Usually, we expect to reject the test for AR(1) errors in an Arellano-Bond model.
To compare the difference GMM estimator to the system GMM approach we are going to retype the same command, noleveleq option excluded:
xtabond2 fatal L.fatal spircons year, gmmstyle(beertax spircons unrate perinc) ivstyle(year) twostep robust
As you can see, contrasted with the previous model, this one works worse. Moreover, the marginally significant negative coefficient on spircons casts doubt on this specification. As the dynamic panel models are instrumental variables methods, it is particularly important to evaluate the Sargan-Hansen test results and the AR test for the autocorrelation of the residuals. By construction, the residuals of the differenced equation should possess serial correlation, but if the assumption of serial independence in the original errors is warranted, the differenced residuals should not exhibit significant AR(2) behavior. If a significant AR(2) statistic arises, the second lags of the endogenous variables are not appropriate instruments for their current values.
Finally, if you are confident that you can lose yourself in these models, there’s an updated version of this command that allows also for autocorrelated errors, xtdpd.