You are here: Home » Data Mining » Loop and Macro: How to repeat yourself without going mad on Stata

Loop and Macro: How to repeat yourself without going mad on Stata

Macro and loops are tools aimed at simplifying your life and codes if you use them in the right way.

Good morning guys!

I just realized I never explained to you, which are the symbols you must know to simplify your life when dealing with Stata. Time to catch up on lost time!

Mathematical and Logical Expressions

  • < or > means less/more than a certain value, you can also combine it >= less than or equal to something
  • == equal to a certain expression. The opposite is != not equal
  • To write expressions use & (and) | (or) ! (not)
  • Ln() = natural log, exp() = exponential, sqrt()= square root, abs()= absolute value

In order to visualize all these symbols let’s make several examples. Let’s take the variable income and generate its square:

Gen inc_sq=income^2

We can create several variables that rely on conditions.

Gen rich=0

Replace rich=1 if male==1 & income>20000 or

Replace rich=1 if male==1 &  income>20000 & (profession == “Lawyer” profession == “Banker”)

In the second condition we have add a constraint on the profession. Parenthesis allow us to specify that both professions should be related to the income’s threshold of 20.000. We are telling Stata to create a variable rich for men who earn more than 20.000 euros working as lawyers or bankers. If we get rid of parenthesis then we are selecting lawyers who earn more than 20,000 and all bankers regardless of their income.

Useful Tip: I already talked about the tricks of missing values here. Now I want just add that they play an important role also in conditions. If we set up a condition for rich to earn income above 20,000 and the variable income has missing values, then this condition will take them also into account. If you do not want them to appear in your condition, you have to specify this:

Replace rich=1 if income>20000 & income!= .  // Income above the threshold and different from missing.

Useful Tip 2: If you are not sure that you are correctly constructing your variables from the disposable ones, I suggest you to tabulate the new one with its components to see if its final values have some sense!


Why do we care? Actually because our time is worth it and we don’t want to waste  a single minute in writing 50 times the same code when we can use an automation tool and create just a line of code that works on its own.  Can’t you see its strengths? Let’s make an example.

You want to build a regression model but you are not sure on how many covariates should be used and which error specification fits better. You can write the reg command twenty times adding different regressors or write a loop that will perform all these calculation in three rows. Another useful applications comes when you want to construct tables of the results. Avoid them only when you don’t have to perform several calculations and your analysis is not time-consuming.

Ready to start?

Macros can be global or local. Local means that the macro is defined within the program it was set in, whereas global means that all commands and programs can refer to the macro.  I suggest you use global macros to set up your directories and files. Their syntax is usually a two-step combination. In the first step, you define a name and the type of your macro and explain to Stata what is going to represent. In the second step, you recall your macro with the symbol $. A possible representation could be something like:

Global data “C:\\Users\Michela\Data\Stata”

Cd $data

Here I told Stata to create a global macro for the directory where I want to store my file and then to load the directory without typing anytime “C:\\…”. Stata has assigned a place in memory called data and put the directory used for the analysis in it.

If instead we want to refer to already created local macros later on in the analysis we have to use backquote and quote characters to recall them. Let’s explore their differences and applications:


In this do-file, I first replicated the global macro to set the directory. Then, I used the famous auto.dta to explore more deeply how local macro works. From auto I created three different datasets using the commands preserve and restore that allow me to preserve the data and guarantee that they will be restored after program termination in order to avoid overwriting and allow users to manipulate data meanwhile. After I created two local macros, once to recall the datasets and the other to store only the variables I have to use later. As you can observe from the first line of code of the foreach loop, local macros must be called back using two quotes, otherwise Stata do not recognize them.

I am sure that now your attention is all on that loop. Before we move to the next argument let’s anticipate something.

What you need to know: Stata has in its program some extremely useful macros. They are not exactly macros as we just presented them but we consider them as such. You can find them by typing help _variables in the Command window. For example, _N indicates the number of observations in the dataset. So, in our dataset auto, just type:

Display _N

You will immediately see how many variables are in there. Several other stored are referred to regressions and will be extremely useful once we will deal with econometrics.

N.B: It may be obvious but it also may be not. When you save your commands in the do-file or log-files it just saves your data and the list of command used, it WILL NOT automatically save the macros. When you close Stata your macros just disappear. If you want to recall them in another session you have to launch the do-file once again.


 Several automation and loop simplify your life when you are dealing with a huge quantity of data analysis.


This is the easiest one. It has a syntax common to the foreach showed above that goes like this:

While <exp> {



Exp is a condition that must be satisfied for the loop to keep on working. For example, I can write:

local i = 1
while `i’ <= 4 {
    di “count `i'”
    di “bravo.”
    local i = `i’ + 1


As you can see, writing this simple loop allowed me to obtain an iteration of the macro i. The loop tells Stata to count the value of i and say bravo (Just to show off!). Then the program increases I by 1 until the loop is not finished. You can have the same result using the forvalues command.


forvalues i=1/4 {
    di “count `i'”
    di “bravo.”

This is a cleaner version of the previous loop. And we do not need to specify the macro because it’s already inserted in the syntax of the command. I=1/4 means we are creating a loop starting at i=1 that increase I by 1 until it reaches 4 included. This loop has a different syntax from while and is similar to the foreach because it works by setting a user-defined local macro to each element of a list of strings or numbers and then execute the command within the loop repeatedly, assuming that one element is true after the other. Forvalues works with list of numbers like ¼ or 0(10)100 that is a proportion that starts at zero, increments by 10 each time until it reaches 100.  For other characteristics type help forvalues.


This is the most versatile loop of Stata. It has two main syntaxes:

foreach <loop_macro> in <list> {

foreach <loop_macro> of varlist <varlist> {


In the previous example, we used the list command because we told Stata to recall all the dataset saved with the local macro and generate the variable used inside each one of these. Once finished, the new datasets are then stored as wide_model1.  These powerful tools may also be combined loop inside loop. In this case, the inner loop run anew for each iteration of the outer loop. This is useful if you want to present regressions and outcomes but I am going to show you an example where I can introduce another useful command that is reshape.


As we can see, I used a nested foreach loop that combines both versions with in list and of varlist. I used a dataset you can download from the web that was constructed to show how to reshape and convert data from the long format to the wide one, its name is reshape1. You must type:

webuse filename  //Directly download and open the chosen dataset

The first part of the do-file reports the same commands outlined above so I skip this. The first loop tells Stata to use the datasets stored in the local macro and to reshape the selected outcomes from wide to long and then go back to wide. Then, after having ordered alphabetically the variables thanks to aorder, it is asked to the software to create two new variables for each outcome included in the local macro and to generate the time difference of these outcomes. Remember that each loop opens and closes with {}. You can always check to have closed them correctly by clicking on the – button at the left of the lines.


The dataset reshape1 is already  wide thus appear like this:


I played a bit in the loop to show how you can easily go back from a conversion to another. If we go from wide to long, a new variable j() is created in the dataset whereas if you go from long to wide format j is an already existing variable. When your data are in the long format already, they appear like:


Useful Tip: Before using reshape, you need to determine which the format of your data is. You must determine the logical observation (i) and the sub observation (j) by which you have to organize your data.