Using ‘comparison of means’ to calculate proportions at IPUMS-USA

(I wrote this for the students in my undergraduate lecture course Introduction to Social Demography. They are working with IPUMS-USA for a final project.  I thought it might be of more general interest to others who are using IPUMS-USA for each.)

We often want to calculate the proportion of people with some characteristic according to the values of two other variables.  The characteristic of interest might be represented by a single value of a categorical variable, or one or more values of a categorical variable, or even a range of values in a continuous variable.  We can do this with the ‘comparison of means’ tab that we use to compute the mean of income, socioeconomic index, or other continuous variables.  We just have to recode the categorical variable that we are interested in into a dichotomous variable that is 1 if the person has the characteristic we are interested in, and 0 otherwise.

For example, we might want to calculate the proportion of people who have ever been married, according to year and age group.  By ‘ever been married’, we mean anyone who is currently married, or was married in the past, but is now widowed, separated, or divorced.  In the MARST variable for marital status, that would be anyone who had values 1-5.  The remaining value, 6, corresponds to people who have never been married.

Of course, we could do a cross-tabulation in which our column variable was marital status, our row variable was age, and our control variable was year.  We could add up the percentages of people in statuses 1-5 in the various tables.  Of course, we could recode 1-5 into one category and have the computer do the addition for us, but we would still end up with a lot of output to go through.

Alternatively, we could recode marital status into a dichotomous variable that takes on the value of 0 or 1 according to whether someone has ever been married, and then compute the mean of that new variable for different combinations of year and age group.  In the following example, I have set up a ‘comparison of means’ calculation in which the dependent variable is MARST recoded so that all values corresponding to categories where a person is currently married or was married in the past (MARST 1 through 5) are 1, and the never married are 0.  The mean of this variable will be the proportion of people who are married, or were married in the past but are now widowed, separated, or divorced.

In the following, pay particular attention to the use of recode in the specification of the dependent variable to turn marst into a dichotomous variable:

 1 proportion_ever_married_by_age_and_year


Below is an example setting up a calculation to calculate proportions enrolled in school.  School enrollment is originally coded so that 1 indicates that someone is not enrolled, and 2 indicates that they are enrolled.  We recode to change 1 to a 0, and 2 to a 1, so that the mean ends up being the proportion currently enrolled.  Note that for the school enrollment variable, it only makes sense to consider people who are at the right age to be enrolled in school.

2 enrollment_example

Of course, you could do this with any number of other variables, including variables that were originally numeric or continuous.  In the example below, I have transformed POVERTY so that it is 0 or 1 according to whether the household in which an individual lives is at or below the poverty line.  POVERTY is originally coded as a three digit number that represents the household’s income as a percentage of the poverty line.  100 means that a household is at the poverty line, 001-099 means that a household is below the poverty line, and 101 up to 500 means that a household is above the poverty line.  There are no values above 500 because POVERTY is top-coded: if a household is earning more than 500% of the poverty line, it is just set to 500.  In the specification of the dependent variable, I have used the recode facility to change all values of poverty that are 101 or higher to 0, and all values of 001 to 100 to 1.  The mean of the variable is therefore the proportion of people living in poverty.  Note that the recode excludes 0 because 0 indicates that the value is not available.

3 poverty_recode_example

The value in each cell represents the proportion of individuals of the specified race in each year who are in poverty.

One thought on “Using ‘comparison of means’ to calculate proportions at IPUMS-USA”

Comments are closed.