Student evaluations for SOSC 1860 and SSMA 5010, Fall 2013


I received student evaluations for the two courses that I taught last fall, SOSC 1860 (Population and Society) and SSMA 5010 (Research Methods).

The former is a general education (Common Core in HKUST parlance) course aimed at freshmen and sophomores, while the latter is a required course in our self-taught Social Science MA program. I enjoyed teaching both courses. The students were bright and highly motivated.

Here are the evaluations for SOSC 1860.

I was initially surprised to read that the students in SOSC 1860 thought I required too much work, but eventually concluded this probably reflects that they have less prior exposure to open-ended written assignments and projects than students I have taught elsewhere. In fact, the course was a simplified version of an upper division course I taught regularly at UCLA that was only ten weeks long (versus thirteen here) yet had even more written assignments and reading. The assignments mostly required them to visit some websites to collect demographic data, and then write about trends and patterns. The final project required them to carry out an analysis at IPUMS. Talking to students here, it seems that they found the relatively open-ended assignments intimidating. The students here are just as smart and motivated as the ones I taught at UCLA, and they actually did a good job on the assignments and their final projects, thus I suspect their reaction may have more to do with lack of familiarity or confidence with open-ended written assignments than with any actual lack of ability. Several students I talked to said this was the first class they had ever taken that made such heavy use of written assignments. I will probably need to adjust the number of assignments next fall.

The evaluations for SSMA 5010 are unremarkable, and about what I expected. Some of the comments reflect that this was a new prep, and I will have to continue revising my course plan and the lecture slides. This is the first time I have taught a research methods course, and it was fun. The students were highly motivated and engaged, making it a relatively pleasant task.


SOSC 1860 W13 Final Project

Due Friday 11/29 at 11:59pm via TurnItIn.

You are to write an original research paper that uses sites such as the IPUMS USA, IPUMS International, IPUMS CPS, and the Hong Kong Census and Statistics website to carry out a comparative study of trends and patterns of demographic characteristics or behavior such as marriage, fertility, or migration by such other variables education, income, ethnicity, race, region, sex, or some other variables. 

Please read the following directions carefully.  Since you have nearly two months to complete the project and ask questions, there is no excuse for not complying with the instructions.

Your research paper should be roughly 2000 words of text (roughly 4 single-spaced pages or 8 double-spaced pages) and 5 tables based on computations at the IPUMS site or at other sites.   The paper should be organized as the text, followed by the references, followed by the tables, with each table on a separate page.  All tables should be publication quality according to the specifications below, not simply copied and pasted from the website.  Do not insert tables into the main text.  Please number all pages, and make sure that your name is on the first page.

The text should consist of four sections: Introduction, Background, Results, and Conclusion.  Below I suggest guidelines for the lengths of each of these sections.  These guidelines are not rigid, and depending on your topic and your findings the actual word count may differ.

The Introduction should explain the overall focus of the paper and explain why you think your topic is interesting.   250 words should be sufficient.

The Background section should provide whatever information from other published sources you think may be necessary to help a reader understand the object of your study.  For example, if your tables focus on comparison of different ethnic groups, you might provide a brief history of each group’s history in the United States that focuses on features relevant to the analysis.  If you are comparing several major cities, you might want to mention key features of each relevant to your analyses.  500 words should be sufficient.

A Results section that discusses the tables one by one, and interprets their contents.  The tables should be numbered consecutively, and referred to in the text as Table 1, Table 2 etc.

The Conclusion reviews the most interesting results in the paper and suggests further work.  250 words should be sufficient.


Each of your tables should examine relationships among a distinct set of variables.  In other words, the tables should not be repetitions of the same basic tabulation but with different filters. 

All of your tables should be ones that you generated yourself at one of the sites I have referred you to. The point of this exercise is to introduce you to data collection and analysis. Tables copied from yearbooks, statistical digests, government publications, or other sources, will not count toward your requirement.

The tables should not be repetitions of ones you have already constructed for a class assignment.

You may also use the Current Population Survey (CPS) data at the IPUMS site.  It tends to have much richer detail on labor force and employment characteristics.

For some of your tables, you may also use General Social Survey (GSS) data, which is available at a different website (  It can be analyzed via a web interface like the one that you are already familiar with at IPUMS.  The GSS includes questions on topics like religion, political views, and so forth that are not covered in the Census.  Keep in mind that if you want to use the GSS, the tables you create should have something to do with demographic behavior, broadly defined.

If you would like to do some comparison with Hong Kong, you may produce up to two of your tables by analyzing data at the Hong Kong Census and Statistics website. The remaining tables must be from IPUMS, IPUMS International, IPUMS CPS, or GSS.

Each table should also have a self-explanatory title, and the row and column headings should be sufficient to allow a reader to interpret the table without referring to your text.  Each table should include a totals column and/or totals row as appropriate.  Please format the tables so that there are no vertical lines, and only four horizontal lines: one between the title and the column headings, one between the column headings and the table contents, one between the table contents and the totals row, and one at the bottom.  Basically the table should be formatted like the ones you see in the papers in the assigned reading.  You will notice that in publications, tables almost never have vertical lines, and generally have a limited number of horizontal lines.

Either the title of the table or a note at the bottom of the table should specify any restrictions that were applied in selecting observations to be included in the calculation.  Typically this means specifying the ages that were included in the calculation, the the years.

The tables should not be copied and pasted directly from the websites, but rather should be prepared to look like they were publication quality, following the guidelines above.

The tables may be frequencies or cross-tabulations like the ones you are already used to.  You are also encouraged to take advantage of some of the other tools available at the site.  You are most likely to find the comparison of means tool ( the most useful.   This allows you to calculate the mean of one variable for different combinations of other variables.  For example, you could calculate mean income (INCTOT) for different combinations of RACE and YEAR.  If you are more adventurous, you may try using the correlation or regression tools, but these can take a long time.

In constructing your tables, make sure to select or filter observations correctly to make sure the ones you include are relevant.  You can restrict the valid range of a variable used in the analysis to achieve the same effect as a filter:

Depending on the analysis that you are doing, you may want to use a filter to restrict to people of particular ages, or people with particular characteristics.  For example, when looking at completed education, EDUC, you will almost always want to restrict to people aged 25 or over, so you will only be looking at people who have completed their education.  Similarly, most of the income and occupation variables are only relevant for people of working ages, 18-55.  For details on using the selection filter at IPUMS, please see

When constructing tables that are tabulations, you will also want to use recode for any variable that is continuous (a quantity), not discrete (a category).  Examples include age, year of birth, and almost any of the income variables.  If you are working with age, instead of having a separate row or column for each single year of age (1,2,3, etc.) you will want to have a limited number of age groups: 1-9, 10-19 and so on.  Similarly, If you want to use total income (INCTOT), income from wages (INCWAGE), or other variables that record an amount in dollars, not a category, you will definitely need to recode the original values into into categories.  If you attempt to carry out a tabulation in which one of the income variables is a row, column, or control variable, and don’t record, the tabulation will almost certainly fail, with an error message indicating that there are too many rows or columns.  The definition of your income categories will depend on the year that you are looking at.  Because of inflation, typical incomes change dramatically over time.  See on how to carry out a recode.

You will also need to exclude missing or not available (N/A) values, especially if you are computing a mean.  In the IPUMS data, when information is missing for a variable in a particular observation, that is typically represented with a numeric value that will be included in any mean that you compute, unless you exclude it.  This is especially important for income variables.  In total income (INCTOT), missing data is represented by 9999999:  For wage income (INCWAGE), missing is represented as 999999:  For the socioeconomic index (SEI), N/A is represented as 0: And so on.  If you fail to exclude the numeric codes for missing values from the calculation of a mean, you may get peculiarly high values (if N/A was being represented as 999999) or particularly low values (if N/A was being represented as 0).  If you are using other variables, you will need to check the documentation for them to see how missing or N/A was coded, and then exclude those values.

Demographic and Socioeconomic Characteristics to Treat as Outcomes/Dependent Variables

Basic demographic and socioeconomic variables available in most of the decennial Censuses that you might want to consider as outcomes (dependent variables) include but are not limited to:

  • Current marital status (MARST)
  • Number of children born (CHBORN)
  • Age at first marriage (AGEMARR)
  • Total individual income (INCTOT)
  • Poverty status (POVERTY)
  • Educational attainment (EDUC)
  • Socioeconomic index (SEI) – this is a commonly used measure of the standing of an individual’s occupation.
  • Of course if you have found another variable that you are interested in, you are welcome to use that.  Some of you have mentioned school enrollment, home ownership, type of school, health insurance, and so forth.

The ACS also includes a rich set of demographic variables that could be used as outcomes.  The ACS are the data that show up annually since 2000 for 2001, 2002, 2003 etc.  The most interesting relevant to the class are some variables for very recent years that indicate whether certain events have occurred in the last year, and could be the basis of the calculation of rates, as opposed to percentages:

These lists are only meant as suggestions, and if you have other interests that can be addressed with other variables you have found, you may pursue them.

Demographic and Socioeconomic Characteristics to Treat as Explanatory/Independent Variables

Generally your explanatory variables should precede your outcome variables in time.  That doesn’t always  mean they have a causal effect on the outcome, but a causal interpretation is at least more plausible.  So, for example, you might examine number of children born (CHBORN) for women aged 45 according to their level of education (EDUC), but you probably won’t think about studying the education of women aged 45 according to their number of children.  The variables are of course the same in both cases, but the interpretation of which is an outcome and which is explanatory differs.

  • Race (RACE) – Note that since 2000, Race includes codes identifying people who have said they were two or more races.   There are also codes since 2000 for single races, for example, RACASIAN
  • Hispanic (HISPAN) – Note that Hispanic status is separate from race.
  • A variety of other nativity and ancestry variables are available at  The availability of these variables tends to change over time, so there isn’t really one nativity or ancestry variable that is available on a continuous basis since 1850.  I will post a separate guide to using some of the key variables.
  • Geographic identifiers in  Note that the IPUMS doesn’t offer any more detail than City, so with IPUMS you can’t compare different neighborhoods in the same city.
  • Of course you can use EDUC, INCTOT and other variables as explanatory variables, just make sure that your dependent variable comes after them in time.

Examples of tables you could construct

  • Use the comparison of means to look at mean number of children born for people of difference races in different years.  In this case, you would select number of children as your dependent variable, and RACE and YEAR as row and column variables.  You would probably want to filter to restrict to (for example) women who were old enough to have completed their childbearing, say 50 years old.  You might want to restrict to decennial census years.
  • Use the comparison of means to look at mean income for people of different ages with different levels of education.  In this case you would select income as your dependent variable, and age and education as your rows and columns.  You would probably want to set a filter to restrict to ages when people might actually have incomes, for example, 25-55.  You would want to recode age so that instead of having fifty rows, one for each age, you have three rows, one for each ten year age group.


  • My posts with IPUMS tips and tracks are accessible via Make sure to review to see if there is anything that helps you.
  • If you are trying to use an income variable such as INCTOT as a row or column variable, you will need to record it into a limited number of categories in order for a table to work.  If you simply specify INCTOT or another income variable as a row or column variable, the table won’t run, because there are too many distinct values, requiring thousands of columns or rows.  You will need to use the recode to regroup incomes into a manageable number of categories, and of course exclude 9999999 and 9999998.
  • Most if not all of the income variables, including INCTOT, FINCTOT, and HINCTOT, code missing values or not available as 9999999,  9999998, 999999, 999998, or some variant thereof.  INCTOT codes missing values as 9999999:  If you are carrying out a comparison of means, you need to exclude those observations because the average shouldn’t include these values.  You could do this by putting inctot(*-9999997) in the filter.
  • Similarly, If you are categorizing income, make sure that the highest category of income doesn’t include 9999998 and 9999999.  For example, inctot(r:0-9999;10000-19999;20000-29999;30000-39999;40000-49999;50000-9999997)
  • Many of the fertility variables use 0 to indicate missing or no response, 1 to indicate no births or no children.  For example, the ACS variable FERTYR is 0 for Not Available, 1 for no births in the last year, and 2 for one or more births in the last year: .  Similarly, CHBORN is 0 for not available, 1 for no children, 2 for one child, and so forth: In those cases, 2 often means 1 child, 3 means 2 children and so forth:   Be attentive to this when you interpret .  If you are computing mean number of children, or mean numbers of births, you will often want to subtract one from the numbers you present.
  • If you are computing averages of any variables via Comparison of Means, make sure to inspect the detailed documentation for those variables to find out how missing values are coded, and use a selection filter to exclude them.
  • Again, use selection filters to make sure that the observations you include are relevant to the question you are interested in.  For example, if you want to use school to look at whether or not someone is currently enrolled in school, you would want to restrict to people who have a chance of being currently enrolled by applying a selection filter based on age.  Restricting to age(14-18), for example, would let you look at people who were eligible to be eligible to be in high school.  If you are looking at completed education, normally you would want to restrict to ages 25 and above.
  • Remember that not every variable is available in every year.  For the variables you are interested in, check to see which years they are available in.  Some very interesting variables are only available in one or two years.  The variables related to ethnicity, nativity, and origin are especially prone to change.
  • Remember that 2001-2009 are based on the ACS.  If you just want to present data from the decennial Census, you would restrict to years 1850-2000, and if you just wanted ACS data, you would restrict to 2001-2009.
  • Keep in mind that the ACS has some nice variables that allow for direct computation of certain demographic rates, like whether or not someone has married in the last year, whether or not someone has had a birth in the last year, and so forth.

SOSC 1860 F13 Assignment 2 Introduction to UN Data

Due via TurnItIn on Friday, October 4 at midnight.

This assignment will introduce you to a very useful web resource for international demographic data, UN Data, and will hopefully prepare you for our discussions of mortality and decline around the world. You will examine trends in demographic rates in three countries by examining data on trends in infant mortality, life expectancy and total fertility rates that you gather from the site. You will not need to do any calculations for this, just look up numbers.

Pick three countries: one developed country in Europe or North America, one developing country somewhere else in the world, and one country in East or Southeast AsiaWe are going to use UN Data ( to examine trends in infant mortality, life expectancy, and fertility from the fifties to the present for the countries you select.

You can gather data on infant mortality, life expectancy, the total fertility rate, and other social demographic and economic indicators at UN Data by typing in the name of your country and the indicator you are interested in, almost like doing a search on Google. For example, to find data on infant mortality in Mauritius, just type ‘infant mortality mauritius’ in the search box. A page will come up with search results from different UN databases and publications like “Key Global Indicators” “Millenium Development Goals Database” “World Health Organization” and so forth.

Note that for the searches below, you may need to check several results before you find one with a relatively complete series. Searching for infant mortality, for example, may turn up several different sets of numbers from different sources. You will want to pick the one with the longest reach.

The initial searches may yield projections for the future. Ignore those numbers for the time being, and only present data for years that have already elapsed. Once you have done your initial search and brought up lists of results, you can use the year filter on the left to restrict to years that have already passed.

Part 1

Examine trends in infant mortality since the 1950s in your countries. Present the basic information you recover from the website as a simple table in which you have one column for each of the countries you chose, and one row for each year. You don’t need data for every single year. Every five or ten years is fine, perhaps 1950, 1955, 1960, etc. Depending on the country, data may not be available for some of the early years. In which country did infant mortality fall the most? Based on your examination of the data, in what era did infant mortality fall the fastest in developing countries? How did infant mortality change in developed countries? What has been happening recently?

Part 2

Do the same for life expectancy at birth, separately for males and females. In what period did life expectancy increase the fastest? In which country did life expectancy increase the most? What has been happening recently? Which of your countries has the widest gap between males and females?

Part 3

Look at trends in the total fertility rate (TFR) for your countries. Note that you should search for ‘total fertility’ rather than ‘total fertility rate’ since that is what the series are titled. Have rates declined over time? If so, when did they decline the fastest? What has happened in the last few decades?

SOSC 1860 F13 Assignment 3 Introduction to IPUMS

Due via TurnItIn on Monday 10/14 at midnight.

This assignment introduces you to the Integrated Public-Use Microsamples (IPUMS), the site at the University of Minnesota that many of you will use for your research paper due at the end of the semester. For this assignment, you will visit the site, collect some basic data for a state of your choice by using the online data analysis facility (, and interpret it. Please read the instructions carefully and follow them step by step. If you follow the instructions carefully, you should be able to complete the assignment very quickly.

I am having you work with IPUMS not because it is the United States, but rather because right now, it is the largest, most detailed, and easiest to use website for analyzing a Census data. Right now there is no other online data that covers such a large population over such a long period of time (1850 to the present) with so many variables. Accordingly, it is ideal for introducing basic analysis of demographic data.

Before you start the assignment, please read the brief instructions for using the online data analysis facility at Since we will be making heavy use of restrictions and selection filters, to ensure that only cases that meet specified criteria are included in the analysis, please read the description of restrictions at and the description of selection filters at Restrictions are applied when variables are specified in ROW, COLUMN, or CONTROL, whereas selection filters are specified in Selection Filter. Since we will also be recoding/transforming variables to simplify the output, for example, by grouping observations by age, please also read the description of transformation at:

Note that this assignment will ask you to construct some nicely, presentation-quality tables and include them in the assignment you upload. You will almost certainly find it easiest to prepare tables first in Excel and then copy the resulting tables in Word. I have a blog entry explaining how to get results from IPUMS into Excel and then into Word that will make it easy for you to create fabulous looking tables:

I have posted a variety of other videos and tips for working with IPUMS ( You may want to get started with

Part 1

Since we will talk about population aging, we will start by looking at changes in the age composition of the country as a whole over the long term. Specifically, let’s look at the age distribution of the population in 1850, 1900, 1950, and 2000, focusing on the percentages of the population who were children (0-17 years), working age (18-59), and older (60+).

Since we will be making use of data from multiple years, at click on ‘United States, 1850-2009’ under ‘Use data from multiple years’:

This brings up a screen where you can specify the parameters of your analysis, which by default is a table with cells that represent counts of observations with different combinations of characteristics. Other, fancier options are available but for the time being we still stick with tabulations that produce results you can put into a table.

We would like to generate a 4×3 table in which the four columns correspond to the years 1850, 1900, 1950, and 2000, and three rows correspond to people aged 0-17, 18-59, and 60+, and the cells present the numbers of people of that age in that year, as well as that number as a ‘column percentage’, that is, as a percentage of all the observations in that year.

Since we only want data from 1850, 1900, 1950, and 2000 for our columns, for Column enter year(1850,1900,1950,2000). Since we want to group ages into three categories, 0-17,18-64, and 65+, specify the Row as age(r:0-17;18-64;65-*). The r: indicates that the values are to be recoded into the specified groups.

Please transcribe the column percentages and the column totals into a nicely formatted table based on the following template:

Age Distributions of the Population of the United States in 1850, 1900, 1950, and 2000
















Note that the entries in each column should sum to 100, since together they should account for the total population in that year. Column percentage is turned on by default on the screen so unless you change something by checking boxes for other percentages or unchecking column percentage, the percentages you see should be column percentages.

Please make sure you understand what is going on with the percentaging to make sure the numbers are being calculated in a way that makes sense. One recurring problem over the years has been that many students percentage their tables incorrectly, producing nonsensical results.

Please do prepare the table as described above. Don’t copy and paste the output directly, or print out the output and turn it in.

Write 2-3 sentences describing the trend that you see.

Part 2

Please redo 1, but limit to a state of your choice. Preferably it would be a state that was in the Union by 1850, or at least 1900, so that you can look at changes over time. You can restrict the calculation to the state of your choice by using the variable statefip, which is the FIPS code for each state. Its values are provided here:

To restrict to California, you would enter statefip(6)into the field for Selection Filter.

Make sure to prepare a nice table like the one in part 1 and include it in your submission.

Write two to three sentences comparing the state that you have chosen to country as a whole as represented in the results for 1.

Here is what the output for California looked like:

Part 3

We will now look at changes in the marital status of the population over time, since we will be discussing family change later in the semester.

The variable describing marital status is marst. Its values are described here:

We will look at changes over time in the percentage of the adult population in different marital statuses.

This time, we want rows to correspond to years. We would like a little more detail on trends in the twentieth century, and are less interested in the period before 1900, so enter year(1900,1930, 1940,1950,1960,1970,1980,1990,2000) as your row variable.

Restrict your analysis to working-age adults by entering age(18-59) as your selection filter. Including the elderly affects observed trends because of the increases in the proportion of the population who are likely to be widowed. Note that this definition is different from the one used in 1 and 2.

The column should correspond to different marital statuses. To make life easier, let’s combine marital statuses, putting all the married into one category, all the separated and divorced into another:


Make sure that row percentage is checked, and column percentage is unchecked.

From the output, prepare a nicely formatted table in which the rows are years, the columns are the different marital statuses, the entry in each cell represents the % of people aged 18 and above in that year who have that marital status.

Write a few sentences commenting on the trends that you observe. Does anything in particular catch your attention?

Part 4

Now we will look at fertility as a function of age and education, using the variable fertyr ( Fertyr is included in the recent IPUMS data which is based on the annual American Community Survey. Women aged 15-50 were asked if they had a child in the last year. It is 0 if the variable is missing or not valid (for male respondents, people aged less than 15 or more than 50), 1 if a woman said she didn’t have a child in the last year, and 2 if she had a child in the last year.

Since we want to use ACS data, go to and click on ‘ACS 2001-2011’ under ‘Use data from multiple samples’:

We are going to approach this calculation a bit differently, and introduce another capability of IPUMS. We want to compute the proportion of women who had a birth. We can do this by recoding fertyr so that 1 becomes 0, and 2 becomes 1. The average of the resulting 0’s and 1’s will be the proportion of women who had a birth.

When you reach the screen where you set up your analysis, mouse over ‘Analysis’ in the upper left, and then click on ‘Comparison of Means’ when it appears.

Once you reach the Comparison of Means screen, enter fertyr(r:0=1;1=2) as the Dependent Variable. Enter age as the row variable and educ(r:0-5;6;7-9;10;11)as the column variable. 0-5 groups people with less than a high-school education, 6 is people with a high school education, 7-9 is people with some college, 10 is college graduates, and 11 is people with some graduate school. Put age(r:15-19;20-24;25-29;30-34;35-39;40-44;45-49)as your row variable. Run the calculation, and use the results to produce a nice table that for each combination of education and age group identifies the proportion of women who have had a birth in the last year.

Write a few sentences about the patterns you observe.

Part 5

Please write about 250 words with some ideas about the project you would like to do for the class, hopefully using IPUMS data, but possibly using data from other Census sources. Explore the list of IPUMS variables at and find relevant variables are available. Make sure to name some of the variables you are interested in using. If you are especially interested in detailed economic variables, you might also want to explore the Current Population Survey: The GSS is another possibility. If you are ambitious, you can also do something using IPUMS International. You may also propose to do a comparison between HK and the United States or some other country available at IPUMS International.

Your response must make clear that you have spent some time at the IPUMS website exploring the variables. In other words, it isn’t sufficient to simply say “I want to study marriage and education.” You would need to provide additional details that show me you spent time at the website, like names of the variables, the populations you might restrict to, and so forth.

Of course, you are welcome to talk to me about your ideas.