Evaluations from the Summer 2014 CMGPD Workshop at SJTU

We conducted the 4th CMGPD summer workshop at Shanghai JIaotong University this summer. As usual, we conducted a survey at the end to get student feedback. I’m a fan of making student evaluations public, so I have uploaded the scanned forms via the link below:

匿名问卷

Overall, I was pleased with the results of the workshop. We have always had good participants. This year, however, we were fortunate to have an especially large share of participants who were interested in historical topics, and had some facility with quantitative methods. In previous offerings, participants were often one or the other.

The students made presentations on the last day with preliminary results and I think that with some more work, many of them can be turned into papers.

Summer 2014 China Multigenerational Panel Dataset Workshop at SJTU (English announcement)

The 4th China Multigenerational Panel Dataset Workshop
Shanghai Jiaotong University, Minhang Campus
Shanghai, China

July 14-25, 2014

中文版

The Center for the History and Society of Northeast China at the Shanghai Jiaotong University School of Humanities will hold its 4th summer China Multigenerational Panel Data workshop from July 14 to July 25.

The workshop will focus on introducing the China Multigenerational Panel Datasets (CMGPD) as sources for the study of demography, stratification, and social and family history. These include the China Multigenerational Panel Dataset – Liaoning (CMGPD-LN) and the China Multigenerational Panel Dataset – Shuangcheng (CMGPD-SC).  The CMGPD have been released via the Inter-university Consortium for Political and Science Research.  The latest versions of the CMGPD document are available for download.

The CMGPD datasets have many unique features that make them useful not only for the study of Chinese population, social, and family history, but for the study of demographic, social and economic processes more generally.  Their features also make them useful as testbeds for researchers developing novel quantitative techniques.  The datasets are longitudinal, multi-generational, and structured at multiple levels, including the individual, the household, the kin group, the community, the administrative unit, and the region.

UCLA Professor of Sociology Cameron Campbell will be the primary lecturer. Guest lecturers will include Distinguished Professor and Dean of Humanities and Social Sciences at the Hong Kong University of Science and Technology James Lee; Yuxue Ren, Professor of History at Shanghai Jiaotong University; and Dong Hao, PhD student at the Hong Kong University of Science and Technology.

This class is intended to 1) introduce researchers to the CMGPD datasets and help them decide whether they may be useful in their own studies, 2) give current users an opportunity to learn more about the origin and context of the data, and 3) give participants basic instruction in the use of STATA to describe, organize and analyze the data.   Researchers who have already started using the CMGPD-SC or CMGPD-LN are welcome to attend and take advantage of the opportunity to discuss any questions they may have with Lee, Campbell, and others who were involved in the creation of the dataset.

Lectures and discussion will focus on 1) the historical, social, economic and institutional context of the populations covered by the data, 2) key features of the data, and 3) potential applications.  There will be optional sessions to introduce the Training Guide and demonstrate basic procedures for downloading the data from the website and loading it into STATA.

Please note that while there will be basic instruction in the use of STATA to organize and analyze the data, this is not intended as a class in STATA, or introductory statistics. Students looking specifically for instruction in STATA, statistics, or data management are encouraged to look elsewhere. Again, the class is intended for participants who want to assess whether CMGPD is suitable for their research interests, or are already considering the use of the CMGPD and seek basic instruction in the use of STATA to manipulate and analyze it.

The workshop will include daily exercises to introduce key features of the data, and STATA techniques for taking advantage of these features. Participants will also complete a small project of their own design using the data and present it on the last day of the workshop.

If any non-Chinese speakers enroll, the lectures will be in English.  If the participants all speak Chinese, lectures may be in Chinese, or a mixture of English and Chinese.  Discussion will be in English and Chinese.

The Shanghai Jiaotong University Center for the History and Society of Northeast China was established as a research unit by a collaboration of the Shanghai Jiaotong University (SJTU) School of the Humanities and the Hong Kong University of Science and Technology (HKUST) School of the Humanities and Social Sciences.

Datasets

China Multigenerational Panel Dataset – Liaoning (CMGPD-LN)

The CMGPD-LN is an important dataset for the study of China’s family, social and demographic history, and for the study of demography and stratification more generally. The dataset is suitable for application of a wide variety of statistical techniques that are commonly used in social demography for the analysis of longitudinal, individual-level data, and available in the most popular statistical software packages. The dataset is distinguished by its size, temporal depth, and richness of detail on family, household and kinship context.

The materials from which the dataset was constructed are Shengjing Imperial Household Agency household registers held in the Liaoning Provincial Archives. The registers are triennial. Altogether there are 3600 of them. We transcribed a subset of them to produce the CMGPD-LN, which spans 160 years from 1749 to 1909. At present, the dataset comprises 29 register series, and consists of 1,500,000 records that describe 260000 individuals over seven generations. The CMGPD-LN is accordingly an important resource for the study of historical demography, sociology, economics, and other fields.

The CMGPD-LN and associated English-language documentation are already available for download at ICPSR.

China Multigenerational Panel Dataset – Shuangcheng (CMGPD-SC)

The CMGPD-SC covers communities of recent settlers in Shuangcheng, Heilongjiang in the last half of the nineteenth century and beginning of the twentieth. It contains 1.35 million records that describe 100,000 people. The registers cover descendants of urban migrants from Beijing and rural migrants from neighboring areas in northeast China who came to the area in the first half of the nineteenth century as part of a government organized effort to settle this largely vacant frontier region. One of the distinguishing features of this dataset is the availability of linked, individual-level landholding records for several points in time. The data also include a rich array of other indicators of household and family context and socioeconomic status.

Pending release of the CMGPD-SC through ICPSR, the data are available for download here.

Information

Dates
Monday, July 14, 2014 to Friday, July 25, 2014

Location
Shanghai Jiaotong University School of Humanities (SJTU Minhang Campus, Shanghai)

Application deadline
May 1, 2014

See link below to download application

Application procedure

Please send your personal statement, curriculum vitae, and application form (English or 中文) as attachments to chinanortheast@gmail.com.

Applications from faculty, postdoctoral researchers and graduate students are welcome. Applications from graduating college seniors will also be considered if they have already been accepted into a graduate program beginning fall 2014.  In that case, the application should include a copy of their graduate school acceptance. Any other interested parties should contact our staff at chinanortheast@gmail.com before applying to see if they will be considered.

Participants should be able to speak or read Chinese or English.  No prior experience in statistics, demography, or Chinese history is required.  Applicants must explain the reasons for their interest in the data in their application, and should demonstrate that they have background, experience or interests that in some way are relevant.

Participants who are Chinese nationals will have accommodations. Participants who are not Chinese nationals will receive assistance with arranging accommodations, and will receive a housing subsidy to help offset their costs. Participants who want other accommodations will have to arrange them on their own and will be responsible for all associated costs.

Participants should bring their own computer.

Students are responsible for all travel and local expenses, health care expenses, and other incidentals. Participants coming from abroad are strongly encouraged to confirm that their health insurance offers international coverage, or purchase travel health insurance.

Participants who are not Chinese nationals will need to obtain visas. We will issue invitation letters to facilitate the visa application. We strongly urge that accepted participants who need visas begin the application process as soon as possible after they are notified of their acceptance.

At present we expect to be able to accommodate 25-30 participants.

Links

Required Reading

Read the following before the workshop begins.  The highest priority are the specified pages in in the CMGPD-LN and CMGPD-SC User Guides.

Documentation

The documentation below is available here.

  • CMGPD-LN User Guide.  English pages 1-54, 90-96 or Chinese pages 13-64, 96-101.  Skim the descriptions of variables to look for ones that may be relevant to your research.
  • CMGPD-SC User Guide.  English pages 1-47. Again, skim the descriptions of variables to look for ones that may be relevant to your research.
  • CMGPD Training Guide. Pay particular attention to the sections at the beginning that introduce the data and highlight its distinctive characteristics.

Research Articles

  • Campbell, Cameron and James Lee. 2002 (publ. 2006). “State views and local views of population: Linking and comparing genealogies and household registers in Liaoning, 1749-1909.” History and Computing. 14(1+2):9-29.  http://papers.ccpr.ucla.edu/papers/PWP-CCPR-2004-025/PWP-CCPR-2004-025.pdf
  • Bengtsson, Tommy, Cameron Campbell, James Lee, et al. 2004.  Life Under Pressure: Mortality and Living Standards in Europe and Asia, 1700-1900. MIT Press.  Published in Chinese as 托米·本特森,康文林,李中清等. 2008. 压力下的生活:1700~1900年欧洲与亚洲的死亡率和生活水平. 北京: 社会科学文献出版社. Translated by 李霞 and 李恭忠.  Appendix A.
  • Campbell, Cameron and James Z. Lee. 2011. “Kinship and the Long-Term Persistence of Inequality in Liaoning, China, 1749-2005.” Chinese Sociological Review. 44(1):71-104.  http://www.ncbi.nlm.nih.gov/pubmed/23596557

Review Articles

  • 康文林 (Cameron Campbell).  2012.  “历史人口学 (Historical Demography).”  Chapter 8 in 梁在编 (Zai Liang ed.) 人口学 (Demography).   北京:人民大学出版社 (Beijing: Renmin University Press), 233-265.

Select one or two of the following research articles based on your own interests (or another published article that uses the CMGPD), and read before the workshop starts

  • CHEN Shuang, James Lee, and Cameron Campbell. 2010. “Wealth stratification and reproduction in Northeast China, 1866-1907.” History of the Family. 15:386-412.  http://www.ncbi.nlm.nih.gov/pubmed/21127716
  • Bengtsson, Tommy, Cameron Campbell, James Lee, et al. 2004.  Life Under Pressure: Mortality and Living Standards in Europe and Asia, 1700-1900. MIT Press.  Published in Chinese as 托米·本特森,康文林,李中清等. 2008. 压力下的生活:1700~1900年欧洲与亚洲的死亡率和生活水平. 北京: 社会科学文献出版社. Translated by 李霞 and 李恭忠.  Chapter 10.
  • Wang Feng, Cameron Campbell, and James Z. Lee. 2010. “Agency, Hierarchies, and Reproduction in Northeastern China, 1789 to 1840.” Chapter 11 in Noriko Tsuya, Wang Feng, George Alter, James Z. Lee et al. Prudence and Pressure: Reproduction and Human Agency in Europe and Asia, 1700-1900. MIT Press, 287-316.
  • Chen Shuang, Cameron Campbell, and James Z. Lee.  Forthcoming.  “Categorical Inequality and Gender Difference: Marriage and Remarriage in Northeast China, 1749-1912.”  Chapter 11 in Lundh, Christer, Satomi Kurosu, et al. Similarity in Difference.

Software

If you are not familiar with STATA, prepare for the workshop by reviewing as many of the materials for learning and using STATA at UCLA IDRE as possible. You are also strongly encouraged to watch video tutorials at the STATA website. Ideally, by the time you arrive at the workshop, you should already be able to  carry out very basic operations in STATA such as loading and saving files, creating tabulations and so forth. Do try to download the CMGPD-SC or CMGPD-LN and make sure you know how to load them and carry out very simple operations.

Recommended Reading

  • As much of the User Guides and Training Guide as you can.
  • 定宜庄, 郭松义, 李中清, 康文林. 2004. 辽东移民中的旗人社会.  上海:上海社会科学出版社.
  • Lee, James and Cameron Campbell. 1997. Fate and Fortune in Rural China: Social Organization and Population Behavior in Liaoning, 1774-1873. Cambridge University Press.
  • 李中清,王丰.  2000.  人类的四分之一: 马尔萨斯的神话与中国的现实:1700-2000。  三联·哈佛燕京学术丛书。(English: Lee, James and Wang Feng.  1999.  One Quarter of Humanity: Malthusian Mythology and Chinese Reality, 1700-2000.)
  • Bengtsson, Tommy, Cameron Campbell, James Lee, et al. 2004.  Life Under Pressure: Mortality and Living Standards in Europe and Asia, 1700-1900. MIT Press.  Published in Chinese as 托米·本特森,康文林,李中清等. 2008. 压力下的生活:1700~1900年欧洲与亚洲的死亡率和生活水平. 北京: 社会科学文献出版社. Translated by 李霞 and 李恭忠.

Tentative Schedule (at Onedrive)

Acknowledgements

Preparation of the CMGPD-LN and accompanying documentation for public release via ICPSR DSDR was supported by NICHD R01 HD057175-01A1 “Multi-Generation Family and Life History Panel Dataset” with funds from the American Recovery and Reinvestment Act.

Preparation of the CMGPD-SC and accompanying documentation for public release via ICPSR DSDR was supported by NICHHD R01 HD070985-01 “Multi-generational Demographic and Landholding Data: CMGPD-SC Public Release.”

The CMGPD summer workshops in Shanghai have been supported by Shanghai Jiaotong University, the School of Humanities, the Department of History, and the Center for the Society and History of Northeast China.  We are also grateful to staff at a variety of campus units at SJTU for their logistical support.

SJTU 2013 Social Demography Final Project

Social Demography
SJTU Summer Short Semester
2013

Due 7/25 at the beginning of class

You are to write an original research paper that uses the IPUMS website to carry out a comparative study of time trends and age patterns of the demographic and socioeconomic characteristics by education, income, ethnicity, race, region, sex, or some other variables.  The emphasis is on comparison.  If you are interested in a particular ethnicity, for example, you still need to compare it to other ethnicities or the population as a whole to establish what is distinct about it.

Please read the following directions carefully.  Since you have nearly two months to complete the project, there is no excuse for not complying with the instructions.

Your research paper should be 2000 words of text (roughly 4 single-spaced pages or 8 double-spaced pages) and 6 tables based on computations at the IPUMS site.   The paper should be organized as the text, followed by the references, followed by the tables, with each table on a separate page.  All tables should be publication quality according to the specifications below, not simply copied and pasted from the website.  Do not insert tables into the main text.  Please number all pages, and make sure that your name is on the first page.

The text should consist of four sections: Introduction, Background, Results, and Conclusion.  Below I suggest guidelines for the lengths of each of these sections.  These guidelines are not rigid, and depending on your topic and your findings the actual word count may differ.  You may end up with more or fewer words in each section than

The Introduction should explain the overall focus of the paper and specify the questions that you are interested in.   250 words should be adequate.

The Background section that provides whatever information from other published sources you think may be necessary to help a reader understand the object of your study.  For example, if your tables focus on comparison of different ethnic groups, you might provide a brief history of each group’s history in the United States that focuses on features relevant to the analysis.  If you are comparing several major cities, you might want to mention key features of each relevant to your analyses.  500 words should be sufficient.

A Results section that discusses the tables one by one, and interprets their contents in light of hypotheses or theories in the introduction.  The tables should be numbered consecutively, and referred to in the text as Table 1, Table 2 etc.

The Conclusion reviews the most interesting results in the paper and suggests further work.  250 words should be sufficient.

Tables

Each of the tables should examine relationships among a distinct set of variables.  In other words, the tables should not be repetitions of the same basic tabulation but with different filters.  At least two tables should make use of demographic or other variables unique to the American Community Survey (ACS) data, which are annual starting in 2001.  At least two tables should make use of variables from the Decennial Census data.

You may also use the Current Population Survey (CPS) data at the IPUMS site.  It tends to have much richer detail on labor force and employment characteristics.  It may also be harder to use.

For some of your tables, you may also use General Social Survey (GSS) data, which is available at a different website (http://sda.berkeley.edu/cgi-bin/hsda?harcsda+gss10).  It can be analyzed via a web interface like the one that you are already familiar with at IPUMS.  The GSS includes questions on topics like religion, political views, and so forth that are not covered in the Census.  Keep in mind that if you want to use the GSS, the tables you create should have something to do with demographic behavior, broadly defined.

Each table should also have a self-explanatory title, and the row and column headings should be sufficient to allow a reader to interpret the table without referring to your text.  Each table should include a totals column and/or totals row as appropriate.  Please format the tables so that there are no vertical lines, and only four horizontal lines: one between the title and the column headings, one between the column headings and the table contents, one between the table contents and the totals row, and one at the bottom.  Basically the table should be formatted like the ones you see in the papers in the assigned reading.  You will notice that in publications, tables almost never have vertical lines, and generally have a limited number of horizontal lines.

Either the title of the table or a note at the bottom of the table should specify any restrictions that were applied in selecting observations to be included in the calculation.  Typically this means specifying the ages that were included in the calculation, the the years.

The tables should not be copied and pasted directly from the site, but rather should be prepared to look like they were publication quality, following the guidelines above.

The tables may be frequencies or cross-tabulations like the ones you are already used to.  You are also encouraged to take advantage of some of the other tools available at the site.  You are most likely to find the comparison of means tool (https://sda.usa.ipums.org/helpfiles/helpan.htm#means) the most useful.   This allows you to calculate the mean of one variable for different combinations of other variables.  For example, you could calculate mean income (INCTOT) for different combinations of RACE and YEAR.  If you are more adventurous, you may try using the correlation or regression tools, but these can take a long time.

Filter variables to restrict the observations included in the analysis

In constructing your tables, make sure to select or filter observations correctly to make sure the ones you include are relevant.  You can restrict the valid range of a variable used in the analysis to achieve the same effect as a filter: https://sda.usa.ipums.org/helpfiles/helpan.htm#range

Depending on the analysis that you are doing, you may want to use a filter to restrict to people of particular ages, or people with particular characteristics.  For example, when looking at completed education, EDUC, you will almost always want to restrict to people aged 25 or over, so you will only be looking at people who have completed their education.  Similarly, most of the income and occupation variables are only relevant for people of working ages, 18-55.  For details on using the selection filter at IPUMS, please see https://sda.usa.ipums.org/helpfiles/helpan.htm#filter

Recode continuous variables like income, age etc. into a manageable number of categories

When constructing tables that are tabulations, you will also want to use recode for any variable that is continuous (a quantity), not discrete (a category).  Examples include age, year of birth, and almost any of the income variables.  If you are working with age, instead of having a separate row or column for each single year of age (1,2,3, etc.) you will want to have a limited number of age groups: 1-9, 10-19 and so on.  Similarly, If you want to use total income (INCTOT), income from wages (INCWAGE), or other variables that record an amount in dollars, not a category, you will definitely need to recode the original values into into categories.

If you attempt to carry out a tabulation in which one of the income variables is a row, column, or control variable, and don’t record, the tabulation will almost certainly fail, with an error message indicating that there are too many rows or columns.  The definition of your income categories will depend on the year that you are looking at.  Because of inflation, typical incomes change dramatically over time.  See  https://sda.usa.ipums.org/helpfiles/helpan.htm#recode on how to carry out a recode.

Exclude observations with missing or not available (N/A) values

You will also need to exclude missing or not available (N/A) values, especially if you are computing a mean.  In the IPUMS data, when information is missing for a variable in a particular observation, that is typically represented with a numeric value that will be included in any mean that you compute, unless you exclude it.  This is especially important for income variables.  In total income (INCTOT), missing data is represented by 9999999: https://usa.ipums.org/usa-action/variables/INCTOT/#codes_section.  For wage income (INCWAGE), missing is represented as 999999: https://usa.ipums.org/usa-action/variables/INCWAGE/#codes_section.  For the socioeconomic index (SEI), N/A is represented as 0: https://usa.ipums.org/usa-action/variables/SEI/#codes_section And so on.  If you fail to exclude the numeric codes for missing values from the calculation of a mean, you may get peculiarly high values (if N/A was being represented as 999999) or particularly low values (if N/A was being represented as 0).  If you are using other variables, you will need to check the documentation for them to see how missing or N/A was coded, and then exclude those values.

Demographic and Socioeconomic Characteristics to Treat as Outcomes/Dependent Variables

Basic demographic and socioeconomic variables available in most of the decennial Censuses that you might want to consider as outcomes (dependent variables) include but are not limited to:

  • Current marital status (MARST)
  • Number of children born (CHBORN)
  • Age at first marriage (AGEMARR)
  • Total individual income (INCTOT)
  • Poverty status (POVERTY)
  • Educational attainment (EDUC)
  • Socioeconomic index (SEI) – this is a commonly used measure of the standing of an individual’s occupation.
  • Of course if you have found another variable that you are interested in, you are welcome to use that.  Some of you have mentioned school enrollment, home ownership, type of school, health insurance, and so forth.

The ACS also includes a rich set of demographic variables that could be used as outcomes.  The ACS are the data that show up annually since 2000 for 2001, 2002, 2003 etc.  The most interesting relevant to the class are some variables for very recent years that indicate whether certain events have occurred in the last year, and could be the basis of the calculation of rates, as opposed to percentages:

These lists are only meant as suggestions, and if you have other interests that can be addressed with other variables you have found, you may pursue them.

Demographic  and Socioeconomic Characteristics to Treat as Explanatory/Independent Variables

Generally your explanatory variables should precede your outcome variables in time.  That doesn’t always  mean they have a causal effect on the outcome, but a causal interpretation is at least more plausible.  So, for example, you might examine number of children born (CHBORN) for women aged 45 according to their level of education (EDUC), but you probably won’t think about studying the education of women aged 45 according to their number of children.  The variables are of course the same in both cases, but the interpretation of which is an outcome and which is explanatory differs.

  • Race (RACE) – Note that since 2000, Race includes codes identifying people who have said they were two or more races.   There are also codes since 2000 for single races, for example, RACASIAN
  • Hispanic (HISPAN) – Note that Hispanic status is separate from race.
  • A variety of other nativity and ancestry variables are available at http://usa.ipums.org/usa-action/variables/group/race_eth.  The availability of these variables tends to change over time, so there isn’t really one nativity or ancestry variable that is available on a continuous basis since 1850.  I will post a separate guide to using some of the key variables.
  • Geographic identifiers in http://usa.ipums.org/usa-action/variables/CITY#codes_section.  Note that the IPUMS doesn’t offer any more detail than City, so with IPUMS you can’t compare different neighborhoods in the same city.
  • Of course you can use EDUC, INCTOT and other variables as explanatory variables, just make sure that your dependent variable comes after them in time.

Examples of tables you could construct

  • Use the comparison of means to look at mean number of children born for people of difference races in different years.  In this case, you would select number of children as your dependent variable, and RACE and YEAR as row and column variables.  You would probably want to filter to restrict to (for example) women who were old enough to have completed their childbearing, say 50 years old.  You might want to restrict to decennial census years.
  • Use the comparison of means to look at mean income for people of different ages with different levels of education.  In this case you would select income as your dependent variable, and age and education as your rows and columns.  You would probably want to set a filter to restrict to ages when people might actually have incomes, for example, 25-55.  You would want to recode age so that instead of having fifty rows, one for each age, you have three rows, one for each ten year age group.

Reminders

  • My posts with IPUMS tips and tracks are accessible via http://camerondcampbell.me/category/ipums/ Make sure to review to see if there is anything that helps you.
  • If you are trying to use an income variable such as INCTOT as a row or column variable, you will need to record it into a limited number of categories in order for a table to work.  If you simply specify INCTOT or another income variable as a row or column variable, the table won’t run, because there are too many distinct values, requiring thousands of columns or rows.  You will need to use the recode to regroup incomes into a manageable number of categories, and of course exclude 9999999 and 9999998.
  • Most if not all of the income variables, including INCTOT, FINCTOT, and HINCTOT, code missing values or not available as 9999999,  9999998, 999999, 999998, or some variant thereof.  INCTOT codes missing values as 9999999: https://usa.ipums.org/usa-action/variables/INCTOT/#codes_section.  If you are carrying out a comparison of means, you need to exclude those observations because the average shouldn’t include these values.  You could do this by putting inctot(*-9999997) in the filter.
  • Similarly, If you are categorizing income, make sure that the highest category of income doesn’t include 9999998 and 9999999.  For example, inctot(r:0-9999;10000-19999;20000-29999;30000-39999;40000-49999;50000-9999997)
  • Many of the fertility variables use 0 to indicate missing or no response, 1 to indicate no births or no children.  For example, the ACS variable FERTYR is 0 for Not Available, 1 for no births in the last year, and 2 for one or more births in the last year: https://usa.ipums.org/usa-action/variables/FERTYR#codes_tab .  Similarly, CHBORN is 0 for not available, 1 for no children, 2 for one child, and so forth: In those cases, 2 often means 1 child, 3 means 2 children and so forth: https://usa.ipums.org/usa-action/variables/CHBORN#codes_tab   Be attentive to this when you interpret .  If you are computing mean number of children, or mean numbers of births, you will often want to subtract one from the numbers you present.
  • If you are computing averages of any variables via Comparison of Means, make sure to inspect the detailed documentation for those variables to find out how missing values are coded, and use a selection filter to exclude them.
  • Again, use selection filters to make sure that the observations you include are relevant to the question you are interested in.  For example, if you want to use school to look at whether or not someone is currently enrolled in school, you would want to restrict to people who have a chance of being currently enrolled by applying a selection filter based on age.  Restricting to age(14-18), for example, would let you look at people who were eligible to be eligible to be in high school.  If you are looking at completed education, normally you would want to restrict to ages 25 and above.
  • Remember that not every variable is available in every year.  For the variables you are interested in, check to see which years they are available in.  Some very interesting variables are only available in one or two years.  The variables related to ethnicity, nativity, and origin are especially prone to change.
  • Remember that 2001-2009 are based on the ACS.  If you just want to present data from the decennial Census, you would restrict to years 1850-2000, and if you just wanted ACS data, you would restrict to 2001-2009.
  • Keep in mind that the ACS has some nice variables that allow for direct computation of certain demographic rates, like whether or not someone has married in the last year, whether or not someone has had a birth in the last year, and so forth.

 

2013 SJTU Summer Short Course: Social Demography

Social Demography

Shanghai Jiaotong University
Summer Short Semester 2013
7/1/2013-7/26/2013

Course description at Shanghai Jiaotong University website: http://summer.jwc.sjtu.edu.cn/web/sjtu/XJXQ/198690.htm

INTRODUCTION

This is an overview class intended to familiarize students with key concepts, major debates, and recent research in population and social demography. The focus will be on contemporary trends in marriage, childbearing, divorce, migration, and health and mortality. Issues discussed will be a balanced mixture of topics of academic interest, contemporary relevance, and policy concern. Along the way, methods and data sources used in the study of population and social demography will be introduced. Readings will include academic publications that are examples of classic or recent work in key issues of population or social demography. Students should come away with the class with an awareness of the range of issues considered in population studies and social demography, a basic understanding of relevant data and methods, and an ability to read articles related to population in an informed and critical fashion.

The emphasis will be on trends and patterns in demographic behavior in the contemporary United States, in historical and comparative perspective.

INSTRUCTOR

Cameron Campbell, camcam@ucla.edu

FORMAT

The class will meet twice a week for four weeks. Each class meeting will last for three hours. The first half of each class meeting will be devoted to lecture relevant to the topic and assigned readings. After a break, the second half will be devoted to class discussion and student presentations of optional readings.

REQUIREMENTS

  • Attendance – 10% Attendance will be taken at each lecture.
  • Discussion – 10% Part of each class meeting will be reserved for discussions of the lecture and the assigned readings. Students are also welcome to initiate discussion or ask questions during lecture, without waiting for the time dedicated to discussion.  Students will be expected to participate in discussion.
  • Research project (written) – 35% Students will complete a research paper describing and interpreting patterns and trends in demographic and socioeconomic characteristics of an ethnic group, state or other geographic region (city etc.), or other well-defined subpopulation, using data from IPUMS USA (http://usa.ipums.org/usa/). Characteristics of interest may include age and sex distribution, marital status, childbearing, and educational attainment. For the paper, students will carry out tabulations at the IPUMS website, produce tables or graphs, and write accompanying text that refers to relevant literature to interpret observed trends. The text should be about 5-7 double-spaced pages of text.
    • Tables, graphs, and references follow at the end and do not count toward the page requirement.
    • All papers must have a reference section
    • Please begin familiarizing yourself with the IPUMS website as soon as possible. In addition to visiting the main IPUMS USA page (http://usa.ipums.org/usa/), please make sure to visit the main page for the Online Data Analysis system (ODA) that you will be using to do the calculations for your research paper: http://usa.ipums.org/usa/sda/. There is also a short set of instructions for using the ODA at: http://usa.ipums.org/usa/resources/sda/sdainstructions.pdf
    • If you are especially interested in economic characteristics of your population of interest, you may also want to consider using Current Population Survey (CPS) data: http://cps.ipums.org/cps/. The Online Data Analysis system for the CPS is available at: http://cps.ipums.org/cps/sda
    • The detailed prompt for the research project is available separately.
    • You may work together on your projects in teams of 2 people.  For team projects, the length requirement is multiplied by the number of team members.  Thus, a paper from a team of two should be 10-14 pages.
  • Presentation on research project  – 15% Students will make short presentations on their research papers at the last two class meetings.
  • Assignments – 30% Assignments will introduce students to various web resources for population and demography.  Assignments should be handed in to the TA at the beginning of the class on the day that they are do.  See the class schedule later in the syllabus for descriptions of the assignments.

READINGS AND RESOURCES

Haupt, Arthur.  2004.  Population Handbook.  Fifth Edition.  Washington: Population Reference Bureau.  http://www.prb.org/pdf/PopHandbook_Eng.pdf

TOPICS AND READINGS ARE PRELIMINARY, AND MAY CHANGE.  CHECK BACK BEFORE CLASS STARTS.

SCHEDULE

Lecture 1 – 7/2/2013

Introduction
Sources for the study of social demography
Population growth over the long term
Population studies and the social sciences

Reading

  • McFalls, Joseph.  2007.  “”Population: A lively introduction.  Fifth Edition.”  Population Bulletin.  62(1).  Link
  • Haupt, Chapters 1 and 2

Optional, not required

  • Preston, Samuel H.  1993.  “The Contours of Demography: Estimates and Projections  Demography.  30(4):593-606.  JSTOR
  • Keyfitz, Nathan. 1975. “How do we know the facts of demography?” Population and Development Review 1(Dec):267-288. J.

Discussion

Self-introductions

Lecture 2 – 7/4/2013

Demographic behavior in the past
Marriage and childbearing before the 20th century: East-West comparisons
Household and family before the 20th century
Mortality and fertility decline, and demographic transition

Reading

Optional, not required

  • Campbell, Cameron and James Lee. 2010. “Fertility control in historical China revisited: New methods for an old debate.” History of the Family. 15:370-385. doi:10.1016/j.hisfam.2010.09.003.

Discussion

Introduction to IPUMS

Assignment 1

Please review the topics in the syllabus.  Which topic do you find most interesting?  Why?  What related to that topic would you most like to learn about?  One single-spaced page.

Lecture 3 – 7/9/2013

Marriage and Cohabitation
Trends in age at marriage and non-marriage in Asia, North America, and Europe
Non-marriage
Socioeconomic, racial and ethnic differences in marriage
Interracial marriage, educational homogamy, and other aspects of partner choice
Emerging trends: living together apart

Reading

Optional, not required

Discussion

Ideas for topics for the final paper.

Assignment 2

Review the variables available for analysis at the IPUMS website.  Make sure to look at variables available for the Decennial Censuses (1850-2010) and in the American Community Survey (annually since 2000).  After you have examined the site to see what is available.  Write a page identifying a topic you would like to work on for your final paper and listing the variables that you plan to make use of.

Lecture 4 – 7/11/2013

Racial and socioeconomic differences in childbearing in the U.S.
Non-marital childbearing and childrearing
Changing age patterns of childbearing
Ultra-low fertility in Europe and Asia

Reading

Optional, not required

The West

China

The Rest of the World

Assignment 3

Prepare two tables at the IPUMS website using variables that you are interested in. For this exercise, I strongly encourage you to learn how to recode variables, and use filters to limit the observations included in the calculation.  Recoding variables allows you to regroup values so that for example instead of having a separate row for every year of age, you can have age groups 20-24, 25-29 etc.  If you can do all of this for this exercise, completing the project should be straightforward.  Make sure to pay attention to handling of missing values.

Make sure to read the description of the final project carefully for detailed instructions on handling variables.  Pay special attention to the discussion of recoding variables, handling missing values, and restricting observations by use of filters.

For the first table, carry out a cross-tabulation of one variable against another, with appropriate restrictions on cases and so forth.  By cross-tabulation, I mean that you should select one variable of interest as a row variable, and another variable of interest as a column variable, and use the IPUMS website to prepare a table that summarizes the distribution of one of the variables as a function of the other variable.  For example, you might choose RACE as a column variable, and YEAR as a row variable, and prepare a table that presents the percentage of the population in each race category by year.  Such table might present the % white, % black etc. in 1850, 1860, and so forth.  Hopefully you can pick a different combination of variables based on your interests.  Most likely you will choose AGE or YEAR as a row variable, and something like education, race, or some other substantive variable as a column variable, and then calculate row percentages so in each year, you can present the % of the population in each of the categories of interest.  Of course you might choose some other combination, like race and education.

Make sure to apply appropriate restrictions (see the prompt for the final project for details of using filters) so that your calculation makes sense.   If you are looking at education, you will almost always want to restrict to people old enough to have finished their education, that is people 25 and above.  If you are looking at something related to marriage, you will want to restrict to people old enough to marry, that is 16 and above.  And so forth.

For the second table, use the comparison of means, to calculate the mean of one variable according to the values of two other variables chosen as row and column variables.  Here is an explanation that I prepared for using comparison of means to calculate percentages/proportions.  For example, you can use comparison of means to calculate the percentage of people who have ever been married, according to their age and level of education.  You would choose age as a row variable, education as a column variable, and then compute the mean of a recoded marital status variable to get the proportion married.  Of course you could also compute the mean of some other variable, like number of children, or income.  You may need to recode so that the mean actually makes sense.

Lecture 5 – 7/16/2013

Divorce and Union Dissolution
Trends in divorce rates: the leveling of divorce in North America, rising divorce rates in East Asia
Racial and socioeconomic differences in divorce
Implications of divorce for couples and for children

Reading

Optional, not required

Assignment 4

Select two or three of the optional readings in the syllabus that are all on a related theme, and write a review and comparison.  What hypotheses do the authors seek to test?  What data and methods do they use?  What are their conclusions?  Which of the readings do you find most convincing?  If you were to carry out a similar analysis in China, what would you focus on?

Lecture 6 – 7/18/2013

Migration
International migration
Domestic migration, residential segregation, and neighborhood formation

Reading

Lecture 7 – 7/23/2013

Health and mortality

Reading

Lecture 8 – 7/25/2013

Research project presentations

Final project due

WEB LINKS

Information for non-SJTU students about registering for the class

Class-related resources

Summer 2013 China Multigenerational Panel Dataset Workshop at SJTU (English announcement)

Summer 2013 China Multigenerational Panel Dataset Workshop
Shanghai Jiaotong University
Minhang Campus
Shanghai, China

July 15-19, 2013

中文版

The Center for the History and Society of Northeast China at the Shanghai Jiaotong University School of Humanities will hold its third summer China Multigenerational Panel Data workshop from July 15 to July 19.

The workshop will focus on introducing the China Multigenerational Panel Datasets (CMGPD) as sources for the study of demography, stratification, and social and family history. These include the China Multigenerational Panel Dataset – Liaoning (CMGPD-LN) and the China Multigenerational Panel Dataset – Shuangcheng (CMGPD-SC).  The CMGPD-LN has already been released via the Inter-university Consortium for Political and Science Research.  Data and documentation are already available for download: http://www.icpsr.umich.edu/icpsrweb/CMGPD/. Chinese language documentation for the CMGPD-LN are available for download here.  Draft documentation for the CMGPD-SC are available for download here.

The CMGPD datasets have many unique features that make them useful not only for the study of Chinese population, social, and family history, but for the study of demographic, social and economic processes more generally.  Their features also make them useful as testbeds for researchers developing novel quantitative techniques.  The datasets are longitudinal, multi-generational, and structured at multiple levels, including the individual, the household, the kin group, the community, the administrative unit, and the region.

UCLA Professor of Sociology Cameron Campbell and Distinguished Professor and Dean of Humanities and Social Sciences at the Hong Kong University of Science and Technology James Lee will be primary lecturers.  Guest lecturers will include Yuxue Ren, Professor of History at Shanghai Jiaotong University; and Dong Hao, PhD student at the Hong Kong University of Science and Technology.

This class is intended to 1) introduce researchers to the CMGPD datasets and help them decide whether they may be useful in their own studies, and 2) give current users an opportunity to learn more about the origin and context of the data.   Researchers who have already started using the CMGPD-SC or CMGPD-LN are welcome to attend and take advantage of the opportunity to discuss any questions they may have with Lee, Campbell, and others who were involved in the creation of the dataset.

Lectures and discussion will focus on 1) the historical, social, economic and institutional context of the populations covered by the data, 2) key features of the data, and 3) potential applications.  Because we have already released a Training Guide that provides instruction on carrying out basic and advanced analysis with the data, this year’s workshop will not provide instruction in STATA, or have computer exercises.  There will be optional sessions to introduce the Training Guide and demonstrate basic procedures for downloading the data from the website and loading it into STATA.

At the end of the week, participants will be asked to make a brief presentation on their ideas for making use of the data.  If participants are already working with the CMGPD, they will be welcome to make brief presentations on their work with it.  There will not be any computer exercises.

If any non-Chinese speakers enroll, the lectures will be in English.  If the participants all speak Chinese, lectures may be in Chinese.  Discussion will be in English and Chinese.

The Shanghai Jiaotong University Center for the History and Society of Northeast China was established as a research unit by a collaboration of the Shanghai Jiaotong University (SJTU) School of the Humanities and the Hong Kong University of Science and Technology (HKUST) School of the Humanities and Social Sciences.

Datasets

China Multigenerational Panel Dataset – Liaoning (CMGPD-LN)

The CMGPD-LN is an important dataset for the study of China’s family, social and demographic history, and for the study of demography and stratification more generally. The dataset is suitable for application of a wide variety of statistical techniques that are commonly used in social demography for the analysis of longitudinal, individual-level data, and available in the most popular statistical software packages. The dataset is distinguished by its size, temporal depth, and richness of detail on family, household and kinship context.

The materials from which the dataset was constructed are Shengjing Imperial Household Agency household registers held in the Liaoning Provincial Archives. The registers are triennial. Altogether there are 3600 of them. We transcribed a subset of them to produce the CMGPD-LN, which spans 160 years from 1749 to 1909. At present, the dataset comprises 29 register series, and consists of 1,500,000 records that describe 260000 individuals over seven generations. The CMGPD-LN is accordingly an important resource for the study of historical demography, sociology, economics, and other fields.

The CMGPD-LN and associated English-language documentation are already available for download at ICPSR, following a free registration. Please visit the website: http://www.icpsr.umich.edu/cmgpd

China Multigenerational Panel Dataset – Shuangcheng (CMGPD-SC)

The CMGPD-SC covers communities of recent settlers in Shuangcheng, Heilongjiang in the last half of the nineteenth century and beginning of the twentieth. It contains 1.35 million records that describe 100,000 people. The registers cover descendants of urban migrants from Beijing and rural migrants from neighboring areas in northeast China who came to the area in the first half of the nineteenth century as part of a government organized effort to settle this largely vacant frontier region. One of the distinguishing features of this dataset is the availability of linked, individual-level landholding records for several points in time. The data also include a rich array of other indicators of household and family context and socioeconomic status. We anticipate formal public release of the dataset via ICPSR in 2013 or 2014. We will provide participants in the summer class with access to drafts of the release and documentation.

Information

Dates

Monday, July 15, 2013 to Friday, July 19, 2013

Location
Shanghai Jiaotong University School of Humanities (SJTU Minhang Campus, Shanghai)
Application deadline

May 25, 2013

See link below to download application

Application procedure

Please send your personal statement, curriculum vitae, and application form as attachments to chinanortheast@gmail.com.  We will have an English language application form available soon.

Applications from faculty, postdoctoral researchers and graduate students are welcome. Applications from graduating college seniors will also be considered if they have already been accepted into a graduate program beginning fall 2013.  In that case, the application should include a copy of their graduate school acceptance. Any other interested parties should contact our staff at chinanortheast@gmail.com before applying to see if they will be considered.

Participants should be able to speak or read Chinese or English.  No prior experience in statistics, demography, or Chinese history is required.  Applicants must explain the reasons for their interest in the data in their application, and should demonstrate that they have background, experience or interests that in some way are relevant.

Participants will be offered free housing in graduate student dormitories at SJTU.  Participants who want other accommodations will have to arrange them on their own and will be responsible for all associated costs.  Participants should bring their own computer.  Students are responsible for travel and local expenses.  At present we expect to be able to accommodate 25-30 participants.

Links

Required Reading

Please complete as much of the required reading as possible before the workshop begins.  The highest priority are the assigned readings in the CMGPD-LN and CMGPD-SC User Guides.  Once these are complete

Documentation

  • CMGPD-LN User Guide.  English pages 1-54, 90-96 or Chinese pages 13-64, 96-101.  Skim the descriptions of variables to look for ones that may be relevant to your research.
  • CMGPD-SC User Guide.  English pages 1-47.
  • CMGPD Training Guide.  Please review slides 1-40.  Users who have experience or training in statistics should skim the remainder of the training guide and review the examples of the use of the guide.

Research Articles

  • Campbell, Cameron and James Lee. 2002 (publ. 2006). “State views and local views of population: Linking and comparing genealogies and household registers in Liaoning, 1749-1909.” History and Computing. 14(1+2):9-29.  http://papers.ccpr.ucla.edu/papers/PWP-CCPR-2004-025/PWP-CCPR-2004-025.pdf
  • Bengtsson, Tommy, Cameron Campbell, James Lee, et al. 2004.  Life Under Pressure: Mortality and Living Standards in Europe and Asia, 1700-1900. MIT Press.  Published in Chinese as 托米·本特森,康文林,李中清等. 2008. 压力下的生活:1700~1900年欧洲与亚洲的死亡率和生活水平. 北京: 社会科学文献出版社. Translated by 李霞 and 李恭忠.  Appendix A.
  • Campbell, Cameron and James Z. Lee. 2011. “Kinship and the Long-Term Persistence of Inequality in Liaoning, China, 1749-2005.” Chinese Sociological Review. 44(1):71-104.  http://www.ncbi.nlm.nih.gov/pubmed/23596557

Review Articles

  • 康文林 (Cameron Campbell).  2012.  “历史人口学 (Historical Demography).”  Chapter 8 in 梁在编 (Zai Liang ed.) 人口学 (Demography).   北京:人民大学出版社 (Beijing: Renmin University Press), 233-265.

Select one or two of the following research articles based on your own interests (or another published article that uses the CMGPD), and read before the workshop starts

  • CHEN Shuang, James Lee, and Cameron Campbell. 2010. “Wealth stratification and reproduction in Northeast China, 1866-1907.” History of the Family. 15:386-412.  http://www.ncbi.nlm.nih.gov/pubmed/21127716
  • Bengtsson, Tommy, Cameron Campbell, James Lee, et al. 2004.  Life Under Pressure: Mortality and Living Standards in Europe and Asia, 1700-1900. MIT Press.  Published in Chinese as 托米·本特森,康文林,李中清等. 2008. 压力下的生活:1700~1900年欧洲与亚洲的死亡率和生活水平. 北京: 社会科学文献出版社. Translated by 李霞 and 李恭忠.  Chapter 10.
  • Wang Feng, Cameron Campbell, and James Z. Lee. 2010. “Agency, Hierarchies, and Reproduction in Northeastern China, 1789 to 1840.” Chapter 11 in Noriko Tsuya, Wang Feng, George Alter, James Z. Lee et al. Prudence and Pressure: Reproduction and Human Agency in Europe and Asia, 1700-1900. MIT Press, 287-316.
  • Chen Shuang, Cameron Campbell, and James Z. Lee.  Forthcoming.  “Categorical Inequality and Gender Difference: Marriage and Remarriage in Northeast China, 1749-1912.”  Chapter 11 in Lundh, Christer, Satomi Kurosu, et al. Similarity in Difference.

Recommended Reading

  • As much of the User Guides and Training Guide as you can.
  • 定宜庄, 郭松义, 李中清, 康文林. 2004. 辽东移民中的旗人社会.  上海:上海社会科学出版社.
  • Lee, James and Cameron Campbell. 1997. Fate and Fortune in Rural China: Social Organization and Population Behavior in Liaoning, 1774-1873. Cambridge University Press.
  • 李中清,王丰.  2000.  人类的四分之一: 马尔萨斯的神话与中国的现实:1700-2000。  三联·哈佛燕京学术丛书。(English: Lee, James and Wang Feng.  1999.  One Quarter of Humanity: Malthusian Mythology and Chinese Reality, 1700-2000.)
  • Bengtsson, Tommy, Cameron Campbell, James Lee, et al. 2004.  Life Under Pressure: Mortality and Living Standards in Europe and Asia, 1700-1900. MIT Press.  Published in Chinese as 托米·本特森,康文林,李中清等. 2008. 压力下的生活:1700~1900年欧洲与亚洲的死亡率和生活水平. 北京: 社会科学文献出版社. Translated by 李霞 and 李恭忠.

Tentative schedule

Acknowledgements

Preparation of the CMGPD-LN and accompanying documentation for public release via ICPSR DSDR was supported by NICHD R01 HD057175-01A1 “Multi-Generation Family and Life History Panel Dataset” with funds from the American Recovery and Reinvestment Act.

Preparation of the CMGPD-SC and accompanying documentation for public release via ICPSR DSDR was supported by NICHHD R01 HD070985-01 “Multi-generational Demographic and Landholding Data: CMGPD-SC Public Release.”

The CMGPD summer workshops in Shanghai have been supported by Shanghai Jiaotong University, the School of Humanities, the Department of History, and the Center for the Society and History of Northeast China.  We are also grateful to staff at a variety of campus units at SJTU for their logistical support.

 

Evaluations from my summer short course on Social Demography at Shanghai Jiaotong University

I taught an undergraduate course in Social Demography this summer at Shanghai Jiaotong University.  The university introduced a short summer semester this year.  As I understand it, at least part of the reason was to give undergraduates more opportunity to take courses with faculty like myself who have visiting appointments, and are only there during the summer.  The short semester was one month long, and immediately followed the end of the spring semester.

I just received the summary of student evaluations from the short course I taught at Shanghai Jiaotong University this summer.  I embedded the spreadsheet below. I don’t know how they compare with other courses there.  At first glance, they don’t seem disastrous, which is always a relief.  I was pleased that even though the students seemed to think the homework load rather heavy, they didn’t seem hate on me as an instructor.

There were some crossed wires so some students enrolled without knowing that I would be teaching in English. I had provided a course description and syllabus that included specification of English as the language of instruction, but as far as I can understand, that was not widely disseminated to students when they were making their choices for the short semester.  I did try to summarize main points in Chinese when people’s expressions suggested an unusual level of confusion.  I do admire the students with limited English who stuck with it and plugged away and ended up doing reasonably well.  I suppose I could have lectured in Chinese, but it probably would have been painful for the students to listen to my Chinese.  More importantly, many of them plan to go abroad for graduate school, so I thought that they might prefer a relatively short and painless taste of what an English language course would be like.

I was really impressed with the students.  Shanghai Jiaotong University is one of the best science and engineering schools in China.  It attracts very smart and ambitious students.

In terms of engagement, the students were much like a typical undergraduate class at UCLA.  One-quarter to one-third of the class routinely sat at the front of the room, were very engaged, listened attentively, raised questions and expressed opinions.  Perhaps another one-third tended to sit in the middle room, and paid attention and took notes, but didn’t participate much in discussion.  And of course, just like UCLA or probably any other university, there were the students who sat in the back of the class, had their laptops open and connected to the campus internet, and I suppose were on social networks or playing Minecraft or World of Warcraft.  I can’t be too harsh on such students since when I was an undergraduate at Caltech, I was one of them.  We didn’t have laptops to bring to class, or internet connectivity, so when I attended lecture, I usually sat in the back and doodled.

For the final project, the students had to use the IPUMS site to do a basis analysis of some aspect of American population.  Since there were only four weeks the projects couldn’t be as ambitious as some of the projects that my undergraduates at UCLA attempt.  That said, most of the students produced reasonably competent analyses on a subject of their choice, mostly consisting of tabulations.  Some ambitious students attempted regression analysis, and one team of economics students downloaded data and estimated quantile regressions to model wages.   Another student who was a physics undergraduate compiled marriage statistics from IPUMs, and then wrote code in Matlab to estimate a non-linear regression to fit Coale and McNeil’s marrigae model to contemporary American marriage patterns.  I couldn’t really follow the explanation after the student introduced tensors, but it looked pretty good.

The most popular topics for the student projects were marriage and divorce patterns, especially by education, and educational and occupational attainment of immigrants, especially Chinese-Americans.  In discussion and in written work, the students generally displayed a relatively mature and sophisticated understanding of contemporary American society.  Probably they were most surprised by the regional divides in socioeconomic and demographic outcomes, and the probably not unrelated regional divides in religious, political, and social orientations.  To the extent the students had anything wrong, it was that they assumed that the entire country was very liberal and open-minded in terms of social attitudes, and weren’t really aware of how socially conservative very large swathes of the country actually are.

I was pleased by how many of the students told me they were not only from outside Shanghai, but from  small towns or rural areas in the interior provinces.  I ran into some of them after the last class and many of them were about to embark on two or three day hard-sleeper train rides to return home.  As is the case at UCLA, many students were first-generation college students, or from otherwise humble origins.  It was a nice reminder of one of the distinguishing features of Caltech, and indeed the UC schools, which among top research universities are all distinguished by their relative (emphasis here on relative!) accessibility to students from modest origins.

SJTU Summer Course in Social Demography: Final Project

Final Project
Detailed Description of Requirements
Due on paper at the last class meeting. 
You are to write an original research paper that uses the IPUMS site to carry out a basic comparative study of trends and patterns in the demographic behaviors such as marriage and reproduction by education, income, ethnicity, race, region, sex, or some other variables.  The emphasis is on comparison.  If you are interested in a particular ethnicity, for example, you still need to compare it to other ethnicities or the population as a whole to establish what is distinct about it. 
Please read the following directions carefully.  Since you have approximately 3 weeks to complete the project, there is no excuse for not complying with the instructions.
Your research paper should be based on computations at the IPUMS site.   The paper should be organized as the text, followed by the references, followed by the tables, with each table on a separate page.  You should construct four to six tables, each of which represents a different combination of variables.  Do not insert tables into the main text.  Please number all pages, and make sure that your name is on the first page.
The text should consist of four sections: Introduction, Background, Results, and Conclusion
If you are working together, the requirements for numbers of tables scales up according to the number of people in your team.
The Introduction should explain the overall focus of the paper and specify the questions that you are interested in.  
The Backgroundsection that provides whatever information from other published sources you think may be necessary to help a reader understand the object of your study.  For example, if your tables focus on comparison of different ethnic groups, you might provide a brief history of each group’s history in the United States that focuses on features relevant to the analysis.  If you are comparing several major cities, you might want to mention key features of each relevant to your analyses. 
A Results section that discusses the tables one by one, and interprets their contents in light of hypotheses or theories in the introduction.  The tables should be numbered consecutively, and referred to in the text as Table 1, Table 2 etc. 
The Conclusion reviews the most interesting results in the paper and suggests further work.  

Tables
Each of the tables should examine relationships among a distinct set of variables.  In other words, the tables should not be repetitions of the same basic tabulation but with different filters.  At least two tables should make use of ACS data, which are annual starting in 2001.  At least two tables should make use of Decennial Census data.  You may also use the Current Population Survey (CPS) data at the IPUMS site.  It tends to have much richer detail on employment and so forth.   
For your tables, you may also use General Social Survey (GSS) data, which is available at a different website (http://sda.berkeley.edu/cgi-bin/hsda?harcsda+gss10) but can be analyzed via a web interface like the one that you are already familiar with at IPUMS.  The GSS includes questions on topics like religion, political views, and so forth that are not covered in the Census.  Keep in mind that if you want to use the GSS, the tables you create should have something to do with demographic behavior, broadly defined.
Each table should also have a self-explanatory title, and the row and column headings should be sufficient to allow a reader to interpret the table without referring to the text.  Please format the tables so that there are no vertical lines, and only four horizontal lines: one between the title and the column headings, one between the column headings and the table contents, one between the table contents and the totals row, and one at the bottom.  Basically the table should be formatted like the ones you see in the papers in the assigned reading.  You will notice that in publications, tables almost never have vertical lines, and generally have a limited number of horizontal lines.  The tables should not be copied and pasted directly from the site, but rather should be prepared to look like they were publication quality, following the guidelines above.
The tables may be frequencies or cross-tabulations, or you may take advantage of some of the other analytic tools available at the site.  You are most likely to find the comparison of means tool (http://sda.usa.ipums.org/cgi-bin/sdaweb/hsda?harcsda+1850-2009) the most useful.   This allows you to calculate the mean of one variable for different combinations of other variables.  For example, you could calculate mean income (INCTOT) for different combinations of RACE and YEAR.  If you are more adventurous, you may try using the correlation or regression tools, but these can take a long time.
In constructing your tables, make sure to filter observations correctly to make sure the ones you include are relevant.   Depending on what  analysis you are doing, you may want to use a filter to restrict to particular ages, or people with particular characteristics.  You may also want to use recode for variables like age that take on many values, instead of having a separate row or column for each age (1,2,3, etc.) you just have a limited number of age groups (1-9, 10-19, etc.).  You will also need to pay attention to handling situations where there are codes for a variable that indicate that the information was missing.  Normally you will want to exclude these from your analysis.
Demographic and Socioeconomic Characteristics to Treat as Outcomes/Dependent Variables

Basic demographic and socioeconomic variables available in most of the decennial Censuses that you might want to consider as outcomes (dependent variables) include but are not limited to:
  • Current marital status (MARST)
  • Number of children born (CHBORN)
  • Age at first marriage (AGEMARR)

·         Of course if you have found another variable that you are interested in, you are welcome to use that.  Some of you have mentioned school enrollment, home ownership, type of school, health insurance, and so forth.
The ACS also includes a rich set of demographic variables that could be used as outcomes.  The ACS are the data that show up annually since 2000 for 2001, 2002, 2003 etc.  The most interesting relevant to the class are some variables for very recent years that indicate whether certain events have occurred in the last year, and could be the basis of the calculation of rates, as opposed to percentages:
  • Children born within the last year (FERTYR)
  •  Married, divorced or widowed within the last year (MARRINYR, DIVINYR, WID INYR).

 These lists are only meant as suggestions, and if you have other interests that can be addressed with other variables you have found, you may pursue them.
Socioeconomic Characteristics to Treat as Explanatory/Independent Variables
There are a vast number of variables that you could use as explanatory/independent variables in your analysis.  I have provided a few examples below.  There are many more that you can see at the IPUMS site.
  • Race (RACE) – Note that since 2000, Race includes codes identifying people who have said they were two or more races.   There are also codes since 2000 for single races, for example, RACASIAN
  • Hispanic (HISPAN) – Note that Hispanic status is separate from race.
  • A variety of other nativity and ancestry variables are available at http://usa.ipums.org/usa-action/variables/group/race_eth.  The availability of these variables tends to change over time, so there isn’t really one nativity or ancestry variable that is available on a continuous basis since 1850.  I will post a separate guide to using some of the key variables.
  •  Geographic identifiers in http://usa.ipums.org/usa-action/variables/CITY#codes_section.  Note that the IPUMS doesn’t offer any more detail than City, so with IPUMS you can’t compare different neighborhoods in the same city.
  • Total individual income (INCTOT)
  • Poverty status (POVERTY)
  • Educational attainment (EDUC)
  •  Socioeconomic index (SEI) – this is a commonly used measure of the standing of an individual’s occupation. 
Examples of tables you could construct (please come up with your own tables, don’t just produce these)

Use the comparison of means to look at mean number of children born for people of difference races in different years. In this case, you would select number of children as your dependent variable, and RACE and YEAR as row and column variables. You would probably want to filter to restrict to (for example) women who were old enough to have completed their childbearing, say 50 years old. You might want to restrict to decennial census years.

Use the comparison of means to look at mean income for people of different ages with different levels of education. In this case you would select income as your dependent variable, and age and education as your rows and columns. You would probably want to set a filter to restrict to ages when people might actually have incomes, for example, 25-55. You would want to recode age so that instead of having fifty rows, one for each age, you have three rows, one for each ten year age group.

2012 SJTU Summer Short Course: Social Demography

Social Demography
SJTU Summer Short Semester 2012

INTRODUCTION

This is an overview class intended to familiarize students with key concepts, major debates, and recent research in population and social demography. The focus will be on contemporary trends in marriage, childbearing, divorce, migration, and health and mortality. Issues discussed will be a balanced mixture of topics of academic interest, contemporary relevance, and policy concern. Along the way, methods and data sources used in the study of population and social demography will be introduced. Readings will include academic publications that are examples of classic or recent work in key issues of population or social demography. Students should come away with the class with an awareness of the range of issues considered in population studies and social demography, a basic understanding of relevant data and methods, and an ability to read articles related to population in an informed and critical fashion.

The emphasis will be on trends and patterns in demographic behavior in the contemporary United States, in historical and comparative perspective.

INSTRUCTOR

Cameron Campbell, camcam@ucla.edu


FORMAT

The class will meet twice a week for four weeks. Each class meeting will last for three hours. The first half of each class meeting will be devoted to lecture relevant to the topic and assigned readings. After a break, the second half will be devoted to class discussion and student presentations of optional readings.

REQUIREMENTS

  • Attendance – 10%Attendance will be taken at each lecture.
  • Discussion – 5%Half of each class meeting will be reserved for discussions of the lecture and the assigned readings. Students will be expected to participate in discussion. Each students will be expected to introduce and discuss at least one paper of their own choice that is relevant to the topics in class but not on the list of required readings.
  • Research project (written) – 35%Students will complete a research paper describing and interpreting patterns and trends in demographic and socioeconomic characteristics of an ethnic group, state or other geographic region (city etc.), or other well-defined subpopulation, using data from IPUMS USA (http://usa.ipums.org/usa/). Characteristics of interest may include age and sex distribution, marital status, childbearing, and educational attainment. For the paper, students will carry out tabulations at the IPUMS website, produce tables or graphs, and write accompanying text that refers to relevant literature to interpret observed trends. The text should be about 5-7 double-spaced pages of text.
    • Tables, graphs, and references follow at the end and do not count toward the page requirement.
    • All papers must have a reference section 
    • Please begin familiarizing yourself with the IPUMS website as soon as possible. In addition to visiting the main IPUMS USA page (http://usa.ipums.org/usa/), please make sure to visit the main page for the Online Data Analysis system (ODA) that you will be using to do the calculations for your research paper: http://usa.ipums.org/usa/sda/. There is also a short set of instructions for using the ODA at: http://usa.ipums.org/usa/resources/sda/sdainstructions.pdf
    • If you are especially interested in economic characteristics of your population of interest, you may also want to consider using Current Population Survey (CPS) data: http://cps.ipums.org/cps/. The Online Data Analysis system for the CPS is available at: http://cps.ipums.org/cps/sda
    • The prompt for the research project will be posted separately.
    • You may work together on your projects in teams of up to 4 people.  For team projects, the length requirement is multiplied by the number of team members.  Thus, a paper from a team of two should be 10-14 pages, and a team of three should be 15-21 pages, and a team of four should produce a paper that is 20-28 pages.
  • Research project (presentation) – 15%Students will make short presentations on their research papers at the last two class meetings.  
  • Assignments – 35%Assignments will introduce students to various web resources for population and demography.  There will be 4 to 6 assignments, and they will have equal weight.

READINGS AND RESOURCES

Haupt, Arthur.  2004.  Population Handbook.  Fifth Edition.  Washington: Population Reference Bureau.  http://www.prb.org/pdf/PopHandbook_Eng.pdf

SCHEDULE

2012 China Multigenerational Panel Data Summer Class

China Multigenerational Panel Data (CMGPD) 2012 Summer Training Workshop

Institute on the History and Society of Northeast China
School of the Humanities
Shanghai Jiaotong University
Shanghai, China

July 6, 2012 – July 20, 2012

Subject to revision.  Please check back on a regular basis for changes.

POLICIES

  • Attendance at all lectures and recitation sections is required.  Unexcused absences may be grounds for immediate dismissal.
  • Completion of all assignments is required.
  • Participants must bring their own laptop, and have STATA installed and the CMGPD-LN downloaded at the beginning of class.
  • If you already have experience working with a statistical package other than STATA, you may use it instead of STATA.  However, we may not be able to provide much assistance if you have difficulties.
  • Lectures will be in English.  The teaching assistants and I all speak Chinese, however.


REQUIRED READING

Please read the following BEFORE class begins

RECOMMENDED READING

These may be useful for participants who have less prior experience in demography, STATA, and other elements of the class.


DETAILED SCHEDULE 
(Links to shared spreadsheet with topics, assignments and readings by day)

Lectures will be in the morning. The substantive lectures will be 9:00am-10:30am. The data and methods lectures will be at 11:45am-12:15pm. Recitation will start at 1:30pm.