How much do we learn about public opinion in China from Weibo posts?

It seems like every piece of reporting on China these days cites as evidence of the import of some event some kind of reference to a particular Weibo post, usually one that included a photo or video of an incident, and then a count of how many times it was forwarded or commented on.  And for evidence on the public reaction to said event, it now seems de rigeur to translate observations and comments made by Weibo users.

As much as I like finding out what is on Weibo, I can’t help wonder whether we really learn much about public opinion from counts of the number of times a post was forwarded, or translations of comments made by occasional users.

I’ve been thinking about this for the last couple years as I have spent more time in China, and had more opportunity to talk to people who aren’t academics.  People certainly have lots of concerns, and strong general opinions about issues like pollution, food safety, corruption, and so forth, but what I find striking is the disconnect between the level of intensity of reactions to specific events suggested by reliance on evidence from Weibo and other social media sources, and what I see in day to day conversation.  Whereas over the last few years we have had one incident after another presented to us as transfixing the Chinese public and having tremendous import and significance, always with Weibo or social media traffic as evidence, in my own experience people are aware of these incidents, and may even be somewhat interested, but don’t seem to obsess about any one them the way that studying social media traffic would suggest.

One issue is whether Weibo users who post on current events are representative of China’s population, or even Weibo users overall.  From Weibo traffic, I suppose we learn something about the opinions of Weibo users who are active and who like to post about current events, but I don’t know if they are any more representative of the population at large in China than the people who comment anonymously on news articles at the New York Times are representative of the U.S. population.

Weibo users may be better off, or at least better educated, than China’s population.  I actually wonder if that gives the appearance of more bifurcation in the population than there actually is.  In my experience in China, my experience is that the better off or at least better educated articulate more views on most subjects that are more extreme in one direction or another than the people I run into who are not doing as well.  Perhaps the fact that they are doing well and in some extreme cases completely disconnected from the realities of day to day life allows them more opportunity to think abstractly and see the world in black and white.  Such abstraction isn’t unique to China, of course.  Here in the United States, my own observation is that the people who tend to spout the nuttiest and unrealistic political views, whether on the  left or right, tend to be people whose situation insulates from contact with people who think differently form themselves, and presents the fewest challenges to a neat and tidy view of the world as a Manichean struggle between the forces of dark and light.

Weibo users who post on current events may not be representative of Weibo users overall.  They may be braver, more engaged, or simply more rash and foolhardy, than most Weibo users.  Of the Weibo posts I see, the overwhelming majority seem to cover the same territory as Facebook status updates: complaints about how busy or tiring their day was, reposts of quotes, links to odd bits of news, commentaries on celebrities, cars and gadgets, and of course, pictures of cats, flowers, sunsets, people at tourist sites smiling and flashing V signs, and so forth.  The people who routinely post on serious subjects seem to be a distinct minority.

A specific concern I have about counts of Weibo reposts as evidence of the attention paid to an event is the lack of a basis for comparison.  When I see a statement that a post about some misbehaving official was reposted 500 times, I don’t know if 500 is a lot, or a few.  Recitation of counts of the number of times a post was reposted are almost never accompanied by any background on how many posts each day are forwarded even more times.  Nothing I have posted on Weibo, has ever been posted more than a few times, so at first glance 500 seems like a lot to me, but then again I don’t have many followers, and most of what I post is mind-numbingly boring.  If pictures of unusually fat, fluffy cats sprawled on their backs are routinely forwarded 50,000 times, then 500 seems like a very small number for something that is being presented as being of social significance.  One of these days, I’d actually like to see a distribution of counts of reposts that would tell me if 500, 5000 or even 50,000 is really an unusually large number of reposts.  Maybe such a tabulation exists somewhere, but I haven’t seen it yet.

I find the presentation of translations of posts by specific users as evidence even more questionable.  I don’t know what the views of a single user tell us, even if whatever they say is presented as being ‘typical’ of Weibo users.  I certainly wouldn’t rely on comments on articles at the New York Times or Washington Post as evidence about public opinion in the United States, unless I thought the United States was made up of ungrammatical, tin-foil hat wearing nuts who have their CAP LOCK key glued down.

Where does this leave me?  I actually do enjoy following Weibo, and I like hearing about what happens to be trending there.  The counts of reposts are interesting, and I like to see examples of what people are posting.  But I am wary of inferring much about Chinese society in general from Weibo or other social media.

I guess I wish we applied the same level of skepticism to interpreting trends on Weibo that we apply to trends on Twitter, Google+ Facebook.  It certainly is fun to see what is trending in social media, and always entertaining to see clever posts that individuals have come up with, but I don’t think we learn much that is deep or profound about the United States from whatever happens to be a popular topic of discussion on social media.  Media here generally don’t bother summarizing trends in Twitter or Facebook traffic when they’re reporting on public reaction to major events.  If they do, they present the results as more of a curiosity than anything else.

I’m not suggesting that Weibo and social media be ignored.  They’re fun and interesting.  And given the difficulties of reporting in China, and the probable impossibility of carrying out surveys on reactions to sensitive subjects, it is certainly true that there aren’t many alternatives for gauging public opinion.  But I’d like to see presentations of evidence from Weibo or other social media accompanied by some caveats about possible problems with representativeness.

Slides introducing use of STATA to organize and analyze CMGPD-LN data


UPDATE: This post is out of date. The most recent CMGPD-LN Documentation is available at the ICPSR study site: The slides referred to here have been added to the Training Guide available there (2016 October 18).

I have posted the slides from my methodological lectures at the CMGPD short course that I taught in July at Shanghai Jiaotong University.  These slides introduce many of the STATA operations necessary to carry out advanced operations with the data, most importantly using bysort, merge and certain other commands to construct complex household, life course and kinship variables.  The slides also introduce the basic ‘pre-packaged’ outcome variables and the social status variables.  They also provide examples of using STATA to produce descriptive tables and figures using the data.

Please let me know if you have any comments or suggestions.  The slides are in essence a draft of the Training Guide that we will release soon.

2012 China Multigenerational Panel Data Summer Class

China Multigenerational Panel Data (CMGPD) 2012 Summer Training Workshop

Institute on the History and Society of Northeast China
School of the Humanities
Shanghai Jiaotong University
Shanghai, China

July 6, 2012 – July 20, 2012

Subject to revision.  Please check back on a regular basis for changes.


  • Attendance at all lectures and recitation sections is required.  Unexcused absences may be grounds for immediate dismissal.
  • Completion of all assignments is required.
  • Participants must bring their own laptop, and have STATA installed and the CMGPD-LN downloaded at the beginning of class.
  • If you already have experience working with a statistical package other than STATA, you may use it instead of STATA.  However, we may not be able to provide much assistance if you have difficulties.
  • Lectures will be in English.  The teaching assistants and I all speak Chinese, however.


Please read the following BEFORE class begins


These may be useful for participants who have less prior experience in demography, STATA, and other elements of the class.

(Links to shared spreadsheet with topics, assignments and readings by day)

Lectures will be in the morning. The substantive lectures will be 9:00am-10:30am. The data and methods lectures will be at 11:45am-12:15pm. Recitation will start at 1:30pm.

Apparently I’m a member of the California School (加州学派)

In a fit of narcissism, I was searching for my name in Chinese. I was pleased to find a few recent scholarly pieces in China that list me as a member of the ‘California school’ (加州学派) of economic and social historians who work on China.  I guess if I am to be listed as the member of a faction or school, better to be listed as a member of the California School than a member of the Saskatchewan, Rhode Island, or Wyoming School.  If you’re part of a named school or faction, hopefully it is named after a place that is exotic and evocative.  If you hear ‘California school’, you imagine a band of open-minded, edgy and perhaps hip professors dressed in khaki pants and white linen shirts hashing out their differences down by the beach.

That said, I’m not sure those of us who are so listed would all agree that we have enough in common to be considered a ‘school’ or academic faction.

I guess the idea on the part of those who have lumped us all together into the ‘California school’ is that we are distinguished by pursuing new approaches to the study of Chinese social and economic history, including use of new methods and data, and a perspective that is less beholden to the influence of traditional thinking associated with European or North American scholars.  The origin of the label appears to be that almost everyone involved either teaches at a university in California, or used to.

Oddly almost everyone who disagrees with the various views espoused by members of the ‘California school’ also has some kind of California connection: they either teach somewhere in California, used to teach in California, or earned their degrees.  I guess this speaks to the dominance of California universities in the English-language scholarly literature on the social and economic history of China.  Even if you violently disagree with the ‘California school’, you’re probably still connected to California.  Unfortunately within California, affiliations don’t line up neatly, so we can’t really speak of opposing ‘Northern California’ and ‘Southern California’ schools.

Anyway, here are a few of the academic essays that discuss the ‘California school’, and list me as one of its members…

Summer 2012 China Multigenerational Panel Dataset class at SJTU (English announcement)

The Shanghai Jiaotong University Center for the History and Society of Northeast China was established as a research unit by a collaboration of the Shanghai Jiaotong University (SJTU) School of the Humanities and the Hong Kong University of Science and Technology (HKUST) School of the Humanities and Social Sciences. The Center’s second summer school will be held from July 6 to July 20. The class will focus on the use of the China Multigenerational Panel Datasets – Liaoning (CMGPD-LN) in the study demography, stratification, and social and family history. It will also preview a new dataset, the China Multigenerational Panel Dataset – Shuangcheng (CMGPD-SC) that we plan to release in 2013. HKUST Distinguished Professor and Dean of Humanities and Social Sciences advises on the organization and content course. UCLA Professor of Sociology Cameron Campbell will lecture.  If any non-Chinese speakers enroll, the lectures will be in English, otherwise lectures may be in Chinese.

These datasets are complex in many ways: longitudinal, multi-generational, and structured at multiple levels, including the individual, the household, the kin group, the community, the administrative unit, and the region.  Fully exploiting the potential offered by these data requires application of sophisticated techniques in STATA or other statistical packages to manage the data, create variables, and carry out analysis.

This class is intended to introduce students to advanced techniques required to manage and analyse the CMGPD datasets, thereby equipping them to make use of the CMGPD-LN and CMGPD-SC in their own research.


China Multigenerational Panel Dataset – Liaoning (CMGPD-LN)

The CMGPD-LN is an important dataset for the study of China’s family, social and demographic history, and for the study of demography and stratification more generally. The dataset is suitable for application of a wide variety of statistical techniques that are commonly used in social demography for the analysis of longitudinal, individual-level data, and available in the most popular statistical software packages. The dataset is distinguished by its size, temporal depth, and richness of detail on family, household and kinship context.

The materials from which the dataset was constructed are Shengjing Imperial Household Agency household registers held in the Liaoning Provincial Archives. The registers are triennial. Altogether there are 3600 of them. We transcribed a subset of them to produce the CMGPD-LN, which spans 160 years from 1749 to 1909. At present, the dataset comprises 29 register series, and consists of 1,500,000 records that describe 260000 individuals over seven generations. The CMGPD-LN is accordingly an important resource for the study of historical demography, sociology, economics, and other fields.

The CMGPD-LN and associated English-language documentation are already available for download at ICPSR, following a free registration. Please visit the website:

China Multigenerational Panel Dataset – Shuangcheng (CMGPD-SC)

The CMGPD-SC covers communities of recent settlers in Shuangcheng, Heilongjiang in the last half of the nineteenth century and beginning of the twentieth. It contains 1.35 million records that describe 100,000 people. The registers cover descendants of urban migrants from Beijing and rural migrants from neighboring areas in northeast China who came to the area in the first half of the nineteenth century as part of a government organized effort to settle this largely vacant frontier region. One of the distinguishing features of this dataset is the availability of linked, individual-level landholding records for several points in time. The data also include a rich array of other indicators of household and family context and socioeconomic status. We anticipate formal public release of the dataset via ICPSR in 2013 or 2014. We will provide participants in the summer class with access to drafts of the release and documentation.

Topics to be Covered in Class

1. Review of relevant research in related topics in social demography

2. Results on topics in social and family demography from CMGPD-LN 
3. Advanced techniques in STATA for the management and analysis of the CMGPD-LN data.  
4. Preview of the CMGPD-SC
July 6, 2012 to July 20, 2012
Shanghai Jiaotong University School of Humanities (SJTU Minhang Campus, Shanghai)
Application deadline
April 25, 2012 (see link below for application)
Application procedure
Please send your personal statement and application form as attachments to  We will have an English language application form available soon.
Applications from faculty and graduate students are welcome.  Applications from undergraduates may be considered if they have already been accepted into a graduate program beginning fall 2012.  Students should already be able to conduct basic operations in STATA, and should also have completed a basic course in linear regression.

We anticipate being able to accommodate 25 students. 
Students will be offered free housing in dormitories at SJTU.  Students who want other accommodations will have to arrange them on their own and pay for them.  Students should bring their own computer, with STATA or another statistical package already installed.  Students already familiar with other statistical packages may use them, but we will only be able to provide support to student using STATA.  Students are responsible for travel and local expenses.

Announcement of 2012 CMGPD-LN Summer Course at SJTU

We’ve begun making our detailed plans for the 2012 CMGPD-LN Summer Course at Shanghai Jiaotong University.  The Chinese-language announcement is available at our SJTU Center website, via this link:  It will be July 6 to July 20.  Since there may be non-Chinese speaking participants this year, I will probably lecture in English.  The goal of the course is to introduce participants to management and analysis of the CMGPD-LN data, with special attention to using STATA to transform the data and create new variables as needed for different analyses.

Our paper on trends in the social origins of students at elite Chinese universities

Our paper on the long-term social origins of students at Peking University and Suzhou University has appeared in China Social Science (中国社会科学). The paper’s title is “无声的革命:北京大学与苏州大学学生社会来源研究 1952-2002 (Silent Revolution: Research on the Social Origins of Peking University and Suzhou University Students, 1952-2002).” The lead authors were James Lee/李中清 (HKUST) and LIANG Chen/梁晨 (Nanjing University) and there were six additional co-authors, including myself.

My own role was fairly small, and limited largely on advising on the statistical analysis, and participating in discussions of the implications of the results. But it is an important paper, and I would rather make a minor contribution to an important paper than make a major contribution to an unimportant one. I already do a lot of the latter.

Here is the announcement of the issue that includes the paper at the China Social Science website:

Here is a place at the China Social Science website where you can view a complete abstract and download the article:

The paper presents many novel empirical findings on trends in the social origins of the students at these two universities. In my mind, the most important is the demonstration that during the period covered by the analysis, the percentage of students from farming and working class origins was much higher than at national and regional elite universities in the US.

Perhaps the only elite schools in the US in which students from modest socioeconomic origins are so well represented are the University of California campuses, including UCLA. I was just at a meeting yesterday where some basic tabulations were presented on the socioeconomic characteristics of entering freshmen at UCLA and I was pleased to see that we continue to admit and enroll large numbers of students who are first-generation college students, or from families of relatively low socioeconomic status. Based on what I have seen in tabulations from the annual Freshman Survey carried out by the Higher Education Research Institute here at UCLA, in the United States the most selective privates admit a large share of their students from high income families. Only a small portion come from modest origins.

If you can lay your hands on a copy of 中国社会科学, the full reference is

梁晨 (LIANG Chen), 张浩 (ZHANG Hao), 李兰 (LI Lan), 阮丹青 (RUAN Danching), 康文林 (Cameron Campbell), 杨善华 (YANG Shanhua), 李中清 (James Lee). 2012. “无声的革命:北京大学与苏州大学学生社会来源研究 (1952-2002) (Silent Revolution: Research on the Social Origins of Students at Peking University and Suzhou University, 1952-2002).” 中国社会科学 (Chinese Social Science). 2012(1):98-118.
For those of you who can read Chinese, here is the abstract:

1949 年以来, 中国高等教育领域出现了一场革命。高等精英教育生 源开始多样化, 以往为社会上层子女所垄断的状况被打破, 工农等社会较低阶层子 女逐渐在其中占据相当比重, 并成功地将这一比重保持到20 世纪末。基础教育的 推广、统一高考招生制度的建立以及重点中学的设置等制度安排共同推动了无声革 命的出现。这场革命虽然不及社会政治革命那样引人瞩目, 却同样意义深远。本研 究利用1952 — 2002 年间北京大学和苏州大学学生学籍卡片的翔实材料, 力图将这 一革命及其成就呈现出来, 为中国高等教育改革与发展提供借鉴.

Because much of the online discussion of our article has focused on what appears to be an increase in the share of students whose father and/or mother are cadres, James Lee and Liang Chen have provided some additional details on this trend to help clarify some key underlying features.  Below, I have added this material to this blog entry, on 3/26/2012.  We are preparing additional materials to help ‘unpack’ the findings in the article and clarify some of the key trends.
Additional points re the increase in the proportion of students whose father and mother was a cadre (from James Lee and LIANG Chen) 

Recently there has been considerable interest in our research finding that the proportion of cadre children at PKU increased during the last quarter of the twentieth century from 11 percent in 1976 to 38 percent in 1999.

This finding which was published in《无声的革命:北京大学与苏州大学学生社会来源研究(1952-2002)》中国社会科学杂志 2012 年 1 期 is based on an analysis of the social origins of some 150,000 undergraduate students who entered Peking University and Suzhou University in the last half of the twentieth century.

The article also shows several other important discoveries.

1. Based on the analysis of Suzhou University undergraduates, while the overall proportion of cadre children similarly increased, the proportion of cadre children who are from explicitly political cadre families in fact declines from 85 percent in 1965 to fewer than 45 percent in 1999

2. The proportion of Suzhou University cadre children who are from commercial enterprise cadre families, however, increases from 3.4 percent in 1976 to over 43 percent in 2001.

3. At the same time, the proportion of children of factory workers also increases from 13 percent in 1992 to 22.4 percent in 1999 at Peking University and from 11.4 percent in 1989 to 24.4 percent in 2001 at Suzhou

In fact, overall the proportion of children from blue collar families remains roughly stable at Peking University during the last quarter of the twentieth century and increases during this period at Suzhou University.

Overall by international standards, Chinese elite university admissions as demonstrated by these two universities were and continue to be remarkably open to children from non-elite families.


最近,我们的一项研究发现,北大学生中干部子女的比例从1976年接近11%增加到了1999年的近38%,这引发了社会各界地广泛关注和持续讨论。实际上,这是我们对上世纪后半叶北京大学和苏州大学招收的共约15万名本科生社会来源研究的发现之一,该研究名为《无声的革命:北京大学与苏州大学学生社会来源研究(1952-2002)》,发表在《中国社会科学》杂志 2012 年第1期上。 其实,我们的研究至少还有其他三个重要发现值得注意:

1.              同北大类似,苏大学生中的干部子女在改革开放以后也有持续的增长,但在干部群体内部,党政干部的比例已经从1965年的85%下降到了2001年的40%
2.              与此相反,苏大干部子女中的企业干部子女比例却从最低谷1976年的3.4%增加到了2001年的43%,超越党政干部成为干部子女的最大来源。
3.              同时,两校的工人子女比例也都有明显增长。其中,北大的工人子女比例从1987年的13%增加到了1998年的22.4%;苏大的工人子女比例从1989年的11.4%增加到了2001年的24.4%