Selected Publications
"Disease frames and their consequences for stigma and medical research funds." (with Rachel Best). In press in Social Science and Medicine.
Illnesses are often understood as criminal acts, as medically treatable conditions, or through metaphors of battles and journeys. Theorists suggest that frames vary across diseases and over time in systematic ways, and that frames have concrete consequences for the distribution of resources. But data limitations have prevented scholars from testing these hypotheses. We combine word embeddings and regression analysis to examine four frames for 104 conditions in news media. Our corpus includes over four million news documents published between 1980 and 2018. First, we study the determinants of disease framing by examining which diseases tend to be medicalized, criminalized, and linked to battle and journey metaphors. We find evidence for systematic links between the demographic characteristics of affected individuals and the extent to which diseases are medicalized or criminalized. Next, we examine disease frames’ consequences for stigma and federal medical research funding. While medical and criminal frames are associated with higher levels of stigma, battle and journey frames are associated with less stigma. And while medical, criminal, and battle frames are associated with more research funding, journey frames are associated with less. Together, our results identify the ways in which the social construction of disease reflects and reinforces social inequality.
Word embeddings are language models that represent words or concepts as positions in an abstract many-dimensional meaning space. Despite a growing range of applications demonstrating their utility for sociology, there is little conceptual clarity regarding what exactly embeddings measure and whether this matches what we need them to measure. Here, we fill this theoretical gap by arguing that embeddings operationalize context spaces, where words’ positions can reflect any regularity in usage. Most sociological scholarship, however, is interested in concept spaces, where positions strictly indicate meaningful conceptual features (e.g., femininity or status). Because meaningful features yield regularities in usage, context spaces can proxy for concept spaces. However, context spaces also reflect regularities in the surface form of language—e.g., syntax, morphology, and dialect—which are irrelevant to most sociological investigations and can bias cultural measurement. We draw on our framework to propose best practices for successfully measuring meaning with embeddings.
Predicted probability of clinical diagnosis of obesity for individuals with a BMI indicating obesity (30.0 kg/m2), by gender and race/ethnicity.
Note: These predicted probabilities are based on an individual with the median age in the sample (53.8 years), who is living in an area with the median poverty rate in the sample (15.3%), a college graduate or has an advanced degree, has health insurance, has no recent housing instability, reports having “good” general health, does not have sleep apnea, diabetes, heart disease, or osteoarthritis, and does have hypertension and hyperlipidemia. NH=Non-Hispanic, BMI=Body Mass Index
“Leveraging Diagnosis and Biometric Data from the NIH All of Us Project to Uncover Disparities in Obesity Diagnosis.” (with Ming Tai-Seale, Eduardo Grunvald, Crystal Wiley Cene, and Amy Sitapati). 2025. Obesity Pillars.
Despite extensive efforts to standardize definitions of obesity, clinical practices of diagnosing obesity vary widely. This study examined (1) discrepancies between biometric body mass index (BMI) measures of obesity and documented diagnoses of obesity in patient electronic health records (EHRs) and (2) how these discrepancies vary by patient gender and race and ethnicity from an intersectional lens. Our study included 383,380 participants in the National Institutes of Health All of Us Research Program dataset. Over half (60%) of participants with a BMI indicating obesity had no clinical diagnosis of obesity in their EHRs. Adjusting for BMI, comorbidities, and other covariates, women’s adjusted odds of diagnosis were far higher than men’s. However, the gender gap between women’s and men’s likelihood of diagnosis varied widely across racial groups. Overall, Non-Hispanic (NH) Black women and Hispanic women were the most likely to be diagnosed and NH-Asian men were the least likely to be diagnosed. Leveraging diagnosis and biometric data from this unique public domain dataset from the All of Us project, this study revealed pervasive disparities in diagnostic attribution by gender, race, and ethnicity.
“Talk of Family: How Institutional Overlap Drives Social Class Differences in Family-related Discourse.” (with Jessica Halliday Hardie, Judith Seltzer, and Jacob Foster). 2024. Russell Sage Foundation Journal of the Social Sciences.
We develop a novel application of machine learning and apply it to the interview transcripts from the American Voices Project (N = 1,396), using discourse atom topic modeling to explore social class variation in the centrality of family in adults’ lives. We take a two-phase approach, first analyzing transcripts at the person level and then at the line level. Our findings suggest that family, as represented by talk, is more central in the lives of those without a college degree than among the college educated. However, the degree of institutional overlap between family and other key institutions—health, work, religion, and criminal justice—does not vary by education. We interpret these findings in the context of debates about the deinstitutionalization of family in the contemporary United States. This demonstrates the value of a new method for analyzing qualitative interview data at scale. We address ways to expand the use of this method to shed light on educational disparities.
“Gendered patterns in manifest and latent mental health indicators among suicide decedents, 2003-2020 National Violent Death Reporting System (NVDRS)." (with Vickie Mays, Kai-Wei Chang, Jacob Foster, and Susan Cochran). 2024. American Journal of Public Health.
We investigate differences in the documentation of mental health symptomology between nearly 300,000 male and female suicide decedents in the US National Violent Death Reporting System (NVDRS).
Judgment Scores for Behavioral, Infectious, and Chronic Diseases over Time
Note: More positive scores indicate stronger connotations of immorality and bad personality traits. More negative scores indicate stronger connotations of morality and good personality traits.
“Stigma's uneven decline: The social construction of disease in news media.” (with Rachel Best). 2023. American Sociological Review. Code and data.
Why are some diseases more stigmatized than others? And, has disease stigma declined over time? Answers to these questions have been hampered by a lack of comparable, longitudinal data. We analyze 4.7 million news articles to create new measures of stigma for 106 health conditions from 1980 to 2018. We find that behavioral health conditions and preventable diseases attract the strongest connotations of immorality and negative personality traits, and infectious diseases are most marked by disgust. These results lend new empirical support to theories that norm enforcement and contagion avoidance drive disease stigma. Challenging existing theories, we find no evidence for a link between medicalization and stigma, and inconclusive evidence on the relationship between advocacy and stigma. Finally, we find that stigma has declined dramatically over time, but only for chronic physical illnesses. In the past four decades, disease stigma has transformed from a sea of negative connotations surrounding most diseases into two primary conduits of meaning: infectious diseases spark disgust, and behavioral health conditions cue negative stereotypes. These results show that cultural meanings are especially durable when they are anchored by interests, and that cultural changes intertwine in ways that only become visible through large-scale research.
Awarded the 2024 Best Publication Award from the American Sociological Association, Section on Mental Health
“Theoretical foundations and limits of word embeddings: What types of cultural meaning can they capture?” 2023. Sociological Methods & Research.
Measuring meaning is a central problem in cultural sociology and word embeddings may offer powerful new tools to do so. But like any tool, they build on and exert theoretical assumptions. In this paper I theorize the ways in which word embeddings model three core premises of a structural linguistic theory of meaning. Formalizing the study of meaning with word embeddings offers theoretical opportunities to clarify core concepts and debates in cultural sociology.
“School, Studying, and Smarts: Gender Stereotypes and Education Across 80 Years of American Print Media, 1930-2009.” (with Andrei Boutyline and Devin Cornell). 2023. Social Forces.
Gender stereotypes about education have important consequences for boys’ and girls’ academic outcomes. In this article, we apply computational word embeddings to a 200-million-word corpus of American print media (1930-2009) to examine how these stereotypes changed as women’s educational attainment caught up with and eventually surpassed men’s.
“Integrating topic modeling and word embedding to characterize violent deaths.” (with Susan Cochran, Vickie Mays, Kai-Wei Chang, and Jacob Foster). 2022. Proceedings of the National Academy of Sciences (PNAS). Code.
We introduce a method to identify topics in a corpus and represent documents as topic sequences. Discourse atom topic modeling (DATM) draws on advances in theoretical machine learning to integrate topic modeling and word embedding. We illustrate our method with a prominent example of underutilized text: the US National Violent Death Reporting System (NVDRS).
Awarded the 2023 Matilda White Early-Stage Investigator Paper Award from the NIH Office of Behavioral and Social Science Research.
Morality of terms related to obesity and body weight
"Machine learning as a model for cultural learning: teaching an algorithm what it means to be fat.” (with Jacob Foster). 2022. Sociological Methods & Research. Code and replication tutorial.
People learn biases, like those towards body weight, from media and other objects in our cultural environment. This paper provides a theoretical account of such cultural learning. We propose that neural word embeddings provide a parsimonious and cognitively plausible model of the representations learned from natural language. Using neural word embeddings, we extract cultural schemas about body weight from news articles. Such schemas may be subtly but pervasively activated in public culture; thus, language can chronically reproduce biases.
Awarded the 2020 Outstanding Graduate Student Paper Award from the American Sociological Association, Section on Mathematical Sociology.
Prevalence of prescription and illicit substances in the environment among suicides of non-poisoning means in the National Violent Death Reporting System 2003-2017.” (with Marissa Seamans, Vickie Mays, and Susan Cochran). 2022. The American Journal of Drug and Alcohol Abuse.
We investigate the presence of prescription and illicit drugs, either through mention in the death record or toxicology reports, among suicides attributed to nonpoisonous causes to identify patterns of risk.
“Aggression, escalation, and other latent themes in legal intervention deaths of non-Hispanic Black and White men: Results from the 2003-2017 NVDRS.” (with Jacob Foster, Vickie Mays, Kai-Wei Chang and Susan Cochran). 2021. American Journal of Public Health.
We characterize racial/ethnic differences in legal intervention‒related deaths using state-of-the-art topic modeling of law enforcement and coroner text summaries drawn from the US National Violent Death Reporting System (NVDRS).
"All Roads Lead to Polenta: Cultural Attractors at the Junction of Public and Personal Culture." (with Andrei Boutyline and Devin Cornell). 2021. Special issue on culture and cognition in Sociological Forum.
We use a word embedding model to simulate a “telephone game” where each actor partially hears an utterance, uses their cultural schemas to guess the missing word, and tells the result to the next actor. We find that these transmission chains are often pulled toward powerful “cultural attractors”—points of least resistance where communications end up through transmission error alone.
“Displayed Depression Symptoms on Facebook at Two Time Points: Content Analysis.” (with Megan A. Moreno, Molly Adrian, Megan A Wilt, Adrienne Ton, Elizabeth McCauley, and Ann Vander Stoep). 2021. JMIR Formative Research.
This study investigates displayed depression symptoms on Facebook at two developmental time points based on symptom type and gender.
“#Pro-Ana: Pro-Eating Disorder Socialization on Twitter.” (with Hedwig Lee, Tyler McCormick, and Megan Moreno). 2016. Journal of Adolescent Health.
Pro-eating disorder online movements support engagement with ED lifestyles and are associated with negative health consequences for adolescents with EDs. Twitter is a popular social media site among adolescents that provides a unique setting for Pro-eating disorder content to be publicly exchanged. This study investigates Pro-eating disorder Twitter profiles' references to eating disorders and how their social connections (followers) reference eating disorders.
“Evaluating college students' displayed alcohol references on Facebook and Twitter.” (with Megan A. Moreno, Dana Litt, and Dimitri Christakis). 2016. Journal of Adolescent Health.
Current trends suggest adolescents and young adults typically maintain a social media “portfolio” of several sites including Facebook and Twitter, but little is known regarding how an individual chooses to display risk behaviors across these different sites. This study investigates college students’ displayed alcohol references on both Facebook and Twitter.
“Development and testing of a 3-item screening tool for Problematic Internet Use.” (with Megan A. Moreno and Ellen Selkie). 2016. Journal of Pediatrics.
This study develops and validates the Problematic and Risky Internet Use Screening Scale (PRIUSS)-3 screening scale, a short scale to screen for Problematic Internet Use.
“Using a Facebook Group as an Adjunct to a Pilot mHealth Physical Activity Intervention: A Mixed Methods Approach.” (with Pumper, Megan, Jay Mendoza, Matt Holm, Allan Waite, and Megan Moreno). 2015. Studies of Health Technology Informatics.
This study evaluates the use of a Facebook group as part of a mHealth physical activity intervention trial.
Book Chapters
“Sociolinguistic Properties of Word Embeddings.” (with Jacob Foster). 2021. The Handbook of Language Analysis in Psychology, edited by Morteza Dehghani and Ryan L. Boyd. Guilford Press.
“Online Pro-Eating Disorder (Pro-ED) Activity.” (with Laura Hooper and Hedwig Lee). 2014. AM:STARs: Adolescent Medicine: State of the Art Reviews. Vol 25, Social Networking and New Technologies, edited by Victor Strasberger and Megan Moreno. American Academy of Pediatrics.
Peer-Reviewed Conference Proceedings
"Adapting Coreference Resolution for Processing Violent Death Narratives.'' (with Ankith Uppunda, Susan Cochran, Jacob Foster, Vickie Mays and Kai-Wei Chang). 2021. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
"Reconsidering Annotator Disagreement about Racist Language: Noise or Signal?'' (with Larimore, Savannah, Ian Kennedy, Breon Haskett). 2021. Proceedings of the Ninth International Workshop on Natural Language Processing for Social Media.
“What type of happiness are you looking for? A closer look at detecting depression.” (with Sharon Mozgai, and Stefan Scherer). 2018. Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology.