Are Women’s Innovative Ideas in Academia Cited Less Than Men’s?
- Greg Thorson
- May 21
- 5 min read

This study investigates whether innovative academic work by women receives the same citation recognition as similar work by men. Using bibliometric data from economics, mathematics, and sociology, and applying machine learning to identify comparable articles, the author constructs counterfactual citation sets. Findings reveal that all-female papers receive 10% fewer citations than all-male papers in economics, a gap reduced by 40% when accounting for team size and eliminated when controlling for publication history. Early-career women face a 9–14% citation penalty. Similar patterns appear in mathematics and sociology, with mixed-gender and female-led teams often cited less frequently than male-only teams.
Full Citation and Link to Article
Here is the full citation for the article:
Koffi, Marlène. “Innovative Ideas and Gender (In)Equality.” American Economic Review (Forthcoming)
Extended Summary
Central Research Question
This paper investigates a critical but underexplored question in academic publishing: Are women’s innovative academic contributions recognized to the same extent as those of men, as measured through citation practices? Specifically, it examines whether comparable research papers authored by women are cited less frequently than similar papers authored by men. The author terms this phenomenon the “gender omission gap” and uses sophisticated bibliometric and machine learning methods to construct counterfactual citation sets to assess disparities in citation behavior across gendered teams. This question is situated within broader concerns about gender inequality in recognition, career advancement, and representation in academia.
Previous Literature
Prior studies have documented women’s underrepresentation in academia, particularly in math-intensive disciplines like economics and mathematics. Much of the literature has explored issues such as gender disparities in hiring, tenure, publishing standards, and collaboration dynamics. For example, Sarsons et al. (2021) demonstrated that women receive less credit for coauthored work, and Card et al. (2020) investigated potential gender bias in the peer review process. Other works have documented gendered language in evaluations, disparities in conference participation, and differences in productivity metrics.
However, most previous studies have focused on formal processes such as hiring or refereeing. This paper extends the literature by analyzing how intellectual contributions are recognized post-publication through citations. A small body of work has touched on citation bias (e.g., Dion et al., 2018; Dworkin et al., 2020), but few have done so with the methodological rigor and counterfactual framework used here. The author also connects with literature on text-based analysis in economics (e.g., Gentzkow et al., 2019) and belief-based models of discrimination (Bohren et al., 2019).
Data
The study uses bibliometric data from three academic disciplines—economics, mathematics, and sociology—drawing from major databases such as Web of Science, EconLit, zbMATH, and Microsoft Academic Graph. The main dataset for economics includes 30,741 papers from 16 top journals published between 1985 and 2019. Similar datasets are constructed for mathematics and sociology, comprising 32,848 and 20,591 papers respectively.
Gender identification is conducted using a combination of automated methods (via Genderize.io) and manual verification by research assistants. Authors are grouped into “all-male,” “all-female,” “mixed-gender,” or “undetermined” teams. The author also constructs measures of publication history, institutional affiliation, methodological orientation, and field classification. These variables are used as controls in the empirical models.
The citation data is restricted to articles in which citation behavior can be evaluated objectively and fairly. Papers without abstracts or references, editorial content, and very short pieces are excluded. Citation counts are supplemented with information on co-authorship patterns and authors’ career stages.
Methods
The key methodological innovation is the construction of a counterfactual citation set using machine learning and natural language processing (NLP). The author uses textual similarity measures—including TF-IDF, Word2Vec, GloVe, and BERT-based models—to compute a cosine similarity score between pairs of articles based on abstracts, titles, and keywords. From this, a “most similar set” is defined for each article, consisting of the ten most textually similar articles published earlier.
An “omission indicator” is then created: a binary variable that equals one if a paper fails to cite a highly similar prior paper, and zero otherwise. This allows the author to measure citation gaps conditional on relevance and similarity. The key dependent variable in the empirical analysis is this omission indicator.
Empirical models then regress the omission indicator on gender composition and a set of controls, including journal, team size, institutional affiliation, field, publication year, and authors’ publication history. The study also employs robustness checks using human audits, alternative similarity algorithms, cross-referencing approaches, and “case-control” setups to validate the counterfactual framework.
Findings/Size Effects
The findings are consistent across multiple disciplines and specifications. In economics, papers authored entirely by women are 2.2 percentage points more likely to be omitted from citations than those authored by men, corresponding to a 10% omission rate based on a baseline citation probability of 22.4% within the “most similar set.” This gap shrinks by 40% when controlling for team size and disappears entirely when prior publication records are included. However, early-career women still face a persistent citation penalty of 9–14%.
In mathematics, a similar omission gap exists: all-female papers are 1 percentage point more likely to be omitted than all-male papers, representing 6% of the base rate. This gap also vanishes when accounting for publication records. However, mixed-gender teams in mathematics are consistently cited 12.6–13.8% less often, even with all controls.
In sociology, the gender gap persists despite controls. All-female teams are 1.2–1.7 percentage points more likely to be omitted, representing a 6.5–9.2% penalty relative to the base citation rate. Mixed-gender teams in sociology, by contrast, do not experience a citation penalty.
The author also documents strong in-group preferences. All-male citing teams are more likely to omit papers by female or mixed-gender teams, whereas teams that include women are more likely to cite such papers. These patterns hold even after accounting for prior publication histories, suggesting behavioral and possibly belief-driven biases.
The paper conducts multiple validation checks. An audit by doctoral students confirms that the algorithm-generated similarity scores match human evaluations and do not introduce gender-related measurement error. Other robustness checks include alternative similarity thresholds, field-specific rankings, and extended journal samples. Across all specifications, the core findings remain consistent.
Finally, the paper presents an “innovativeness index,” showing that many female-authored papers that are omitted rank in the top percentiles of originality and relevance. Over 50% of omitted female-authored papers have innovativeness scores above the median of cited articles, suggesting that the omission gap affects not just quantity but also the visibility of high-quality work.
Conclusion
This study provides compelling evidence of a gender omission gap in academic citations across economics, mathematics, and sociology. While controlling for observable characteristics like team size and publication history can explain much of the gap in economics and mathematics, the penalty persists in sociology and among early-career female scholars. The paper also uncovers strong in-group citation preferences, where all-male teams are less likely to cite female-authored work.
The implications are significant. Citations play a central role in academic recognition, hiring, and promotions. Gender-based disparities in citations may thus contribute to systemic underrepresentation and slower career advancement for women, particularly at early career stages. Moreover, citation gaps risk hindering the dissemination of innovative ideas, reducing the overall quality and diversity of academic knowledge production.
The study calls for more awareness in citation practices and consideration of omission metrics in evaluations of scholarly impact. It also underscores the value of machine learning tools in uncovering subtle but consequential patterns of bias in academic life. Through its methodological rigor and comprehensive approach, the paper makes a major contribution to the literature on gender inequality in academia and provides a framework for future studies to investigate recognition gaps in other domains.
Komentāri