Nupoor Gandhi (she/her)

Nupoor Gandhi

I am a PhD student in Machine Learning and Public Policy at Carnegie Mellon University advised by Emma Strubell. I am interested in translational NLP, with emphasis on low-cost structured information extraction to meet the needs of researchers and practitioners in policy-relevant domains. Previously, I received my bachelors in Computer Science with a minor in Math at the University of Illinois at Urbana-Champaign. At Illinois I worked on modeling public health outcomes using social media data as a part of the Text Information Management and Analysis Group and the Health and Social Media Group working with ChengXiang Zhai and Dolores Albarracin. I was also a Data Science for Social Good Fellow at Imperial College London, where I worked on a QA system for nonprofit Barefoot Law to accelerate access to legal advice.

I work on improving accessibility to rich, structured meaning representations of text in policy-relevant domains, with a particular focus on climate policy. Much of what governments commit to, let alone implement, is buried in long, dense, heterogeneous documents that are difficult to systemically analyze: My goal is to make climate policy legible and measurable at scale. To do this, I develop methods for efficiently structuring text under realistic resource constraints in settings that combine human expertise and language models.

More broadly, my research spans:

  1. Structured representation as a more efficient way of doing document research.
  2. Addressing failures to meet real-world complex information extraction needs.
  3. The capacity of decomposition to reduce the cost of high quality annotation projects.

Contact: nmgandhi(at)cs.cmu.edu Google Scholar CV

Peer-Reviewed Publications

Task Decomposition for Efficient Annotation. Nupoor Gandhi, Emma Strubell. Computational Linguistics 2026 (Under Review).

Decomposing Unitization and Typing for Efficient and Consistent Span-Bound Concept Annotation. Nupoor Gandhi, Michael Bada, Emma Strubell. Findings of ACL 2026.

SynthTextEval: Synthetic Text Data Generation and Evaluation for High-Stakes Domains. Krithika Ramesh, Daniel Smolyak, Zihao Zhao, Nupoor Gandhi, Ritu Agarwal, Margrét Bjarnadóttir, Anjalie Field. Proceedings of EMNLP 2025 - Demo.

Beyond Text: Characterizing Domain Expert Needs in Document Research. Sireesh Gururaja, Nupoor Gandhi, Jeremiah Milbauer, Emma Strubell. Findings of ACL 2025.

Evaluating Differentially Private Synthetic Data Generation in High-Stakes Domains. Krithika Ramesh, Nupoor Gandhi, Pulkit Madaan, Lisa Bauer, Charith Peris, Anjalie Field. Findings of EMNLP 2024.

Challenges in End-to-End Policy Extraction from Climate Action Plans. Nupoor Gandhi, Tom Corringham, Emma Strubell. Proceedings of ClimateNLP Workshop at ACL 2024.

Annotating Mentions Alone Enables Efficient Domain Adaptation for Coreference Resolution. Nupoor Gandhi, Anjalie Field, Emma Strubell. Proceedings of ACL 2023. (Selected for Oral Presentation)

Examining risks of racial biases in NLP tools for child protective services. Anjalie Field, Amanda Coston, Nupoor Gandhi, Alexandra Chouldechova, Emily Putnam-Hornstein, David Steier and Yulia Tsvetkov. Proceedings of ACM FAccT 2023.

Improving Span Representation for Domain-adapted Coreference Resolution. Nupoor Gandhi, Anjalie Field, Yulia Tsvetkov. Proceedings of EMNLP 2021 Workshop on Computational Models of Reference, Anaphora and Coreference.

Predicting Opioid Overdose Crude Rates with Text-Based Twitter Features (Student Abstract). Nupoor Gandhi, Alex Morales, Sally Man-Pui Chan, Dolores Albarracin, and ChengXiang Zhai. Proceedings of the AAAI 2020.

Multi-Attribute Topic Feature Construction for Social Media-based Prediction. Alex Morales, Nupoor Gandhi, Man-pui Sally Chan, Sophie Lohmann, Travis Sanchez, Kathleen A. Brady, Lyle Ungar, Dolores Albarracín, and ChengXiang Zhai. Proceedings of IEEE Big Data 2018.