Vaishnavh Nagarajan

Research Scientist
Google, New York
E-Mail: vaishnavh at google.com
He/Him/His

Research Interests

I like thinking about when and why complex AI systems work. I hope to do this by pursuing minimal abstractions, insightful (counter)examples and clear and nuanced arguments. I am currently interested in understanding the limits of (and going beyond) the next-token prediction paradigm that underlies many current AI models (see this or this recent work). My prior works similarly identify examples of failure and success across a broad range of settings in AI, typically what were prevailing paradigms back then. This includes out-of-distribution generalization, uniform-convergence-based generalization bounds and GAN optimization.

I am strongly against pushing too many papers into the void. I also enjoy deep collaborations where we meet often, so feel free to reach out to brainstorm!

Students I have worked with

I have had the fortune of closely working with and/or mentoring the following students:

Shahriar Noroozizadeh (PhD student at CMU, intering at Google, hosted by me)
Chen Henry Wu (PhD student at CMU as of 2025)
Gregor Bachmann (PhD student at ETH Zürich as of 2024)
Sachin Goyal (PhD student at CMU as of 2024, interned at Google, hosted by me)
Jacob Springer (PhD student at CMU as of 2024)
Yuri Galindo (mentee through Fatima Fellowship)
Marcus Blake (Software Engineer at Google as of 2024, mentee through Learning Theory Alliance)
Kimia Hamidieh (PhD student at MIT as of 2024)
Nuredin Ali (PhD student at Universiy of Minnesota as of 2024)
Thao Nguyen (PhD student at UW as of 2024)
Yiding Jiang (PhD student at CMU as of 2024)
Melrose Roderick (postdoc at Mila as of 2024)
Jeffrey Li (PhD student at UW as of 2024)

Select Papers (Google Scholar)

CONFERENCE PUBLICATIONS

Roll the dice and look before you leap: Going beyond the creative limits of next-token prediction,
International Conference on Machine Learning (ICML) 2025,
(Double first author) Vaishnavh Nagarajan*, Chen Henry Wu*, Charles Ding and Aditi Raghunathan
Oral presentation (1% acceptance)
[arxiv][Poster]

The pitfalls of next-token prediction,
International Conference on Machine Learning (ICML) 2024,
(Double first author) Gregor Bachmann* and Vaishnavh Nagarajan*
[arxiv][Poster][Slides][Simons Institute Talk]
- Also oral presentation at ICLR ‘24 Workshop “How Far Are We From AGI?”

Think before you speak: Training language models with pause tokens,
International Conference on Learning Representations (ICLR) 2024,
Sachin Goyal, Ziwei Ji, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar, Vaishnavh Nagarajan
[arxiv] [Poster]

Assessing Generalization via Disagreement,
International Conference on Learning Representations (ICLR) 2022,
(Double first author) Yiding Jiang*, Vaishnavh Nagarajan*, Christina Baek, J. Zico Kolter
Accepted for Spotlight presentation, 5.2% acceptance
[arxiv] [Poster]

Understanding the failure modes of out-of-distribution generalization,
International Conference on Learning Representations (ICLR) 2021,
Vaishnavh Nagarajan, Anders Andreassen and Behnam Neyshabur
[arxiv] [Poster]

Uniform convergence may be unable to explain generalization in deep learning,
Neural Information Processing Systems (NeurIPS) 2019
Vaishnavh Nagarajan and J. Zico Kolter
Winner of The Outstanding New Directions Paper Award
Oral presentation, 0.54% acceptance
[arxiv] [NeurIPS oral slides] [Poster] [Blogpost] [Code] [Errata]
Also accepted for spotlight talk at:
- ICML ‘19 Workshop on Understanding and Improving Generalization in Deep Learning
- IAS/Princeton Workshop on Theory of Deep Learning. [Video]

Gradient descent GAN optimization is locally stable,
Neural Information Processing Systems (NeurIPS) 2017
Vaishnavh Nagarajan and J. Zico Kolter
Oral presentation, 1.2% acceptance
[arxiv] [1hr talk - slides] [NeurIPS oral slides] [Poster] [3 min video] [Code]

WORKSHOP PAPERS

Theoretical Insights into Memorization in GANs,
Neural Information Processing Systems (NeurIPS) 2017 - Integration of Deep Learning Theories Workshop
Vaishnavh Nagarajan, Colin Raffel, Ian Goodfellow.
[PDF]

Generalization in Deep Networks: The Role of Distance from Initialization,
Neural Information Processing Systems (NeurIPS) 2017 - Deep Learning: Bridging Theory and Practice
Vaishnavh Nagarajan and J. Zico Kolter.
Spotlight talk
[arxiv] [Poster]

THESIS

Explaining generalization in deep learning: progress and fundamental limits,
Vaishnavh Nagarajan, 2021
[arxiv]

Peer Review

Reviewer:

ICLR 2023, 2021 (outstanding reviewer award, top 10%)
NeurIPS 2024 (top 7%), 2023 (top 10%), 2021, 2020 (top 10%), 2019, (top 50%), 2018 (top 30%)
ICML 2024 & 2023 (Expert reviewer), 2022, 2021 (Expert reviewer, top 10%), 2020 (top 33%), 2019 (top 5%)
COLT 2019
ALT 2021
UAI 2022
AISTATS 2023 (top 10%) 2019
JMLR, Nature
Workshops: ICML 22 PODS, ICML 21 OPPO, ICLR-Me-FOMO 2023, DistShift NeurIPS 2023, R0-FoMo NeurIPS 2023 (area chair)

Area chair:

ICML 2025
NeurIPS 2025
COLM 2025

Last Updated: May 29 2025