Research Scientist
Google, New York
E-Mail: vaishnavh at google.com
He/Him/His
Resume | | | | |
I like thinking about when and why complex AI systems work, and how to make them better. I do this by pursuing minimal abstractions, insightful (counter)examples and simple results. I am currently interested in understanding the limits of and going beyond the next-token prediction paradigm that underlies many current AI models (see this or this recent work). My prior works similarly identify examples of failure and success across a broad range of settings in AI including out-of-distribution generalization, generalization bounds and GAN optimization amongst other things.
My work has been recognized with the Outstanding Paper Award at ICML 2025, the Outstanding New Directions Paper Award at NeurIPS 2019, an Oral presentation at NeurIPS 2017 and a spotlight at ICLR 2021.
I seek simple, insightful results unburdened by mathematical obfuscation. This means pursuing minimal abstractions, example and counterexamples, vivid intuition, clarity and depth of thought and nuanced argumentation.
I believe that researchers should be publishing significantly less (I have published about a paper a year; I wish it was less frequent).
I value clear and accessible communication and spend an unhealthy amount of time making my talks, writing my papers, choosing my diagrams and my sentences. I enjoy conversations with students, particularly conversations with technical depth.
I believe that science must be done and truth must be sought collaboratively with compassion and good faith, building on each other's works and strengths, embracing the subjective aspects of the process. I don't enjoy combative science.
If there are missing citations in my work, please reach out to me so I can rectify it!
My values are shaped by what I've read.

I have had the fortune of closely working with and/or mentoring the following students:
- Shahriar Noroozizadeh (PhD student at CMU, intering at Google, hosted by me)
- Chen Henry Wu (PhD student at CMU as of 2025)
- Gregor Bachmann (PhD student at ETH Zürich as of 2024)
- Sachin Goyal (PhD student at CMU as of 2024, interned at Google, hosted by me)
- Jacob Springer (PhD student at CMU as of 2024)
- Yuri Galindo (mentee through Fatima Fellowship)
- Marcus Blake (Software Engineer at Google as of 2024, mentee through Learning Theory Alliance)
- Kimia Hamidieh (PhD student at MIT as of 2024)
- Nuredin Ali (PhD student at Universiy of Minnesota as of 2024)
- Thao Nguyen (PhD student at UW as of 2024)
- Yiding Jiang (PhD student at CMU as of 2024)
- Melrose Roderick (postdoc at Mila as of 2024)
- Jeffrey Li (PhD student at UW as of 2024)
- Roll the dice and look before you leap: Going beyond the creative limits of next-token prediction,
International Conference on Machine Learning (ICML) 2025,
(Double first author) Vaishnavh Nagarajan*, Chen Henry Wu*, Charles Ding and Aditi Raghunathan
Winner of Outstanding Paper Award
Oral presentation (1% acceptance)
[arxiv][Poster][Oral Slides] [1h Talk Slides][Code]
- The pitfalls of next-token prediction,
International Conference on Machine Learning (ICML) 2024,
(Double first author) Gregor Bachmann* and Vaishnavh Nagarajan*
[arxiv][Poster][Slides][Simons Institute Talk][Code]- Also oral presentation at ICLR ‘24 Workshop “How Far Are We From AGI?”
- Think before you speak: Training language models with pause tokens,
International Conference on Learning Representations (ICLR) 2024,
Sachin Goyal, Ziwei Ji, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar, Vaishnavh Nagarajan
[arxiv] [Poster]
- Assessing Generalization via Disagreement,
International Conference on Learning Representations (ICLR) 2022,
(Double first author) Yiding Jiang*, Vaishnavh Nagarajan*, Christina Baek, J. Zico Kolter
Accepted for Spotlight presentation, 5.2% acceptance
[arxiv] [Poster]
- Understanding the failure modes of out-of-distribution generalization,
International Conference on Learning Representations (ICLR) 2021,
Vaishnavh Nagarajan, Anders Andreassen and Behnam Neyshabur
[arxiv] [Poster] [1h talk] [Slides]
- Uniform convergence may be unable to explain generalization in deep learning,
Neural Information Processing Systems (NeurIPS) 2019
Vaishnavh Nagarajan and J. Zico Kolter
Winner of The Outstanding New Directions Paper Award
Oral presentation, 0.54% acceptance
[arxiv] [NeurIPS Oral slides] [Poster] [Blogpost] [Code] [Errata]
Also accepted for spotlight talk at:- ICML ‘19 Workshop on Understanding and Improving Generalization in Deep Learning
- IAS/Princeton Workshop on Theory of Deep Learning. [Video]
- Gradient descent GAN optimization is locally stable,
Neural Information Processing Systems (NeurIPS) 2017
Vaishnavh Nagarajan and J. Zico Kolter
Oral presentation, 1.2% acceptance
[arxiv] [1hr talk - slides] [NeurIPS oral slides] [Poster] [3 min video] [Code]
- Theoretical Insights into Memorization in GANs,
Neural Information Processing Systems (NeurIPS) 2017 - Integration of Deep Learning Theories Workshop
Vaishnavh Nagarajan, Colin Raffel, Ian Goodfellow.
[PDF]
- Generalization in Deep Networks: The Role of
Distance from Initialization,
Neural Information Processing Systems (NeurIPS) 2017 - Deep Learning: Bridging Theory and Practice
Vaishnavh Nagarajan and J. Zico Kolter.
Spotlight talk
[arxiv] [Poster]
- Explaining generalization in deep learning: progress and fundamental limits,
Vaishnavh Nagarajan, 2021
[arxiv]
Reviewer:
- ICLR 2023, 2021 (outstanding reviewer award, top 10%)
- NeurIPS 2024 (top 7%), 2023 (top 10%), 2021, 2020 (top 10%), 2019, (top 50%), 2018 (top 30%)
- ICML 2024 & 2023 (Expert reviewer), 2022, 2021 (Expert reviewer, top 10%), 2020 (top 33%), 2019 (top 5%)
- COLT 2019
- ALT 2021
- UAI 2022
- AISTATS 2023 (top 10%) 2019
- JMLR, Nature
- Workshops: ICML 22 PODS, ICML 21 OPPO, ICLR-Me-FOMO 2023, DistShift NeurIPS 2023, R0-FoMo NeurIPS 2023 (area chair)
Area chair:
- ICML 2025
- NeurIPS 2025
- COLM 2025
Last Updated: Jul 15 2025