
Research Scientist
Google, New York
E-Mail: vaishnavh at google.com
He/Him/His
I am a research scientist working in the foundations of AI. I graduated with a PhD from the Computer Science Department of Carnegie Mellon University (CMU) advised by Zico Kolter. I completed my undergraduate studies in the Department of Computer Science and Engineering at IIT Madras, where I was advised by Balaraman Ravindran.
I like thinking about when and why complex AI systems work through simple (counter)examples. I am currently interested in understanding the limits of (and going beyond) the next-token prediction paradigm that underlies many current AI models (see this recent work). My prior works similarly identify examples of failure (and occasionally, success!) across a broad range of settings in AI (typically what were prevailing paradigms back then). This includes out-of-distribution generalization, uniform-convergence-based generalization bounds and GAN optimization.
I am strongly against pushing too many papers into the void. I also enjoy deep collaborations where we meet often, so feel free to reach out to brainstorm!
I have had the fortune of closely working with and/or mentoring the following students:
- Chen Henry Wu (PhD student at CMU as of 2025)
- Gregor Bachmann (PhD student at ETH Zürich as of 2024)
- Sachin Goyal (PhD student at CMU as of 2024, interned at Google, hosted by me)
- Jacob Springer (PhD student at CMU as of 2024)
- Yuri Galindo (mentee through Fatima Fellowship)
- Marcus Blake (Software Engineer at Google as of 2024, mentee through Learning Theory Alliance)
- Kimia Hamidieh (PhD student at MIT as of 2024)
- Nuredin Ali (PhD student at Universiy of Minnesota as of 2024)
- Thao Nguyen (PhD student at UW as of 2024)
- Yiding Jiang (PhD student at CMU as of 2024)
- Melrose Roderick (postdoc at Mila as of 2024)
- Jeffrey Li (PhD student at UW as of 2024)
- Going beyond the creative limits of next-token prediction,
International Conference on Machine Learning (ICML) 2025,
(Double first author) Vaishnavh Nagarajan*, Chen Henry Wu*, Charles Ding and Aditi Raghunathan
Accepted for spotlight poster (2.6% acceptance)
[arxiv][Poster]
- The pitfalls of next-token prediction,
International Conference on Machine Learning (ICML) 2024,
(Double first author) Gregor Bachmann* and Vaishnavh Nagarajan*
[arxiv][Poster][Slides][Simons Talk]- Also accepted for oral presentation at ICLR ‘24 Workshop “How Far Are We From AGI?”
- Think before you speak: Training language models with pause tokens,
International Conference on Learning Representations (ICLR) 2024,
Sachin Goyal, Ziwei Ji, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar, Vaishnavh Nagarajan
[arxiv] [Poster]
- Assessing Generalization via Disagreement,
International Conference on Learning Representations (ICLR) 2022,
(Double first author) Yiding Jiang*, Vaishnavh Nagarajan*, Christina Baek, J. Zico Kolter
Accepted for Spotlight presentation, 5.2% acceptance
[arxiv] [Poster]
- Understanding the failure modes of out-of-distribution generalization,
International Conference on Learning Representations (ICLR) 2021,
Vaishnavh Nagarajan, Anders Andreassen and Behnam Neyshabur
[arxiv] [Poster]- Invited poster presentation at Conceptual Understanding of Deep Learning Workshop, Google Algorithms Workshop Series, 2021.
- Uniform convergence may be unable to explain generalization in deep learning,
Neural Information Processing Systems (NeurIPS) 2019
Vaishnavh Nagarajan and J. Zico Kolter
Winner of The Outstanding New Directions Paper Award
Accepted for Oral presentation, 0.54% acceptance
[arxiv] [NeurIPS oral slides] [Poster] [Blogpost] [Code] [Errata]
Also accepted for spotlight talk at:- ICML ‘19 Workshop on Understanding and Improving Generalization in Deep Learning
- IAS/Princeton Workshop on Theory of Deep Learning. [Video]
- Gradient descent GAN optimization is locally stable,
Neural Information Processing Systems (NeurIPS) 2017
Vaishnavh Nagarajan and J. Zico Kolter
Accepted for Oral presentation, 1.2% acceptance
[arxiv] [1hr talk - slides] [NeurIPS oral slides] [Poster] [3 min video] [Code]
- Theoretical Insights into Memorization in GANs,
Neural Information Processing Systems (NeurIPS) 2017 - Integration of Deep Learning Theories Workshop
Vaishnavh Nagarajan, Colin Raffel, Ian Goodfellow.
[PDF]
- Generalization in Deep Networks: The Role of
Distance from Initialization,
Neural Information Processing Systems (NeurIPS) 2017 - Deep Learning: Bridging Theory and Practice
Vaishnavh Nagarajan and J. Zico Kolter.
Accepted for Spotlight talk
[arxiv] [Poster]
- Explaining generalization in deep learning: progress and fundamental limits,
Vaishnavh Nagarajan, 2021
[arxiv]
Reviewer:
- ICLR 2023, 2021 (outstanding reviewer award, top 10%)
- NeurIPS 2024 (top 7%), 2023 (top 10%), 2021, 2020 (top 10%), 2019, (top 50%), 2018 (top 30%)
- ICML 2024 & 2023 (Expert reviewer), 2022, 2021 (Expert reviewer, top 10%), 2020 (top 33%), 2019 (top 5%)
- COLT 2019
- ALT 2021
- UAI 2022
- AISTATS 2023 (top 10%) 2019
- JMLR, Nature
- Workshops: ICML 22 PODS, ICML 21 OPPO, ICLR-Me-FOMO 2023, DistShift NeurIPS 2023, R0-FoMo NeurIPS 2023 (area chair)
Area chair:
- ICML 2025
- NeurIPS 2025
- COLM 2025
Last Updated: Apr 29th, 2025