Research Scientist
Google, New York
E-Mail: vaishnavh at google.com
He/Him/His
Resume | | | | |
I am a research scientist working in the foundations of AI. I graduated with a PhD from the Computer Science Department of Carnegie Mellon University (CMU) advised by Zico Kolter. I completed my undergraduate studies in the Department of Computer Science and Engineering at IIT Madras, where I was advised by Balaraman Ravindran.
I like thinking about when and why complex AI systems work through simple (counter)examples. I am currently interested in understanding the limits of (and going beyond) the next-token prediction paradigm that underlies many current AI models (see this recent work). My prior works similarly identify examples of failure (and occasionally, success!) across a broad range of settings in AI (typically what were prevailing paradigms back then). This includes out-of-distribution generalization, uniform-convergence-based generalization bounds and GAN optimization.
I am strongly against pushing too many papers into the void. I also enjoy deep collaborations where we meet often, so feel free to reach out to brainstorm!
- Fall 2024: Simon’s Institute talk on multi-token prediction in this YouTube link!. Also presenting this talk at CMU guest lecture, CMU AI Lunch, and NYU. Co-presenting with Gregor, at MSR, Amazon Research and Princeton.
- Fall 2024: Sachin Goyal is presenting our work on pause tokens at MPI and Google Deepmind, India.
- Jan 2024: ICML paper with Gregor Bachmann solidifying the debate on next-token prediction.
- Oct 2023: Paper with Sachin Goyal and collaborators at Google, exploring a simple change that introduces delays to how Transformers predict the next token.
I have had the fortune of closely working with and/or mentoring the following students:
- Gregor Bachmann (PhD student at ETH Zürich as of 2024)
- Sachin Goyal (PhD student at CMU as of 2024, interned at Google, hosted by me)
- Jacob Springer (PhD student at CMU as of 2024)
- Yuri Galindo (mentee through Fatima Fellowship)
- Marcus Blake (Software Engineer at Google as of 2024, mentee through Learning Theory Alliance)
- Kimia Hamidieh (PhD student at MIT as of 2024)
- Nuredin Ali (PhD student at Universiy of Minnesota as of 2024)
- Thao Nguyen (PhD student at UW as of 2024)
- Yiding Jiang (PhD student at CMU as of 2024)
- Melrose Roderick (postdoc at Mila as of 2024)
- Jeffrey Li (PhD student at UW as of 2024)
- The pitfalls of next-token prediction,
International Conference on Machine Learning (ICML) 2024,
(Double first author) Gregor Bachmann* and Vaishnavh Nagarajan*
[arxiv][Poster][Slides][Simon’s Talk]- Also accepted for oral presentation at ICLR ‘24 Workshop “How Far Are We From AGI?”
- Think before you speak: Training language models with pause tokens,
International Conference on Learning Representations (ICLR) 2024,
Sachin Goyal, Ziwei Ji, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar, Vaishnavh Nagarajan
[arxiv] [Poster]- Also accepted at NeurIPS ‘23 Workshop R0-FoMo: Robustness of Few-shot and Zero-shot Learning in Foundation Models
- Assessing Generalization via Disagreement,
International Conference on Learning Representations (ICLR) 2022,
(Double first author) Yiding Jiang*, Vaishnavh Nagarajan*, Christina Baek, J. Zico Kolter
Accepted for Spotlight presentation
[arxiv] [Poster]- Also accepted at ICML ‘21 Workshop on Overparameterization: Pitfalls & Opportunities
- Understanding the failure modes of out-of-distribution generalization,
International Conference on Learning Representations (ICLR) 2021,
Vaishnavh Nagarajan, Anders Andreassen and Behnam Neyshabur
[arxiv] [Poster]- Invited poster presentation at Conceptual Understanding of Deep Learning Workshop_, Google Algorithms Workshop Series, 2021.
- Uniform convergence may be unable to explain generalization in deep learning,
Neural Information Processing Systems (NeurIPS) 2019
Vaishnavh Nagarajan and J. Zico Kolter
Winner of The Outstanding New Directions Paper Award
Accepted for Oral presentation, 0.54% acceptance
[arxiv] [NeurIPS 19 oral slides] [Poster] [Blogpost] [Code] [Errata]
Also accepted for spotlight talk at:- ICML ‘19 Workshop on Understanding and Improving Generalization in Deep Learning
- IAS/Princeton Workshop on Theory of Deep Learning. [Video]
- Gradient descent GAN optimization is locally stable,
Neural Information Processing Systems (NeurIPS) 2017
Vaishnavh Nagarajan and J. Zico Kolter
Accepted for Oral presentation, 1.2% acceptance
[arxiv] [1hr talk - slides] [NeurIPS Oral - Slides] [Poster] [3 min video] [Code]
- Theoretical Insights into Memorization in GANs,
Neural Information Processing Systems (NeurIPS) 2017 - Integration of Deep Learning Theories Workshop
Vaishnavh Nagarajan, Colin Raffel, Ian Goodfellow.
[PDF]
- Generalization in Deep Networks: The Role of
Distance from Initialization,
Neural Information Processing Systems (NeurIPS) 2017 - Deep Learning: Bridging Theory and Practice
Vaishnavh Nagarajan and J. Zico Kolter.
Accepted for Spotlight talk
[arxiv] [Poster]
- Explaining generalization in deep learning: progress and fundamental limits,
Vaishnavh Nagarajan, 2021
[arxiv]
- ICLR 2023, 2021 (outstanding reviewer award)
- NeurIPS 2023, 2021, 2020 (top 10%), 2019 (top 50%), 2018 (top 30%)
- ICML 2024 & 2023 (Expert reviewer), 2022, 2021 (Expert reviewer, top 10%), 2020 (top 33%), 2019 (top 5%)
- COLT 2019
- ALT 2021
- UAI 2022
- AISTATS 2023 (top 10%) 2019
- JMLR, Nature
- Workshops: ICML 22 PODS, ICML 21 OPPO, ICLR-Me-FOMO 2023, DistShift NeurIPS 2023, R0-FoMo NeurIPS 2023 (area chair)
Last Updated: Dec 5th, 2024