Research Scientist
Google, New York
E-Mail: vaishnavh at google.com
He/Him/His

Resume | Google Scholar

I am a research scientist working in the foundations of AI. I graduated with a PhD from the Computer Science Department of Carnegie Mellon University (CMU) advised by Zico Kolter. I completed my undergraduate studies in the Department of Computer Science and Engineering at IIT Madras, where I was advised by Balaraman Ravindran.

Research Interests

I like thinking about when and why complex AI systems work, in a way that brings theory and practice closer together (see footnote below). I am currently interested in understanding the limits of (and going beyond) the next-token prediction paradigm that underlies many current AI models (see this recent work). My prior works similarly identify examples of failure (and occasionally, success!) across a broad range of settings in AI (typically what were prevailing paradigms back then). This includes out-of-distribution generalization, uniform-convergence-based generalization bounds and GAN optimization.

I publish approximately ~1.2 papers a year, a rate that I have come to value (ideally, an even less number would be great!), and I am strongly against pushing too many papers into the void. I also enjoy deep collaborations where we meet often, so feel free to reach out to brainstorm!

My style of “bringing theory and practice closer”: This is a fairly subjective balancing act. My personal emphasis is on devising minimal (counter)examples that capture the core phenomena we empirically observe, and then formally (or informally) reasoning about it. Such pared-down examples lead to simpler proofs and are therefore a powerful tool for gaining intution about growingly complex AI systems. Part of the insight we derive from this style lies in the simple proof; but importantly, part of it also lies in identifying the right assumptions and in the thoughtful design of a minimal system.


Updates
  • Sep 2024: Simon’s Institute talk on multi-token prediction in this YouTube link! Also presenting this talk at CMU guest lecture, CMU AI Lunch, and NYU. Co-presenting with Gregor, at MSR, Amazon Research and Princeton.
  • Jan 2024: ICML paper with Gregor Bachmann solidifying the debate on next-token prediction.
  • Oct 2023: Paper with Sachin Goyal and collaborators at Google, exploring a simple change that introduces delays to how Transformers predict the next token.

Students I have worked with

I have had the fortune of closely working with and/or mentoring the following students:


Select Papers (Google Scholar)


CONFERENCE PUBLICATIONS
  • The pitfalls of next-token prediction,
    International Conference on Machine Learning (ICML) 2024,
    (Double first author) Gregor Bachmann* and Vaishnavh Nagarajan*
    [arxiv][Poster][Slides][Simon’s Talk]
    • Also accepted for oral presentation at ICLR ‘24 Workshop “How Far Are We From AGI?”

  • Think before you speak: Training language models with pause tokens,
    International Conference on Learning Representations (ICLR) 2024,
    Sachin Goyal, Ziwei Ji, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar, Vaishnavh Nagarajan
    [arxiv] [Poster]
    • Also accepted at NeurIPS ‘23 Workshop R0-FoMo: Robustness of Few-shot and Zero-shot Learning in Foundation Models

  • Assessing Generalization via Disagreement,
    International Conference on Learning Representations (ICLR) 2022,
    (Double first author) Yiding Jiang*, Vaishnavh Nagarajan*, Christina Baek, J. Zico Kolter
    Accepted for Spotlight presentation
    [arxiv] [Poster]
    • Also accepted at ICML ‘21 Workshop on Overparameterization: Pitfalls & Opportunities

  • Understanding the failure modes of out-of-distribution generalization,
    International Conference on Learning Representations (ICLR) 2021,
    Vaishnavh Nagarajan, Anders Andreassen and Behnam Neyshabur
    [arxiv] [Poster]
    • Invited poster presentation at Conceptual Understanding of Deep Learning Workshop, Google Algorithms Workshop Series, 2021.


WORKSHOP PAPERS
  • Theoretical Insights into Memorization in GANs,
    Neural Information Processing Systems (NeurIPS) 2017 - Integration of Deep Learning Theories Workshop
    Vaishnavh Nagarajan, Colin Raffel, Ian Goodfellow.
    [PDF]

  • Generalization in Deep Networks: The Role of Distance from Initialization,
    Neural Information Processing Systems (NeurIPS) 2017 - Deep Learning: Bridging Theory and Practice
    Vaishnavh Nagarajan and J. Zico Kolter.
    Accepted for Spotlight talk
    [arxiv] [Poster]


THESIS
  • Explaining generalization in deep learning: progress and fundamental limits,
    Vaishnavh Nagarajan, 2021
    [arxiv]



Peer Review
  • ICLR 2023, 2021 (outstanding reviewer award)
  • NeurIPS 2023, 2021, 2020 (top 10%), 2019 (top 50%), 2018 (top 30%)
  • ICML 2024 & 2023 (Expert reviewer), 2022, 2021 (Expert reviewer, top 10%), 2020 (top 33%), 2019 (top 5%)
  • COLT 2019
  • ALT 2021
  • UAI 2022
  • AISTATS 2023 (top 10%) 2019
  • JMLR, Nature
  • Workshops: ICML 22 PODS, ICML 21 OPPO, ICLR-Me-FOMO 2023, DistShift NeurIPS 2023, R0-FoMo NeurIPS 2023 (area chair)

Last Updated: Apr 23th, 2024