Valmik Nahata

PST

Light Mode

Favorite Papers

Attention Is All You Need

Attention Is All You Need

The death of RNNs and the birth of the parallelizable sequence model.

Language Models are Few-Shot Learners

Language Models are Few-Shot Learners

GPT-3 and the realization that scale is a quality of its own.

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

A simple prompting trick that unlocked emergent logical capabilities.

Training language models to follow instructions with human feedback

Training language models to follow instructions with human feedback

The core mechanics of RLHF; aligning predicted text with human intent.

Constitutional AI: Harmlessness from AI Feedback

Constitutional AI: Harmlessness from AI Feedback

Scalable oversight: using a model to supervise another model's safety.

Computing Machinery and Intelligence

Computing Machinery and Intelligence

The original question: 'Can machines think?' and the imitation game.

A Mathematical Theory of Communication

A Mathematical Theory of Communication

The Bell System Technical Journal (1948)

Information entropy defined. The bedrock of every bit we transmit.

Mastering the game of Go with deep neural networks and tree search

Mastering the game of Go with deep neural networks and tree search

AlphaGo and the triumph of MCTS pair with deep reinforcement learning.

Scaling Laws for Neural Language Models

Scaling Laws for Neural Language Models

The empirical predictability of error as compute and data grow.

Direct Preference Optimization

Direct Preference Optimization

Rafailov et al.

Removing the complex reward model from RLHF for simpler alignment.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

The masked language modeling revolution for context-aware embeddings.

Deep Residual Learning for Image Recognition

Deep Residual Learning for Image Recognition

ResNets and the identity mapping that enabled training 1000+ layers.

Generative Adversarial Nets

Generative Adversarial Nets

Goodfellow et al.

The zero-sum game that redefined synthetic data generation.

LoRA: Low-Rank Adaptation of Large Language Models

LoRA: Low-Rank Adaptation of Large Language Models

Fine-tuning billions of parameters by updating only a tiny fraction.

Chinchilla: Training Compute-Optimal Large Language Models

Chinchilla: Training Compute-Optimal Large Language Models

Hoffmann et al.

Challenging the assumption that bigger is always better; data matters.

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

Enabling models to branch, look ahead, and backtrack during reasoning.

Voyager: An Open-Ended Embodied Agent with Large Language Models

Voyager: An Open-Ended Embodied Agent with Large Language Models

Agents that learn continuously in Minecraft through a code-based skill library.

Self-Instruct: Aligning Language Models with Self-Generated Instructions

Self-Instruct: Aligning Language Models with Self-Generated Instructions

Bootstrapping an instruction-tuned model from a raw base model.

LLaMA: Open and Efficient Foundation Language Models

LLaMA: Open and Efficient Foundation Language Models

Democratizing state-of-the-art performance for the open-source community.

DALL·E: Zero-Shot Text-to-Image Generation

DALL·E: Zero-Shot Text-to-Image Generation

Bridging the gap between conceptual text and high-fidelity visuals.

Reinforcement Learning from Human Feedback

Reinforcement Learning from Human Feedback

Christiano et al.

Teaching a model to backflip using only human preferences.

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

AlphaZero and the endgame of superhuman generalized game intelligence.

The Second Law of Thermodynamics

The Second Law of Thermodynamics

Clausius / Kelvin

The inevitable heat death of every system. My favorite physics constraint.

Bitter Lesson

The hard truth: compute-heavy methods eventually crush human-engineered cleverness.

ImageNet Classification with Deep Convolutional Neural Networks

ImageNet Classification with Deep Convolutional Neural Networks

Krizhevsky et al.

AlexNet: The Big Bang of the modern deep learning era.