Research

Machine learning has recently made great progress by following a “straightforward recipe”:

This recipe has been wildly successful: It’s delivered software that can generate realistic images, chatbots that can solve challenging math problems roughly on par with a (mediocre) grad student, and algorithms that can generate high-quality 3D assets based on a single 2D image.

But this recipe doesn’t always work! In my research, I’ve identified several real-world machine learning problems where naively scaling up training data, model size, or the number of training iterations is either impossible, fails to yield the promised improvements, or even leads to unexpected – and unwanted – results.

For example:

For each of these problems, I’ve developed alternative approaches that avoid these fundamental issues and that are then amenable to scale.

My work complements scale. I aim to build a fundamental understanding of machine learning problems to answer questions like…

…before spending valuable time and compute training – and inevitably debugging – a neural pipeline.