Dwarkesh Podcast | Podcast kostenlos online hören

128 Episoden

Alex Imas and Phil Trammell – What remains scarce after AGI?
04.06.2026 | 1 Std. 16 Min.
Economics of AGI episode w Alex Imas and Phil Trammell.
There’s a bunch of important questions about how we deal with AI that only economics can answer.
What is the optimal way to tax and redistribute the wealth that will be generated? How should countries not in the AI supply chain index into the gains? Is there any world where inequality doesn’t explode?
It might seem like these questions have obvious answers, but the first thing economics teaches you is that your intuitions can often be entirely wrong.
It was very helpful to chat through these things with Alex and Phil.
Watch on YouTube; read the transcript.
Sponsors
Jane Street invests heavily in turning smart people into exceptional researchers and engineers. In addition to their apprenticeship model, Jane Street runs lectures and bootcamps in their in-office classrooms -- managers clear their teams’ schedules to encourage attendance. If you’d like to work at a place that takes learning this seriously, Jane Street is hiring. Check out their open roles at janestreet.com/dwarkesh
Google’s Gemini Omni has incredible video editing capabilities -- you can upload a video and have Omni change the background, adjust lighting, or add specific elements. But Omni is also a preview of how future frontier models will be trained -- fully multimodal on both input and output. You can try it yourself in the Gemini app at gemini.google or in Flow at flow.google
Cursor used targeted RL with textual feedback to help train their Composer 2.5 model. One of their researchers, Sasha Rush, gave me an impromptu blackboard lecture to explain how this form of on-policy self-distillation works -- I posted the full thing on X. If you want to try Composer 2.5, go to cursor.com/dwarkesh
Timestamps
(00:00:00) – Will capital share increase?
(00:19:36) – Messy Middle scenario
(00:25:57) – How to tax and redistribute AI wealth
(00:30:02) – Why demand collapse is unlikely
(00:39:26) – Human employees would be hard to integrate into the machine economy
(00:43:08) – What if some humans (or AIs) value wealth accumulation intrinsically?
(01:01:28) – What should developing countries do?

Get full access to Dwarkesh Podcast at www.dwarkesh.com/subscribe
Reiner Pope – Chip design from the bottom up
22.05.2026 | 1 Std. 20 Min.
New blackboard lecture with Reiner Pope: how do chips actually work - starting with basic logic gates, and working up to why GPUs, TPUs, FPGAs, and the human brain each look the way they do.
Reiner is CEO of MatX, a new chip startup (full disclosure - I’m an angel investor). He was previously at Google, where he worked on software efficiency, compilers, and TPU architecture.
Watch this one on YouTube so you can see the chalkboard. Read the transcript.
Sponsors
* Crusoe was one of only five GPU clouds that made the gold tier in SemiAnalysis' most recent ClusterMAX report. Gold-tier providers like Crusoe delivered 5-15% lower TCO than silver-tier clouds, even with identical GPU pricing. This is because optimizations like early fault detection and rapid node replacement don't necessarily show up in the sticker price, but still matter a ton in the real world. Learn more at crusoe.ai/dwarkesh
* Cursor is where I do most of my work—from reading research papers to visualizing technical concepts to coding up internal tools for the podcast. Most recently, I used it to build two different review interfaces for my essay contest, one that anonymizes submissions for scoring and another that lets me see applicants' essays next to their resumes and websites. Whatever you're working on, you should try doing it in Cursor. Get started at cursor.com/dwarkesh
* Jane Street let me ask Ron Minsky and Dan Pontecorvo, two senior Jane Streeters, a bunch of questions about how they use AI. We discussed everything from the types of models they're training to how they think about the future of trading to why they're more bullish than ever on hiring technical talent. You can watch the full conversation and learn more about their open positions at janestreet.com/dwarkesh
Timestamps
00:00:00 – Building a multiply-accumulate from logic gates
00:16:31 – Muxes and the cost of data movement
00:26:10 – How systolic arrays work
00:39:11 – Clock cycles and pipeline registers
00:51:51 – FPGAs vs ASICs
01:03:25 – Cache vs scratchpad
01:07:27 – Why CPU cores are much bigger than GPU cores
01:12:00 – Brains vs chips
01:15:33 – A GPU is just a bunch of tiny TPUs

Get full access to Dwarkesh Podcast at www.dwarkesh.com/subscribe
Eric Jang – Building AlphaGo from scratch
15.05.2026 | 2 Std. 37 Min.
Eric Jang walks through how to build AlphaGo from scratch, but with modern AI tools.
Sometimes you understand the future better by stepping backward. AlphaGo is still the cleanest worked example of the primitives of intelligence: search, learning from experience, and self-play. You have to go back to 2017 to get insight into how the more general AIs of the future might learn.
Once he explained how AlphaGo works, it gave us the context to have a discussion about how RL works in LLMs and how it could work better – naive policy gradient RL has to figure out which of the 100k+ tokens in your trajectory actually got you the right answer, while AlphaGo’s MCTS suggests a strictly better action every single move, giving you a training target that sidesteps the credit assignment problem. The way humans learn is surely closer to the second.
Eric also kickstarted an Autoresearch loop on his project. And it was very interesting to discuss which parts of AI research LLMs can already automate pretty well (implementing and running experiments, optimizing hyperparameters) and which they still struggle with (choosing the right question to investigate next, escaping research dead ends). Informative to all the recent discussion about when we should expect an intelligence explosion, and what it would look like from the inside.
Watch on YouTube. Read the transcript.
And check out the flashcards I wrote to retain the insights.
Sponsors
* Cursor‘s agent SDK let me build a pipeline to generate flashcards for this episode. For each card, I had an agent read the transcript, ingest blackboard screenshots, generate an SVG visual, and run everything through a critic. A durable agent is much better at this kind of work than a chain of LLM calls, and Cursor’s SDK made it easy. Check out the cards at flashcards.dwarkesh.com and get started with the SDK at cursor.com/dwarkesh
* Jane Street gave me a real deep-dive tour of one of their datacenters. I got to ask a bunch of questions to Ron Minsky, who co-leads Jane Street’s tech group, and Dan Pontecorvo, who runs Jane Street’s physical engineering team. They were willing to literally pull up the floorboards and take out racks to explain how everything works. Check out the full tour at janestreet.com/dwarkesh
Timestamps
(00:00:00) – Basics of Go
(00:08:17) – Monte Carlo Tree Search
(00:32:04) – What the neural network does
(01:00:33) – Self-play
(01:25:38) – Alternative RL approaches
(01:45:47) – Why doesn't MCTS work for LLMs
(02:01:09) – Off-policy training
(02:12:02) – RL is even more information inefficient than you thought
(02:22:16) – Automated AI researchers

Get full access to Dwarkesh Podcast at www.dwarkesh.com/subscribe
David Reich – Why the Bronze Age was an inflection point in human evolution
08.05.2026 | 2 Std. 13 Min.
David Reich is back.
He and collaborator Ali Akbari just published a paper that overturns a long-standing consensus about human evolution — that natural selection has been dormant in our species since the agricultural revolution.
By scaling ancient DNA sequencing and developing a new statistical method, they found that selection has actually sped up.
Selection went especially bonkers during the Bronze Age (around 3,000 years ago).
That’s when gene frequencies for everything from immune function to body fat to intelligence were most in flux.
Over the last 10,000 years, selection pushed the genetic predictor of cognitive performance up by roughly a full standard deviation — most of it between 4,000 and 2,000 years ago.
After we finished recording, David sketched out on a whiteboard his new heretical model about who the Neanderthals really were. Luckily, I took out my iPhone and managed to record it.
He thinks the standard story (that Neanderthals are some separate archaic lineage we interbred with a little) just doesn’t fit the evidence. Instead, he proposes that Neanderthals are essentially genetically-swamped modern humans.
A small population somewhere around the Caucasus invented Middle Stone Age technology roughly 300,000 years ago and expanded outward. The ones that moved into Europe interbred with local archaic humans, got genetically swamped, and became Neanderthals. The same expansion went into Africa, met much more diverged archaic Africans, and that mixture became us.
This means Neanderthals and modern humans share the same cultural ancestry — the only difference is which archaic humans they mixed with afterward.
David is a brilliant and rigorous scholar. It was a real delight to learn from him again.
Watch on YouTube; read the transcript.
Sponsors
* Cursor was super useful as I prepped for this episode. Whenever I had a question, I’d have Cursor kick off a few different models simultaneously and then compare their responses. I found that this led to better results than I could get out of any individual LLM. If you’ve only used Cursor for coding, you should try using it for research. Check it out at cursor.com/dwarkesh
* Jane Street uses an internal currency called “hive bucks” to allocate compute through a real-time auction – and anyone can change anyone else’s bids or even kill their jobs! Everyone just trusts each other to act in the firm’s best interest, which is what lets the system work in the first place. If this weird and high-trust culture sounds like your kind of thing, Jane Street’s hiring at janestreet.com/dwarkesh
* Crusoe’s ML infra team built fastokens, an open-source tokenizer that delivers a ~9x speedup over Hugging Face and up to 40% faster time-to-first token – on real production workloads! Crusoe achieved these results by parallelizing things and using some clever engineering to handle duplicates without cross-thread coordination. Learn more at crusoe.ai/dwarkesh
Timestamps
(00:00:00) – Ancient DNA suggests strong selection over last 10,000 years
(00:15:45) – Natural selection intensified during the Bronze Age
(00:35:02) – Why didn’t evolution max out intelligence?
(00:57:21) – Evolution is limited by time, not population size
(01:09:02) – Why no farming before the Ice Age?
(01:17:13) – The Neanderthal puzzle David can’t stop thinking about
(01:54:10) – The methodology behind this breakthrough

Get full access to Dwarkesh Podcast at www.dwarkesh.com/subscribe
Reiner Pope – The math behind how LLMs are trained and served
29.04.2026 | 2 Std. 13 Min.
Did a very different format with Reiner Pope - a blackboard lecture where he walks through how frontier LLMs are trained and served.
It’s shocking how much you can deduce about what the labs are doing from a handful of equations, public API prices, and some chalk.
It’s a bit technical, but I encourage you to hang in there – it’s really worth it.
There are less than a handful of people who understand the full stack of AI, from chip design to model architecture, as well as Reiner. It was a real delight to learn from him.
Recommend watching this one on YouTube so you can see the chalkboard.
Reiner is CEO of MatX, a new chip startup (full disclosure - I’m an angel investor). He was previously at Google, where he worked on software efficiency, compilers, and TPU architecture.
Download markdown of transcript here to chat with an LLM.
Wrote up some flashcards and practice problems to help myself retain what Reiner taught. Hope it's helpful to you too!
Sponsors
* Jane Street needs constant access to incredibly low-latency compute. I recently asked one of their engineers, Clark, to talk me through how they meet these demands. Our conversation—which touched on everything from FPGAs to liquid cooling—was extremely helpful as I prepped to interview Reiner. You can watch the full discussion and explore Jane Street’s open roles at janestreet.com/dwarkesh
* Google’s Gemma 4 is the first open model that’s let me shut off the internet and create a fully disconnected “focus machine”. This is because Gemma is small enough to run on my laptop, but powerful enough to actually be useful. So, to prep for this interview, I downloaded Reiner’s scaling book, disconnected from wifi, and used Gemma to help me break down the material. Check it out at goo.gle/Gemma4
* Cursor helped me turn some notes I took on how gradients flow during large-scale pretraining into a great animation. At first, I wasn’t sure the best way to visualize the concept, but Cursor’s Composer 2 Fast model let me iterate on different ideas almost instantaneously. You can check out the animation in my recent blog post. And if you have something to visualize yourself, go to cursor.com/dwarkesh
Timestamps
(00:00:00) – How batch size affects token cost and speed
(00:32:09) – How MoE models are laid out across GPU racks
(00:47:12) – How pipeline parallelism spreads model layers across racks
(01:03:37) – Why Ilya said, “As we now know, pipelining is not wise.”
(01:18:59) – Because of RL, models may be 100x over-trained beyond Chinchilla-optimal
(01:33:02) – Deducing long context memory costs from API pricing
(02:04:02) – Convergent evolution between neural nets and cryptography

Get full access to Dwarkesh Podcast at www.dwarkesh.com/subscribe