This week, I'm excited to welcome Sandra Rivera from VSORA! We dive into a discussion on why AI inference is essential for deployment at scale, specifically focusing on how VSORA’s patented software architecture addresses the "memory wall" by collapsing memory layers. We explore their recent tape-out, which promises approximately 3X the performance at half the power of leading GPUs. We also chat about deployment use cases, the need for low latency and high determinism, future plans for OEM modules and MLPerf benchmarking, and even get a brief look into Sandra’s family llama farm.