Skip to content
felesh.ai
Engineering · Series

️ LLM Infrastructure

A 4-part series

How do large language models actually run? This series walks through the inference engine itself — the difference between the prefill and decode phases, the role of the KV cache, the numerical compression of weights, and the batched scheduling of requests. The goal is a clear, vendor-neutral picture of where cost and speed are really decided — foundational knowledge that serves you whatever your stack.