Build
Want to build something today?
Hands-on and copy-paste-ready — pick a model, fine-tune on free Colab, ship it.
A practical checklist for picking an LLM for your feature Choosing a model is less about leaderboards than about knowing your own need precisely. Six simple steps you can follow today. Build 2 min read How LLM inference actually works: prefill vs decode Text generation has two phases with very different behaviour: one compute-bound, one memory-bound. Understanding the difference explains why the KV cache exists and why decode is slow. Build 3 min read Fine-tune your first model on free Colab: QLoRA in about 40 lines Fine-tuning a model doesn't have to need an expensive cluster. With QLoRA you can tune a small model on a free GPU in just a few dozen lines of code. Build 3 min read A 70B model on one GPU: a practical guide to quantization A seventy-billion-parameter model needs about 140 GB of memory at full precision. With quantization you can compress that same model until it fits on a single GPU — and keep quality almost untouched. Build 3 min read Working with LLM APIs: first calls, tokens, and structured output Your first call to a language-model API is simpler than it looks. Once you know a few basics — roles, tokens, temperature, and structured output — the rest falls into place. Build 3 min read Zero-shot, few-shot, or chain-of-thought: picking the right technique There are three basic prompting techniques, and each has its place. Knowing when to reach for which matters more than the techniques themselves. Build 2 min read