Build

Want to build something today?

Hands-on and copy-paste-ready — pick a model, fine-tune on free Colab, ship it.

LoRA hyperparameters demystified: rank, alpha, and what to set

Fine-tuning with LoRA has a handful of key numbers, and once you know what they mean, choosing them is simple. This guide clears up rank, alpha, learning rate, and the rest.

3 min read

A practical checklist for picking an LLM for your feature Choosing a model is less about leaderboards than about knowing your own need precisely. Six simple steps you can follow today. Build 2 min read How LLM inference actually works: prefill vs decode Text generation has two phases with very different behaviour: one compute-bound, one memory-bound. Understanding the difference explains why the KV cache exists and why decode is slow. Build 3 min read Fine-tune your first model on free Colab: QLoRA in about 40 lines Fine-tuning a model doesn't have to need an expensive cluster. With QLoRA you can tune a small model on a free GPU in just a few dozen lines of code. Build 3 min read A 70B model on one GPU: a practical guide to quantization A seventy-billion-parameter model needs about 140 GB of memory at full precision. With quantization you can compress that same model until it fits on a single GPU — and keep quality almost untouched. Build 3 min read Working with LLM APIs: first calls, tokens, and structured output Your first call to a language-model API is simpler than it looks. Once you know a few basics — roles, tokens, temperature, and structured output — the rest falls into place. Build 3 min read Zero-shot, few-shot, or chain-of-thought: picking the right technique There are three basic prompting techniques, and each has its place. Knowing when to reach for which matters more than the techniques themselves. Build 2 min read