Build — Felesh

Build — FeleshBuildhttps://blog.felesh.ai/LoRA hyperparameters demystified: rank, alpha, and what to sethttps://blog.felesh.ai/en/tutorials/lora-hyperparameters-demystified/https://blog.felesh.ai/en/tutorials/lora-hyperparameters-demystified/Fine-tuning with LoRA has a handful of key numbers, and once you know what they mean, choosing them is simple. This guide clears up rank, alpha, learning rate, and the rest.Sun, 21 Jun 2026 00:00:00 GMTA practical checklist for picking an LLM for your featurehttps://blog.felesh.ai/en/tutorials/pick-an-llm-checklist/https://blog.felesh.ai/en/tutorials/pick-an-llm-checklist/Choosing a model is less about leaderboards than about knowing your own need precisely. Six simple steps you can follow today.Sun, 21 Jun 2026 00:00:00 GMTHow LLM inference actually works: prefill vs decodehttps://blog.felesh.ai/en/tutorials/prefill-vs-decode/https://blog.felesh.ai/en/tutorials/prefill-vs-decode/Text generation has two phases with very different behaviour: one compute-bound, one memory-bound. Understanding the difference explains why the KV cache exists and why decode is slow.Sun, 21 Jun 2026 00:00:00 GMTFine-tune your first model on free Colab: QLoRA in about 40 lineshttps://blog.felesh.ai/en/tutorials/qlora-on-free-colab/https://blog.felesh.ai/en/tutorials/qlora-on-free-colab/Fine-tuning a model doesn't have to need an expensive cluster. With QLoRA you can tune a small model on a free GPU in just a few dozen lines of code.Sun, 21 Jun 2026 00:00:00 GMTA 70B model on one GPU: a practical guide to quantizationhttps://blog.felesh.ai/en/tutorials/run-a-70b-on-one-gpu/https://blog.felesh.ai/en/tutorials/run-a-70b-on-one-gpu/A seventy-billion-parameter model needs about 140 GB of memory at full precision. With quantization you can compress that same model until it fits on a single GPU — and keep quality almost untouched.Sun, 21 Jun 2026 00:00:00 GMTWorking with LLM APIs: first calls, tokens, and structured outputhttps://blog.felesh.ai/en/tutorials/working-with-llm-apis/https://blog.felesh.ai/en/tutorials/working-with-llm-apis/Your first call to a language-model API is simpler than it looks. Once you know a few basics — roles, tokens, temperature, and structured output — the rest falls into place.Sun, 21 Jun 2026 00:00:00 GMTZero-shot, few-shot, or chain-of-thought: picking the right techniquehttps://blog.felesh.ai/en/tutorials/zero-shot-few-shot-cot/https://blog.felesh.ai/en/tutorials/zero-shot-few-shot-cot/There are three basic prompting techniques, and each has its place. Knowing when to reach for which matters more than the techniques themselves.Sun, 21 Jun 2026 00:00:00 GMT