<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"><channel><title>Build — Felesh</title><description>Build</description><link>https://blog.felesh.ai/</link><item><title>LoRA hyperparameters demystified: rank, alpha, and what to set</title><link>https://blog.felesh.ai/en/tutorials/lora-hyperparameters-demystified/</link><guid isPermaLink="true">https://blog.felesh.ai/en/tutorials/lora-hyperparameters-demystified/</guid><description>Fine-tuning with LoRA has a handful of key numbers, and once you know what they mean, choosing them is simple. This guide clears up rank, alpha, learning rate, and the rest.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>A practical checklist for picking an LLM for your feature</title><link>https://blog.felesh.ai/en/tutorials/pick-an-llm-checklist/</link><guid isPermaLink="true">https://blog.felesh.ai/en/tutorials/pick-an-llm-checklist/</guid><description>Choosing a model is less about leaderboards than about knowing your own need precisely. Six simple steps you can follow today.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>How LLM inference actually works: prefill vs decode</title><link>https://blog.felesh.ai/en/tutorials/prefill-vs-decode/</link><guid isPermaLink="true">https://blog.felesh.ai/en/tutorials/prefill-vs-decode/</guid><description>Text generation has two phases with very different behaviour: one compute-bound, one memory-bound. Understanding the difference explains why the KV cache exists and why decode is slow.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Fine-tune your first model on free Colab: QLoRA in about 40 lines</title><link>https://blog.felesh.ai/en/tutorials/qlora-on-free-colab/</link><guid isPermaLink="true">https://blog.felesh.ai/en/tutorials/qlora-on-free-colab/</guid><description>Fine-tuning a model doesn&apos;t have to need an expensive cluster. With QLoRA you can tune a small model on a free GPU in just a few dozen lines of code.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>A 70B model on one GPU: a practical guide to quantization</title><link>https://blog.felesh.ai/en/tutorials/run-a-70b-on-one-gpu/</link><guid isPermaLink="true">https://blog.felesh.ai/en/tutorials/run-a-70b-on-one-gpu/</guid><description>A seventy-billion-parameter model needs about 140 GB of memory at full precision. With quantization you can compress that same model until it fits on a single GPU — and keep quality almost untouched.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Working with LLM APIs: first calls, tokens, and structured output</title><link>https://blog.felesh.ai/en/tutorials/working-with-llm-apis/</link><guid isPermaLink="true">https://blog.felesh.ai/en/tutorials/working-with-llm-apis/</guid><description>Your first call to a language-model API is simpler than it looks. Once you know a few basics — roles, tokens, temperature, and structured output — the rest falls into place.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Zero-shot, few-shot, or chain-of-thought: picking the right technique</title><link>https://blog.felesh.ai/en/tutorials/zero-shot-few-shot-cot/</link><guid isPermaLink="true">https://blog.felesh.ai/en/tutorials/zero-shot-few-shot-cot/</guid><description>There are three basic prompting techniques, and each has its place. Knowing when to reach for which matters more than the techniques themselves.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item></channel></rss>