<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"><channel><title>Felesh</title><description>Latest posts from Felesh</description><link>https://blog.felesh.ai/</link><item><title>Why a person approves before the AI acts</title><link>https://blog.felesh.ai/en/method/a-person-approves-before-the-ai-acts/</link><guid isPermaLink="true">https://blog.felesh.ai/en/method/a-person-approves-before-the-ai-acts/</guid><description>Autonomy isn&apos;t the goal — trustworthy action is. Our agents propose; a person approves or sends a revision note; only then does the work commit. Authority is earned, not granted.</description><pubDate>Tue, 23 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Every action carries its reason</title><link>https://blog.felesh.ai/en/method/every-action-carries-its-reason/</link><guid isPermaLink="true">https://blog.felesh.ai/en/method/every-action-carries-its-reason/</guid><description>Most systems record what changed. The thing you can never reconstruct later is why — so we make the reason a mandatory part of every change, captured the moment it happens.</description><pubDate>Tue, 23 Jun 2026 00:00:00 GMT</pubDate></item><item><title>One relationship, many specialists</title><link>https://blog.felesh.ai/en/method/one-relationship-many-specialists/</link><guid isPermaLink="true">https://blog.felesh.ai/en/method/one-relationship-many-specialists/</guid><description>A customer should feel one relationship, not be handed from bot to bot. Behind a single surface, specialists each own a domain — unified by default, separable by a setting, never by a rebuild.</description><pubDate>Tue, 23 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Agents that act, not just answer — safely</title><link>https://blog.felesh.ai/en/insights/agents-that-act-safely/</link><guid isPermaLink="true">https://blog.felesh.ai/en/insights/agents-that-act-safely/</guid><description>The real leap is when a system doesn&apos;t just answer but gets something done in a real system. And acting makes mistakes costly — so &apos;safely&apos; has to be structural.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>AI that survives an audit</title><link>https://blog.felesh.ai/en/insights/ai-that-survives-an-audit/</link><guid isPermaLink="true">https://blog.felesh.ai/en/insights/ai-that-survives-an-audit/</guid><description>When an intelligent system makes a decision, sooner or later someone asks: &apos;why?&apos; A system that can&apos;t answer fails an audit. We believe auditability has to be in the design from day one, not a later add-on.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Attentive AI: why focused beats big</title><link>https://blog.felesh.ai/en/insights/attentive-ai/</link><guid isPermaLink="true">https://blog.felesh.ai/en/insights/attentive-ai/</guid><description>The biggest model is not the best answer. When every job has a defined boundary, focused intelligence beats jack-of-all-trades intelligence every time.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Authority is a trajectory, not a key</title><link>https://blog.felesh.ai/en/insights/authority-as-a-trajectory/</link><guid isPermaLink="true">https://blog.felesh.ai/en/insights/authority-as-a-trajectory/</guid><description>You don&apos;t hand an intern the keys to everything on day one. AI agents should earn their authority along a path too — just like a career.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Three lessons the human mind offers agent architecture</title><link>https://blog.felesh.ai/en/engineering/agent-architecture/brain-lessons-for-agent-architecture/</link><guid isPermaLink="true">https://blog.felesh.ai/en/engineering/agent-architecture/brain-lessons-for-agent-architecture/</guid><description>Several hard problems in agent design rhyme with how the human mind works: focus over clutter, knowing when a job is done, and separating the layers of memory. The resemblances are a good guide for design.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Cognitive Least Privilege: your agent should know only what it needs</title><link>https://blog.felesh.ai/en/engineering/prompting/cognitive-least-privilege/</link><guid isPermaLink="true">https://blog.felesh.ai/en/engineering/prompting/cognitive-least-privilege/</guid><description>Any information that doesn&apos;t serve the agent&apos;s job both lowers accuracy and widens the attack surface. Borrow least privilege from security and extend it to what an agent knows.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>CRM is a special case of a larger pattern</title><link>https://blog.felesh.ai/en/insights/crm-is-a-special-case/</link><guid isPermaLink="true">https://blog.felesh.ai/en/insights/crm-is-a-special-case/</guid><description>What we&apos;ve built to manage customer relationships looks, at first glance, like a CRM. But underneath it lies a more general pattern that reaches beyond relationships.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>You&apos;re not buying software — you&apos;re buying an ecosystem that compounds</title><link>https://blog.felesh.ai/en/insights/ecosystem-that-compounds/</link><guid isPermaLink="true">https://blog.felesh.ai/en/insights/ecosystem-that-compounds/</guid><description>A tool solves one problem once, and stays there. What we build is an ecosystem whose intelligence compounds with every addition.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Evaluate models on your own set, not the public leaderboard</title><link>https://blog.felesh.ai/en/engineering/model-selection/evaluate-on-your-own-set/</link><guid isPermaLink="true">https://blog.felesh.ai/en/engineering/model-selection/evaluate-on-your-own-set/</guid><description>Public leaderboards tell you less about your work than you think. The reliable method: build a small eval set that represents your real task, and score the candidates on that.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Event-driven by design: agent teams that don&apos;t lose messages</title><link>https://blog.felesh.ai/en/engineering/agent-architecture/event-driven-agent-teams/</link><guid isPermaLink="true">https://blog.felesh.ai/en/engineering/agent-architecture/event-driven-agent-teams/</guid><description>When several agents work together, the biggest risk is lost messages and a collapsing chain. An event-driven architecture removes that risk with a few simple rules.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Fine-tune, RAG, or prompt: which one, and what each costs</title><link>https://blog.felesh.ai/en/insights/fine-tune-rag-or-prompt/</link><guid isPermaLink="true">https://blog.felesh.ai/en/insights/fine-tune-rag-or-prompt/</guid><description>There are three ways to adapt a model to your need, and the wrong choice can get expensive. The difference is in what problem each one actually solves.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Common fine-tuning pitfalls and how to debug them</title><link>https://blog.felesh.ai/en/engineering/fine-tuning/fine-tuning-pitfalls/</link><guid isPermaLink="true">https://blog.felesh.ai/en/engineering/fine-tuning/fine-tuning-pitfalls/</guid><description>Most failed fine-tunes trace back to a few recurring patterns. If you know the signs, debugging becomes a simple checklist instead of guesswork.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>From LLMs to agents: the complete journey</title><link>https://blog.felesh.ai/en/learn/llms-to-agents/from-llms-to-agents/</link><guid isPermaLink="true">https://blog.felesh.ai/en/learn/llms-to-agents/from-llms-to-agents/</guid><description>A language model, at heart, just guesses the next word. Here&apos;s how that simple guess becomes an agent once you add tools, memory, and planning — and where you actually need one.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>From prompt engineering to context engineering</title><link>https://blog.felesh.ai/en/insights/from-prompt-to-context-engineering/</link><guid isPermaLink="true">https://blog.felesh.ai/en/insights/from-prompt-to-context-engineering/</guid><description>There was a time when the art of working with a model came down to writing a good prompt. But we believe the centre of gravity is shifting: from crafting one instruction to designing the whole context the model works in.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>From a vague message to the right structured action</title><link>https://blog.felesh.ai/en/insights/from-vague-message-to-action/</link><guid isPermaLink="true">https://blog.felesh.ai/en/insights/from-vague-message-to-action/</guid><description>Humans speak vaguely, incompletely, messily. The hard job of an intelligent system is to pull the right action out of that mess — without asking the user to speak like a form.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>How agents remember: memory and knowledge representation</title><link>https://blog.felesh.ai/en/learn/llms-to-agents/how-agents-remember/</link><guid isPermaLink="true">https://blog.felesh.ai/en/learn/llms-to-agents/how-agents-remember/</guid><description>A language model has no memory of its own and begins each conversation from scratch. What gives an agent memory is the layers built around the model.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>How an AI agent thinks</title><link>https://blog.felesh.ai/en/learn/llms-to-agents/how-agents-think/</link><guid isPermaLink="true">https://blog.felesh.ai/en/learn/llms-to-agents/how-agents-think/</guid><description>An agent takes a goal, reasons, acts, and observes the result — repeating that loop until it&apos;s done. A simple mental model for how agents reason and decide.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Layered intelligence: when conversational AI meets background AI</title><link>https://blog.felesh.ai/en/insights/layered-intelligence/</link><guid isPermaLink="true">https://blog.felesh.ai/en/insights/layered-intelligence/</guid><description>A good agent has to do two things well at once: fluent conversation with a human, and deep cognitive work behind the scenes. These are two different layers of intelligence — and we believe separating them is the key.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>The LoRA family: QLoRA, DoRA, and LoRA+ — which, and when?</title><link>https://blog.felesh.ai/en/engineering/fine-tuning/lora-family-variants/</link><guid isPermaLink="true">https://blog.felesh.ai/en/engineering/fine-tuning/lora-family-variants/</guid><description>Since LoRA was introduced, several improved variants have appeared, each targeting one particular problem. Knowing them helps you pick the right one for each job.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>LoRA hyperparameters demystified: rank, alpha, and what to set</title><link>https://blog.felesh.ai/en/tutorials/lora-hyperparameters-demystified/</link><guid isPermaLink="true">https://blog.felesh.ai/en/tutorials/lora-hyperparameters-demystified/</guid><description>Fine-tuning with LoRA has a handful of key numbers, and once you know what they mean, choosing them is simple. This guide clears up rank, alpha, learning rate, and the rest.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>MLP is the model&apos;s memory: where knowledge lives</title><link>https://blog.felesh.ai/en/engineering/llm-infra/mlp-is-the-models-memory/</link><guid isPermaLink="true">https://blog.felesh.ai/en/engineering/llm-infra/mlp-is-the-models-memory/</guid><description>In a language model, attention layers route information, but the actual knowledge is stored elsewhere — in the MLP layers that make up the bulk of the model.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Why we run our own models on home infrastructure</title><link>https://blog.felesh.ai/en/insights/our-own-models/</link><guid isPermaLink="true">https://blog.felesh.ai/en/insights/our-own-models/</guid><description>Running models on our own infrastructure isn&apos;t just a technical choice; it&apos;s a conviction about independence, keeping data at home, and durability. This is our decision and its reasoning.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Hire, coach, and learn to trust your AI employees</title><link>https://blog.felesh.ai/en/insights/owner-ai-employees/</link><guid isPermaLink="true">https://blog.felesh.ai/en/insights/owner-ai-employees/</guid><description>You install software; you hire an employee. That small difference changes everything about how you build a digital organization.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>PagedAttention and continuous batching: how one server answers more users</title><link>https://blog.felesh.ai/en/engineering/llm-infra/paged-attention-continuous-batching/</link><guid isPermaLink="true">https://blog.felesh.ai/en/engineering/llm-infra/paged-attention-continuous-batching/</guid><description>Two infrastructure tricks multiply the capacity of a language-model server: continuous batching and smart KV-cache management. Both come from one simple idea — don&apos;t waste resources.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>A practical checklist for picking an LLM for your feature</title><link>https://blog.felesh.ai/en/tutorials/pick-an-llm-checklist/</link><guid isPermaLink="true">https://blog.felesh.ai/en/tutorials/pick-an-llm-checklist/</guid><description>Choosing a model is less about leaderboards than about knowing your own need precisely. Six simple steps you can follow today.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>How LLM inference actually works: prefill vs decode</title><link>https://blog.felesh.ai/en/tutorials/prefill-vs-decode/</link><guid isPermaLink="true">https://blog.felesh.ai/en/tutorials/prefill-vs-decode/</guid><description>Text generation has two phases with very different behaviour: one compute-bound, one memory-bound. Understanding the difference explains why the KV cache exists and why decode is slow.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>When no single model is enough: the primary and verifier pattern</title><link>https://blog.felesh.ai/en/engineering/model-selection/primary-and-verifier/</link><guid isPermaLink="true">https://blog.felesh.ai/en/engineering/model-selection/primary-and-verifier/</guid><description>Sometimes a task has two critical demands that no single model satisfies at once. The answer isn&apos;t to accept a weak model; it&apos;s to combine two.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Common failure modes in LLM systems — and how to catch them</title><link>https://blog.felesh.ai/en/engineering/model-selection/production-failure-modes/</link><guid isPermaLink="true">https://blog.felesh.ai/en/engineering/model-selection/production-failure-modes/</guid><description>A language model fails in specific ways, not random ones. If you know these modes, you can catch them before your users do.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Stop ranking LLMs, start profiling them</title><link>https://blog.felesh.ai/en/engineering/model-selection/profile-dont-rank-llms/</link><guid isPermaLink="true">https://blog.felesh.ai/en/engineering/model-selection/profile-dont-rank-llms/</guid><description>A single number on a leaderboard won&apos;t tell you which model fits your job. A multi-dimensional profile of capabilities will.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Prompt engineering from zero: write a formal letter, not a text message</title><link>https://blog.felesh.ai/en/learn/llms-to-agents/prompt-engineering-from-zero/</link><guid isPermaLink="true">https://blog.felesh.ai/en/learn/llms-to-agents/prompt-engineering-from-zero/</guid><description>A language model can&apos;t read your mind. Everything you&apos;d take for granted in a casual chat has to be stated outright in the prompt — the six components that make a good one, and the five mistakes beginners make.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Defending against prompt injection and jailbreaks — and reducing hallucination</title><link>https://blog.felesh.ai/en/engineering/prompting/prompt-injection-and-defense/</link><guid isPermaLink="true">https://blog.felesh.ai/en/engineering/prompting/prompt-injection-and-defense/</guid><description>When user input can change an agent&apos;s behaviour, security becomes a design problem. A few clear principles neutralise most of these attacks.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Fine-tune your first model on free Colab: QLoRA in about 40 lines</title><link>https://blog.felesh.ai/en/tutorials/qlora-on-free-colab/</link><guid isPermaLink="true">https://blog.felesh.ai/en/tutorials/qlora-on-free-colab/</guid><description>Fine-tuning a model doesn&apos;t have to need an expensive cluster. With QLoRA you can tune a small model on a free GPU in just a few dozen lines of code.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Making AI reliable on real, messy documents</title><link>https://blog.felesh.ai/en/insights/reliable-on-messy-documents/</link><guid isPermaLink="true">https://blog.felesh.ai/en/insights/reliable-on-messy-documents/</guid><description>On a clean document, any model looks good. The difference is in the messy tail — where real documents live.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>A 70B model on one GPU: a practical guide to quantization</title><link>https://blog.felesh.ai/en/tutorials/run-a-70b-on-one-gpu/</link><guid isPermaLink="true">https://blog.felesh.ai/en/tutorials/run-a-70b-on-one-gpu/</guid><description>A seventy-billion-parameter model needs about 140 GB of memory at full precision. With quantization you can compress that same model until it fits on a single GPU — and keep quality almost untouched.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Save first, then publish: a simple rule for not losing work</title><link>https://blog.felesh.ai/en/engineering/llm-infra/save-before-publish/</link><guid isPermaLink="true">https://blog.felesh.ai/en/engineering/llm-infra/save-before-publish/</guid><description>One of the most common hidden bugs in event-driven systems is publishing the news before the fact is recorded. The right order — save first, then publish — removes that bug at the root.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>One agent or many? When each one fits</title><link>https://blog.felesh.ai/en/learn/llms-to-agents/single-agent-vs-multi-agent/</link><guid isPermaLink="true">https://blog.felesh.ai/en/learn/llms-to-agents/single-agent-vs-multi-agent/</guid><description>Sometimes a single agent is the best answer, and sometimes the work should be split across several. The deciding factor is the complexity of the task, not how advanced the architecture looks.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Smaller, faster, cheaper: the case against one big model for everything</title><link>https://blog.felesh.ai/en/insights/smaller-faster-cheaper/</link><guid isPermaLink="true">https://blog.felesh.ai/en/insights/smaller-faster-cheaper/</guid><description>It&apos;s tempting to hand every job to the most capable model. But most jobs need only one focused capability — and the right smaller model does it faster and cheaper.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>The rule layer: deterministic guardrails around a probabilistic model</title><link>https://blog.felesh.ai/en/engineering/model-selection/the-rule-layer/</link><guid isPermaLink="true">https://blog.felesh.ai/en/engineering/model-selection/the-rule-layer/</guid><description>A language model is probabilistic and sometimes errs. The way to make it reliable isn&apos;t to perfect the model; it&apos;s to build a deterministic layer that catches what slips through.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>The self-learning colleague: a roadmap, not a release</title><link>https://blog.felesh.ai/en/insights/the-self-learning-colleague/</link><guid isPermaLink="true">https://blog.felesh.ai/en/insights/the-self-learning-colleague/</guid><description>Today, an agent begins each conversation anew and learns nothing from yesterday&apos;s work. What follows is a vision, not a current capability: an agent that learns from its own work.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Tracing a request through a multi-agent system</title><link>https://blog.felesh.ai/en/engineering/agent-architecture/trace-a-request-through-agents/</link><guid isPermaLink="true">https://blog.felesh.ai/en/engineering/agent-architecture/trace-a-request-through-agents/</guid><description>The best way to understand a multi-agent architecture is to follow a real request from start to finish. Let&apos;s trace a vague message, step by step, into a structured action.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>What quantization actually does: precision loss and vector-space collapse</title><link>https://blog.felesh.ai/en/engineering/llm-infra/what-quantization-actually-does/</link><guid isPermaLink="true">https://blog.felesh.ai/en/engineering/llm-infra/what-quantization-actually-does/</guid><description>Quantization means holding a model&apos;s weights with fewer bits. But what exactly does this loss of precision do to the model, and why are models so robust to it?</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Where LLM serving costs actually go</title><link>https://blog.felesh.ai/en/insights/where-llm-serving-costs-go/</link><guid isPermaLink="true">https://blog.felesh.ai/en/insights/where-llm-serving-costs-go/</guid><description>If you open up the bill for serving a model, most of the cost is concentrated in one place. Understanding that concentration also clarifies the eternal &apos;build or buy&apos; question.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Why LoRA works: the intrinsic-dimensionality story</title><link>https://blog.felesh.ai/en/engineering/fine-tuning/why-lora-works/</link><guid isPermaLink="true">https://blog.felesh.ai/en/engineering/fine-tuning/why-lora-works/</guid><description>If a large model has billions of parameters, how can you tune it by training only a few small matrices? The answer is a subtle idea: the change you need has a small intrinsic dimension.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Why we don&apos;t build chatbots: the two multi-agent paradigms we rejected</title><link>https://blog.felesh.ai/en/insights/why-not-chatbots/</link><guid isPermaLink="true">https://blog.felesh.ai/en/insights/why-not-chatbots/</guid><description>Most AI systems are built in one of two shapes — and both break at scale. This is the story of what we chose instead.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>From worker to specialist: an agent that owns a domain</title><link>https://blog.felesh.ai/en/engineering/agent-architecture/worker-to-specialist/</link><guid isPermaLink="true">https://blog.felesh.ai/en/engineering/agent-architecture/worker-to-specialist/</guid><description>The difference between an executing worker and a specialist is that the first does one job and steps aside, while the second owns a domain and carries its state over time.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Working with LLM APIs: first calls, tokens, and structured output</title><link>https://blog.felesh.ai/en/tutorials/working-with-llm-apis/</link><guid isPermaLink="true">https://blog.felesh.ai/en/tutorials/working-with-llm-apis/</guid><description>Your first call to a language-model API is simpler than it looks. Once you know a few basics — roles, tokens, temperature, and structured output — the rest falls into place.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Zero-shot, few-shot, or chain-of-thought: picking the right technique</title><link>https://blog.felesh.ai/en/tutorials/zero-shot-few-shot-cot/</link><guid isPermaLink="true">https://blog.felesh.ai/en/tutorials/zero-shot-few-shot-cot/</guid><description>There are three basic prompting techniques, and each has its place. Knowing when to reach for which matters more than the techniques themselves.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item></channel></rss>