Felesh

FeleshLatest posts from Feleshhttps://blog.felesh.ai/Why a person approves before the AI actshttps://blog.felesh.ai/en/method/a-person-approves-before-the-ai-acts/https://blog.felesh.ai/en/method/a-person-approves-before-the-ai-acts/Autonomy isn't the goal — trustworthy action is. Our agents propose; a person approves or sends a revision note; only then does the work commit. Authority is earned, not granted.Tue, 23 Jun 2026 00:00:00 GMTEvery action carries its reasonhttps://blog.felesh.ai/en/method/every-action-carries-its-reason/https://blog.felesh.ai/en/method/every-action-carries-its-reason/Most systems record what changed. The thing you can never reconstruct later is why — so we make the reason a mandatory part of every change, captured the moment it happens.Tue, 23 Jun 2026 00:00:00 GMTOne relationship, many specialistshttps://blog.felesh.ai/en/method/one-relationship-many-specialists/https://blog.felesh.ai/en/method/one-relationship-many-specialists/A customer should feel one relationship, not be handed from bot to bot. Behind a single surface, specialists each own a domain — unified by default, separable by a setting, never by a rebuild.Tue, 23 Jun 2026 00:00:00 GMTAgents that act, not just answer — safelyhttps://blog.felesh.ai/en/insights/agents-that-act-safely/https://blog.felesh.ai/en/insights/agents-that-act-safely/The real leap is when a system doesn't just answer but gets something done in a real system. And acting makes mistakes costly — so 'safely' has to be structural.Sun, 21 Jun 2026 00:00:00 GMTAI that survives an audithttps://blog.felesh.ai/en/insights/ai-that-survives-an-audit/https://blog.felesh.ai/en/insights/ai-that-survives-an-audit/When an intelligent system makes a decision, sooner or later someone asks: 'why?' A system that can't answer fails an audit. We believe auditability has to be in the design from day one, not a later add-on.Sun, 21 Jun 2026 00:00:00 GMTAttentive AI: why focused beats bighttps://blog.felesh.ai/en/insights/attentive-ai/https://blog.felesh.ai/en/insights/attentive-ai/The biggest model is not the best answer. When every job has a defined boundary, focused intelligence beats jack-of-all-trades intelligence every time.Sun, 21 Jun 2026 00:00:00 GMTAuthority is a trajectory, not a keyhttps://blog.felesh.ai/en/insights/authority-as-a-trajectory/https://blog.felesh.ai/en/insights/authority-as-a-trajectory/You don't hand an intern the keys to everything on day one. AI agents should earn their authority along a path too — just like a career.Sun, 21 Jun 2026 00:00:00 GMTThree lessons the human mind offers agent architecturehttps://blog.felesh.ai/en/engineering/agent-architecture/brain-lessons-for-agent-architecture/https://blog.felesh.ai/en/engineering/agent-architecture/brain-lessons-for-agent-architecture/Several hard problems in agent design rhyme with how the human mind works: focus over clutter, knowing when a job is done, and separating the layers of memory. The resemblances are a good guide for design.Sun, 21 Jun 2026 00:00:00 GMTCognitive Least Privilege: your agent should know only what it needshttps://blog.felesh.ai/en/engineering/prompting/cognitive-least-privilege/https://blog.felesh.ai/en/engineering/prompting/cognitive-least-privilege/Any information that doesn't serve the agent's job both lowers accuracy and widens the attack surface. Borrow least privilege from security and extend it to what an agent knows.Sun, 21 Jun 2026 00:00:00 GMTCRM is a special case of a larger patternhttps://blog.felesh.ai/en/insights/crm-is-a-special-case/https://blog.felesh.ai/en/insights/crm-is-a-special-case/What we've built to manage customer relationships looks, at first glance, like a CRM. But underneath it lies a more general pattern that reaches beyond relationships.Sun, 21 Jun 2026 00:00:00 GMTYou're not buying software — you're buying an ecosystem that compoundshttps://blog.felesh.ai/en/insights/ecosystem-that-compounds/https://blog.felesh.ai/en/insights/ecosystem-that-compounds/A tool solves one problem once, and stays there. What we build is an ecosystem whose intelligence compounds with every addition.Sun, 21 Jun 2026 00:00:00 GMTEvaluate models on your own set, not the public leaderboardhttps://blog.felesh.ai/en/engineering/model-selection/evaluate-on-your-own-set/https://blog.felesh.ai/en/engineering/model-selection/evaluate-on-your-own-set/Public leaderboards tell you less about your work than you think. The reliable method: build a small eval set that represents your real task, and score the candidates on that.Sun, 21 Jun 2026 00:00:00 GMTEvent-driven by design: agent teams that don't lose messageshttps://blog.felesh.ai/en/engineering/agent-architecture/event-driven-agent-teams/https://blog.felesh.ai/en/engineering/agent-architecture/event-driven-agent-teams/When several agents work together, the biggest risk is lost messages and a collapsing chain. An event-driven architecture removes that risk with a few simple rules.Sun, 21 Jun 2026 00:00:00 GMTFine-tune, RAG, or prompt: which one, and what each costshttps://blog.felesh.ai/en/insights/fine-tune-rag-or-prompt/https://blog.felesh.ai/en/insights/fine-tune-rag-or-prompt/There are three ways to adapt a model to your need, and the wrong choice can get expensive. The difference is in what problem each one actually solves.Sun, 21 Jun 2026 00:00:00 GMTCommon fine-tuning pitfalls and how to debug themhttps://blog.felesh.ai/en/engineering/fine-tuning/fine-tuning-pitfalls/https://blog.felesh.ai/en/engineering/fine-tuning/fine-tuning-pitfalls/Most failed fine-tunes trace back to a few recurring patterns. If you know the signs, debugging becomes a simple checklist instead of guesswork.Sun, 21 Jun 2026 00:00:00 GMTFrom LLMs to agents: the complete journeyhttps://blog.felesh.ai/en/learn/llms-to-agents/from-llms-to-agents/https://blog.felesh.ai/en/learn/llms-to-agents/from-llms-to-agents/A language model, at heart, just guesses the next word. Here's how that simple guess becomes an agent once you add tools, memory, and planning — and where you actually need one.Sun, 21 Jun 2026 00:00:00 GMTFrom prompt engineering to context engineeringhttps://blog.felesh.ai/en/insights/from-prompt-to-context-engineering/https://blog.felesh.ai/en/insights/from-prompt-to-context-engineering/There was a time when the art of working with a model came down to writing a good prompt. But we believe the centre of gravity is shifting: from crafting one instruction to designing the whole context the model works in.Sun, 21 Jun 2026 00:00:00 GMTFrom a vague message to the right structured actionhttps://blog.felesh.ai/en/insights/from-vague-message-to-action/https://blog.felesh.ai/en/insights/from-vague-message-to-action/Humans speak vaguely, incompletely, messily. The hard job of an intelligent system is to pull the right action out of that mess — without asking the user to speak like a form.Sun, 21 Jun 2026 00:00:00 GMTHow agents remember: memory and knowledge representationhttps://blog.felesh.ai/en/learn/llms-to-agents/how-agents-remember/https://blog.felesh.ai/en/learn/llms-to-agents/how-agents-remember/A language model has no memory of its own and begins each conversation from scratch. What gives an agent memory is the layers built around the model.Sun, 21 Jun 2026 00:00:00 GMTHow an AI agent thinkshttps://blog.felesh.ai/en/learn/llms-to-agents/how-agents-think/https://blog.felesh.ai/en/learn/llms-to-agents/how-agents-think/An agent takes a goal, reasons, acts, and observes the result — repeating that loop until it's done. A simple mental model for how agents reason and decide.Sun, 21 Jun 2026 00:00:00 GMTLayered intelligence: when conversational AI meets background AIhttps://blog.felesh.ai/en/insights/layered-intelligence/https://blog.felesh.ai/en/insights/layered-intelligence/A good agent has to do two things well at once: fluent conversation with a human, and deep cognitive work behind the scenes. These are two different layers of intelligence — and we believe separating them is the key.Sun, 21 Jun 2026 00:00:00 GMTThe LoRA family: QLoRA, DoRA, and LoRA+ — which, and when?https://blog.felesh.ai/en/engineering/fine-tuning/lora-family-variants/https://blog.felesh.ai/en/engineering/fine-tuning/lora-family-variants/Since LoRA was introduced, several improved variants have appeared, each targeting one particular problem. Knowing them helps you pick the right one for each job.Sun, 21 Jun 2026 00:00:00 GMTLoRA hyperparameters demystified: rank, alpha, and what to sethttps://blog.felesh.ai/en/tutorials/lora-hyperparameters-demystified/https://blog.felesh.ai/en/tutorials/lora-hyperparameters-demystified/Fine-tuning with LoRA has a handful of key numbers, and once you know what they mean, choosing them is simple. This guide clears up rank, alpha, learning rate, and the rest.Sun, 21 Jun 2026 00:00:00 GMTMLP is the model's memory: where knowledge liveshttps://blog.felesh.ai/en/engineering/llm-infra/mlp-is-the-models-memory/https://blog.felesh.ai/en/engineering/llm-infra/mlp-is-the-models-memory/In a language model, attention layers route information, but the actual knowledge is stored elsewhere — in the MLP layers that make up the bulk of the model.Sun, 21 Jun 2026 00:00:00 GMTWhy we run our own models on home infrastructurehttps://blog.felesh.ai/en/insights/our-own-models/https://blog.felesh.ai/en/insights/our-own-models/Running models on our own infrastructure isn't just a technical choice; it's a conviction about independence, keeping data at home, and durability. This is our decision and its reasoning.Sun, 21 Jun 2026 00:00:00 GMTHire, coach, and learn to trust your AI employeeshttps://blog.felesh.ai/en/insights/owner-ai-employees/https://blog.felesh.ai/en/insights/owner-ai-employees/You install software; you hire an employee. That small difference changes everything about how you build a digital organization.Sun, 21 Jun 2026 00:00:00 GMTPagedAttention and continuous batching: how one server answers more usershttps://blog.felesh.ai/en/engineering/llm-infra/paged-attention-continuous-batching/https://blog.felesh.ai/en/engineering/llm-infra/paged-attention-continuous-batching/Two infrastructure tricks multiply the capacity of a language-model server: continuous batching and smart KV-cache management. Both come from one simple idea — don't waste resources.Sun, 21 Jun 2026 00:00:00 GMTA practical checklist for picking an LLM for your featurehttps://blog.felesh.ai/en/tutorials/pick-an-llm-checklist/https://blog.felesh.ai/en/tutorials/pick-an-llm-checklist/Choosing a model is less about leaderboards than about knowing your own need precisely. Six simple steps you can follow today.Sun, 21 Jun 2026 00:00:00 GMTHow LLM inference actually works: prefill vs decodehttps://blog.felesh.ai/en/tutorials/prefill-vs-decode/https://blog.felesh.ai/en/tutorials/prefill-vs-decode/Text generation has two phases with very different behaviour: one compute-bound, one memory-bound. Understanding the difference explains why the KV cache exists and why decode is slow.Sun, 21 Jun 2026 00:00:00 GMTWhen no single model is enough: the primary and verifier patternhttps://blog.felesh.ai/en/engineering/model-selection/primary-and-verifier/https://blog.felesh.ai/en/engineering/model-selection/primary-and-verifier/Sometimes a task has two critical demands that no single model satisfies at once. The answer isn't to accept a weak model; it's to combine two.Sun, 21 Jun 2026 00:00:00 GMTCommon failure modes in LLM systems — and how to catch themhttps://blog.felesh.ai/en/engineering/model-selection/production-failure-modes/https://blog.felesh.ai/en/engineering/model-selection/production-failure-modes/A language model fails in specific ways, not random ones. If you know these modes, you can catch them before your users do.Sun, 21 Jun 2026 00:00:00 GMTStop ranking LLMs, start profiling themhttps://blog.felesh.ai/en/engineering/model-selection/profile-dont-rank-llms/https://blog.felesh.ai/en/engineering/model-selection/profile-dont-rank-llms/A single number on a leaderboard won't tell you which model fits your job. A multi-dimensional profile of capabilities will.Sun, 21 Jun 2026 00:00:00 GMTPrompt engineering from zero: write a formal letter, not a text messagehttps://blog.felesh.ai/en/learn/llms-to-agents/prompt-engineering-from-zero/https://blog.felesh.ai/en/learn/llms-to-agents/prompt-engineering-from-zero/A language model can't read your mind. Everything you'd take for granted in a casual chat has to be stated outright in the prompt — the six components that make a good one, and the five mistakes beginners make.Sun, 21 Jun 2026 00:00:00 GMTDefending against prompt injection and jailbreaks — and reducing hallucinationhttps://blog.felesh.ai/en/engineering/prompting/prompt-injection-and-defense/https://blog.felesh.ai/en/engineering/prompting/prompt-injection-and-defense/When user input can change an agent's behaviour, security becomes a design problem. A few clear principles neutralise most of these attacks.Sun, 21 Jun 2026 00:00:00 GMTFine-tune your first model on free Colab: QLoRA in about 40 lineshttps://blog.felesh.ai/en/tutorials/qlora-on-free-colab/https://blog.felesh.ai/en/tutorials/qlora-on-free-colab/Fine-tuning a model doesn't have to need an expensive cluster. With QLoRA you can tune a small model on a free GPU in just a few dozen lines of code.Sun, 21 Jun 2026 00:00:00 GMTMaking AI reliable on real, messy documentshttps://blog.felesh.ai/en/insights/reliable-on-messy-documents/https://blog.felesh.ai/en/insights/reliable-on-messy-documents/On a clean document, any model looks good. The difference is in the messy tail — where real documents live.Sun, 21 Jun 2026 00:00:00 GMTA 70B model on one GPU: a practical guide to quantizationhttps://blog.felesh.ai/en/tutorials/run-a-70b-on-one-gpu/https://blog.felesh.ai/en/tutorials/run-a-70b-on-one-gpu/A seventy-billion-parameter model needs about 140 GB of memory at full precision. With quantization you can compress that same model until it fits on a single GPU — and keep quality almost untouched.Sun, 21 Jun 2026 00:00:00 GMTSave first, then publish: a simple rule for not losing workhttps://blog.felesh.ai/en/engineering/llm-infra/save-before-publish/https://blog.felesh.ai/en/engineering/llm-infra/save-before-publish/One of the most common hidden bugs in event-driven systems is publishing the news before the fact is recorded. The right order — save first, then publish — removes that bug at the root.Sun, 21 Jun 2026 00:00:00 GMTOne agent or many? When each one fitshttps://blog.felesh.ai/en/learn/llms-to-agents/single-agent-vs-multi-agent/https://blog.felesh.ai/en/learn/llms-to-agents/single-agent-vs-multi-agent/Sometimes a single agent is the best answer, and sometimes the work should be split across several. The deciding factor is the complexity of the task, not how advanced the architecture looks.Sun, 21 Jun 2026 00:00:00 GMTSmaller, faster, cheaper: the case against one big model for everythinghttps://blog.felesh.ai/en/insights/smaller-faster-cheaper/https://blog.felesh.ai/en/insights/smaller-faster-cheaper/It's tempting to hand every job to the most capable model. But most jobs need only one focused capability — and the right smaller model does it faster and cheaper.Sun, 21 Jun 2026 00:00:00 GMTThe rule layer: deterministic guardrails around a probabilistic modelhttps://blog.felesh.ai/en/engineering/model-selection/the-rule-layer/https://blog.felesh.ai/en/engineering/model-selection/the-rule-layer/A language model is probabilistic and sometimes errs. The way to make it reliable isn't to perfect the model; it's to build a deterministic layer that catches what slips through.Sun, 21 Jun 2026 00:00:00 GMTThe self-learning colleague: a roadmap, not a releasehttps://blog.felesh.ai/en/insights/the-self-learning-colleague/https://blog.felesh.ai/en/insights/the-self-learning-colleague/Today, an agent begins each conversation anew and learns nothing from yesterday's work. What follows is a vision, not a current capability: an agent that learns from its own work.Sun, 21 Jun 2026 00:00:00 GMTTracing a request through a multi-agent systemhttps://blog.felesh.ai/en/engineering/agent-architecture/trace-a-request-through-agents/https://blog.felesh.ai/en/engineering/agent-architecture/trace-a-request-through-agents/The best way to understand a multi-agent architecture is to follow a real request from start to finish. Let's trace a vague message, step by step, into a structured action.Sun, 21 Jun 2026 00:00:00 GMTWhat quantization actually does: precision loss and vector-space collapsehttps://blog.felesh.ai/en/engineering/llm-infra/what-quantization-actually-does/https://blog.felesh.ai/en/engineering/llm-infra/what-quantization-actually-does/Quantization means holding a model's weights with fewer bits. But what exactly does this loss of precision do to the model, and why are models so robust to it?Sun, 21 Jun 2026 00:00:00 GMTWhere LLM serving costs actually gohttps://blog.felesh.ai/en/insights/where-llm-serving-costs-go/https://blog.felesh.ai/en/insights/where-llm-serving-costs-go/If you open up the bill for serving a model, most of the cost is concentrated in one place. Understanding that concentration also clarifies the eternal 'build or buy' question.Sun, 21 Jun 2026 00:00:00 GMTWhy LoRA works: the intrinsic-dimensionality storyhttps://blog.felesh.ai/en/engineering/fine-tuning/why-lora-works/https://blog.felesh.ai/en/engineering/fine-tuning/why-lora-works/If a large model has billions of parameters, how can you tune it by training only a few small matrices? The answer is a subtle idea: the change you need has a small intrinsic dimension.Sun, 21 Jun 2026 00:00:00 GMTWhy we don't build chatbots: the two multi-agent paradigms we rejectedhttps://blog.felesh.ai/en/insights/why-not-chatbots/https://blog.felesh.ai/en/insights/why-not-chatbots/Most AI systems are built in one of two shapes — and both break at scale. This is the story of what we chose instead.Sun, 21 Jun 2026 00:00:00 GMTFrom worker to specialist: an agent that owns a domainhttps://blog.felesh.ai/en/engineering/agent-architecture/worker-to-specialist/https://blog.felesh.ai/en/engineering/agent-architecture/worker-to-specialist/The difference between an executing worker and a specialist is that the first does one job and steps aside, while the second owns a domain and carries its state over time.Sun, 21 Jun 2026 00:00:00 GMTWorking with LLM APIs: first calls, tokens, and structured outputhttps://blog.felesh.ai/en/tutorials/working-with-llm-apis/https://blog.felesh.ai/en/tutorials/working-with-llm-apis/Your first call to a language-model API is simpler than it looks. Once you know a few basics — roles, tokens, temperature, and structured output — the rest falls into place.Sun, 21 Jun 2026 00:00:00 GMTZero-shot, few-shot, or chain-of-thought: picking the right techniquehttps://blog.felesh.ai/en/tutorials/zero-shot-few-shot-cot/https://blog.felesh.ai/en/tutorials/zero-shot-few-shot-cot/There are three basic prompting techniques, and each has its place. Knowing when to reach for which matters more than the techniques themselves.Sun, 21 Jun 2026 00:00:00 GMT