Engineering — Felesh

Engineering — FeleshEngineeringhttps://blog.felesh.ai/Three lessons the human mind offers agent architecturehttps://blog.felesh.ai/en/engineering/agent-architecture/brain-lessons-for-agent-architecture/https://blog.felesh.ai/en/engineering/agent-architecture/brain-lessons-for-agent-architecture/Several hard problems in agent design rhyme with how the human mind works: focus over clutter, knowing when a job is done, and separating the layers of memory. The resemblances are a good guide for design.Sun, 21 Jun 2026 00:00:00 GMTCognitive Least Privilege: your agent should know only what it needshttps://blog.felesh.ai/en/engineering/prompting/cognitive-least-privilege/https://blog.felesh.ai/en/engineering/prompting/cognitive-least-privilege/Any information that doesn't serve the agent's job both lowers accuracy and widens the attack surface. Borrow least privilege from security and extend it to what an agent knows.Sun, 21 Jun 2026 00:00:00 GMTEvaluate models on your own set, not the public leaderboardhttps://blog.felesh.ai/en/engineering/model-selection/evaluate-on-your-own-set/https://blog.felesh.ai/en/engineering/model-selection/evaluate-on-your-own-set/Public leaderboards tell you less about your work than you think. The reliable method: build a small eval set that represents your real task, and score the candidates on that.Sun, 21 Jun 2026 00:00:00 GMTEvent-driven by design: agent teams that don't lose messageshttps://blog.felesh.ai/en/engineering/agent-architecture/event-driven-agent-teams/https://blog.felesh.ai/en/engineering/agent-architecture/event-driven-agent-teams/When several agents work together, the biggest risk is lost messages and a collapsing chain. An event-driven architecture removes that risk with a few simple rules.Sun, 21 Jun 2026 00:00:00 GMTCommon fine-tuning pitfalls and how to debug themhttps://blog.felesh.ai/en/engineering/fine-tuning/fine-tuning-pitfalls/https://blog.felesh.ai/en/engineering/fine-tuning/fine-tuning-pitfalls/Most failed fine-tunes trace back to a few recurring patterns. If you know the signs, debugging becomes a simple checklist instead of guesswork.Sun, 21 Jun 2026 00:00:00 GMTThe LoRA family: QLoRA, DoRA, and LoRA+ — which, and when?https://blog.felesh.ai/en/engineering/fine-tuning/lora-family-variants/https://blog.felesh.ai/en/engineering/fine-tuning/lora-family-variants/Since LoRA was introduced, several improved variants have appeared, each targeting one particular problem. Knowing them helps you pick the right one for each job.Sun, 21 Jun 2026 00:00:00 GMTMLP is the model's memory: where knowledge liveshttps://blog.felesh.ai/en/engineering/llm-infra/mlp-is-the-models-memory/https://blog.felesh.ai/en/engineering/llm-infra/mlp-is-the-models-memory/In a language model, attention layers route information, but the actual knowledge is stored elsewhere — in the MLP layers that make up the bulk of the model.Sun, 21 Jun 2026 00:00:00 GMTPagedAttention and continuous batching: how one server answers more usershttps://blog.felesh.ai/en/engineering/llm-infra/paged-attention-continuous-batching/https://blog.felesh.ai/en/engineering/llm-infra/paged-attention-continuous-batching/Two infrastructure tricks multiply the capacity of a language-model server: continuous batching and smart KV-cache management. Both come from one simple idea — don't waste resources.Sun, 21 Jun 2026 00:00:00 GMTWhen no single model is enough: the primary and verifier patternhttps://blog.felesh.ai/en/engineering/model-selection/primary-and-verifier/https://blog.felesh.ai/en/engineering/model-selection/primary-and-verifier/Sometimes a task has two critical demands that no single model satisfies at once. The answer isn't to accept a weak model; it's to combine two.Sun, 21 Jun 2026 00:00:00 GMTCommon failure modes in LLM systems — and how to catch themhttps://blog.felesh.ai/en/engineering/model-selection/production-failure-modes/https://blog.felesh.ai/en/engineering/model-selection/production-failure-modes/A language model fails in specific ways, not random ones. If you know these modes, you can catch them before your users do.Sun, 21 Jun 2026 00:00:00 GMTStop ranking LLMs, start profiling themhttps://blog.felesh.ai/en/engineering/model-selection/profile-dont-rank-llms/https://blog.felesh.ai/en/engineering/model-selection/profile-dont-rank-llms/A single number on a leaderboard won't tell you which model fits your job. A multi-dimensional profile of capabilities will.Sun, 21 Jun 2026 00:00:00 GMTDefending against prompt injection and jailbreaks — and reducing hallucinationhttps://blog.felesh.ai/en/engineering/prompting/prompt-injection-and-defense/https://blog.felesh.ai/en/engineering/prompting/prompt-injection-and-defense/When user input can change an agent's behaviour, security becomes a design problem. A few clear principles neutralise most of these attacks.Sun, 21 Jun 2026 00:00:00 GMTSave first, then publish: a simple rule for not losing workhttps://blog.felesh.ai/en/engineering/llm-infra/save-before-publish/https://blog.felesh.ai/en/engineering/llm-infra/save-before-publish/One of the most common hidden bugs in event-driven systems is publishing the news before the fact is recorded. The right order — save first, then publish — removes that bug at the root.Sun, 21 Jun 2026 00:00:00 GMTThe rule layer: deterministic guardrails around a probabilistic modelhttps://blog.felesh.ai/en/engineering/model-selection/the-rule-layer/https://blog.felesh.ai/en/engineering/model-selection/the-rule-layer/A language model is probabilistic and sometimes errs. The way to make it reliable isn't to perfect the model; it's to build a deterministic layer that catches what slips through.Sun, 21 Jun 2026 00:00:00 GMTTracing a request through a multi-agent systemhttps://blog.felesh.ai/en/engineering/agent-architecture/trace-a-request-through-agents/https://blog.felesh.ai/en/engineering/agent-architecture/trace-a-request-through-agents/The best way to understand a multi-agent architecture is to follow a real request from start to finish. Let's trace a vague message, step by step, into a structured action.Sun, 21 Jun 2026 00:00:00 GMTWhat quantization actually does: precision loss and vector-space collapsehttps://blog.felesh.ai/en/engineering/llm-infra/what-quantization-actually-does/https://blog.felesh.ai/en/engineering/llm-infra/what-quantization-actually-does/Quantization means holding a model's weights with fewer bits. But what exactly does this loss of precision do to the model, and why are models so robust to it?Sun, 21 Jun 2026 00:00:00 GMTWhy LoRA works: the intrinsic-dimensionality storyhttps://blog.felesh.ai/en/engineering/fine-tuning/why-lora-works/https://blog.felesh.ai/en/engineering/fine-tuning/why-lora-works/If a large model has billions of parameters, how can you tune it by training only a few small matrices? The answer is a subtle idea: the change you need has a small intrinsic dimension.Sun, 21 Jun 2026 00:00:00 GMTFrom worker to specialist: an agent that owns a domainhttps://blog.felesh.ai/en/engineering/agent-architecture/worker-to-specialist/https://blog.felesh.ai/en/engineering/agent-architecture/worker-to-specialist/The difference between an executing worker and a specialist is that the first does one job and steps aside, while the second owns a domain and carries its state over time.Sun, 21 Jun 2026 00:00:00 GMT