<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"><channel><title>Engineering — Felesh</title><description>Engineering</description><link>https://blog.felesh.ai/</link><item><title>Three lessons the human mind offers agent architecture</title><link>https://blog.felesh.ai/en/engineering/agent-architecture/brain-lessons-for-agent-architecture/</link><guid isPermaLink="true">https://blog.felesh.ai/en/engineering/agent-architecture/brain-lessons-for-agent-architecture/</guid><description>Several hard problems in agent design rhyme with how the human mind works: focus over clutter, knowing when a job is done, and separating the layers of memory. The resemblances are a good guide for design.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Cognitive Least Privilege: your agent should know only what it needs</title><link>https://blog.felesh.ai/en/engineering/prompting/cognitive-least-privilege/</link><guid isPermaLink="true">https://blog.felesh.ai/en/engineering/prompting/cognitive-least-privilege/</guid><description>Any information that doesn&apos;t serve the agent&apos;s job both lowers accuracy and widens the attack surface. Borrow least privilege from security and extend it to what an agent knows.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Evaluate models on your own set, not the public leaderboard</title><link>https://blog.felesh.ai/en/engineering/model-selection/evaluate-on-your-own-set/</link><guid isPermaLink="true">https://blog.felesh.ai/en/engineering/model-selection/evaluate-on-your-own-set/</guid><description>Public leaderboards tell you less about your work than you think. The reliable method: build a small eval set that represents your real task, and score the candidates on that.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Event-driven by design: agent teams that don&apos;t lose messages</title><link>https://blog.felesh.ai/en/engineering/agent-architecture/event-driven-agent-teams/</link><guid isPermaLink="true">https://blog.felesh.ai/en/engineering/agent-architecture/event-driven-agent-teams/</guid><description>When several agents work together, the biggest risk is lost messages and a collapsing chain. An event-driven architecture removes that risk with a few simple rules.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Common fine-tuning pitfalls and how to debug them</title><link>https://blog.felesh.ai/en/engineering/fine-tuning/fine-tuning-pitfalls/</link><guid isPermaLink="true">https://blog.felesh.ai/en/engineering/fine-tuning/fine-tuning-pitfalls/</guid><description>Most failed fine-tunes trace back to a few recurring patterns. If you know the signs, debugging becomes a simple checklist instead of guesswork.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>The LoRA family: QLoRA, DoRA, and LoRA+ — which, and when?</title><link>https://blog.felesh.ai/en/engineering/fine-tuning/lora-family-variants/</link><guid isPermaLink="true">https://blog.felesh.ai/en/engineering/fine-tuning/lora-family-variants/</guid><description>Since LoRA was introduced, several improved variants have appeared, each targeting one particular problem. Knowing them helps you pick the right one for each job.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>MLP is the model&apos;s memory: where knowledge lives</title><link>https://blog.felesh.ai/en/engineering/llm-infra/mlp-is-the-models-memory/</link><guid isPermaLink="true">https://blog.felesh.ai/en/engineering/llm-infra/mlp-is-the-models-memory/</guid><description>In a language model, attention layers route information, but the actual knowledge is stored elsewhere — in the MLP layers that make up the bulk of the model.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>PagedAttention and continuous batching: how one server answers more users</title><link>https://blog.felesh.ai/en/engineering/llm-infra/paged-attention-continuous-batching/</link><guid isPermaLink="true">https://blog.felesh.ai/en/engineering/llm-infra/paged-attention-continuous-batching/</guid><description>Two infrastructure tricks multiply the capacity of a language-model server: continuous batching and smart KV-cache management. Both come from one simple idea — don&apos;t waste resources.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>When no single model is enough: the primary and verifier pattern</title><link>https://blog.felesh.ai/en/engineering/model-selection/primary-and-verifier/</link><guid isPermaLink="true">https://blog.felesh.ai/en/engineering/model-selection/primary-and-verifier/</guid><description>Sometimes a task has two critical demands that no single model satisfies at once. The answer isn&apos;t to accept a weak model; it&apos;s to combine two.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Common failure modes in LLM systems — and how to catch them</title><link>https://blog.felesh.ai/en/engineering/model-selection/production-failure-modes/</link><guid isPermaLink="true">https://blog.felesh.ai/en/engineering/model-selection/production-failure-modes/</guid><description>A language model fails in specific ways, not random ones. If you know these modes, you can catch them before your users do.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Stop ranking LLMs, start profiling them</title><link>https://blog.felesh.ai/en/engineering/model-selection/profile-dont-rank-llms/</link><guid isPermaLink="true">https://blog.felesh.ai/en/engineering/model-selection/profile-dont-rank-llms/</guid><description>A single number on a leaderboard won&apos;t tell you which model fits your job. A multi-dimensional profile of capabilities will.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Defending against prompt injection and jailbreaks — and reducing hallucination</title><link>https://blog.felesh.ai/en/engineering/prompting/prompt-injection-and-defense/</link><guid isPermaLink="true">https://blog.felesh.ai/en/engineering/prompting/prompt-injection-and-defense/</guid><description>When user input can change an agent&apos;s behaviour, security becomes a design problem. A few clear principles neutralise most of these attacks.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Save first, then publish: a simple rule for not losing work</title><link>https://blog.felesh.ai/en/engineering/llm-infra/save-before-publish/</link><guid isPermaLink="true">https://blog.felesh.ai/en/engineering/llm-infra/save-before-publish/</guid><description>One of the most common hidden bugs in event-driven systems is publishing the news before the fact is recorded. The right order — save first, then publish — removes that bug at the root.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>The rule layer: deterministic guardrails around a probabilistic model</title><link>https://blog.felesh.ai/en/engineering/model-selection/the-rule-layer/</link><guid isPermaLink="true">https://blog.felesh.ai/en/engineering/model-selection/the-rule-layer/</guid><description>A language model is probabilistic and sometimes errs. The way to make it reliable isn&apos;t to perfect the model; it&apos;s to build a deterministic layer that catches what slips through.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Tracing a request through a multi-agent system</title><link>https://blog.felesh.ai/en/engineering/agent-architecture/trace-a-request-through-agents/</link><guid isPermaLink="true">https://blog.felesh.ai/en/engineering/agent-architecture/trace-a-request-through-agents/</guid><description>The best way to understand a multi-agent architecture is to follow a real request from start to finish. Let&apos;s trace a vague message, step by step, into a structured action.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>What quantization actually does: precision loss and vector-space collapse</title><link>https://blog.felesh.ai/en/engineering/llm-infra/what-quantization-actually-does/</link><guid isPermaLink="true">https://blog.felesh.ai/en/engineering/llm-infra/what-quantization-actually-does/</guid><description>Quantization means holding a model&apos;s weights with fewer bits. But what exactly does this loss of precision do to the model, and why are models so robust to it?</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>Why LoRA works: the intrinsic-dimensionality story</title><link>https://blog.felesh.ai/en/engineering/fine-tuning/why-lora-works/</link><guid isPermaLink="true">https://blog.felesh.ai/en/engineering/fine-tuning/why-lora-works/</guid><description>If a large model has billions of parameters, how can you tune it by training only a few small matrices? The answer is a subtle idea: the change you need has a small intrinsic dimension.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item><item><title>From worker to specialist: an agent that owns a domain</title><link>https://blog.felesh.ai/en/engineering/agent-architecture/worker-to-specialist/</link><guid isPermaLink="true">https://blog.felesh.ai/en/engineering/agent-architecture/worker-to-specialist/</guid><description>The difference between an executing worker and a specialist is that the first does one job and steps aside, while the second owns a domain and carries its state over time.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate></item></channel></rss>