All briefs

May 26, 2026

3 stories cleared the bar, led by Constraint Decay: The Fragility of LLM Agents in Back End Code Generation, llama.cpp server: fix checkpoints creation (PR #22929), and DeepSeek Reasonix — native coding agent with high caching and low cost.

Worth mentioning

Arxiv paper documenting 'constraint decay' — LLM agents progressively fail to maintain stated constraints (security requirements, API contracts, error handling rules) across multi-step backend code generation tasks. The longer and more complex the session, the more constraints are silently dropped. Directly relevant to anyone running agentic coding loops (nightly-librarian, second-brain). Practical mitigations: shorter sessions, explicit re-injection of constraints at each step, structured output validation. No vendor-provided fix exists — this is a fundamental model behavior pattern.
llama.cpp PR #22929 fixes KV cache checkpoint creation in the server — enabling save and restore of conversation state without reprocessing the full context. The Reddit discussion highlights the workflow value: discuss a problem for 50k tokens, then kick off a long implementation task and save your place. Particularly useful for solo devs running long agentic coding sessions on local models via llama.cpp or Ollama. Watch for this to ship in a stable llama.cpp release.
DeepSeek Reasonix is a coding agent built on DeepSeek V4 with aggressive KV caching to reduce cost per agent loop. More importantly, the related HN thread confirms DeepSeek made the V4 Pro pricing discount permanent. If you're evaluating API providers for agent workloads, DeepSeek V4 Pro is now a stable pricing option rather than a promotional one. Check current pricing against Anthropic/OpenAI for batch/cached workloads.