The Prompt Software Crisis: Engineering Challenges of Agentic AI Systems
The rise of large language models has created a prompt‑software crisis for Agentic AI, where fragile natural‑language prompts cause robustness, observability, and adaptability problems, and existing software‑engineering methods fail to address these issues, prompting the need for a new systematic framework.
The rapid emergence of large language models such as ChatGPT and DeepSeek has shifted the paradigm from traditional search engines to generative AI agents that can autonomously plan, reason, and execute complex tasks. However, many teams start building agents with frameworks like LangChain or AutoGen, quickly producing an 80‑point demo that becomes extremely difficult to evolve into a production‑grade 99‑point system.
This difficulty is compounded by the prompt software crisis : agents rely on brittle natural‑language prompts, and any change in the underlying LLM can cause massive prompt migration failures. Researchers have coined the term “Prompt Migration” to describe how carefully crafted prompts become ineffective after a model upgrade, exposing a methodological dead‑end where engineering is missing.
The crisis manifests in three core deficiencies:
Robustness loss : Without strict engineering constraints, the nondeterministic nature of LLMs leads to divergent macro‑planning and micro‑intent speculation, causing agents to deviate from intended goals.
Observability deficiency : Both single‑agent and multi‑agent systems hide internal reasoning behind input‑output black boxes, making debugging and reliability impossible.
Adaptivity gap : Existing agents lack mechanisms to capture, consolidate, and reuse runtime experience, preventing self‑evolution and causing repeated errors.
To address these challenges, researchers have tried importing traditional software‑engineering techniques:
Goal‑oriented requirements engineering (GORE) and hierarchical task networks (HTN) with runtime extensions like Tropos4AS, which enforce pre‑defined goal models but clash with the dynamic goal generation of LLM‑driven agents.
Multi‑agent system (MAS) architectures, BDI models, and MS‑HTN frameworks, which assume predictable symbolic agents and therefore cannot manage the emergent, black‑box behavior of modern LLM agents.
Control‑loop models such as MAPE‑K and Models@Runtime, which provide static or semi‑dynamic knowledge bases but lack genuine learning pipelines to absorb transient LLM outputs.
These attempts illustrate why traditional paradigms fail: they rely on design‑time specifications that cannot constrain runtime emergent behavior.
Consequently, the authors propose a dedicated Agentic AI software‑engineering framework composed of three complementary methodologies, each targeting one of the identified crises. The framework is demonstrated on a mobile GUI agent named Fairy , showing how the approach can be engineered into a real‑world system.
For a detailed exposition of the three methodologies and their implementation, see the accompanying paper (https://arxiv.org/abs/2509.20729).
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Smart Era Software Development
Committed to openness and connectivity, we build frontline engineering capabilities in software, requirements, and platform engineering. By integrating digitalization, cloud computing, blockchain, new media and other hot tech topics, we create an efficient, cutting‑edge tech exchange platform and a diversified engineering ecosystem. Provides frontline news, summit updates, and practical sharing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
