The Constraint Infrastructure Growing on Alibaba Cloud Agent Infra

This article introduces the constraint infrastructure on Alibaba Cloud Agent Infra, which ensures AI agents' runtime behaviors are controllable, observable, and continuously evolving.

By Wangchen

Agent = Model + Harness. This formula has gained market consensus because it very concisely summarizes the relationship among Agent, Model, and Harness, and provides a clear direction of investment for improving Agent quality.

However, Harness only answers directional questions. In practice, the engineering team faces a set of concrete platform requirements: How are constraint rules declared and version-managed? How do rule changes take hot effect in seconds without restarting services? How do constraint execution points cover every stage of the Agent lifecycle, from model calling to task orchestration to O&M observation? How are abnormal behaviors detected in real-time to trigger interception or degradation? How do the constraint rules themselves continuously iterate along with the evolution of Agent capabilities, rather than becoming obsolete once written? These requirements exceed the scope that the methodology itself can cover; they need more concrete platform capabilities and infrastructure to support them. This is far more complex than the simple formula Agent = Model + Harness.

The article "What Does Alibaba Cloud Agent Infra Look Like" answered the six infrastructure capabilities of Alibaba Cloud Agent Infra, including Agent Runtime, Agent Orchestration, Agent Governance, Agent Memory, Agent Data Plane, and Agent Security, to address unpredictable burst loads of Agents, large-scale dynamic orchestration of Agents, short life cycles of Agents, complex Agent data modalities and storage forms, Agent dynamic environment dependencies, and task-level safety and control of Agents. Therefore, we have further decomposed Harness.

Harness = Define Constraints + Validate Outputs + Establish Feedback Loop

Among them, the constraint infrastructure refers to the platform capabilities and infrastructure within the Agent Infra that support Harness in constraining Agent behavior.

1. What is Constraint Infrastructure

The definition of constraint infrastructure is: the infrastructure layer that systematically guarantees the runtime behavior boundaries of Agents, responsible for transforming the constraint principles in the Harness methodology into programmable, deployable, and operational engineering entities.

Specifically, constraint infrastructure needs to provide the following capabilities: declarative definition and version control of constraint rules, dynamic distribution and hot activation of rules during runtime, unified embedding of constraint execution points at all stages of the Agent lifecycle, real-time detection, interception, and automatic restoration of non-compliant behaviors, as well as observability and auditability of constraint effectiveness.

Constraint infrastructure does not replace existing Agent development frameworks or inference engines; it is a governance layer running on top of these components. Development frameworks focus on how Agents complete tasks, while constraint infrastructure focuses on which boundaries Agents must not cross while completing those tasks.

2. The Technical Stack of Constraint Infrastructure

According to the lifecycle of an Agent request, the constraint infrastructure can be disassembled into four layers. Each layer corresponds to a clear set of constraint responsibilities.

2.1 Constraining Model Invocation Behavior: Who Can Call Which Models and How Much

Agent systems typically need to connect to multiple model providers, mixing models of different capability levels and costs. The constraint responsibility of the model access layer is to centrally manage and control the access policies of all model calls at the gateway level, rather than having each Agent application implement its own rate limiting and authentication.

As Alibaba Cloud's open-source AI gateway, Higress plays a core role in this layer. It provides model-level traffic control capabilities: unified routing of multiple model providers (dispatching requests to appropriate models based on task type), token-based usage limiting (unlike traditional QPS rate limiting, AI scenarios require quota management based on actual token consumption), and access policies for model calls.

Alibaba Cloud API Gateway's AI gateway capability further provides enterprise-level features on top of the open-source version of Higress: fully managed identity authentication and authorization, multi-dimensional control of calling quotas (by user / by application / by model), and mandatory validation of output format contracts. For the constraint infrastructure at the traffic ingress, model access policies are infrastructure-level concerns that should sink to the gateway for unified processing, rather than being scattered in the code of each Agent application.

2.2 Constraining Agent Runtime Behavior: Single-Agent Boundary Control and Multi-Agent Collaborative Governance

The traffic ingress layer controls model invocation behaviors, while the Agent runtime behavior controls the execution process of the Agent itself. This layer needs to answer three questions: How is the Agent's Prompt behavior centrally managed and dynamically adjusted? How is the execution process of a single Agent observed and constrained? How is multi-agent collaboration made access-controlled and behavior-auditable?

Prompt as the core carrier of behavioral constraints. Prompt is the most direct means to define an Agent's behavior boundaries, but in a production environment, Prompts cannot be hardcoded strings in code. MSE Nacos AI manages Prompts as first-class configuration assets, providing a centralized asset library (unified storage of Prompts for all Agents), semantic version control (saving 30-day historical versions by default, supporting one-click rollback), sub-second hot updates (taking effect immediately after modification without restarting the application), and canary release strategies (canary by IP or by tag to reduce change risks). When the behavior boundary of a certain type of Agent needs to be tightened, the governance team modifies the Prompt and gradually verifies it through a canary release, without needing to redeploy business code throughout the process.

Observation-driven dynamic constraints. Static rules can only cover known patterns, whereas Agent abnormal behaviors in production environments (dead loops, output drift, tool abuse) often need to be intercepted based on runtime data. AgentLoop provides full-link tracing capabilities for LLM applications, automatically collecting golden indicators such as Token consumption, TTFT (Time to First Token), and TPOT (Time per Output Token), and connecting the complete path from user request to model inference based on the OpenTelemetry GenAI extension specification. On this basis, AgentLoop's evaluation system supports automated verification in three scenarios: basic LLM dialogue evaluation, RAG process evaluation, and Agent tool call evaluation, covering toxic content detection, safety review, content relevance, and tool selection accuracy. These evaluation capabilities constitute the engineering implementation of the "Validate Outputs" phase in Harness.

Collaborative governance of multi-agents. When multiple Agents collaborate, the complexity of constraints rises significantly. AgentTeams, as an enterprise-grade multi-agent governance and collaboration platform, implements collaborative orchestration through a Leader-Worker architecture: the Leader is responsible for task decomposition and dispatch, and the Worker is responsible for execution. This architecture itself is a constraint design; Workers can only execute tasks dispatched by the Leader and cannot expand their action scope on their own. At the security level, AgentTeams achieves fine-grained permission control based on a zero-trust security model, where all communications between Agents are decoupled at the protocol level based on the Matrix protocol, supporting unified take-over of multi-source heterogeneous Agents. Instance-level resource isolation ensures that Agents from different business scenarios do not interfere with each other. The monitoring dashboard provides real-time visualization across three dimensions: Worker statistics, tasks and teams, and model calls, allowing the effectiveness of constraint execution in multi-agent environments to be continuously observed.

2.3 Dynamic Management of Constraint Rules and Task Orchestration

2.1 and 2.2 defined "what to constrain," while 2.3 solves "how to dynamically manage constraint rules" and "how to run Agent tasks within resource boundaries."

Four Registries build a unified governance plane for AI assets. Starting from version 3.0, MSE Nacos AI extends the concept of registry from microservices to AI scenarios, constructing four major registries: Prompt Registry, MCP Registry, Agent Registry (A2A), and Skill Registry. The MCP Registry supports upgrading legacy HTTP interfaces to the MCP protocol with zero code modification and provides hot-update capabilities for Tool metadata. When the description or parameter definition of a tool needs to be modified, the configuration takes effect automatically, and all Agents using that tool immediately obtain the updated metadata. The Agent Registry achieves registration and discovery among Agents based on the A2A protocol, supporting namespace isolation and multi-version management. The Skill Registry provides a review mechanism for skills before they go online and sub-second rollback capabilities, ensuring that new skills will not be called by Agents without validation.

Task orchestration and resource boundary control. MSE AI Task Scheduling extracts scheduled scheduling from the inside of each Agent and manages it centrally through the platform. It provides a four-level priority queue (low / medium / high / very high), allowing high-priority tasks to preempt resources from low-priority tasks; failed auto-retries support configuring the number of retries and intervals, which can be dynamically adjusted in the console; timeout alerts and failure alerts provide timely anomaly notifications. Visual DAG orchestration supports defining task dependencies across applications, avoiding deadlocks and cyclic waiting.

Real-time routing and automatic response of constraint events. When the execution points at various layers of constraints (Higress gateway, AgentLoop evaluation, AgentTeams permission validation) detect non-compliance, a unified channel is needed to deliver the events to the correct handling process. EventBridge, as a fully managed Serverless event bus, takes on this role. It determines the severity level and type of events through declarative filtering rules (value matching, prefix matching, range matching), and then routes them to corresponding handling targets: low-risk events are delivered to AgentLoop to record audit logs, medium-risk events trigger Nacos configuration canary rollback or MSE task suspension, and high-risk events are routed to manual approval processes. The entire link from detection to response is entirely event-driven, without requiring each constraint execution point to hardcode handling logic separately. A new response strategy only needs an additional filtering rule and delivery target on the EventBridge side to take effect.

2.4 Observability of Effectiveness: Closed-loop Guarantee of Constraints

The first three layers solve the questions of "what to constrain" and "how to constrain," while the observability of effectiveness solves "whether constraints are truly effective."

UModel: Ontology-based modeling of the O&M world. UModel is Alibaba Cloud's observability data modeling framework based on graph models, and its design philosophy originates from the ontology of information science. It builds a topology map of the IT system through EntitySet (entity sets, such as apm.service, k8s.pod, ecs.instance) and EntitySetLink (entity relationships, such as service_runs_on_pod, service_depends_on_rds), and then binds observability data such as metrics, logs, traces, and events to entities through TelemetryDataSet and DataLink.

For constraint infrastructure, the value of UModel lies in providing topology-aware constraint observation. To take a concrete diagnostic path as an example: when a service's QPS drops abnormally, the system detects abnormal metrics from apm.service, traces back to the owning k8s.pod through EntitySetLink, associates with Pod logs through DataLink to discover an OOM error, and then traces to k8s.node → ecs.instance to locate the root cause of insufficient memory. If this anomaly was caused by a misconfiguration of a constraint rule (for example, the token quota was set too low, triggering a circuit breaker), UModel's topological relationship can help quickly locate the impact scope, which upstream and downstream services are affected, and which other Agents share the same resource.

StarOps: A complete Agent practice built on constraint infrastructure. The StarOps all-domain intelligent O&M platform itself is an Agent system running on Alibaba Cloud's constraint infrastructure. Its three key product capabilities (intelligent assistant, long-term task Mission, digital employee) demonstrate how constraint infrastructure plays its role in a real production environment.

Among them, the digital employee is a typical constrained Agent. Its permission boundaries are configured through RAM roles (the principle of least privilege), its behavioral constraints are defined through Markdown rules (constraints as code), its operation impact scope is perceived through UModel topology map (observability), and it pauses to wait for manual confirmation during high-risk changes through the Human-in-the-Loop mechanism. These mechanisms correspond precisely to the design principles and technical stack layer capabilities described above in this article. StarOps proves one thing: when the layer capabilities of the constraint infrastructure work together, Agents can undertake high-risk O&M tasks in production environments while keeping their behaviors controllable, actions auditable, and risks mitigated.

3. The Data Flywheel of Constraint Infrastructure

Constraints are not static. Business is changing, models are upgrading, and the behavior patterns of Agents are also continuously evolving. A set of constraint rules that is never iterated once written will soon become obsolete shackles or filters with holes. Constraint infrastructure needs a self-evolving mechanism.

AgentLoop's Pipeline data processing engine provides the engineering implementation of this mechanism. The Pipeline contains 6 categories and 13 processing nodes (field selection, regular expression handling, filtering, three-level deduplication, diversity sampling, AI evaluation, clustering, output configuration), automatically converting massive runtime logs generated by Agents into high-quality datasets, reducing processing costs by 97% compared to manual labor.

These datasets are not only used for model training but also directly drive the iteration of constraint rules. AgentLoop promotes the concept of Evaluation-Driven Development (EDD): observability data is continuously compiled into evaluation datasets, and evaluation results expose blind spots (which abnormal behaviors are not covered by existing rules) and misconceptions (which normal behaviors are mistakenly intercepted) of constraint rules. The governance team adjusts the rules accordingly and verifies the effects through Nacos's canary release mechanism. The whole process forms a closed loop of "observability → evaluation → optimization → deployment → observability," and the constraint infrastructure continuously evolves in this loop.

4. Landing Path and Engineering Challenges

Where to Start

Implementing constraint infrastructure is a step-by-step process. Based on Alibaba Cloud's cloud-native Agent Infra product portfolio, the recommended landing sequence is:

First step, connect to AgentLoop observability. See the problem before applying any constraints. Collect full-link data of the Agent through OpenTelemetry probes, establish a baseline for golden indicators, and understand the actual behavior patterns of the current Agent. This step requires the least investment (zero-code probe access) but provides data support for subsequent constraint designs.
Second step, manage Prompts and MCPs through MSE Nacos AI. Centralize Prompts and tool configurations scattered in code to Nacos, establishing version management and canary release capabilities. This step solves the problem of "manageable constraint rules."
Third step, integrate AgentTeams to achieve multi-agent governance. As the number of Agents grows and collaboration scenarios increase, introduce unified permission management and collaborative orchestration. The Leader-Worker architecture establishes a structured constraint framework for multi-agent scenarios.
Fourth step, drive the co-evolution of constraints and capabilities using the AgentLoop self-evolving platform. Constraints and capabilities are not in opposition. AgentLoop's evaluation system exposes two types of information simultaneously: blind spots of constraint rules (which abnormal behaviors were leaked) and misconceptions (which normal behaviors were mistakenly intercepted). Relaxing mistaken interceptions means releasing Agent capability, while tightening leaked behaviors means reinforcing behavioral boundaries. Meanwhile, AgentLoop automatically extracts success patterns from high-quality Trajectories to compile into an experience library, dynamically injecting them into the Agent context so that the Agent can complete more tasks within the same constraint boundaries. The entire process forms a double helix of "observability → evaluation → optimize constraint rules / optimize Agent capabilities → re-observe," and the Agent becomes smarter in use within controlled boundaries.

Balancing Constraints and Latency

Every constraint checkpoint introduces additional latency. In Agent scenarios, distinguishing between two types of constraints is crucial: synchronous constraints (which must be completed in the request path, such as identity authentication and tool whitelist validation) and asynchronous constraints (which can be executed after the fact, such as output auditing and compliance logging). The approach of the Higress AI gateway is to quickly complete coarse-grained authentication and rate limiting at the request entry, while processing fine-grained token usage statistics asynchronously. The timeout circuit breaker of MSE AI Task Scheduling uses another approach: instead of performing fine-grained checks at every step, it sets a time boundary for the entire task and terminates it upon timeout.

Testing of Constraint Rules

Errors in the constraint rules themselves can be more dangerous than having no constraints; mistakenly intercepting normal behaviors leads to degradation of Agent capability, while leaking abnormal behaviors makes constraints practically non-existent. MSE Nacos AI's canary release mechanism provides the engineering foundation for testing constraint rules: new rules are first rolled out to a small number of Agent instances, and the false interception rate and leakage rate are observed in combination with AgentLoop's evaluation data. Full release is performed once verified. For high-risk constraint changes, shadow mode can be adopted, where rules are executed but do not actually intercept, only recording judgment results for accuracy verification before going online.

5. Summary

Returning to the starting formula. Harness = Define Constraints + Validate Outputs + Establish Feedback Loop. What the constraint infrastructure does is map these three steps respectively to engineering-deliverable platform capabilities.

Defining constraints is carried out by MSE Nacos AI's Prompt management, MCP Registry, and Skill Registry, making constraint rules declarable, version-controlled, and canary-releasable. Validating outputs is carried out by AgentLoop's evaluation system and Higress AI gateway's output contract validation, making validation automatable and quantifiable. Establishing feedback loops is carried out by AgentLoop's self-evolving flywheel (observability → evaluation → optimization → re-observation), allowing constraints and Agent capabilities to co-evolve within the same loop. StarOps, as an SRE Agent, serves as the complete validation of this combination in a production environment.

Constraint infrastructure dynamically cuts across the Agent Governance and Agent Security domain capabilities among the six core capabilities of Agent Infra. It is not an independent new product, but an organic combination of existing infrastructure capabilities under the perspective of "runtime constraints."

The constraint infrastructure provides a reproducible path: start with observation to establish cognition, express constraints with declarative rules, enforce management through layered execution points, and then drive the co-evolution of constraints and capabilities with a data flywheel.

Community

The Constraint Infrastructure Growing on Alibaba Cloud Agent Infra

1. What is Constraint Infrastructure

2. The Technical Stack of Constraint Infrastructure

2.1 Constraining Model Invocation Behavior: Who Can Call Which Models and How Much

2.2 Constraining Agent Runtime Behavior: Single-Agent Boundary Control and Multi-Agent Collaborative Governance

2.3 Dynamic Management of Constraint Rules and Task Orchestration

2.4 Observability of Effectiveness: Closed-loop Guarantee of Constraints

3. The Data Flywheel of Constraint Infrastructure

4. Landing Path and Engineering Challenges

Where to Start

Balancing Constraints and Latency

Testing of Constraint Rules

5. Summary

Read previous post:

Alibaba Cloud Native Community

You may also like

Comments

Alibaba Cloud Native Community

Related Products

Alibaba Cloud Model Studio

API Gateway

AgentBay

Qwen