By Liu Jun
I have previously conducted an in-depth analysis of OpenClaw and the underlying Harness Engineering practices, and conceived a "Harness Framework" to explain how to apply this methodology to enterprise-grade agent development.
The good news is that AgentScope Java 1.1.0 has officially been released. In this milestone version, we have fully implemented the “Harness Framework” plan. Developers can use version 1.1 to quickly practice Harness, building local applications such as XxxClaw and Coding Agent for personal productivity, or enterprise-grade applications such as DataAgent and SRE Agent for distributed scenarios.
In this release, AgentScope Java 1.1.0 delivers four core capabilities:
Over the past year, agent products such as OpenClaw, Hermes, and Claude Code have sparked a wave of enthusiasm, and the Harness Engineering ideas behind them have also become popular—using structured workspaces, context management, and tool conventions to replace the primitive “every conversation fights on its own” usage pattern. More and more teams have started bringing this approach into their own Agent development.
However, people who actually try to implement it often find that things start to break once they reach the “enterprise” level. We summarized the five obstacles most commonly mentioned by frontline developers:
The root cause of these five problems is the same: personal-assistant-style agents and enterprise-grade agents are two different engineering forms, and using the same assumptions to handle both scenarios will inevitably run into walls.
From the perspective of deployment form: a personal assistant is single-user, single-process, and all state can live on one machine; enterprise agents need horizontal scaling, multi-tenancy, and uninterrupted service, so state must be distributable and recoverable. From the perspective of security boundaries: local tool execution carries little risk, but arbitrary Shell execution in production is a serious attack surface; sandboxes and permission boundaries are not “optional optimizations” but “prerequisites for going live.” From the perspective of operability and observability: when a personal tool has a problem, you just check the logs yourself; enterprise services require persistent memory, auditable sessions, and traceable state changes. From the perspective of token economics: personal users are not sensitive to latency and cost, but in enterprise scenarios every wasted context re-prompt is a real expense.
So, is there a framework that lets you “write one set of logic and switch forms as needed”? The Harness module in AgentScope Java 1.1.0, with its entry class HarnessAgent, is designed around exactly that goal: it does not replace the reasoning loop of ReActAgent; instead, it inserts hooks at key moments in the loop, fills in a set of tool and workspace conventions, packages the engineering answers to the five problems above, and lets you focus on the Agent’s business logic rather than the infrastructure.
The design philosophy of AgentScope Harness can be summarized in one sentence: package the engineering answers to “what happens next round, what happens tomorrow, what happens when the context explodes, and what happens when state is lost,” instead of having every Agent project invent them from scratch.
At the implementation level, two core pillars support the entire framework.
Harness introduces the concept of a workspace for each Agent—a structured directory that holds everything the Agent needs to run persistently: persona definitions (AGENTS.md), long-term memory (MEMORY.md), domain knowledge (knowledge/), reusable skills (skills/), sub-agent specifications (subagents/), and session history (agents/<agentId>/).
This is not a new idea—OpenClaw and Hermes both found in practice that giving an Agent a stable “workbench” is much more effective than reinitializing it every time. Harness systematizes that intuition: the workspace is the Agent’s single source of truth, and all state reads and writes revolve around the workspace rather than being scattered across code, databases, and memory.
In actual operation, before each inference begins, WorkspaceContextHook automatically injects key files such as AGENTS.md, MEMORY.md, and knowledge/ into the system prompt, ensuring the Agent’s persona and knowledge are fully present in every round. After the Agent finishes running, MemoryFlushHook extracts the new facts from the conversation and writes them into the memory file, while the background MemoryConsolidator periodically merges the running log into refined long-term memory. The workspace keeps evolving throughout the conversation, and each run leaves the Agent a little “more aware” of the user and the task than the last one.
The workspace idea is beautiful, but there is one real-world constraint: local disk directories do not work in distributed scenarios. Multiple Pods each have their own local disk, so where should MEMORY.md be written? Which replica’s version is the “real” one?
AgentScope Harness solves this with the AbstractFilesystem abstraction layer. From the upper layer’s perspective, the Agent only needs to call unified interfaces such as read/write/ls/grep and does not care where the “file” actually lives; from the lower layer’s perspective, it can adapt to any medium such as local disks, remote object storage (OSS), KV databases (Redis), or sandbox file systems, and it can even route different paths to different backends through CompositeFilesystem.

As shown above, based on the AbstractFilesystem interface, AgentScope provides three built-in extension implementations corresponding to three usage modes.
The three implementations and modes will be explained in detail later.
In AgentScope 1.1, the workspace is the core abstraction of the agent. We use AbstractFilesystem as the physical implementation carrier of the workspace, and all file operations, command execution, and memory management tools use AbstractFilesystem as the standard entry point.

Based on this file-system abstraction layer, the AgentScope framework directly brings three major engineering capabilities to agent development:
Security and isolation
execute tool only appears when the backend implements the sandbox interfaceDistributed deployment
MEMORY.md and session logs are routed through a Remote backend to shared storage, naturally achieving cross-node synchronizationIsolationScope (SESSION / USER / AGENT / GLOBAL) with RuntimeContext, you can implement session-level isolation, user-level sharing, and other tenancy strategies without changing the codeSub-agent and asynchronous task support
The three scenarios below cover the typical development patterns from personal to enterprise. They are not either/or options; rather, they represent three different complexity paths—you can start with the simplest one and migrate gradually as requirements evolve.
Characteristics of this scenario: single user, local execution, needs to manipulate local files or run scripts; typical products include personal assistants, note bots, and local Coding Agents.
The core need in this scenario is to “let the Agent truly understand me and remember me,” not just be a stateless Q&A machine. Harness’s value here is that the AGENTS.md in the workspace defines the Agent’s persona and behavioral preferences, new facts are automatically distilled into memory after the conversation ends, and when you open it again the Agent still knows you and remembers the last progress. Skills and domain knowledge also live in the workspace, so they can be edited and adjusted at any time without touching the code.
Under local deployment you can also enable Shell execution, allowing the Agent to run scripts and operate the file system directly. This is one of the most attractive aspects of OpenClaw-style products. On top of that, Harness adds a layer of “continuous evolution”: the workspace acts like the Agent’s brain, becoming more experienced with every conversation.
Core capabilities that AgentScope Harness provides in this scenario:
AGENTS.md to adjust persona, add skills in the skills/ directory, and changing one file is equivalent to upgrading the Agent once—no rebuild or redeploy required.Characteristics of this scenario: serves multiple users, needs to execute SQL / Python / Shell, tasks are long-running, inputs come from untrusted external users, and it also requires recoverable multi-turn state and consistent user experience across replicated deployments.
The biggest risk in this scenario is execution safety—user-driven code must not run unrestricted on the server. Harness’s sandbox mechanism confines both file operations and command execution to an isolated environment, leaving the server process unaffected. More importantly, the sandbox is not “use it once and throw it away”; after each conversation round ends, the sandbox state is persisted and brought back for the next round, so users do not lose their work just because the service restarts or switches nodes.
When deployed with multiple replicas, a user’s long-term memory (the Agent’s accumulated understanding of that user) can be stored in shared storage, so no matter which node receives the request, the Agent sees the same memory. Long-running analysis tasks can be split into multiple sub-agents and executed in parallel, with the main Agent only coordinating and summarizing rather than blocking the entire time.
Core capabilities that AgentScope Harness provides in this scenario:
Characteristics of this scenario: tasks are mainly completed by calling business APIs (placing orders, querying, approving, etc.), no Shell execution is needed on the server, but multi-instance operation, persistent session state, and cross-user knowledge sharing are required.
The core need in this scenario is stability and safety—an online service cannot afford to fail because an Agent called a Shell command it should never have called. Harness’s value here is that if sandbox execution is not configured, the framework simply does not expose the Shell tool by default. The Agent can only interact with the outside world through explicitly defined business tools, and the security boundary is determined by configuration rather than developer self-discipline.
Session state and memory can be stored in remote storage, and multiple service instances can share the same set of user memories. No matter which entry point the user uses to start a new conversation, the Agent can continue from the previous context. When multiple subtasks need to be processed in parallel (for example, checking inventory, calculating discounts, and generating summaries at the same time), the sub-agent mechanism still applies, and it can integrate with an external task queue to enable cross-process task management.
Core capabilities that AgentScope Harness provides in this scenario:
This section explains the core capabilities of AgentScope Harness from the user’s perspective: what it is, how it works, and how you should think about configuration.
Getting started with Harness only takes three steps: add the dependency, prepare the workspace, then build and call the Agent.
1. Add the dependency
<dependency>
<groupId>io.agentscope</groupId>
<artifactId>agentscope-harness</artifactId>
<version>${agentscope.version}</version>
</dependency>
2. Prepare the workspace
Choose a directory on disk as the workspace and create AGENTS.md inside it. This is not an “optional initialization step,” but the core entry point of Harness—the Agent’s persona, memory, skills, and sub-agent specifications all revolve around this directory. A few lines of conventions in AGENTS.md are enough, and it can continue evolving as it is used.
3. Build HarnessAgent and call it
HarnessAgent agent = HarnessAgent.builder()
.name("my-agent")
.model(model)
.workspace(Paths.get(".agentscope/workspace"))
.compaction(CompactionConfig.builder() // Recommended to configure from the start to avoid context overflow in production
.triggerMessages(50)
.keepMessages(20)
.build())
.build();
RuntimeContext ctx = RuntimeContext.builder()
.sessionId("user-session-001") // Multiple calls with the same sessionId automatically continue the context
.userId("alice") // Required in multi-user scenarios for namespace isolation
.build();
Msg reply = agent.call(userMessage, ctx).block();
After running, check the workspace directory: the three paths AGENTS.md, memory/, and agents/<agentId>/ should all exist, which means the Agent is successfully writing memory and persisting session state.
See QuickstartExample in agentscope-examples/harness-example for a fully runnable example.
If you understand the six concepts below, you basically understand how Harness works.
| Concept | Definition | Problem It Solves | Usage Suggestion |
|---|---|---|---|
HarnessAgent |
An engineering-oriented wrapper entry built on ReActAgent; build() assembles hooks, built-in tools, skills, and session persistence |
“I don’t want to assemble compaction, memory, sessions, subtasks, and the file system from scratch” | Business code only interacts with HarnessAgent.builder() and agent.call(msg, ctx)
|
workspace |
The Agent’s working directory, containing all persistent content such as AGENTS.md, MEMORY.md, skills/, subagents/, and session history |
“Where do persona, knowledge, memory, and state live, and how do they keep evolving?” | Plan the workspace structure first, then write prompts; treat the workspace as a versionable asset |
filesystem |
A unified interface for file read/write, serving as the abstraction layer between Agent tools and physical storage, supporting local disks, remote storage, sandboxes, and more | “How can the same Agent logic switch between local, shared storage, and sandboxes?” | Prefer one of the three declarative modes first (Local / Remote / Sandbox) |
RuntimeContext |
The identity context for a single call(), including sessionId, userId, etc.; passed in again on each call and not persisted |
“Who is this round for, where should state be read and written, and how do we isolate multiple tenants?” | Always pass a stable sessionId; multi-tenant scenarios must pass userId
|
sandbox |
An isolated execution environment where file operations and commands run on the sandbox side, with state persisted after each round and restored on the next | “How can tools and scripts be executed safely with untrusted input while keeping multi-turn state continuous?” | Enable it first when code execution is needed; choose the isolation granularity according to the business |
memory |
A two-layer memory system: after each conversation round, new facts are automatically distilled into a running log, then the backend periodically merges it into injectable long-term memory, combined with full-text search | “Don’t lose facts in long conversations, don’t blow up the context, and make history searchable” | Enable conversation compaction and watch the memory files change; retrieve old facts with search tools |
Bottom line: HarnessAgent handles orchestration, workspace handles persistence, filesystem handles placement, RuntimeContext handles identity, sandbox handles boundaries, and memory handles long-term evolution.
The workspace is the most important design difference between Harness and ordinary Agent frameworks. It is not a temporary storage directory, but the Agent’s “externalized brain”—everything that needs to persist across sessions lives here.
The standard workspace directory structure is as follows:
workspace/
├── AGENTS.md ← Agent persona and behavior guidelines, automatically injected into the system prompt before each inference
├── MEMORY.md ← Refined long-term memory, automatically maintained in the background and accumulated over time
├── knowledge/ ← Domain knowledge, injected together with AGENTS.md
├── skills/ ← Reusable skills, automatically assembled into the Agent’s toolset
├── subagents/ ← Sub-agent specification declarations, automatically discovered and loaded
└── agents/<agentId>/
├── context/ ← Session state snapshot (used for recovery after a process restart)
├── sessions/ ← Conversation JSONL and compressed context, for auditing and retrieval
└── memory/ ← Daily memory ledger
How the workspace works in each inference: Before inference begins, Harness merges key files such as AGENTS.md, MEMORY.md, and knowledge/ into the system prompt; after inference ends, it extracts the new facts from the conversation and appends them to the day’s memory ledger. The workspace keeps evolving with each conversation, and the Agent gradually becomes “more aware” of the business and the user over time.
Why the workspace is better than hard-coding prompts in code: Persona, knowledge, skills, and sub-agent specifications all live in workspace files. Adjusting behavior only requires editing files—no recompilation or deployment needed. This is especially important for Agents with complex business knowledge, where business rules change frequently and updates should be lightweight.
Harness persists session state along two parallel paths, each solving a different problem:
context/): After each call() ends, the Agent’s runtime state (current conversation memory, tool execution context, etc.) is serialized into a JSON file and stored under agents/<agentId>/context/<sessionId>/ in the workspace. The next time a call is made with the same sessionId, the framework automatically loads this snapshot before inference begins and restores it to where it left off. This is the technical guarantee that “closing it and opening it again still remembers the last session.”sessions/): The full conversation history is appended in JSONL format to <sessionId>.log.jsonl; this file is never compacted and is used for auditing and by the session_search tool. A separate <sessionId>.jsonl stores the compacted LLM context—the version the model actually “sees.”Both paths are maintained automatically by the framework; the only thing the developer needs to do is pass the same sessionId consistently on every call.
This is one of Harness’s most valuable engineering capabilities. In many Agent frameworks, “memory” essentially means stuffing historical messages into the context until it eventually explodes; AgentScope’s current approach is a two-layer separation:
Layer one — daily running log: After each conversation ends, the framework uses an LLM to extract the “new facts” from that conversation and appends them as bullet points to the day’s memory file (memory/YYYY-MM-DD.md). This layer only appends and never modifies, ensuring that no new fact is lost.
Layer two — long-term memory: In the background, a scheduler periodically reads recent daily ledger files and uses an LLM to merge, deduplicate, and refine them together with the existing MEMORY.md, producing an “injectable version” that fits within the token budget and writing it back to MEMORY.md. This layer is the “fact summary” injected into the system prompt for each inference; it is high quality and size-controlled.
The relationship between the two layers is: the first layer guarantees that nothing is lost, and the second layer guarantees that it is usable. New facts are first written into the running log, then moved into long-term memory by the background process once enough has accumulated. During inference, the model first looks at long-term memory, and if it still cannot find what it needs, it uses the memory_search tool for full-text search (based on SQLite FTS5).
Conversation compaction is the other side of memory management: when the number of messages or token count exceeds a threshold, Harness uses an LLM to compress the earlier conversation into a summary, keeps the most recent messages, and unloads the rest into the JSONL file. Compaction happens after long-term memory has been extracted, ensuring valuable information is persisted before it is compressed. If the model returns a context overflow error, the framework also catches the exception, forces compaction, and retries automatically; the whole process is transparent to the caller.
Configuration suggestion:
.compaction(CompactionConfig.builder()
.triggerMessages(50) // Trigger compaction when the message count exceeds 50
.keepMessages(20) // Keep the latest 20 messages
.flushBeforeCompact(true) // Extract memory before compaction (enabled by default)
.build())
When the main Agent encounters a subtask that is long-running, context-heavy, or parallelizable, it can delegate it to a sub-agent. A sub-agent is an independent Agent instance with its own system prompt and Memory, does not share the main Agent’s conversation history, and returns the execution result to the main Agent as a tool result.
There are four ways to declare sub-agents, ranging from least to most flexible:
**general-purpose** Agent: Mirrors the main Agent’s configuration, suitable for temporarily delegating arbitrary subtasks;workspace/subagents/ (YAML front matter defines name, description, and tools; the body is the system prompt), and the framework automatically discovers and loads them;builder.subagent(spec);The workspace-driven approach is the most recommended—the sub-agent definitions are versioned with the workspace, so delegation strategies can be adjusted without changing code.
Invocation modes come in two types: synchronous and asynchronous:
task_output tool. For tasks that take more than a few seconds, asynchronous execution is strongly recommended to avoid wasting time and tokens while the main Agent is idle.Preventing infinite recursion: Sub-agents are leaf nodes by default and cannot spawn sub-agents themselves, and the framework also provides a maximum-depth limit as a safety net.
HarnessAgent automatically registers a set of tools that cover the “closed loop” needed for operation, so no manual configuration is required:
| Tool Category | Tool List | Description |
|---|---|---|
| File operations |
read_file, write_file, edit_file, grep_files, glob_files, list_files
|
Operate on workspace files, with paths constrained to the file-system backend scope |
| Memory retrieval |
memory_search, memory_get
|
memory_search uses SQLite full-text search; memory_get reads the memory file by line number |
| Session queries |
session_search, session_list, session_history
|
Search historical conversation content for the Agent’s proactive review |
| Subtask management |
agent_spawn, agent_send, agent_list, task_output, task_list, task_cancel
|
Delegate, query, and manage sub-agent tasks |
| Shell execution | execute |
Conditionally registered: only appears when the file-system backend supports isolated execution (local Shell mode or sandbox mode) |
It is worth noting that in the “remote shared storage” mode, the framework does not register the Shell tool by default—this is an intentional security design, not an omission. If your business Agent does not need to execute commands, this mode can eliminate an entire class of execution-safety risks.
The file system is the key layer that connects “Agent logic” and “infrastructure” in Harness. The framework provides three declarative modes, and selection should start from business constraints:
Mode 1: Local + Shell (default)
If you do not configure filesystem or explicitly write filesystem(new LocalFilesystemSpec()), the workspace is just a directory on the local machine, and Shell commands can be executed. This is suitable for personal local applications and development/testing environments—the simplest option, with no extra dependencies.
Mode 2: Remote shared storage
Configure filesystem(new RemoteFilesystemSpec(store)), and key data such as memory and session logs are routed to a remote KV store (such as Redis), while the local file system stores only content that does not need to be shared. Shell tools are not registered by default, which is suitable for multi-replica online services and scenarios that require cross-node sharing of user memory but do not need code execution.
Mode 3: Sandbox execution
Configure filesystem(sandboxSpec), and file read/write plus command execution all happen inside the isolated sandbox environment, leaving the host process unaffected. This is suitable for scenarios that need to execute untrusted code, such as DataAgent and Coding Agent.
The core difference among the three modes is: who executes the commands, where the data lives, and how much isolation is provided. The same Agent code logic can migrate among the three modes by switching the filesystem configuration.
Sandbox mode solves not only “isolated execution,” but also “continuity of the isolated environment across multi-turn conversations”—and these two together are what make it truly valuable.
Execution boundary: In sandbox mode, the Shell commands and file operations invoked by the Agent happen on the sandbox side, and the host process only coordinates. Arbitrary user input commands do not directly affect the server.
Recoverable state: At the end of each call(), the current sandbox file-system state is persisted (snapshot mechanism). When the next call begins, the framework finds the corresponding snapshot by sessionId or userId and restores the sandbox to where it left off. Users do not lose work progress because the service restarts or the request drifts to another node.
Workspace projection: Host workspace contents such as AGENTS.md, skills/, subagents/, and knowledge/ are synchronized into the sandbox at the start of each call(), ensuring that the Agent inside the sandbox can see the full configuration and skill definitions.
Isolation granularity (choose as needed):
There are more factors to consider when Sandbox is truly used in production environments; refer to the official documentation for more details:
Skills are the structured way to represent “reusable operating procedures.” Put a SKILL.md file under skills/<skill-name>/ in the workspace, and the framework will automatically discover it and assemble it into the Agent’s capability library. During inference, the Agent can call these skills, and the skills themselves describe the “steps and rules for doing this task.”
The engineering value of this design is that skills are files—they can be versioned together with code in Git, reviewed through code review, and updated without redeploying. When a team has a large number of SOPs and operating guidelines that need to be injected into the Agent, this is much clearer than stuffing everything into the system prompt.
In sandbox mode, skill files are synchronized into the sandbox along with the workspace projection, and the commands involved in the skills run in the isolated environment, without affecting the host.
AgentScope Java 1.1 condenses the set of capabilities that people want most from Harness Engineering, but are hardest to assemble themselves, into HarnessAgent + workspace conventions + a pluggable file system + Hook pipeline`: in personal scenarios, it is an enhanced ReAct Agent with memory, compaction, and sub-tasks; in enterprise scenarios, it is infrastructure that turns isolation, multi-tenancy, distributed memory, and sub-agent orchestration into configuration items.
If you are evaluating the evolution from a personal assistant prototype to a production-ready enterprise agent, it is recommended to start by running through the quick start in Harness Overview, then choose one declarative mode under Filesystem, and then enable compaction, sandboxing, and sub-agents as needed—every step has corresponding documentation and sample modules to compare against, so you do not have to invent a “workspace as truth” runtime from scratch.
709 posts | 57 followers
FollowAlibaba Cloud Native Community - April 3, 2026
Alibaba Cloud Native Community - March 13, 2026
Alibaba Cloud Native Community - January 22, 2026
Alibaba Cloud Native Community - December 11, 2025
Alibaba Cloud Native Community - January 21, 2026
Alibaba Cloud Native Community - November 21, 2025
709 posts | 57 followers
Follow
Alibaba Cloud Model Studio
A one-stop generative AI platform to build intelligent applications that understand your business, based on Qwen model series such as Qwen-Max and other popular models
Learn More
Tongyi Qianwen (Qwen)
Top-performance foundation models from Alibaba Cloud
Learn More
ACK One
Provides a control plane to allow users to manage Kubernetes clusters that run based on different infrastructure resources
Learn More
AI Acceleration Solution
Accelerate AI-driven business and AI model training and inference with Alibaba Cloud GPU technology
Learn MoreMore Posts by Alibaba Cloud Native Community