×
Community Blog The First Java Harness Framework Is Here | AgentScope Brings OpenClaw to Enterprise Distributed Scenarios

The First Java Harness Framework Is Here | AgentScope Brings OpenClaw to Enterprise Distributed Scenarios

AgentScope Java 1.1 launches with workspace-driven persistence, pluggable filesystems, auto-context management, and secure sandbox orchestration for scalable enterprise Agents.

By Liu Jun

I have previously conducted an in-depth analysis of OpenClaw and the underlying Harness Engineering practices, and conceived a "Harness Framework" to explain how to apply this methodology to enterprise-grade agent development.

The good news is that AgentScope Java 1.1.0 has officially been released. In this milestone version, we have fully implemented the “Harness Framework” plan. Developers can use version 1.1 to quickly practice Harness, building local applications such as XxxClaw and Coding Agent for personal productivity, or enterprise-grade applications such as DataAgent and SRE Agent for distributed scenarios.

In this release, AgentScope Java 1.1.0 delivers four core capabilities:

  • Workspace-driven Agent runtime: The Agent’s persona, knowledge, skills, memory, and sub-agent specifications are all consolidated into a structured workspace. Context is automatically loaded from the workspace before each run, and memory is automatically written back afterward, allowing the Agent’s capabilities to evolve continuously over time.
  • Pluggable abstract file system: The physical storage of the workspace can be switched freely—local disk, remote shared storage, and isolated sandboxes are all operated through the same interface, allowing the same Agent logic to adapt to both personal development environments and enterprise distributed deployments without modification.
  • Ready-to-use context management: Built-in conversation compaction, two-tier memory consolidation, and full-text search solve two stubborn problems: long-conversation context bloat and memory loss across sessions. A background maintenance mechanism ensures the memory store does not grow out of control over time.
  • Sub-agent orchestration and isolated execution: Supports declarative definition of sub-agents and synchronous or asynchronous task delegation. Tool execution can be configured to run inside an isolated sandbox, and sandbox state remains recoverable across multi-turn conversations, balancing session-level and user-level isolation in multi-tenant scenarios.

OpenClaw/Hermes Are Great, but Why Don’t They Work in Enterprise Agent Scenarios?

Over the past year, agent products such as OpenClaw, Hermes, and Claude Code have sparked a wave of enthusiasm, and the Harness Engineering ideas behind them have also become popular—using structured workspaces, context management, and tool conventions to replace the primitive “every conversation fights on its own” usage pattern. More and more teams have started bringing this approach into their own Agent development.

However, people who actually try to implement it often find that things start to break once they reach the “enterprise” level. We summarized the five obstacles most commonly mentioned by frontline developers:

  1. Multiple users, multiple replicas—what about the workspace? OpenClaw uses a local directory as the workspace, which is perfectly fine for a single machine and a single user. But once the service is exposed externally, multiple users need isolated workspaces; after the Agent scales horizontally to multiple machines, the same user’s workspace also needs to be shared across replicas—the local-directory assumption collapses immediately.
  2. Tools and Skill Scripts can’t run on the host machine—how do you isolate execution? It’s fine to let an Agent call Shell or run user-provided code on a trusted local development machine. But once you put it into a service, directly executing arbitrary user commands on the host machine becomes a security vulnerability. A sandbox is mandatory, but “having a sandbox” is only the first step: tools inside the sandbox still need to see full context, and the same sandbox instance must be recoverable across multi-turn conversations instead of starting from scratch every time.
  3. How do you move the “workspace + file system” combination into a distributed environment? A file-system-driven workspace is the most intuitive—and also one of the most effective—patterns in Harness Engineering, but that pattern depends on a “file system” as its premise. In distributed scenarios there is no unified local disk; remote storage, KV services, and object storage all have different interfaces. Rewriting everything would tightly couple the Agent logic to the infrastructure.
  4. What is the right way to build Multi-Agent systems? Subtask distribution, context isolation, asynchronous execution, result collection, timeout cancellation—none of these is hard on its own, but combining them into a manageable orchestration layer quickly increases code complexity. And most frameworks only provide primitives; the engineering questions—“how do you declare sub-agents, when do you spawn them, how do you manage state?”—are left entirely to you.
  5. Is there an out-of-the-box implementation for context compaction and hierarchical memory? Harness Engineering explains these two things very clearly, but the real implementation involves many details: when to compact, how to compact, fact extraction before compaction, historical retrievability, recovery after cross-process restarts… Most frameworks only provide abstract interfaces for short/long memory, and the actual implementation still has to be built by yourself.

The root cause of these five problems is the same: personal-assistant-style agents and enterprise-grade agents are two different engineering forms, and using the same assumptions to handle both scenarios will inevitably run into walls.

From the perspective of deployment form: a personal assistant is single-user, single-process, and all state can live on one machine; enterprise agents need horizontal scaling, multi-tenancy, and uninterrupted service, so state must be distributable and recoverable. From the perspective of security boundaries: local tool execution carries little risk, but arbitrary Shell execution in production is a serious attack surface; sandboxes and permission boundaries are not “optional optimizations” but “prerequisites for going live.” From the perspective of operability and observability: when a personal tool has a problem, you just check the logs yourself; enterprise services require persistent memory, auditable sessions, and traceable state changes. From the perspective of token economics: personal users are not sensitive to latency and cost, but in enterprise scenarios every wasted context re-prompt is a real expense.

So, is there a framework that lets you “write one set of logic and switch forms as needed”? The Harness module in AgentScope Java 1.1.0, with its entry class HarnessAgent, is designed around exactly that goal: it does not replace the reasoning loop of ReActAgent; instead, it inserts hooks at key moments in the loop, fills in a set of tool and workspace conventions, packages the engineering answers to the five problems above, and lets you focus on the Agent’s business logic rather than the infrastructure.

AgentScope Harness Design Philosophy: Why Can It Solve the Problems Above?

The design philosophy of AgentScope Harness can be summarized in one sentence: package the engineering answers to “what happens next round, what happens tomorrow, what happens when the context explodes, and what happens when state is lost,” instead of having every Agent project invent them from scratch.

At the implementation level, two core pillars support the entire framework.

Core Pillar One: Workspace as the Single Source of Truth

Harness introduces the concept of a workspace for each Agent—a structured directory that holds everything the Agent needs to run persistently: persona definitions (AGENTS.md), long-term memory (MEMORY.md), domain knowledge (knowledge/), reusable skills (skills/), sub-agent specifications (subagents/), and session history (agents/<agentId>/).

This is not a new idea—OpenClaw and Hermes both found in practice that giving an Agent a stable “workbench” is much more effective than reinitializing it every time. Harness systematizes that intuition: the workspace is the Agent’s single source of truth, and all state reads and writes revolve around the workspace rather than being scattered across code, databases, and memory.

In actual operation, before each inference begins, WorkspaceContextHook automatically injects key files such as AGENTS.md, MEMORY.md, and knowledge/ into the system prompt, ensuring the Agent’s persona and knowledge are fully present in every round. After the Agent finishes running, MemoryFlushHook extracts the new facts from the conversation and writes them into the memory file, while the background MemoryConsolidator periodically merges the running log into refined long-term memory. The workspace keeps evolving throughout the conversation, and each run leaves the Agent a little “more aware” of the user and the task than the last one.

Core Pillar Two: AbstractFilesystem Makes the Workspace Work in Any Environment

The workspace idea is beautiful, but there is one real-world constraint: local disk directories do not work in distributed scenarios. Multiple Pods each have their own local disk, so where should MEMORY.md be written? Which replica’s version is the “real” one?

AgentScope Harness solves this with the AbstractFilesystem abstraction layer. From the upper layer’s perspective, the Agent only needs to call unified interfaces such as read/write/ls/grep and does not care where the “file” actually lives; from the lower layer’s perspective, it can adapt to any medium such as local disks, remote object storage (OSS), KV databases (Redis), or sandbox file systems, and it can even route different paths to different backends through CompositeFilesystem.

2

As shown above, based on the AbstractFilesystem interface, AgentScope provides three built-in extension implementations corresponding to three usage modes.

The three implementations and modes will be explained in detail later.

In AgentScope 1.1, the workspace is the core abstraction of the agent. We use AbstractFilesystem as the physical implementation carrier of the workspace, and all file operations, command execution, and memory management tools use AbstractFilesystem as the standard entry point.

3

Based on this file-system abstraction layer, the AgentScope framework directly brings three major engineering capabilities to agent development:

Security and isolation

  • Shell/Code/Skill execution is isolated through a sandbox backend, so commands driven by user input no longer run directly on the host machine
  • The workspace itself can also run inside a sandbox, achieving isolation at the file read/write layer
  • Tool registration and exposure are uniformly managed by the framework, and the execute tool only appears when the backend implements the sandbox interface

Distributed deployment

  • Agents can be deployed with multiple equivalent replicas, and key files such as MEMORY.md and session logs are routed through a Remote backend to shared storage, naturally achieving cross-node synchronization
  • By combining IsolationScope (SESSION / USER / AGENT / GLOBAL) with RuntimeContext, you can implement session-level isolation, user-level sharing, and other tenancy strategies without changing the code

Sub-agent and asynchronous task support

  • The sub-agent’s workspace, file system, and session state can be inherited from the parent Agent or configured independently, and orchestration strategies are declared in the specification so they do not need to be assembled manually
  • The asynchronous task state machine (PENDING/RUNNING/COMPLETED/FAILED/CANCELLED) and result collection mechanism are ready to use out of the box, and can be replaced with a cross-process implementation

Typical AgentScope Harness Use Cases: Quickly Map It to Your Application Scenario

The three scenarios below cover the typical development patterns from personal to enterprise. They are not either/or options; rather, they represent three different complexity paths—you can start with the simplest one and migrate gradually as requirements evolve.

Personal assistant Agent — typical of OpenClaw-style applications

Characteristics of this scenario: single user, local execution, needs to manipulate local files or run scripts; typical products include personal assistants, note bots, and local Coding Agents.

The core need in this scenario is to “let the Agent truly understand me and remember me,” not just be a stateless Q&A machine. Harness’s value here is that the AGENTS.md in the workspace defines the Agent’s persona and behavioral preferences, new facts are automatically distilled into memory after the conversation ends, and when you open it again the Agent still knows you and remembers the last progress. Skills and domain knowledge also live in the workspace, so they can be edited and adjusted at any time without touching the code.

Under local deployment you can also enable Shell execution, allowing the Agent to run scripts and operate the file system directly. This is one of the most attractive aspects of OpenClaw-style products. On top of that, Harness adds a layer of “continuous evolution”: the workspace acts like the Agent’s brain, becoming more experienced with every conversation.

Core capabilities that AgentScope Harness provides in this scenario:

  • Persistent memory: After a conversation ends, new facts are automatically distilled and written into the workspace, so the next start does not require re-explaining the background to the Agent, and long-term memory accumulates with use.
  • Local Shell execution: In a trusted local environment, the Agent can directly run scripts and manipulate files, reproducing the core experience of OpenClaw-style products.
  • Workspace as configuration: Modify AGENTS.md to adjust persona, add skills in the skills/ directory, and changing one file is equivalent to upgrading the Agent once—no rebuild or redeploy required.
  • Cross-process session recovery: Close it and open it again; as long as the sessionId stays the same, the last conversation state is fully restored instead of starting from scratch.

Enterprise data services — typical of DataAgent

Characteristics of this scenario: serves multiple users, needs to execute SQL / Python / Shell, tasks are long-running, inputs come from untrusted external users, and it also requires recoverable multi-turn state and consistent user experience across replicated deployments.

The biggest risk in this scenario is execution safety—user-driven code must not run unrestricted on the server. Harness’s sandbox mechanism confines both file operations and command execution to an isolated environment, leaving the server process unaffected. More importantly, the sandbox is not “use it once and throw it away”; after each conversation round ends, the sandbox state is persisted and brought back for the next round, so users do not lose their work just because the service restarts or switches nodes.

When deployed with multiple replicas, a user’s long-term memory (the Agent’s accumulated understanding of that user) can be stored in shared storage, so no matter which node receives the request, the Agent sees the same memory. Long-running analysis tasks can be split into multiple sub-agents and executed in parallel, with the main Agent only coordinating and summarizing rather than blocking the entire time.

Core capabilities that AgentScope Harness provides in this scenario:

  • Isolated sandbox execution: All code and commands run in an isolated environment, so the host service process is not affected by user input and the security boundary is clear.
  • Multi-turn sandbox state recovery: Sandbox state is automatically saved after each conversation round, and restored in place on the next round or the next service start, so the user’s work area is not lost.
  • Distributed memory sharing: A user’s long-term memory is stored in shared storage, so all replicas in a multi-node deployment read the same “understanding of this user,” ensuring a consistent experience.
  • Parallel sub-agent orchestration: Long tasks can be decomposed into multiple sub-agents running concurrently, with the main Agent only coordinating. This improves overall efficiency and makes timeouts and failures easier to manage.
  • Multi-tenant isolation: Workspaces and execution environments are isolated by session or user, so multiple users online at the same time do not interfere with one another.

Enterprise online services — typical of Taotian transaction agents

Characteristics of this scenario: tasks are mainly completed by calling business APIs (placing orders, querying, approving, etc.), no Shell execution is needed on the server, but multi-instance operation, persistent session state, and cross-user knowledge sharing are required.

The core need in this scenario is stability and safety—an online service cannot afford to fail because an Agent called a Shell command it should never have called. Harness’s value here is that if sandbox execution is not configured, the framework simply does not expose the Shell tool by default. The Agent can only interact with the outside world through explicitly defined business tools, and the security boundary is determined by configuration rather than developer self-discipline.

Session state and memory can be stored in remote storage, and multiple service instances can share the same set of user memories. No matter which entry point the user uses to start a new conversation, the Agent can continue from the previous context. When multiple subtasks need to be processed in parallel (for example, checking inventory, calculating discounts, and generating summaries at the same time), the sub-agent mechanism still applies, and it can integrate with an external task queue to enable cross-process task management.

Core capabilities that AgentScope Harness provides in this scenario:

  • Default security boundary: When sandbox execution is not enabled, the framework does not expose the Shell tool, and the Agent can only interact with the outside world through the business tools you explicitly register; the security policy is determined by configuration.
  • Shared memory across instances: Session state and user memory are stored in remote storage, so any service instance can read the same context and users can switch between instances without noticing.
  • Continuous sessions across requests: Each request carries the same user identifier, and the Agent automatically restores the previous conversation state, achieving a truly continuous multi-turn conversation experience.
  • Parallel subtask support: When multiple business steps need to be handled at the same time, subtasks can be delegated to sub-agents for parallel execution, with the results merged and returned together without slowing the main flow.

A Detailed Look at AgentScope Harness: Spend Some Time Learning More Framework Details

This section explains the core capabilities of AgentScope Harness from the user’s perspective: what it is, how it works, and how you should think about configuration.

Quick Start

Getting started with Harness only takes three steps: add the dependency, prepare the workspace, then build and call the Agent.

1. Add the dependency

<dependency>
    <groupId>io.agentscope</groupId>
    <artifactId>agentscope-harness</artifactId>
    <version>${agentscope.version}</version>
</dependency>

2. Prepare the workspace

Choose a directory on disk as the workspace and create AGENTS.md inside it. This is not an “optional initialization step,” but the core entry point of Harness—the Agent’s persona, memory, skills, and sub-agent specifications all revolve around this directory. A few lines of conventions in AGENTS.md are enough, and it can continue evolving as it is used.

3. Build HarnessAgent and call it

HarnessAgent agent = HarnessAgent.builder()
    .name("my-agent")
    .model(model)
    .workspace(Paths.get(".agentscope/workspace"))
    .compaction(CompactionConfig.builder()     // Recommended to configure from the start to avoid context overflow in production
        .triggerMessages(50)
        .keepMessages(20)
        .build())
    .build();

RuntimeContext ctx = RuntimeContext.builder()
    .sessionId("user-session-001")   // Multiple calls with the same sessionId automatically continue the context
    .userId("alice")                 // Required in multi-user scenarios for namespace isolation
    .build();

Msg reply = agent.call(userMessage, ctx).block();

After running, check the workspace directory: the three paths AGENTS.md, memory/, and agents/<agentId>/ should all exist, which means the Agent is successfully writing memory and persisting session state.

See QuickstartExample in agentscope-examples/harness-example for a fully runnable example.

Core Concepts

If you understand the six concepts below, you basically understand how Harness works.

Concept Definition Problem It Solves Usage Suggestion
HarnessAgent An engineering-oriented wrapper entry built on ReActAgent; build() assembles hooks, built-in tools, skills, and session persistence “I don’t want to assemble compaction, memory, sessions, subtasks, and the file system from scratch” Business code only interacts with HarnessAgent.builder() and agent.call(msg, ctx)
workspace The Agent’s working directory, containing all persistent content such as AGENTS.md, MEMORY.md, skills/, subagents/, and session history “Where do persona, knowledge, memory, and state live, and how do they keep evolving?” Plan the workspace structure first, then write prompts; treat the workspace as a versionable asset
filesystem A unified interface for file read/write, serving as the abstraction layer between Agent tools and physical storage, supporting local disks, remote storage, sandboxes, and more “How can the same Agent logic switch between local, shared storage, and sandboxes?” Prefer one of the three declarative modes first (Local / Remote / Sandbox)
RuntimeContext The identity context for a single call(), including sessionId, userId, etc.; passed in again on each call and not persisted “Who is this round for, where should state be read and written, and how do we isolate multiple tenants?” Always pass a stable sessionId; multi-tenant scenarios must pass userId
sandbox An isolated execution environment where file operations and commands run on the sandbox side, with state persisted after each round and restored on the next “How can tools and scripts be executed safely with untrusted input while keeping multi-turn state continuous?” Enable it first when code execution is needed; choose the isolation granularity according to the business
memory A two-layer memory system: after each conversation round, new facts are automatically distilled into a running log, then the backend periodically merges it into injectable long-term memory, combined with full-text search “Don’t lose facts in long conversations, don’t blow up the context, and make history searchable” Enable conversation compaction and watch the memory files change; retrieve old facts with search tools

Bottom line: HarnessAgent handles orchestration, workspace handles persistence, filesystem handles placement, RuntimeContext handles identity, sandbox handles boundaries, and memory handles long-term evolution.

Feature Details

Workspace: The Agent’s Single Source of Truth

The workspace is the most important design difference between Harness and ordinary Agent frameworks. It is not a temporary storage directory, but the Agent’s “externalized brain”—everything that needs to persist across sessions lives here.

The standard workspace directory structure is as follows:

workspace/
├── AGENTS.md              ← Agent persona and behavior guidelines, automatically injected into the system prompt before each inference
├── MEMORY.md              ← Refined long-term memory, automatically maintained in the background and accumulated over time
├── knowledge/             ← Domain knowledge, injected together with AGENTS.md
├── skills/                ← Reusable skills, automatically assembled into the Agent’s toolset
├── subagents/             ← Sub-agent specification declarations, automatically discovered and loaded
└── agents/<agentId>/
    ├── context/           ← Session state snapshot (used for recovery after a process restart)
    ├── sessions/          ← Conversation JSONL and compressed context, for auditing and retrieval
    └── memory/            ← Daily memory ledger

How the workspace works in each inference: Before inference begins, Harness merges key files such as AGENTS.md, MEMORY.md, and knowledge/ into the system prompt; after inference ends, it extracts the new facts from the conversation and appends them to the day’s memory ledger. The workspace keeps evolving with each conversation, and the Agent gradually becomes “more aware” of the business and the user over time.

Why the workspace is better than hard-coding prompts in code: Persona, knowledge, skills, and sub-agent specifications all live in workspace files. Adjusting behavior only requires editing files—no recompilation or deployment needed. This is especially important for Agents with complex business knowledge, where business rules change frequently and updates should be lightweight.

Session Persistence: Continuous State Across Requests and Processes

Harness persists session state along two parallel paths, each solving a different problem:

  • State snapshot (context/): After each call() ends, the Agent’s runtime state (current conversation memory, tool execution context, etc.) is serialized into a JSON file and stored under agents/<agentId>/context/<sessionId>/ in the workspace. The next time a call is made with the same sessionId, the framework automatically loads this snapshot before inference begins and restores it to where it left off. This is the technical guarantee that “closing it and opening it again still remembers the last session.”
  • Conversation logs (sessions/): The full conversation history is appended in JSONL format to <sessionId>.log.jsonl; this file is never compacted and is used for auditing and by the session_search tool. A separate <sessionId>.jsonl stores the compacted LLM context—the version the model actually “sees.”

Both paths are maintained automatically by the framework; the only thing the developer needs to do is pass the same sessionId consistently on every call.

Memory Management: Automatic Accumulation from Conversation to Long-Term Knowledge

This is one of Harness’s most valuable engineering capabilities. In many Agent frameworks, “memory” essentially means stuffing historical messages into the context until it eventually explodes; AgentScope’s current approach is a two-layer separation:

Layer one — daily running log: After each conversation ends, the framework uses an LLM to extract the “new facts” from that conversation and appends them as bullet points to the day’s memory file (memory/YYYY-MM-DD.md). This layer only appends and never modifies, ensuring that no new fact is lost.

Layer two — long-term memory: In the background, a scheduler periodically reads recent daily ledger files and uses an LLM to merge, deduplicate, and refine them together with the existing MEMORY.md, producing an “injectable version” that fits within the token budget and writing it back to MEMORY.md. This layer is the “fact summary” injected into the system prompt for each inference; it is high quality and size-controlled.

The relationship between the two layers is: the first layer guarantees that nothing is lost, and the second layer guarantees that it is usable. New facts are first written into the running log, then moved into long-term memory by the background process once enough has accumulated. During inference, the model first looks at long-term memory, and if it still cannot find what it needs, it uses the memory_search tool for full-text search (based on SQLite FTS5).

Conversation compaction is the other side of memory management: when the number of messages or token count exceeds a threshold, Harness uses an LLM to compress the earlier conversation into a summary, keeps the most recent messages, and unloads the rest into the JSONL file. Compaction happens after long-term memory has been extracted, ensuring valuable information is persisted before it is compressed. If the model returns a context overflow error, the framework also catches the exception, forces compaction, and retries automatically; the whole process is transparent to the caller.

Configuration suggestion:

.compaction(CompactionConfig.builder()
    .triggerMessages(50)    // Trigger compaction when the message count exceeds 50
    .keepMessages(20)       // Keep the latest 20 messages
    .flushBeforeCompact(true) // Extract memory before compaction (enabled by default)
    .build())

Sub-agent Orchestration: Decomposition and Delegation of Complex Tasks

When the main Agent encounters a subtask that is long-running, context-heavy, or parallelizable, it can delegate it to a sub-agent. A sub-agent is an independent Agent instance with its own system prompt and Memory, does not share the main Agent’s conversation history, and returns the execution result to the main Agent as a tool result.

There are four ways to declare sub-agents, ranging from least to most flexible:

  1. Built-in **general-purpose** Agent: Mirrors the main Agent’s configuration, suitable for temporarily delegating arbitrary subtasks;
  2. Workspace file-driven: Place Markdown files under workspace/subagents/ (YAML front matter defines name, description, and tools; the body is the system prompt), and the framework automatically discovers and loads them;
  3. Code declaration: Specify it programmatically with builder.subagent(spec);
  4. Custom factory: Full control over the sub-agent construction logic.

The workspace-driven approach is the most recommended—the sub-agent definitions are versioned with the workspace, so delegation strategies can be adjusted without changing code.

Invocation modes come in two types: synchronous and asynchronous:

  • Synchronous call: The main Agent waits for the sub-agent to finish before continuing, suitable for scenarios where you must have the result before moving on;
  • Asynchronous call: After submitting the task, the main Agent immediately receives a task ID and can continue doing other things, later polling the result with the task_output tool. For tasks that take more than a few seconds, asynchronous execution is strongly recommended to avoid wasting time and tokens while the main Agent is idle.

Preventing infinite recursion: Sub-agents are leaf nodes by default and cannot spawn sub-agents themselves, and the framework also provides a maximum-depth limit as a safety net.

Built-in Tools

HarnessAgent automatically registers a set of tools that cover the “closed loop” needed for operation, so no manual configuration is required:

Tool Category Tool List Description
File operations read_file, write_file, edit_file, grep_files, glob_files, list_files Operate on workspace files, with paths constrained to the file-system backend scope
Memory retrieval memory_search, memory_get memory_search uses SQLite full-text search; memory_get reads the memory file by line number
Session queries session_search, session_list, session_history Search historical conversation content for the Agent’s proactive review
Subtask management agent_spawn, agent_send, agent_list, task_output, task_list, task_cancel Delegate, query, and manage sub-agent tasks
Shell execution execute Conditionally registered: only appears when the file-system backend supports isolated execution (local Shell mode or sandbox mode)

It is worth noting that in the “remote shared storage” mode, the framework does not register the Shell tool by default—this is an intentional security design, not an omission. If your business Agent does not need to execute commands, this mode can eliminate an entire class of execution-safety risks.

File System: Three Modes, Choose as Needed

The file system is the key layer that connects “Agent logic” and “infrastructure” in Harness. The framework provides three declarative modes, and selection should start from business constraints:

Mode 1: Local + Shell (default)

If you do not configure filesystem or explicitly write filesystem(new LocalFilesystemSpec()), the workspace is just a directory on the local machine, and Shell commands can be executed. This is suitable for personal local applications and development/testing environments—the simplest option, with no extra dependencies.

Mode 2: Remote shared storage

Configure filesystem(new RemoteFilesystemSpec(store)), and key data such as memory and session logs are routed to a remote KV store (such as Redis), while the local file system stores only content that does not need to be shared. Shell tools are not registered by default, which is suitable for multi-replica online services and scenarios that require cross-node sharing of user memory but do not need code execution.

Mode 3: Sandbox execution

Configure filesystem(sandboxSpec), and file read/write plus command execution all happen inside the isolated sandbox environment, leaving the host process unaffected. This is suitable for scenarios that need to execute untrusted code, such as DataAgent and Coding Agent.

The core difference among the three modes is: who executes the commands, where the data lives, and how much isolation is provided. The same Agent code logic can migrate among the three modes by switching the filesystem configuration.

Sandbox: Isolated Execution + Recoverable State

Sandbox mode solves not only “isolated execution,” but also “continuity of the isolated environment across multi-turn conversations”—and these two together are what make it truly valuable.

Execution boundary: In sandbox mode, the Shell commands and file operations invoked by the Agent happen on the sandbox side, and the host process only coordinates. Arbitrary user input commands do not directly affect the server.

Recoverable state: At the end of each call(), the current sandbox file-system state is persisted (snapshot mechanism). When the next call begins, the framework finds the corresponding snapshot by sessionId or userId and restores the sandbox to where it left off. Users do not lose work progress because the service restarts or the request drifts to another node.

Workspace projection: Host workspace contents such as AGENTS.md, skills/, subagents/, and knowledge/ are synchronized into the sandbox at the start of each call(), ensuring that the Agent inside the sandbox can see the full configuration and skill definitions.

Isolation granularity (choose as needed):

  • Session-level: each session has its own sandbox state and does not interfere with others, suitable for multi-user SaaS;
  • User-level: multiple sessions for the same user share one sandbox state, suitable for “long-term user workbench” scenarios;
  • Global shared: the entire Agent shares one sandbox, suitable for tool-like, read-only Agents.

There are more factors to consider when Sandbox is truly used in production environments; refer to the official documentation for more details:

  • How to manage sandbox lifecycle: Agent-managed or user-managed
  • Which processes need to run in the sandbox: Tool In Sandbox, Subagent in Sandbox
  • How to manage internal sandbox state: state and snapshot recovery

Skills: Reusable Workspace-Driven Skills

Skills are the structured way to represent “reusable operating procedures.” Put a SKILL.md file under skills/<skill-name>/ in the workspace, and the framework will automatically discover it and assemble it into the Agent’s capability library. During inference, the Agent can call these skills, and the skills themselves describe the “steps and rules for doing this task.”

The engineering value of this design is that skills are files—they can be versioned together with code in Git, reviewed through code review, and updated without redeploying. When a team has a large number of SOPs and operating guidelines that need to be injected into the Agent, this is much clearer than stuffing everything into the system prompt.

In sandbox mode, skill files are synchronized into the sandbox along with the workspace projection, and the commands involved in the skills run in the isolated environment, without affecting the host.

Summary

AgentScope Java 1.1 condenses the set of capabilities that people want most from Harness Engineering, but are hardest to assemble themselves, into HarnessAgent + workspace conventions + a pluggable file system + Hook pipeline`: in personal scenarios, it is an enhanced ReAct Agent with memory, compaction, and sub-tasks; in enterprise scenarios, it is infrastructure that turns isolation, multi-tenancy, distributed memory, and sub-agent orchestration into configuration items.

If you are evaluating the evolution from a personal assistant prototype to a production-ready enterprise agent, it is recommended to start by running through the quick start in Harness Overview, then choose one declarative mode under Filesystem, and then enable compaction, sandboxing, and sub-agents as needed—every step has corresponding documentation and sample modules to compare against, so you do not have to invent a “workspace as truth” runtime from scratch.

0 0 0
Share on

You may also like

Comments

Related Products