Deploy an agent application as a Knative service in an ACK cluster - Container Service for Kubernetes

How it works

The agentscope deploy knative command wraps the complex process of containerizing an agent application and deploying it to a cluster. The core workflow is as follows:

Packages the application: It packages the specified Python agent application code, its dependencies (requirements.txt), and environment variables.
Builds the image: It builds a container image that includes the agent application in your local Docker environment, based on a specified base image.
Pushes the image: It pushes the built image to a specified container image registry, such as Container Registry (ACR).
Deploys the service: It generates and applies a Knative Service (ksvc) manifest in the target cluster. Knative then uses this manifest to create a Deployment and Pods, and automatically configures network routing, load balancing, and auto scaling policies.

Before you begin

You have deployed Knative components in the cluster.
You have deployed Knative in the ACS cluster.
Docker is installed and running on your local machine for building container images.

You have installed AgentScope Runtime by using the pip command.

# Basic installation
pip install agentscope-runtime>=1.1.0
# Dependencies for Kubernetes deployment
pip install "agentscope-runtime[ext]>=1.1.0"

Step 1: Create an agent project

If you do not have an agent application, use the following example file structure and code.

Create the project directory structure:

my-agent-project/
├── app_agent.py          # Main file for the agent application
├── requirements.txt      # Python dependencies (optional)
└── .env                  # Environment variables (optional)

Write the agent code in the app_agent.py file.

The following code creates an agent that uses a Qwen model and supports code execution and multi-turn conversations.

# -*- coding: utf-8 -*-
import os
from contextlib import asynccontextmanager
from fastapi import FastAPI
from agentscope.agent import ReActAgent
from agentscope.formatter import DashScopeChatFormatter
from agentscope.model import DashScopeChatModel
from agentscope.pipeline import stream_printing_messages
from agentscope.tool import Toolkit, execute_python_code
from agentscope.memory import InMemoryMemory
from agentscope.session import JSONSession
from agentscope_runtime.engine.app import AgentApp
from agentscope_runtime.engine.schemas.agent_schemas import AgentRequest
@asynccontextmanager
async def lifespan(app: FastAPI):
    """Initialize the service."""
    app.state.session = JSONSession(
        save_dir="./",  # Directory to save all session files
    )
    try:
        yield
    finally:
        print("AgentApp is shutting down...")
# Create an AgentApp
agent_app = AgentApp(
    app_name="MyAssistant",
    app_description="A helpful assistant agent",
    lifespan=lifespan,
)
@agent_app.query(framework="agentscope")
async def query_func(
    self,
    msgs,
    request: AgentRequest = None,
    **kwargs,
):
    """Process user queries."""
    session_id = request.session_id
    user_id = request.user_id
    # Create toolkit with Python execution
    toolkit = Toolkit()
    toolkit.register_tool_function(execute_python_code)
    # Create agent
    agent = ReActAgent(
        name="MyAssistant",
        model=DashScopeChatModel(
            "qwen-turbo",
            api_key=os.getenv("DASHSCOPE_API_KEY"),
            enable_thinking=True,
            stream=True,
        ),
        sys_prompt="You're a helpful assistant.",
        toolkit=toolkit,
        memory=InMemoryMemory(),
        formatter=DashScopeChatFormatter(),
    )
    agent.set_console_output_enabled(False)
    await agent_app.state.session.load_session_state(
        session_id=session_id,
        user_id=user_id,
        agent=agent,
    )
    async for msg, last in stream_printing_messages(
        agents=[agent],
        coroutine_task=agent(msgs),
    ):
        yield msg, last
    await agent_app.state.session.save_session_state(
        session_id=session_id,
        user_id=user_id,
        agent=agent,
    )
if __name__ == "__main__":
    agent_app.run()

Step 2: Deploy the agent application

Use the agentscope deploy command to deploy your local agent application as a Knative service.

Run the deployment command.

Navigate to the my-agent-project directory and run the following command to deploy the service.

Replace DASHSCOPE_API_KEY with your actual value.

agentscope deploy knative app_agent.py \
  --image-name agent_app \
  --env DASHSCOPE_API_KEY=sk-xxx \
  --image-tag linux-amd64-18 \
  --registry-url registry.cn-hangzhou.aliyuncs.com \
  --base-image registry.cn-hangzhou.aliyuncs.com/knative-sample/python:3.10-slim-bookworm \
  --registry-namespace knative-sample \
  --namespace default \
  --push

Command syntax

agentscope deploy knative SOURCE [OPTIONS]

SOURCE: The path to the Python file, such as app_agent.py.

[OPTIONS]: Common parameters are described below.

For more parameters, run agentscope deploy knative --help.

Parameter	Type	Default	Description
`--namespace`	string	`agentscope-runtime`	The namespace where you deploy the agent.
`--kube-config-path`, `-c`	path	`None`	The path to the KubeConfig file used to connect to the cluster. If you do not specify this parameter, the command uses the default path.
`--port`	integer	`8080`	The port that the agent application listens on inside the container.
`--image-name`	string	`agent_app`	The name of the container image for the agent application.
`--image-tag`	string	`linux-amd64`	The tag of the container image for the agent application.
`--registry-url`	string	`localhost`	The URL of the target container registry to push the image to, for example, `registry.cn-hangzhou.aliyuncs.com`.
`--registry-namespace`	string	`agentscope-runtime`	The namespace (or project) used in the container registry.
`--push`	flag	`False`	Add this flag to push the built image to a remote registry. This is typically required for deployments to a remote cluster.
`--base-image`	string	`python:3.10-slim-bookworm`	The base image used to build the application. It must include a Python environment.
`--requirements`	string	`None`	Specifies the Python dependencies for the application. This can be a path to a `requirements.txt` file or a comma-separated list of packages.
`--cpu-request`	string	`200m`	Sets the CPU resource request for the Pod. The unit is `m` (millicores) or an integer core value, such as `200m` or `1`.
`--cpu-limit`	string	`1000m`	Sets the CPU resource limit for the Pod, for example, `1000m` or `2`.
`--memory-request`	string	`512Mi`	Sets the memory resource request for the Pod. Common units are `Mi` or `Gi`, for example, `512Mi` or `1Gi`.
`--memory-limit`	string	`2Gi`	Sets the memory resource limit for the Pod, for example, `2Gi` or `4Gi`.
`--image-pull-policy`	choice	`IfNotPresent`	Sets the image pull policy for the Pod. Valid values are `Always`, `IfNotPresent`, and `Never`.
`--deploy-timeout`	integer	`300`	The timeout period in seconds for waiting for the Knative service to become ready.
`--health-check`	flag	`None`	Add this flag to enable a health check for the Knative service.
`--platform`	string	`linux/amd64`	The target hardware platform for building the image, for example, `linux/amd64` or `linux/arm64`.
`--pypi-mirror`	string	`None`	The PyPI mirror to use when installing Python packages, for example, `https://pypi.tuna.tsinghua.edu.cn/simple`.

View the deployment result.

After the deployment succeeds, the terminal outputs information such as the service URL. Record this URL to use in the next step.

Deployment successful!
Deployment ID: d4b4a54d-9976-443c-a1da-a77643******
Resource Name: agent-d4b4*****
URL: http://agent-03e*****.default.example.com
Namespace: default

Step 3: Access the deployed agent

Obtain the access gateway.
1. On the ACK Clusters page, click the name of your cluster. In the left navigation pane, click Applications > Knative.
2. Log on to the ACS console. In the left navigation pane, click Clusters.
3. On the Clusters page, click the name of the target cluster. In the left navigation pane, choose Applications > Knative.
4. On the Services or Component management page, obtain the Gateway.
  
  The IP address of the Access Gateway (for example, 120.xxx.159) is at the bottom of the page. Before accessing the service, you must resolve the service domain to this IP address.

Send a request to the agent service using the curl command.

Replace 115.29.xxx.xxx with the access gateway IP address and replace the Host value with the access URL you obtained earlier.

curl -i -X POST "http://115.29.xxx.xxx:80/process" \
  -H "Content-Type: application/json" \
  -H "Host: agent-03e*****.default.example.com" \
  -d '{
    "input": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "Hello, how are you?"
          }
        ]
      }
    ],
    "session_id": "123"
  }'

View the result. The response streams the agent's thinking process and the final output.

Step 4: Observe auto scaling

Scale-down: After a period of inactivity, Knative automatically scales the service's Pods down to zero to save resources. You can observe the changes in the Pods by running the following command:
```
# Continuously watch the Pods in the specified namespace
kubectl get pods -n default -w
```
Alternatively, on the Knative page, click Services, and then click the service name. You can view the running status of the Pods for the current revision at the bottom of the page.

On the Revision information tab, the Ready/requested Pods column for the current revision shows 0/0, which confirms that the Pods have scaled to zero.
Scale-up: When a new request arrives, Knative automatically starts new Pods within seconds to handle the request.

Production recommendations

Persist state: The InMemoryStateService and InMemorySessionHistoryService mentioned in the example lose all state and conversation history when a Pod restarts, which makes them unsuitable for production. For production deployments, switch to an implementation backed by Redis or another persistent storage solution.
Manage secrets: Do not pass sensitive information like API_KEY directly on the command line or hard-code it in your application. Use Kubernetes Secrets and mount them as environment variables in the container through the Knative Service configuration.
Plan resources: Based on your expected workload, stress-test and configure parameters such as --cpu-request, --cpu-limit, --memory-request, and --memory-limit to ensure service performance and stability.
Improve observability: Configure log collection (such as Log Service) and monitoring with alerts (such as Application Real-Time Monitoring Service (ARMS)) to troubleshoot issues and monitor service health.

Related operations

Update the agent: After modifying the code, rerun the agentscope deploy knative command with a new image tag (for example, --image-tag v1.1) to perform a rolling update.
Uninstall the agent:
```
# Replace <resource-name> and <namespace> with your actual values
kubectl delete ksvc <resource-name> -n <namespace>
```
This operation only deletes the service in Kubernetes. It does not delete the container image that was pushed to the container registry.