×
Community Blog Significant Upgrade to the MCP Protocol: Spring AI Alibaba and Higress Release the Industry's First Streamable HTTP Implementation Solution

Significant Upgrade to the MCP Protocol: Spring AI Alibaba and Higress Release the Industry's First Streamable HTTP Implementation Solution

This article introduces the new Streamable HTTP transport layer by MCP, highlighting its design philosophy, technical details, practical applications,...

Article Summary

MCP officially introduces a new Streamable HTTP transport layer, which represents a significant improvement over the original HTTP+SSE transport mechanism. This article will:

  1. Detail the design philosophy, technical details, and practical applications of this protocol.
  2. Provide an in-depth explanation of the Streamable HTTP Java implementation offered by the Spring AI Alibaba open-source framework, along with example explanations of Streamable HTTP using Spring AI Alibaba and Higress at the end.<

Relevant project links are as follows:

● Complete runnable example: https://github.com/springaialibaba/spring-ai-alibaba-examples

● Spring AI Alibaba official blog article: https://java2ai.com/

● Spring AI Alibaba open-source project address: https://github.com/alibaba/spring-ai-alibaba

● Higress official website: https://higress.ai/

HTTP+SSE Principles and Deficiencies

1

In the original MCP implementation, communication between the client and server occurs through two primary channels:

● HTTP request/response: The client sends messages to the server via standard HTTP requests.

● Server-Sent Events (SSE): The server pushes messages to the client through a dedicated /sse endpoint.

Major Issues

While this design is simple and intuitive, there are several key issues:

1. No support for reconnecting/recovery:

When the SSE connection is dropped, all session states are lost, necessitating the client to re-establish the connection and initialize the entire session. For example, a large document analysis task being executed could be completely interrupted due to unstable WiFi, forcing the user to restart the entire process.

2. The server must maintain long connections:

The server must maintain a long-lived SSE connection for each client, leading to a significant increase in resource consumption with a large number of concurrent users. When the server needs to restart or scale, all connections are interrupted, negatively affecting user experience and system reliability.

3. Server messages can only be transmitted via SSE:

Even for simple request-response interactions, the server must return information through the SSE channel, creating unnecessary complexity and overhead. This approach is unsuitable for certain environments (such as cloud functions) due to the need to maintain long-lived SSE connections.

4. Infrastructure compatibility limitations:

Many existing web infrastructures such as CDNs, load balancers, and API gateways may not correctly handle long-lived SSE connections. Corporate firewalls might force close timed-out connections, leading to unreliable services.

Streamable HTTP Principles and Improvements

Key Improvements of Streamable:

Compared to the original HTTP+SSE mechanism, Streamable HTTP introduces several key improvements:

  1. Unified Endpoint: Removes the dedicated /sse endpoint, allowing all communication through a single endpoint (currently implemented as /mcp in the official SDK).
  2. On-Demand Streaming: The server can flexibly choose to return a standard HTTP response or upgrade to an SSE stream.
  3. Session Identification: Introduces a session ID mechanism to support state management and recovery.
  4. Flexible Initialization: The client can actively initialize the SSE stream through an empty GET request.

How Streamable Works

The workflow of Streamable HTTP is as follows:

1. Session Initialization (Optional, Suitable for Stateful Implementation Scenarios):

  • The client sends an initialization request to the /mcp endpoint.
  • The server may choose to generate a session ID and return it to the client.
  • The session ID is used to identify the session in subsequent requests.

2. Client Communication with the Server:

  • All messages are sent to the /mcp endpoint via HTTP POST requests.
  • If a session ID exists, it is included in the request.

3. Server Response Methods:

  • Normal Response: Directly returns a standard HTTP response, suitable for simple interactions.
  • Streaming Response: Upgrades the connection to SSE and sends a series of events before closing.
  • Long Connection: Maintains the SSE connection to continuously send events.

4. Active Establishment of SSE Stream:

  • The client can send a GET request to the /mcp endpoint to actively establish the SSE stream.
  • The server can push notifications or requests through this stream.

5. Connection Recovery:

  • When the connection is interrupted, the client can reconnect using the previous session ID.
  • The server can recover the session state to continue the previous interactions.

Streamable Request Examples

Stateless Server Mode

Scenario: Simple tool API services, such as mathematical calculations, text processing, etc.

Implementation:

Client                                   Server
   |                                    |
   |-- POST /message (Calculation Request) -------->|
   |                                    |-- Perform Calculation
   |<------- HTTP 200 (Calculation Result) -------|
   |                                    |

Advantages: Extremely simple deployment, no state management required, suitable for serverless architecture and microservices.

Streaming Progress Feedback Mode

Scenario: Long-running tasks, such as large file processing, complex AI generation, etc.

Implementation:

Client                                   Server
   |                                    |
   |-- POST /message (Processing Request) -------->|
   |                                    |-- Start Processing Task
   |<------- HTTP 200 (SSE Starts) --------|
   |                                    |
   |<------- SSE: Progress 10% ---------------|
   |<------- SSE: Progress 30% ---------------|
   |<------- SSE: Progress 70% ---------------|
   |<------- SSE: Completion + Result ------------|
   |                                    |

Advantages: Provides real-time feedback without needing to maintain a permanent connection state.

Complex AI Conversation Mode

Scenario: Multi-turn dialogue AI assistants that require context maintenance.

Implementation:

Client                                   Server
   |                                    |
   |-- POST /message (Initialization) ---------->|
   |<-- HTTP 200 (Session ID: abc123) ------|
   |                                    |
   |-- GET /message (Session ID: abc123) --->|
   |<------- SSE Stream Established -----------------|
   |                                    |
   |-- POST /message (Question 1, abc123) --->|
   |<------- SSE: Thinking... -------------|
   |<------- SSE: Answer 1 ----------------|
   |                                    |
   |-- POST /message (Question 2, abc123) --->|
   |<------- SSE: Thinking... -------------|
   |<------- SSE: Answer 2 ----------------|

Advantages: Maintains session context, supports complex interactions, while allowing for horizontal scaling.

Disconnection Recovery Mode

Scenario: AI applications used in unstable network environments.

Implementation:

Client                                   Server
   |                                    |
   |-- POST /message (Initialization) ---------->|
   |<-- HTTP 200 (Session ID: xyz789) ------|
   |                                    |
   |-- GET /message (Session ID: xyz789) --->|
   |<------- SSE Stream Established -----------------|
   |                                    |
   |-- POST /message (Long Task, xyz789) -->|
   |<------- SSE: Progress 30% ---------------|
   |                                    |
   |     [Network Disruption]                      |
   |                                    |
   |-- GET /message (Session ID: xyz789) --->|
   |<------- SSE Stream Re-established --------------|
   |<------- SSE: Progress 60% ---------------|
   |<------- SSE: Completion ------------------|

Advantages: Increases reliability in weak network environments, improving user experience.

Streamable HTTP Implementation in the Spring AI Alibaba Community

From the perspective of enterprise business implementation, why is Streamable HTTP necessary?

In the previous sections, we theoretically outlined the advantages and disadvantages of the HTTP+SSE and Streamable modes. In actual applications, the fragmented request and response patterns of the HTTP+SSE model lead to a very tricky problem in architectural implementation and scalability: it forces the maintenance of sticky session connections between the client and server, even for stateless communication where we need to maintain a session ID and ensure that requests with the same session ID are sent to the same server machine. This poses a heavy burden on both the client and server implementations.

For the Streamable mode, if the goal is merely to maintain stateless communication, there is no need to manage sticky sessions at all. Considering that over 90% of MCP services may be stateless, this presents a significant improvement in the overall architecture's scalability.

Of course, if stateful communication needs to be implemented, the Streamable HTTP mode still requires maintaining a session ID.

Streamable HTTP Java Implementation Scheme

Currently, neither MCP nor Spring AI's official documentation has provided the Streamable mode; we have only provided the Stream HTTP Client implementation, which only supports stateless mode and can connect to the official Typescript server implementation and Higress community server implementation.

A complete runnable example can be found at: https://github.com/springaialibaba/spring-ai-alibaba-examples

Due to the ongoing development of the MCP Java SDK implementation for the Streamable HTTP solution, this example repository contains customized source code from the following two repositories:

  1. MCP Java SDK, located in the io.modelcontextprotocol package.
  2. Spring AI, located in the org.springframework.ai.mcp.client.autoconfigure package.

The examples integrate a Higress gateway that supports the MCP Streamable HTTP protocol implementation. This implementation still has many limitations, such as not supporting GET requests, not supporting session-id management, etc.

New StreamHttpClientTransport

GET request (empty request body), actively establish SSE connection:

The client can actively establish an SSE connection by sending a GET request to the /mcp endpoint, which will serve as the subsequent request-response channel.

return Mono.defer(() -> Mono.fromFuture(() -> {
        final HttpRequest.Builder builder = requestBuilder.copy().GET().uri(uri);
        final String lastId = lastEventId.get();
        if (lastId != null) {
            builder.header("Last-Event-ID", lastId);
        }
        return httpClient.sendAsync(builder.build(), HttpResponse.BodyHandlers.ofInputStream());
    }).flatMap(response -> {
        if (response.statusCode() == 405 || response.statusCode() == 404) {
            // .....
        }
        return handleStreamingResponse(response, handler);
    })
    .retryWhen(Retry.backoff(3, Duration.ofSeconds(3)).filter(err -> err instanceof IllegalStateException))
    .doOnSuccess(v -> state.set(TransportState.CONNECTED))
    .doOnTerminate(() -> state.set(TransportState.CLOSED))
    .onErrorResume(e -> {
        System.out.println("Ignore GET connection error.");
        LOGGER.error("Streamable transport connection error", e);
        state.set(TransportState.CONNECTED);
        return Mono.just("Streamable transport connection error").then();
    }));

POST request, server can respond with a normal response or upgrade to an SSE response:

Example of an equivalent HTTP request, where listTool and callTool are similar requests.

curl -X POST -H "Content-Type: application/json" -H "Accept: application/json" -H "Accept: text/event-stream"     -d '{
  "jsonrpc" : "2.0",   
  "method" : "initialize",   
  "id" : "9afdedcc-0",
  "params" : {             
    "protocolVersion" : "2024-11-05",   
    "capabilities" : { 
      "roots" : {        
        "listChanged" : true
      }  
    },  
    "clientInfo" : {
      "name" : "Java SDK MCP Client",                                                 
      "version" : "1.0.0"
    }
  }
}' -i http://localhost:3000/mcp 

You can start and test with the Streamable Server provided by the official TypeScript SDK in conjunction with the current client implementation.

Java Code Implementation

// Send POST request to /mcp, including
public Mono<Void> sendMessage(final McpSchema.JSONRPCMessage message,
            final Function<Mono<McpSchema.JSONRPCMessage>, Mono<McpSchema.JSONRPCMessage>> handler) {
    // ... 
    return sentPost(message, handler).onErrorResume(e -> {
        LOGGER.error("Streamable transport sendMessage error", e);
        return Mono.error(e);
    });
}

// Actually send the POST request and process the response
private Mono<Void> sentPost(final Object msg,
        final Function<Mono<McpSchema.JSONRPCMessage>, Mono<McpSchema.JSONRPCMessage>> handler) {
    return serializeJson(msg).flatMap(json -> {
        final HttpRequest request = requestBuilder.copy()
            .POST(HttpRequest.BodyPublishers.ofString(json))
            .uri(uri)
            .build();
        return Mono.fromFuture(httpClient.sendAsync(request, HttpResponse.BodyHandlers.ofInputStream()))
            .flatMap(response -> {
                // If the response is 202 Accepted, there's no body to process
                if (response.statusCode() == 202) {
                    return Mono.empty();
                }

                if (response.statusCode() == 405 || response.statusCode() == 404) {
                 // ...
                }

                if (response.statusCode() >= 400) {
                 // ...
                }

                return handleStreamingResponse(response, handler);
            });
    });
}

// Handle different types of responses that the server might return
private Mono<Void> handleStreamingResponse(final HttpResponse<InputStream> response,
            final Function<Mono<McpSchema.JSONRPCMessage>, Mono<McpSchema.JSONRPCMessage>> handler) {
    final String contentType = response.headers().firstValue("Content-Type").orElse("");
    if (contentType.contains("application/json-seq")) {
        return handleJsonStream(response, handler);
    }
    else if (contentType.contains("text/event-stream")) {
        return handleSseStream(response, handler);
    }
    else if (contentType.contains("application/json")) {
        return handleSingleJson(response, handler);
    }
    else {
        return Mono.error(new UnsupportedOperationException("Unsupported Content-Type: " + contentType));
    }
}

Integrated into Spring AI Framework

@AutoConfiguration
@ConditionalOnClass({ McpSchema.class, McpSyncClient.class })
@EnableConfigurationProperties({ McpStreamableClientProperties.class, McpClientCommonProperties.class })
@ConditionalOnProperty(prefix = McpClientCommonProperties.CONFIG_PREFIX, name = "enabled", havingValue = "true",
        matchIfMissing = true)
public class StreamableHttpClientTransportAutoConfiguration {
    @Bean
    public List<NamedClientMcpTransport> mcpHttpClientTransports(McpStreamableClientProperties streamableProperties,
            ObjectProvider<ObjectMapper> objectMapperProvider) {

        ObjectMapper objectMapper = objectMapperProvider.getIfAvailable(ObjectMapper::new);

        List<NamedClientMcpTransport> sseTransports = new ArrayList<>();

        for (Map.Entry<String, McpStreamableClientProperties.StreamableParameters> serverParameters : streamableProperties.getConnections().entrySet()) {

            var transport = StreamableHttpClientTransport.builder(serverParameters.getValue().url()).withObjectMapper(objectMapper).build();
            sseTransports.add(new NamedClientMcpTransport(serverParameters.getKey(), transport));
        }

        return sseTransports;
    }

}

Complete Spring AI Alibaba + Higress Streamable HTTP Example

By configuring the following, you can enable Streamable HTTP Transport. The configuration shows the MCP Server address provided by Higress (supporting limited Streamable HTTP Server implementation).

spring:
  ai:
    mcp:
      client:
        toolcallback:
          enabled: true
        streamable:
          connections:
            server1:
              url: http://env-cvpjbjem1hkjat42sk4g-ap-southeast-1.alicloudapi.com/mcp-quark
@SpringBootApplication(exclude = {
        org.springframework.ai.mcp.client.autoconfigure.SseHttpClientTransportAutoConfiguration.class,
})
@ComponentScan(basePackages = "org.springframework.ai.mcp.client")
public class Application {
    @Bean
    public CommandLineRunner predefinedQuestions(ChatClient.Builder chatClientBuilder, ToolCallbackProvider tools,
            ConfigurableApplicationContext context) {
        return args -> {
            var chatClient = chatClientBuilder
                    .defaultTools(tools)
                    .build();

            System.out.println("\n>>> QUESTION: " + "Alibaba Xixi Park");
            System.out.println("\n>>> ASSISTANT: " + chatClient.prompt("Alibaba Xixi Park").call().content());

            System.out.println("\n>>> QUESTION: " + "Gold price trend");
            System.out.println("\n>>> ASSISTANT: " + chatClient.prompt("Gold price trend").call().content());

        };
    }
}

After running the example, you should see a successful connection to the MCP Server and the execution of a list of tools. The Higress example has two built-in tools.

{
    "jsonrpc": "2.0",
    "id": "32124bd9-1",
    "result": {
        "nextCursor": "",
        "tools": [{
            "description": "Performs a web search using the Quark Search API, ideal for general queries, news, articles, and online content.\nUse this for broad information gathering, recent events, or when you need diverse web sources.\nBecause Quark search performs poorly for English searches, please use Chinese for the query parameters.",
            "inputSchema": {
                "additionalProperties": false,
                "properties": {
                    "contentMode": {
                        "default": "summary",
                        "description": "Return the level of content detail, choose to use summary or full text",
                        "enum": ["full", "summary"],
                        "type": "string"
                    },
                    "number": {
                        "default": 5,
                        "description": "Number of results",
                        "type": "integer"
                    },
                    "query": {
                        "description": "Search query, please use Chinese",
                        "examples": ["Gold price trend"],
                        "type": "string"
                    }
                },
                "required": ["query"],
                "type": "object"
            },
            "name": "web_search"
        }]
    }
}

The example initiates a chat session, and the model will guide the agent to call the web_search tool and return results.

Current Implementation Limitations and Improvement Plans

Currently, the implementation is based on the official Java SDK, adding the Streamable HTTP mode for the McpClientTransport. However, this modification does not fully support Streamable HTTP because its workflow is inconsistent with many aspects of HTTP+SSE, and many processes in the original Java SDK are strongly tailored to the HTTP+SSE design, resulting in the current SDK implementation requiring some structural changes.

For example, here are a few points where the current implementation is limited:

  1. Initialization is not mandatory in Streamable HTTP; it is only required when implementing state management. Moreover, once initialized, all subsequent requests must include the mcp-session-id. The current Java SDK design enforces an initialization state check and does not manage the mcp-session-id after initialization.
  2. The /mcp GET request is constrained in the protocol to be used when the client actively initiates an SSE request. However, the current implementation initiates a GET request and establishes an SSE session each time it connects, with subsequent POST requests relying on the response returned here, as seen with the pendingResponses property operations.

Currently, several core contributors from the Spring AI Alibaba community are actively involved in the development of the official MCP SDK, including bug fixes and the implementation of the Streamable HTTP solution. We have already submitted the relevant pull requests (PRs) to the official community. Below are the PRs for the community's Streamable solution:

  1. https://github.com/modelcontextprotocol/java-sdk/pull/144
  2. https://github.com/modelcontextprotocol/java-sdk/pull/163

Official Implementations and Reference Materials

1. Spring AI Alibaba official website: https://java2ai.com/

2. Spring AI Alibaba open-source project source repository: https://github.com/alibaba/spring-ai-alibaba

3. mcp-streamable-http: https://www.claudemcp.com/blog/mcp-streamable-http

4. MCP java-sdk: https://github.com/modelcontextprotocol/java-sdk

5. Streamable HTTP: https://modelcontextprotocol.io/specification/2025-03-26/basic/transports#streamable-http

0 1 0
Share on

You may also like

Comments