Real-time workflows allow you to access large language models (LLMs) based on the OpenAI specifications.
Access self-developed LLMs
You can integrate LLMs that comply with OpenAI specifications into your workflow. Currently, only streaming requests are supported.
Follow the steps below to access a self-developed LLM:
In the LLM node, select Access Self-developed Model (Based on OpenAI Specifications) and configure the following parameters:
Parameter | Type | Required | Description | Example |
ModelId | String | Yes | The model name, which corresponds to the model field of OpenAI specifications. | abc |
API-KEY | String | Yes | The API authentication information, which corresponds to the api_key field of OpenAI specifications. | AUJH-pfnTNMPBm6iWXcJAcWsrscb5KYaLitQhHBLKrI |
HTTPS URL of Destination Model | String | Yes | The URL of the model service, which corresponds to the base_url field of OpenAI specifications. Note Alibaba Cloud will automatically add | http://www.abc.com |
When the real-time workflow is running, data will be assembled in OpenAI standard format and sent via POST requests to your configured HTTPS base URL of the self-developed model. The input parameters are as follows:
Parameter | Type | Required | Description | Example |
messages | Array | Yes | The context of historical conversations. A maximum of 20 records can be retained. Earlier questions or answers are positioned at the front of the array. Note The system will automatically combine the user's current speech with the historical records as input to the LLM. | [{'role': 'user', 'content': 'How is the weather today?'},{'role': 'assistant', 'content': 'The weather is sunny today.'},{'role': 'user', 'content': 'How will the weather be tomorrow?'}] |
model | String | Yes | The model name. | abc |
stream | Boolean | Yes | Specifies whether to use streaming. Currently, only streaming requests are supported. | True |
extendData | Object | Yes | The supplementary information. | {'instanceId':'68e00b6640e*****3e943332fee7','channelId':'123','sentenceId':'3',userData':'{"aaaa":"bbbb"}'} |
| String | Yes | The instance ID. | 68e00b6640e*****3e943332fee7 |
| String | Yes | The channel ID. | 123 |
| Int | Yes | The Q&A ID. Note For the same user inquiry, the AI responses will use the same ID. | 3 |
| String | No | The value of the UserData field that is passed when the instance is started. | {"aaaa":"bbbb"} |
Custom LLM server (OpenAI specifications)
Python
import json
import time
from loguru import logger
from flask import Flask, request, jsonify, Response
app = Flask(__name__)
API_KEY = "YOURAPIKEY"
@app.route('/v1/chat/completions', methods=['POST'])
def chat_completion():
# Check API key
auth_header = request.headers.get('Authorization')
if not auth_header or auth_header.split()[1] != API_KEY:
return jsonify({"error": "Unauthorized"}), 401
data = request.json
logger.info(f"data is {data}")
task_id = request.args.get('task_id')
room_id = request.args.get('room_id')
for header, value in request.headers.items():
logger.info(f"{header}: {value}")
# Print query parameters
logger.info("\nQuery Parameters:")
for key, value in request.args.items():
logger.info(f"{key}: {value}")
logger.info(f"task_id: {task_id}, room_id: {room_id}")
stream = data.get('stream', False)
if stream:
return Response(generate_stream_response(data), content_type='text/event-stream')
else:
return jsonify(generate_response(data))
def generate_response(data):
response = "This is a simulated AI assistant response. In actual applications, a real AI model should be called here."
return {
"id": "chatcmpl-123",
"object": "chat.completion",
"created": int(time.time()),
"model": data['model'],
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": response
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": sum(len(m['content']) for m in data['messages']),
"completion_tokens": len(response),
"total_tokens": sum(len(m['content']) for m in data['messages']) + len(response)
}
}
def generate_stream_response(data):
response = "This is a simulated AI assistant streaming response. In actual applications, a real AI model should be called here."
words = list(response)
for i, word in enumerate(words):
chunk = {
"id": "chatcmpl-123",
"object": "chat.completion.chunk",
"created": int(time.time()),
"model": data['model'],
"choices": [{
"index": 0,
"delta": {
"content": word,
"tool_calls": [
{
"id": "call_abc123",
"type": "function",
"function": {
"name": "hangup",
"arguments": "\{\}"
}
}
]
},
"finish_reason": None if i < len(words) - 1 else "stop"
}]
}
logger.info(chunk)
yield f"data: {json.dumps(chunk)}\n\n"
time.sleep(0.1) # Simulate processing time
yield "data: [DONE]\n\n"
if __name__ == '__main__':
logger.info(f"Server is running with API_KEY: {API_KEY}")
app.run(port=8083, debug=True)