All Products
Search
Document Center

Intelligent Media Services:Access LLMs

Last Updated:Dec 09, 2025

Real-time workflows allow you to access large language models (LLMs) based on the OpenAI specifications.

Access self-developed LLMs

You can integrate LLMs that comply with OpenAI specifications into your workflow. Currently, only streaming requests are supported.

Follow the steps below to access a self-developed LLM:

  1. In the LLM node, select Access Self-developed Model (Based on OpenAI Specifications) and configure the following parameters:

Parameter

Type

Required

Description

Example

ModelId

String

Yes

The model name, which corresponds to the model field of OpenAI specifications.

abc

API-KEY

String

Yes

The API authentication information, which corresponds to the api_key field of OpenAI specifications.

AUJH-pfnTNMPBm6iWXcJAcWsrscb5KYaLitQhHBLKrI

HTTPS URL of Destination Model

String

Yes

The URL of the model service, which corresponds to the base_url field of OpenAI specifications.

Note

Alibaba Cloud will automatically add /chat/completions to the base_url.

http://www.abc.com

  1. When the real-time workflow is running, data will be assembled in OpenAI standard format and sent via POST requests to your configured HTTPS base URL of the self-developed model. The input parameters are as follows:

Parameter

Type

Required

Description

Example

messages

Array

Yes

The context of historical conversations. A maximum of 20 records can be retained. Earlier questions or answers are positioned at the front of the array.

Note

The system will automatically combine the user's current speech with the historical records as input to the LLM.

[{'role': 'user', 'content': 'How is the weather today?'},{'role': 'assistant', 'content': 'The weather is sunny today.'},{'role': 'user', 'content': 'How will the weather be tomorrow?'}]

model

String

Yes

The model name.

abc

stream

Boolean

Yes

Specifies whether to use streaming. Currently, only streaming requests are supported.

True

extendData

Object

Yes

The supplementary information.

{'instanceId':'68e00b6640e*****3e943332fee7','channelId':'123','sentenceId':'3',userData':'{"aaaa":"bbbb"}'}

  • instanceId

String

Yes

The instance ID.

68e00b6640e*****3e943332fee7

  • channelId

String

Yes

The channel ID.

123

  • sentenceId

Int

Yes

The Q&A ID.

Note

For the same user inquiry, the AI responses will use the same ID.

3

  • userData

String

No

The value of the UserData field that is passed when the instance is started.

{"aaaa":"bbbb"}

Custom LLM server (OpenAI specifications)

Python

import json
import time
from loguru import logger
from flask import Flask, request, jsonify, Response

app = Flask(__name__)

API_KEY = "YOURAPIKEY"

@app.route('/v1/chat/completions', methods=['POST'])
def chat_completion():
    # Check API key
    auth_header = request.headers.get('Authorization')
    if not auth_header or auth_header.split()[1] != API_KEY:
        return jsonify({"error": "Unauthorized"}), 401

    data = request.json
    logger.info(f"data is {data}")
    task_id = request.args.get('task_id')
    room_id = request.args.get('room_id')
    for header, value in request.headers.items():
        logger.info(f"{header}: {value}")

    # Print query parameters
    logger.info("\nQuery Parameters:")
    for key, value in request.args.items():
        logger.info(f"{key}: {value}")

    logger.info(f"task_id: {task_id}, room_id: {room_id}")
    stream = data.get('stream', False)

    if stream:
        return Response(generate_stream_response(data), content_type='text/event-stream')
    else:
        return jsonify(generate_response(data))

def generate_response(data):
    response = "This is a simulated AI assistant response. In actual applications, a real AI model should be called here."

    return {
        "id": "chatcmpl-123",
        "object": "chat.completion",
        "created": int(time.time()),
        "model": data['model'],
        "choices": [{
            "index": 0,
            "message": {
                "role": "assistant",
                "content": response
            },
            "finish_reason": "stop"
        }],
        "usage": {
            "prompt_tokens": sum(len(m['content']) for m in data['messages']),
            "completion_tokens": len(response),
            "total_tokens": sum(len(m['content']) for m in data['messages']) + len(response)
        }
    }

def generate_stream_response(data):
    response = "This is a simulated AI assistant streaming response. In actual applications, a real AI model should be called here."
    words = list(response)
    for i, word in enumerate(words):
        chunk = {
            "id": "chatcmpl-123",
            "object": "chat.completion.chunk",
            "created": int(time.time()),
            "model": data['model'],
            "choices": [{
                "index": 0,
                "delta": {
                    "content": word, 
                    "tool_calls": [  
                        {
                            "id": "call_abc123",  
                            "type": "function",
                            "function": {
                                "name": "hangup", 
                                "arguments": "\{\}"  
                            }
                        }
                    ]
                },
                "finish_reason": None if i < len(words) - 1 else "stop"
            }]
        }
        logger.info(chunk)
        yield f"data: {json.dumps(chunk)}\n\n"
        time.sleep(0.1)  # Simulate processing time

    yield "data: [DONE]\n\n"

if __name__ == '__main__':
    logger.info(f"Server is running with API_KEY: {API_KEY}")
    app.run(port=8083, debug=True)