All Products
Search
Document Center

Microservices Engine:AI cache

Last Updated:Jan 02, 2025

This topic describes the ai-cache plug-in. This plug-in is used to cache large language model (LLM)-based results. The default configuration of the plug-in allows you to cache the results by using the protocol over which API requests are made. Both streaming and non-streaming response results can be cached.

Running attributes

Plug-in execution stage: authentication stage. Plug-in execution priority: 10.

Configuration description

Name

Data type

Required

Default value

Description

cacheKeyFrom.requestBody

string

No

"messages.@reverse.0.content"

The string that is extracted from the request body based on the GJSON PATH syntax.

cacheValueFrom.responseBody

string

No

"choices.0.message.content"

The string that is extracted from the response body based on the GJSON PATH syntax.

cacheStreamValueFrom.responseBody

string

No

"choices.0.delta.content"

The string that is extracted from the streaming response body based on the GJSON PATH syntax.

cacheKeyPrefix

string

No

"higress-ai-cache"

The prefix of the Redis cache key.

cacheTTL

integer

No

0

The expiration time of the cache. Unit: seconds. The default value is 0, which indicates that the cache never expires.

redis.serviceName

string

Yes

-

The Redis service name, which is a fully qualified domain name (FQDN) with a specific service type, such as my-redis.dns or redis.my-ns.svc.cluster.local.

redis.servicePort

integer

No

6379

The Redis service port.

redis.timeout

integer

No

1000

The timeout period of the Redis request. Unit: milliseconds.

redis.username

string

No

-

The username that is used to log on to the Redis instance.

redis.password

string

No

-

The password that is used to log on to the Redis instance.

returnResponseTemplate

string

No

{"id":"from-cache","choices":[%s],"model":"gpt-4o","object":"chat.completion","usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}

The template of the HTTP response. %s is used to mark the part that needs to be replaced by the cache value.

returnStreamResponseTemplate

string

No

data:{"id":"from-cache","choices":[{"index":0,"delta":{"role":"assistant","content":"%s"},"finish_reason":"stop"}],"model":"gpt-4o","object":"chat.completion","usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}\n\ndata:[DONE]\n\n

The template of the HTTP streaming response. %s is used to mark the part that needs to be replaced by the cache value.

Configuration example

redis:
  serviceName: my-redis.dns
  timeout: 2000

Advanced usage

  • The current default cache key is extracted based on the GJSON PATH expression. For example, the expression messages.@reverse.0.content specifies to obtain the content of the first element after the order of the elements in the messages array is reversed.

  • GJSON PATH supports the condition syntax. For example, if you want to use the content whose last role is user as the key, you can use the expression messages.@reverse.#(role=="user").content.

  • If you want to combine all the contents whose role is user into an array as the key, you can use the expression messages.@reverse.#(role=="user")#.content.

  • The pipe syntax is also supported. For example, if you want to use the content whose second role is user as the key, you can use the expression messages.@reverse.#(role=="user")#.content|1.

  • For more syntax details, see the official documentation. You can use GJSON Playground for syntax testing.