Configure instance concurrency to improve resource utilization - Function Compute

Instance concurrency specifies the maximum number of concurrent requests that each function instance can process at a time. You can use the instance concurrency feature of Function Compute to manage resource usage during traffic peaks and mitigate the impact of cold starts. This improves performance and reduces costs.

Note

Instance concurrency is typically configured together with instance specifications to optimize function performance and reduce costs.

How it works

By default, each Function Compute instance processes one request at a time. When you set instance concurrency to a value greater than 1, a single instance accepts multiple concurrent requests up to the configured limit. Function Compute creates a new instance only when the number of requests that are concurrently processed by existing instances exceeds the specified value.

Example: Three requests arrive at the same time, each taking 10 seconds to process. If instance concurrency is set to 10 and this concurrency does not exceed the queries per second (QPS) limit for an instance, each instance can process 10 requests at a time.

Concurrency = 1: Function Compute creates three instances. Total billed execution duration: 30 seconds.
Concurrency = 10: Function Compute creates one instance to handle all three requests. Total billed execution duration: 10 seconds.

When to use instance concurrency

Instance concurrency works best when your function spends most of its time waiting for downstream responses (network calls, database queries, file I/O). In most cases, waiting for responses does not consume resources. Concurrent processing of multiple requests by one instance saves costs and improves the responsiveness and throughput of applications.

Benefits

Lower costs. Multiple requests share a single instance, reducing total billed execution duration. This is especially effective for I/O-heavy functions.
Shared state across requests. Requests processed by the same instance share resources such as database connection pools, reducing the total number of connections.
Fewer cold starts. Fewer instances means fewer cold start events.
Fewer VPC IP addresses consumed. For functions running in a virtual private cloud (VPC), fewer instances means fewer IP addresses used.

Important

Make sure the vSwitch associated with your VPC has at least two available IP addresses. Otherwise, services may become unavailable and requests may fail.

When to keep concurrency at 1

Keep instance concurrency at 1 in these scenarios:

CPU-intensive functions where concurrent requests compete for CPU and degrade each other's performance.
Memory-intensive functions where multiple concurrent requests may exceed instance memory limits and crash the process.
Functions with non-thread-safe global state that cannot be easily synchronized.
Functions that require per-request logging through the X-Fc-Log-Result response header (not supported when concurrency > 1).

Choosing a concurrency value

The right value depends on your workload type and instance specifications.

Workload type	Characteristics	Recommended concurrency
I/O-bound	Spends most time waiting for network, database, or file I/O	Higher values
CPU-bound	Spends most time on computation	Lower values (consider keeping at 1)
Mixed	Combination of I/O and CPU work	Start low and adjust based on monitoring

You can estimate an initial concurrency value with the following heuristic:

Estimated concurrency ≈ Expected QPS per instance x Average response time (seconds)

For example, if a single instance handles 20 requests per second and each request takes 0.5 seconds: estimated concurrency ≈ 20 x 0.5 = 10.

Monitor actual utilization and adjust based on observed performance.

Limits

Item	Constraint
Supported runtime environments	Custom runtimes, custom container images
Instance concurrency range	1 to 200
`X-Fc-Log-Result` response header	Not supported when instance concurrency > 1

Configure instance concurrency

Specify instance concurrency when creating a function, or modify it on an existing function.

To change the concurrency of an existing function:

Open the function details page.
Click the Configurations tab.
In the Instance Configuration section, click Modify.
Set the instance concurrency value (1--200).
Save the configuration.

Billing impact

The billing duration of a function instance depends on the concurrency setting. For details, see Billing overview.

Concurrency = 1

Each instance processes one request at a time. The billing duration spans from when the first request starts to when the last request finishes. Because requests run sequentially, the total billed duration equals the sum of all request durations.

Concurrency > 1

An instance processes multiple requests concurrently. The billing duration spans from when the first request starts to when the last request finishes. Because requests overlap, the total billed duration is typically much less than the sum of individual request durations.

Concurrency and scaling

Instance concurrency directly affects how many total requests your functions can handle in a region.

Formula:

Maximum concurrent requests = Maximum instances x Instance concurrency

By default, Function Compute supports up to 100 on-demand instances per region. If instance concurrency is set to 10, up to 1,000 requests can be processed concurrently across the region.

When concurrent requests exceed this capacity, Function Compute returns the ResourceExhausted throttling error.

Note

To increase the maximum number of instances in a region, contact us.

Best practices

Logging

When instance concurrency is set to 1, function logs are returned in the X-Fc-Log-Result response header if the request includes X-Fc-Log-Type: Tail. When concurrency is greater than 1, this header is not available because logs from concurrent requests cannot be isolated.

Node.js-specific issue: When concurrency > 1, console.info() cannot associate log entries with the correct request ID. All entries may show the same request ID. Example of interleaved logs:

2019-11-06T14:23:37.587Z req1 [info] logger begin
2019-11-06T14:23:37.587Z req1 [info] ctxlogger begin
2019-11-06T14:23:37.587Z req2 [info] logger begin
2019-11-06T14:23:37.587Z req2 [info] ctxlogger begin
2019-11-06T14:23:40.587Z req1 [info] ctxlogger end
2019-11-06T14:23:40.587Z req2 [info] ctxlogger end
2019-11-06T14:23:37.587Z req2 [info] logger end
2019-11-06T14:23:37.587Z req2 [info] logger end

Use context.logger.info() instead of console.info() to preserve correct request ID association:

exports.handler = (event, context, callback) => {
    console.info('logger begin');
    context.logger.info('ctxlogger begin');

    setTimeout(function() {
        context.logger.info('ctxlogger end');
        console.info('logger end');
        callback(null, 'hello world');
    }, 3000);
};

Error handling

When an instance processes multiple requests concurrently, an unhandled exception in one request can crash the process and affect all other in-flight requests. Wrap request processing logic in try-catch blocks to isolate failures.

Node.js example:

exports.handler = (event, context, callback) => {
    try {
        JSON.parse(event);
    } catch (ex) {
        callback(ex);
    }

    callback(null, 'hello world');
};

Shared variables

When multiple requests run concurrently on the same instance, concurrent modifications to shared variables can cause race conditions. Use synchronization mechanisms to protect shared state.

Java example:

public class App implements StreamRequestHandler
{
    private static int counter = 0;

    @Override
    public void handleRequest(InputStream inputStream, OutputStream outputStream, Context context) throws IOException {
        synchronized (this) {
            counter = counter + 1;
        }
        outputStream.write(new String("hello world").getBytes());
    }
}