Configure instance concurrency to improve resource utilization - Function Compute

Instance concurrency sets the maximum number of concurrent requests a single function instance can handle. By allowing one instance to serve multiple requests at once, Function Compute reduces cold starts, cuts costs, and maintains throughput during traffic spikes.

Note

Configure instance concurrency together with instance specifications to get the best performance-to-cost ratio.

How it works

By default, each Function Compute instance processes one request at a time. When you raise instance concurrency above 1, a single instance accepts up to that many requests simultaneously. Function Compute only creates a new instance when all existing instances are fully occupied.

Example: Three requests arrive at the same time, each taking 10 seconds.

Concurrency = 1: Function Compute creates three instances. Total billed duration: 30 seconds.
Concurrency = 10: One instance handles all three requests. Total billed duration: 10 seconds.

When to use instance concurrency

Instance concurrency works best for I/O-bound functions — those that spend most of their time waiting on network calls, database queries, or file I/O. During that wait time, the instance is idle, so letting it handle additional requests costs nothing extra and significantly improves throughput.

Benefits

Lower costs. Multiple requests share one instance, shrinking total billed duration. The savings are largest for I/O-heavy functions.
Shared resource pools. Requests on the same instance share resources such as database connection pools, reducing the total number of connections your backend must maintain.
Fewer cold starts. Fewer instances spinning up means fewer cold start delays for your users.
Lower VPC IP consumption. For functions in a virtual private cloud (VPC), fewer instances means fewer IP addresses consumed.

Important

Make sure the vSwitch associated with your VPC has at least two available IP addresses. Otherwise, services may become unavailable and requests may fail.

When to keep concurrency at 1

Keep instance concurrency at 1 in these scenarios:

CPU-intensive functions where concurrent requests compete for CPU cycles and slow each other down.
Memory-intensive functions where multiple concurrent requests risk exceeding instance memory limits and crashing the process.
Functions with non-thread-safe global state that you cannot easily protect with synchronization.
Functions that require per-request logging via the X-Fc-Log-Result response header, which is not available when concurrency > 1.

Choosing a concurrency value

The right value depends on your workload type and instance specifications.

Workload type	Characteristics	Recommended concurrency
I/O-bound	Spends most time waiting for network, database, or file I/O	Higher values
CPU-bound	Spends most time on computation	Lower values (consider keeping at 1)
Mixed	Combination of I/O and CPU work	Start low and adjust based on monitoring

Use the following formula to estimate a starting value:

Estimated concurrency ≈ Expected QPS per instance x Average response time (seconds)

For example, if a single instance handles 20 requests per second and each request takes 0.5 seconds: estimated concurrency ≈ 20 × 0.5 = 10.

Monitor actual utilization after deployment and adjust as needed.

Limits

Item	Constraint
Supported runtime environments	Custom runtimes, custom container images
Instance concurrency range	1 to 200
`X-Fc-Log-Result` response header	Not supported when instance concurrency > 1

Configure instance concurrency

Set instance concurrency when you create a function, or update it on an existing function at any time.

To update concurrency on an existing function:

Open the function details page.
Click the Configurations tab.
In the Instance Configuration section, click Modify.
Set the instance concurrency value (1–200).
Save the configuration.

Billing impact

The billing duration of a function instance depends on the concurrency setting. For details, see Billing overview.

Concurrency = 1

The instance handles one request at a time. Billing starts when the first request arrives and stops when the last request finishes. Because requests run sequentially, the total billed duration equals the sum of all individual request durations.

Concurrency > 1

The instance processes multiple requests at the same time. Billing spans from the first request to the last, but because requests overlap, the total billed duration is typically much shorter than the sum of individual durations.

Concurrency and scaling

Instance concurrency multiplies your region-wide request capacity:

Maximum concurrent requests = Maximum instances x Instance concurrency

By default, Function Compute supports up to 100 on-demand instances per region. With instance concurrency set to 10, the region can handle 1,000 concurrent requests.

When demand exceeds this capacity, Function Compute returns the ResourceExhausted throttling error.

Note

To increase the maximum number of instances in a region, contact us.

Best practices

Logging

When concurrency is 1, function logs are returned in the X-Fc-Log-Result response header for any request that includes X-Fc-Log-Type: Tail. When concurrency exceeds 1, this header is unavailable because logs from concurrent requests cannot be isolated per-request.

Node.js-specific issue: With concurrency > 1, console.info() cannot tie log entries to the correct request ID — all entries may show the same ID. Example of interleaved output:

2019-11-06T14:23:37.587Z req1 [info] logger begin
2019-11-06T14:23:37.587Z req1 [info] ctxlogger begin
2019-11-06T14:23:37.587Z req2 [info] logger begin
2019-11-06T14:23:37.587Z req2 [info] ctxlogger begin
2019-11-06T14:23:40.587Z req1 [info] ctxlogger end
2019-11-06T14:23:40.587Z req2 [info] ctxlogger end
2019-11-06T14:23:37.587Z req2 [info] logger end
2019-11-06T14:23:37.587Z req2 [info] logger end

Use context.logger.info() instead of console.info() to keep log entries associated with the correct request ID:

exports.handler = (event, context, callback) => {
    console.info('logger begin');
    context.logger.info('ctxlogger begin');

    setTimeout(function() {
        context.logger.info('ctxlogger end');
        console.info('logger end');
        callback(null, 'hello world');
    }, 3000);
};

Error handling

When an instance handles multiple concurrent requests, an unhandled exception in one request can crash the entire process and fail all in-flight requests. Wrap your handler logic in try-catch blocks to contain failures.

Node.js example:

exports.handler = (event, context, callback) => {
    try {
        JSON.parse(event);
    } catch (ex) {
        callback(ex);
    }

    callback(null, 'hello world');
};

Shared variables

Concurrent requests on the same instance can modify shared variables simultaneously, causing race conditions. Use synchronization to protect any shared state.

Java example:

public class App implements StreamRequestHandler
{
    private static int counter = 0;

    @Override
    public void handleRequest(InputStream inputStream, OutputStream outputStream, Context context) throws IOException {
        synchronized (this) {
            counter = counter + 1;
        }
        outputStream.write(new String("hello world").getBytes());
    }
}