Common issues and solutions for OSS Connector for AI/ML inference acceleration.
When troubleshooting, check the connector runtime log. The default path is /var/log/oss-connector/connector.log.
Initialization errors
The program exits immediately and prints failed to launch oss connector: xxx to stderr. Common initialization errors:
|
Error message |
Description |
|
MODEL_DIR not set |
|
|
OSS_PATH not set |
|
|
OSS_ENDPOINT not set |
|
|
OSS_AUTHORIZATION_FILE_PATH not accessible |
The access credential file path specified by |
|
OSS_ACCESS_KEY_SECRET not set |
|
|
failed to infer region from endpoint |
|
|
failed to parse config file |
The JSON configuration file cannot be parsed. Check the file format. |
|
failed to initialize log path |
The log file cannot be initialized. |
|
invalid uri |
|
|
failed to list objects |
ListObjects failed. Verify your OSS credentials and oss:ListObjects permission on the target bucket path. |
|
failed to download metadata |
Failed to download data to the |
|
failed to acquire uds lock |
Failed to acquire the UDS file lock. If multiple connector main processes must run, set a unique |
|
failed to initialize connector |
Other initialization error. |
Check the connector runtime log at the default path /var/log/oss-connector/connector.log for details.
Model file loading errors
huggingface_hub reports a Repo id error. Example:
raise HFValidationError(
huggingface_hub.errors.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/var/model/qwen/Qwen3-32B/'. Use `repo_type` argument if needed.
safe_open reports an incomplete metadata error when loading the model file. Example:
[rank0]: with safe_open(st_file, framework="pt") as f:
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: safetensors_rust.SafetensorError: Error while deserializing header: MetadataIncompleteBuffer
Possible causes:
-
Connector runtime error: Check the connector runtime log at the default path
/var/log/oss-connector/connector.logfor the specific error. -
Connector feature not enabled: Loading models from
MODEL_DIRrequires theLD_PRELOADandENABLE_CONNECTORenvironment variables. -
Mount path conflict: Switching
OSS_PATHvalues under the sameMODEL_DIRcan cause cache conflicts. ClearMODEL_DIRbefore switching models.
Running as a non-root user
Running the connector as a non-root user may produce these errors:
ERROR|log_output_file:Fail to open log file /var/log/oss-connector/default.log
ERROR|log_output_file:Fail to open log file /var/log/oss-connector/connector.log.1845009
ERROR|new_log_output_file:Failed to open log file /var/log/oss-connector/audit.log
ERROR|bind:failed to bind to '/run/modelconnector.sock' errno=13(Permission denied)
ERROR|operator():failed to bind to path /run/modelconnector.sock errno=13(Permission denied)
ERROR|init:failed to init uds server errno=13(Permission denied)
ERROR|init_library:failed to create connector
A non-root user lacks write permissions on the default log, config, and UDS paths. Use CONNECTOR_CONFIG_PATH and CONNECTOR_UDS_PATH to point to a writable directory, and configure a writable log path in the config file.
Model file size exceeds node memory
The connector prefetches the entire model into memory by default. If the model exceeds available memory, loading may fail with a Segment Fault or OOM Kill error.
Solutions:
-
Limit prefetch cache size with
CONNECTOR_MAX_CACHE_ADVISE_GBor thepretech.maxCacheAdviseGBconfig parameter. This may reduce loading performance. -
Use a node with more memory.