Tune OpenCLIP & BEVFormer Performance with Keentune - Alibaba Cloud Linux

This topic describes how to use the Alibaba Cloud Linux 3 AI Extension Edition with Alibaba Cloud AI container images to improve performance.

Enable the keentune optimization tool.
The keentune optimization tool is pre-installed on the Alibaba Cloud Linux 3 AI Extension Edition image. This tool provides optimizations for various scenarios. Follow these steps to enable optimization for AI scenarios.
```
systemctl stop tuned
systemctl disable tuned
systemctl start keentune-target
systemctl enable keentune-target
systemctl enable keentuned
systemctl start keentuned
keentune profile set ai_train.profile
```

The keentune optimizations require an OS restart to take effect. To disable the optimizations, run the keentune profile rollback command. This change also requires an OS restart.

Install Docker.
To install Docker and its related components, see Train models using PyTorch GPU images.

Obtain the test image.

docker pull ac2-registry.cn-hangzhou.cr.aliyuncs.com/ac2/openclip-bevformer:v0.1-torch2.6-cuda12.6-py3.10-ubuntu22.04

Download the datasets.

The container image does not include datasets. After you download the image, run the following commands to download the required model datasets:

OpenCLIP training and inference datasets

# Download the training dataset
mkdir -p /workspace/dataset && cd /workspace/dataset
wget -O mscoco.parquet https://hf-mirror.com/datasets/ChristophSchuhmann/MS_COCO_2017_URL_TEXT/resolve/main/mscoco.parquet?download=true
pip3 install img2dataset webdataset==0.2.86 numpy==1.23.5 --ignore-installed && NO_ALBUMENTATIONS_UPDATE=1 img2dataset --url_list mscoco.parquet --input_format "parquet" --url_col "URL" --caption_col "TEXT" --output_format webdataset --output_folder COCO_2017_Captions-webdataset-592k-256x256-296shards --processes_count 16 --thread_count 64 --image_size 256 --number_sample_per_shard 2000
# Download the inference dataset
mkdir -p /workspace/dataset && cd /workspace/dataset
wget https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_val.tar --no-check-certificate
mkdir -p ILSVRC2012_img_val
tar xvf ILSVRC2012_img_val.tar -C ILSVRC2012_img_val
cd ILSVRC2012_img_val/ && wget https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh && bash valprep.sh

For more information, see GitHub - mlfoundations/open_clip: An open source implementation of CLIP.

BEVFormer training dataset

mkdir -p /workspace/BEVFormer/data && cd /workspace/BEVFormer/data
wget https://d36yt3mvayqw5m.cloudfront.net/public/v1.0/v1.0-mini.tgz
mkdir -p nuscenes && tar -xzf v1.0-mini.tgz -C ./nuscenes/
wget https://d36yt3mvayqw5m.cloudfront.net/public/v1.0/can_bus.zip
unzip -q can_bus.zip
cd /workspace/BEVFormer && python tools/create_data.py nuscenes --root-path ./data/nuscenes --out-dir ./data/nuscenes --extra-tag nuscenes --version v1.0-mini --canbus ./data

For more information, see BEVFormer/docs/prepare_dataset.md at master · fundamentalvision/BEVFormer.

Run tests in the container image.

OpenCLIP

Training command:

cd /workspace/open_clip/src && torchrun --nproc_per_node 8 -m open_clip_train.main --model RN50 --train-data /workspace/dataset/COCO_2017_Captions-webdataset-592k-256x256-296shards/\{00000..00295\}.tar --train-num-samples 591753 --dataset-type webdataset --batch-size 1152 --precision amp --workers 8 --epochs 4 --log-every-n-steps 1 --torchcompile

Inference command:

cd /workspace/open_clip/src && torchrun --nproc_per_node 1 -m open_clip_train.main --imagenet-val /workspace/dataset/ILSVRC2012_img_val --model RN50 --batch-size 1152 --workers 8 --pretrained openai

BEVFormer

Training command:

cd /workspace/BEVFormer && TORCH_ALLOW_TF32_CUBLAS_OVERRIDE=1 tools/dist_train.sh projects/configs/bevformer/bevformer_base.py 8