Deploy & Optimize Models Fast with Blade EAS Plugin - Platform for AI

Blade EAS Plugin automatically optimizes models during EAS service deployment through configuration file settings.

Prerequisites and limitations

TensorFlow and PyTorch processors in EAS integrate the Blade runtime SDK. Configure the EAS service configuration file to optimize models deployed with these processors.

Note

Blade EAS Plugin optimizes models before EAS deployment. Optimization takes 3 to 10 minutes depending on model complexity. The plugin runs once during initial EAS service deployment. Subsequent scale-out and scale-in operations use the optimized model.

Enable Blade EAS Plugin only when creating services using the eascmd client. For eascmd client configuration, see Download and authenticate the client and Command reference.

Add the plugins field to the EAS service configuration file to enable Blade EAS Plugin. For details, see plugins field. The following examples demonstrate Blade EAS Plugin configuration:

Without Blade EAS Plugin
Basic configuration
Advanced configuration

plugins field

Add the plugins field to the EAS service configuration file to enable Blade EAS Plugin. This field contains a list of one or more dictionary elements. For field details, see Create a service.

Each dictionary element contains the following keys:

Table 1. Fields in plugins

Key	Required	Description
command	Yes	Optimization command to run. For values, see Mappings between processors and plugin runtime images.
image	Yes	Registry address of the Blade EAS Plugin runtime image. Blade EAS Plugin supports TensorFlow and PyTorch frameworks on CPU and GPU (CUDA 10.0) devices. See Mappings between processors and plugin runtime images for processor names and plugin registry addresses.
resource	No	Resource group for executing optimization. This differs from the top-level resource field in the service description file, which specifies the resource group for the EAS service. Required for GPU optimization. In China (Hangzhou) and China (Shanghai) regions, use the T4_8CORE resource group. In China (Shanghai), V100_8CORE or P4_8CORE resource groups are also supported. Note For GPUs, the resource group running Blade EAS Plugin must use the same card type as the resource group for the EAS service.
gpu	No	Number of GPUs for optimization. Typically 1.
config	No	Advanced optimization configuration items. Currently supports only the model_info subkey for configuring one model. The key is the model file name. The value supports multiple optimization items that align with the `blade.optimize` interface parameters in the PAI-Blade Wheel package. For supported optimization items, see List of optimization items.

Table 2. Mappings between processors and plugin runtime images

Device type	Key	Value
CPU	image (Blade EAS Plugin Registry Address)	`registry.cn-shanghai.aliyuncs.com/eas/pai-blade-deploy:cpu_latest`
	command (Plugin optimization command)	`blade --mode eas_plugin --optimize_for cpu`
	processor	For TensorFlow processor: tensorflow_cpu_1.15. For PyTorch processor: pytorch_cpu_1.6.
GPU	image (Blade EAS Plugin Registry Address)	CUDA 10.0: `registry.cn-shanghai.aliyuncs.com/eas/pai-blade-deploy:gpu_latest` CUDA 11.0: `registry.cn-shanghai.aliyuncs.com/eas/pai-blade-deploy:cu110_latest`
	command (Plugin optimization command)	`blade --mode eas_plugin --optimize_for gpu`
	processor	For TensorFlow 1.15 (CUDA 10.0): tensorflow_gpu_1.15. For Tensorflow 2.4 (CUDA 11.0): tensorflow_gpu_2.4. For PyTorch 1.6 (CUDA 10.0): pytorch_gpu_1.6. For PyTorch 1.7 (CUDA 11.0): pytorch_gpu_1.7.

Table 3. List of optimization items

Optimization item	Description
optimization_level	Optimization level. Two levels available: o1: Default. Lossless precision optimization. Attempts FP32 or FP16 optimization based on hardware. Keeps precision loss within a minimal threshold. o2: Enables INT8 quantization. Use for hardware that supports INT8.
test_data	Test data file. Optional. The test data file must be included in the path or compressed package specified by model_path. Provide test data for normal model inference, especially for PyTorch model optimization. For test data file generation, see Auxiliary data for optimization.
calibration_dataset	Quantization calibration data file. Optional. The quantization calibration data file must be included in the path or compressed package specified by model_path. If not specified, Blade performs online INT8 quantization. If specified, Blade performs offline INT8 quantization. Provide more than 100 pieces of calibration data. For calibration data file generation, see Auxiliary data for optimization.
inputs	List of strings. Optional. Specifies model input node names. If not specified, Blade uses nodes without upstream nodes as input nodes. Not required for PyTorch models.
outputs	List of strings. Optional. Specifies model output node names. If not specified, Blade uses nodes without downstream nodes as output nodes. Not required for PyTorch models.
input_shapes	Possible shapes of input tensors. Use to improve optimization in specific scenarios. The inner list element count must equal the number of input tensors. Each element is a string representing an input shape, such as `'1512'`. For multiple sets of possible shapes, add elements to the outer list. For example, a model with two inputs might have the following value examples: `[['1512', '3256']]` `[ ['1512', '3256'], ['5512', '9256'], ['10512', '27*256'] ]`
input_ranges	Value range for elements of each input tensor. The inner list element count must equal the number of input tensors. Each element is a string representing a value range. Value ranges use square brackets with real numbers or characters, such as `'[1,2]'`, `'[0.3,0.9]'`, or `'[a,f]'`. For multiple sets of possible value ranges, add elements to the outer list. For example, a model with two inputs might have the following value examples: `[['[0.1,0.4]', '[a,f]']]` `[ ['[0.1,0.4]', '[a,f]'], ['[1.1,1.4]', '[h,l]'], ['[2.1,2.4]', '[n,z]'] ]`
quantization	JSON dictionary. Currently supports only the weight_adjustment key. This key indicates whether to adjust model parameters to reduce quantization precision loss. Supported values: "true": Enables this option. "false": Disables this option.

Auxiliary data for optimization

At O1 optimization level, providing test_data makes optimization more targeted. At O2 optimization level, providing calibration_dataset guides Blade to perform offline INT8 optimization. Both parameters must conform to Blade auxiliary data format. Auxiliary data formats for TensorFlow and PyTorch:

TensorFlow auxiliary data is in list of feed dict format. The feed dict key is a string, and the value is a numpy ndarray. Save the auxiliary data file as an .npy file.
PyTorch auxiliary data is in list of tensor tuple format. Save as a .pth file.

The following sample code generates an auxiliary data file for TensorFlow:

import numpy as np

calib_data = list()
for i in range(10):
    feed_dict = {
        'image_placeholder:0': np.ones((8, 3, 224, 224), dtype=np.float32),
        'threshold_placeholder:0': np.float32(0.5),
    }
    calib_data.append(feed_dict)
np.save("calib_data.npy", calib_data)

The following sample code generates an auxiliary data file for PyTorch:

import numpy as np
import torch

calib_data = list()
for i in range(10):
    image = torch.ones(8, 3, 224, 224)
    threshold = torch.tensor(0.5)
    feed_tuple = (image, threshold)
    calib_data.append(feed_tuple)
torch.save(calib_data, 'calib_data.pth')

Without Blade EAS Plugin

The following is a simple EAS service configuration file without Blade EAS Plugin:

{
  "name": "blade_eas_plugin_test",
  "model_path": "oss://<yourBucket>/<pathToYourModel>/",
  "processor": "tensorflow_cpu_1.15",
  "metadata": {
    "instance": 1,
    "memory": 4000
  },
  "resource": "<yourEASResource>"
}

For field details in the EAS service configuration file, see Create a service.

Save the service configuration file as service.json. Run the following command to create a TensorFlow 1.15 service deployed on CPU:

eascmd create service.json

Sample output:

+-------------------+-------------------------------------------------------------------------------------------------+
| Internet Endpoint | http://123456789012****.cn-shanghai.pai-eas.aliyuncs.com/api/predict/blade_eas_plugin_test       |
| Intranet Endpoint | http://123456789012****.vpc.cn-shanghai.pai-eas.aliyuncs.com/api/predict/blade_eas_plugin_test   |
|             Token | owic823JI3kRmMDZlOTQzMTA3ODhmOWIzMmVkZmZmZGQyNmQ1N2M5****                                        |
+-------------------+-------------------------------------------------------------------------------------------------+
[OK] Service is now creating
[OK] Schedule process to node cn-shanghai.i-uf6hv6kfua25q1k8****
[OK] Fetching processor from [http://pai-blade.oss-cn-zhangjiakou.aliyuncs.com/release/3.18.0/py3.6.8_cpu_tf1.15.0_torch1.6.0_abiprecxx11/TENSORFLOW_SDK_CPU.d12d3dc-91024d0-1.15.0-Linux.tar.gz]
[OK] Successfully downloaded all artifacts
[OK] Building image registry-vpc.cn-shanghai.aliyuncs.com/eas/blade_eas_plugin_test_cn-shanghai:v0.0.1-20211117170541
[OK] Pushing image registry-vpc.cn-shanghai.aliyuncs.com/eas/blade_eas_plugin_test_cn-shanghai:v0.0.1-20211117170541
[OK] Successfully pushed image registry-vpc.cn-shanghai.aliyuncs.com/eas/blade_eas_plugin_test_cn-shanghai:v0.0.1-20211117170541
[OK] Successfully created ingress
[OK] Successfully synchronized resources
[OK] Waiting [Total: 1, Pending: 1, Running: 0]
[OK] Running [Total: 1, Pending: 0, Running: 1]
[OK] Service is running

Basic configuration

To enable Blade EAS Plugin, add the plugins field to the EAS service configuration file. This field is a list. In the following example, the list contains one dictionary element for the Blade optimization plugin:

{
  "name": "blade_eas_plugin_test",
  "model_path": "oss://<yourBucket>/<pathToYourModel>/",
  "processor": "tensorflow_cpu_1.15",
  "metadata": {
    "instance": 1,
    "memory": 4000
  },
  "plugins": [
      {
          "command": "blade --mode eas_plugin --optimize_for cpu",
          "image": "registry.cn-shanghai.aliyuncs.com/eas/pai-blade-deploy:cpu_latest"
      }
  ],
  "resource": "<yourEASResource>"
}

In the example, fields other than plugins follow the PAI-EAS service configuration file format. For details, see Create a service. The dictionary element in the plugins list contains two keys:

command: Optimization command to run. --mode eas_plugin indicates EAS plugin optimization pattern. --optimize_for cpu indicates CPU inference optimization.
image: Registry address of the Blade EAS Plugin runtime image. All CPU optimizations use the runtime image registry.cn-shanghai.aliyuncs.com/eas/pai-blade-deploy:cpu_latest.

The example completes optimization configuration on CPU device without test data. Save the service configuration file as service1.json. Create the service using the create command in the eascmd client tool:

eascmd create service1.json

Sample output:

+-------------------+-------------------------------------------------------------------------------------------------+
| Internet Endpoint | http://123456789012****.cn-shanghai.pai-eas.aliyuncs.com/api/predict/blade_eas_plugin_test       |
| Intranet Endpoint | http://123456789012****.vpc.cn-shanghai.pai-eas.aliyuncs.com/api/predict/blade_eas_plugin_test   |
|             Token | owic823JI3kRmMDZlOTQzMTA3ODhmOWIzMmVkZmZmZGQyNmQ1N2M5****                                       |
+-------------------+-------------------------------------------------------------------------------------------------+
[OK] Service is now creating
[OK] Fetching processor from [http://pai-blade.oss-cn-zhangjiakou.aliyuncs.com/release/3.18.0/py3.6.8_cpu_tf1.15.0_torch1.6.0_abiprecxx11/TENSORFLOW_SDK_CPU.d12d3dc-91024d0-1.15.0-Linux.tar.gz]
[OK] Successfully downloaded all artifacts
[OK] Executing plugin eas-plugin-73d70d54: registry.cn-shanghai.aliyuncs.com/eas/pai-blade-deploy:cpu_latest
[OK] Building image registry-vpc.cn-shanghai.aliyuncs.com/eas/blade_eas_plugin_test_cn-shanghai:v0.0.1-20211117172259
[OK] Pushing image registry-vpc.cn-shanghai.aliyuncs.com/eas/blade_eas_plugin_test_cn-shanghai:v0.0.1-20211117172259
[OK] Successfully pushed image registry-vpc.cn-shanghai.aliyuncs.com/eas/blade_eas_plugin_test_cn-shanghai:v0.0.1-20211117172259
[OK] Successfully created ingress
[OK] Successfully patch resources
[OK] Waiting [Total: 1, Pending: 1, Running: 0]
[OK] Running [Total: 1, Pending: 0, Running: 1]
[OK] Service is running

Compared with configuration without Blade EAS Plugin, the logs contain an additional line indicating successful Blade optimization execution:

[OK] Executing plugin eas-plugin-73d70d54: registry.cn-shanghai.aliyuncs.com/eas/pai-blade-deploy:cpu_latest

Advanced configuration

Providing more model information improves optimization accuracy and acceleration ratio. The following example shows a GPU service description file with additional optimization parameters:

{
    "name": "blade_eas_plugin_test",
    "metadata": {
        "cpu": 4,
        "gpu": 1,
        "instance": 1,
        "memory": 4096,
        "cuda": "10.0"
    },
    "model_path": "oss://<yourBucket>/<pathToYourModel>/",
    "plugins": [
        {
            "command": "blade --mode eas_plugin --optimize_for gpu",
            "image": "registry.cn-shanghai.aliyuncs.com/eas/pai-blade-deploy:gpu_latest",
            "resource": "T4_8CORE",
            "gpu": 1,
            "config": {
                "model_info": {
                    "frozen.pb": {
                        "optimization_level": "o1",
                        "inputs": [
                            "input_ids_a_1"
                        ],
                        "outputs": [
                            "l2_normalize"
                        ],
                        "test_data": "test_len9240_bc1.npy"
                    }
                }
            }
        }
    ],
    "processor": "tensorflow_gpu_1.15",
    "resource": "<yourEASResource>"
}

In the example, fields other than plugins follow the EAS service configuration file format. For details, see Create a service and Fields in plugins. frozen.pb is the model file name. This indicates optimization of the TensorFlow model in the frozen.pb file.

Save the service configuration file as service2.json. Create the service using the create command in the eascmd client tool:

eascmd create service2.json

Sample output:

+-------------------+-------------------------------------------------------------------------------------------------+
| Internet Endpoint | http://123456789012****cn-shanghai.pai-eas.aliyuncs.com/api/predict/blade_eas_plugin_test      |
| Intranet Endpoint | http://123456789012****.vpc.cn-shanghai.pai-eas.aliyuncs.com/api/predict/blade_eas_plugin_test  |
|             Token | owic823JI3kRmMDZlOTQzMTA3ODhmOWIzMmVkZmZmZGQyNmQ1N2M5****                                        |
+-------------------+-------------------------------------------------------------------------------------------------+
[OK] Service is now creating
[OK] Schedule process to node cn-shanghai.i-uf642ocg20xinsme****
[OK] Downloading oss file: oss://blade-qa/test_assets/tf_security_textcnn/
[OK] Fetching processor from [http://pai-blade.oss-cn-zhangjiakou.aliyuncs.com/release/3.18.0/py3.6.8_cu100_tf1.15.0_torch1.6.0_abiprecxx11/TENSORFLOW_SDK_GPU.d12d3dc-91024d0-1.15.0-Linux.tar.gz]
[OK] Successfully downloaded all artifacts
[OK] Executing plugin eas-plugin-7126ee68: registry.cn-shanghai.aliyuncs.com/eas/pai-blade-deploy:gpu_latest
[OK] Building image registry-vpc.cn-shanghai.aliyuncs.com/eas/blade_eas_plugin_test_cn-shanghai:v0.0.1-20211117191732
[OK] Pushing image registry-vpc.cn-shanghai.aliyuncs.com/eas/blade_eas_plugin_test_cn-shanghai:v0.0.1-20211117191732
[OK] Successfully pushed image registry-vpc.cn-shanghai.aliyuncs.com/eas/blade_eas_plugin_test_cn-shanghai:v0.0.1-20211117191732
[OK] Successfully created ingress
[OK] Successfully synchronized resources
[OK] Waiting [Total: 1, Pending: 1, Running: 0]
[OK] Running [Total: 1, Pending: 0, Running: 1]
[OK] Service is running

The logs contain an additional entry compared with configuration without Blade EAS Plugin. This entry indicates successful Blade optimization execution:

[OK] Executing plugin eas-plugin-7126ee68: registry.cn-shanghai.aliyuncs.com/eas/pai-blade-deploy:gpu_latest