Blade EAS Plugin automatically optimizes models during EAS service deployment through configuration file settings.
Prerequisites and limitations
TensorFlow and PyTorch processors in EAS integrate the Blade runtime SDK. Configure the EAS service configuration file to optimize models deployed with these processors.
Blade EAS Plugin optimizes models before EAS deployment. Optimization takes 3 to 10 minutes depending on model complexity. The plugin runs once during initial EAS service deployment. Subsequent scale-out and scale-in operations use the optimized model.
Enable Blade EAS Plugin only when creating services using the eascmd client. For eascmd client configuration, see Download and authenticate the client and Command reference.
Add the plugins field to the EAS service configuration file to enable Blade EAS Plugin. For details, see plugins field. The following examples demonstrate Blade EAS Plugin configuration:
plugins field
Add the plugins field to the EAS service configuration file to enable Blade EAS Plugin. This field contains a list of one or more dictionary elements. For field details, see Create a service.
Each dictionary element contains the following keys:
Table 1. Fields in plugins
|
Key |
Required |
Description |
|
command |
Yes |
Optimization command to run. For values, see Mappings between processors and plugin runtime images. |
|
image |
Yes |
Registry address of the Blade EAS Plugin runtime image. Blade EAS Plugin supports TensorFlow and PyTorch frameworks on CPU and GPU (CUDA 10.0) devices. See Mappings between processors and plugin runtime images for processor names and plugin registry addresses. |
|
resource |
No |
Resource group for executing optimization. This differs from the top-level resource field in the service description file, which specifies the resource group for the EAS service. Required for GPU optimization. In China (Hangzhou) and China (Shanghai) regions, use the T4_8CORE resource group. In China (Shanghai), V100_8CORE or P4_8CORE resource groups are also supported. Note
For GPUs, the resource group running Blade EAS Plugin must use the same card type as the resource group for the EAS service. |
|
gpu |
No |
Number of GPUs for optimization. Typically 1. |
|
config |
No |
Advanced optimization configuration items. Currently supports only the model_info subkey for configuring one model. The key is the model file name. The value supports multiple optimization items that align with the |
Table 2. Mappings between processors and plugin runtime images
|
Device type |
Key |
Value |
|
CPU |
image (Blade EAS Plugin Registry Address) |
|
|
command (Plugin optimization command) |
|
|
|
processor |
|
|
|
GPU |
image (Blade EAS Plugin Registry Address) |
|
|
command (Plugin optimization command) |
|
|
|
processor |
|
Table 3. List of optimization items
|
Optimization item |
Description |
|
optimization_level |
Optimization level. Two levels available:
|
|
test_data |
Test data file. Optional. The test data file must be included in the path or compressed package specified by model_path. Provide test data for normal model inference, especially for PyTorch model optimization. For test data file generation, see Auxiliary data for optimization. |
|
calibration_dataset |
Quantization calibration data file. Optional. The quantization calibration data file must be included in the path or compressed package specified by model_path. If not specified, Blade performs online INT8 quantization. If specified, Blade performs offline INT8 quantization. Provide more than 100 pieces of calibration data. For calibration data file generation, see Auxiliary data for optimization. |
|
inputs |
List of strings. Optional. Specifies model input node names. If not specified, Blade uses nodes without upstream nodes as input nodes. Not required for PyTorch models. |
|
outputs |
List of strings. Optional. Specifies model output node names. If not specified, Blade uses nodes without downstream nodes as output nodes. Not required for PyTorch models. |
|
input_shapes |
Possible shapes of input tensors. Use to improve optimization in specific scenarios. The inner list element count must equal the number of input tensors. Each element is a string representing an input shape, such as For multiple sets of possible shapes, add elements to the outer list. For example, a model with two inputs might have the following value examples:
|
|
input_ranges |
Value range for elements of each input tensor. The inner list element count must equal the number of input tensors. Each element is a string representing a value range. Value ranges use square brackets with real numbers or characters, such as For multiple sets of possible value ranges, add elements to the outer list. For example, a model with two inputs might have the following value examples:
|
|
quantization |
JSON dictionary. Currently supports only the weight_adjustment key. This key indicates whether to adjust model parameters to reduce quantization precision loss. Supported values:
|
Auxiliary data for optimization
At O1 optimization level, providing test_data makes optimization more targeted. At O2 optimization level, providing calibration_dataset guides Blade to perform offline INT8 optimization. Both parameters must conform to Blade auxiliary data format. Auxiliary data formats for TensorFlow and PyTorch:
-
TensorFlow auxiliary data is in list of feed dict format. The feed dict key is a string, and the value is a numpy ndarray. Save the auxiliary data file as an .npy file.
-
PyTorch auxiliary data is in list of tensor tuple format. Save as a .pth file.
The following sample code generates an auxiliary data file for TensorFlow:
import numpy as np
calib_data = list()
for i in range(10):
feed_dict = {
'image_placeholder:0': np.ones((8, 3, 224, 224), dtype=np.float32),
'threshold_placeholder:0': np.float32(0.5),
}
calib_data.append(feed_dict)
np.save("calib_data.npy", calib_data)
The following sample code generates an auxiliary data file for PyTorch:
import numpy as np
import torch
calib_data = list()
for i in range(10):
image = torch.ones(8, 3, 224, 224)
threshold = torch.tensor(0.5)
feed_tuple = (image, threshold)
calib_data.append(feed_tuple)
torch.save(calib_data, 'calib_data.pth')
Without Blade EAS Plugin
The following is a simple EAS service configuration file without Blade EAS Plugin:
{
"name": "blade_eas_plugin_test",
"model_path": "oss://<yourBucket>/<pathToYourModel>/",
"processor": "tensorflow_cpu_1.15",
"metadata": {
"instance": 1,
"memory": 4000
},
"resource": "<yourEASResource>"
}
For field details in the EAS service configuration file, see Create a service.
Save the service configuration file as service.json. Run the following command to create a TensorFlow 1.15 service deployed on CPU:
eascmd create service.json
Sample output:
+-------------------+-------------------------------------------------------------------------------------------------+
| Internet Endpoint | http://123456789012****.cn-shanghai.pai-eas.aliyuncs.com/api/predict/blade_eas_plugin_test |
| Intranet Endpoint | http://123456789012****.vpc.cn-shanghai.pai-eas.aliyuncs.com/api/predict/blade_eas_plugin_test |
| Token | owic823JI3kRmMDZlOTQzMTA3ODhmOWIzMmVkZmZmZGQyNmQ1N2M5**** |
+-------------------+-------------------------------------------------------------------------------------------------+
[OK] Service is now creating
[OK] Schedule process to node cn-shanghai.i-uf6hv6kfua25q1k8****
[OK] Fetching processor from [http://pai-blade.oss-cn-zhangjiakou.aliyuncs.com/release/3.18.0/py3.6.8_cpu_tf1.15.0_torch1.6.0_abiprecxx11/TENSORFLOW_SDK_CPU.d12d3dc-91024d0-1.15.0-Linux.tar.gz]
[OK] Successfully downloaded all artifacts
[OK] Building image registry-vpc.cn-shanghai.aliyuncs.com/eas/blade_eas_plugin_test_cn-shanghai:v0.0.1-20211117170541
[OK] Pushing image registry-vpc.cn-shanghai.aliyuncs.com/eas/blade_eas_plugin_test_cn-shanghai:v0.0.1-20211117170541
[OK] Successfully pushed image registry-vpc.cn-shanghai.aliyuncs.com/eas/blade_eas_plugin_test_cn-shanghai:v0.0.1-20211117170541
[OK] Successfully created ingress
[OK] Successfully synchronized resources
[OK] Waiting [Total: 1, Pending: 1, Running: 0]
[OK] Running [Total: 1, Pending: 0, Running: 1]
[OK] Service is running
Basic configuration
To enable Blade EAS Plugin, add the plugins field to the EAS service configuration file. This field is a list. In the following example, the list contains one dictionary element for the Blade optimization plugin:
{
"name": "blade_eas_plugin_test",
"model_path": "oss://<yourBucket>/<pathToYourModel>/",
"processor": "tensorflow_cpu_1.15",
"metadata": {
"instance": 1,
"memory": 4000
},
"plugins": [
{
"command": "blade --mode eas_plugin --optimize_for cpu",
"image": "registry.cn-shanghai.aliyuncs.com/eas/pai-blade-deploy:cpu_latest"
}
],
"resource": "<yourEASResource>"
}
In the example, fields other than plugins follow the PAI-EAS service configuration file format. For details, see Create a service. The dictionary element in the plugins list contains two keys:
-
command: Optimization command to run.
--mode eas_pluginindicates EAS plugin optimization pattern.--optimize_for cpuindicates CPU inference optimization. -
image: Registry address of the Blade EAS Plugin runtime image. All CPU optimizations use the runtime image
registry.cn-shanghai.aliyuncs.com/eas/pai-blade-deploy:cpu_latest.
The example completes optimization configuration on CPU device without test data. Save the service configuration file as service1.json. Create the service using the create command in the eascmd client tool:
eascmd create service1.json
Sample output:
+-------------------+-------------------------------------------------------------------------------------------------+
| Internet Endpoint | http://123456789012****.cn-shanghai.pai-eas.aliyuncs.com/api/predict/blade_eas_plugin_test |
| Intranet Endpoint | http://123456789012****.vpc.cn-shanghai.pai-eas.aliyuncs.com/api/predict/blade_eas_plugin_test |
| Token | owic823JI3kRmMDZlOTQzMTA3ODhmOWIzMmVkZmZmZGQyNmQ1N2M5**** |
+-------------------+-------------------------------------------------------------------------------------------------+
[OK] Service is now creating
[OK] Fetching processor from [http://pai-blade.oss-cn-zhangjiakou.aliyuncs.com/release/3.18.0/py3.6.8_cpu_tf1.15.0_torch1.6.0_abiprecxx11/TENSORFLOW_SDK_CPU.d12d3dc-91024d0-1.15.0-Linux.tar.gz]
[OK] Successfully downloaded all artifacts
[OK] Executing plugin eas-plugin-73d70d54: registry.cn-shanghai.aliyuncs.com/eas/pai-blade-deploy:cpu_latest
[OK] Building image registry-vpc.cn-shanghai.aliyuncs.com/eas/blade_eas_plugin_test_cn-shanghai:v0.0.1-20211117172259
[OK] Pushing image registry-vpc.cn-shanghai.aliyuncs.com/eas/blade_eas_plugin_test_cn-shanghai:v0.0.1-20211117172259
[OK] Successfully pushed image registry-vpc.cn-shanghai.aliyuncs.com/eas/blade_eas_plugin_test_cn-shanghai:v0.0.1-20211117172259
[OK] Successfully created ingress
[OK] Successfully patch resources
[OK] Waiting [Total: 1, Pending: 1, Running: 0]
[OK] Running [Total: 1, Pending: 0, Running: 1]
[OK] Service is running
Compared with configuration without Blade EAS Plugin, the logs contain an additional line indicating successful Blade optimization execution:
[OK] Executing plugin eas-plugin-73d70d54: registry.cn-shanghai.aliyuncs.com/eas/pai-blade-deploy:cpu_latest
Advanced configuration
Providing more model information improves optimization accuracy and acceleration ratio. The following example shows a GPU service description file with additional optimization parameters:
{
"name": "blade_eas_plugin_test",
"metadata": {
"cpu": 4,
"gpu": 1,
"instance": 1,
"memory": 4096,
"cuda": "10.0"
},
"model_path": "oss://<yourBucket>/<pathToYourModel>/",
"plugins": [
{
"command": "blade --mode eas_plugin --optimize_for gpu",
"image": "registry.cn-shanghai.aliyuncs.com/eas/pai-blade-deploy:gpu_latest",
"resource": "T4_8CORE",
"gpu": 1,
"config": {
"model_info": {
"frozen.pb": {
"optimization_level": "o1",
"inputs": [
"input_ids_a_1"
],
"outputs": [
"l2_normalize"
],
"test_data": "test_len9240_bc1.npy"
}
}
}
}
],
"processor": "tensorflow_gpu_1.15",
"resource": "<yourEASResource>"
}
In the example, fields other than plugins follow the EAS service configuration file format. For details, see Create a service and Fields in plugins. frozen.pb is the model file name. This indicates optimization of the TensorFlow model in the frozen.pb file.
Save the service configuration file as service2.json. Create the service using the create command in the eascmd client tool:
eascmd create service2.json
Sample output:
+-------------------+-------------------------------------------------------------------------------------------------+
| Internet Endpoint | http://123456789012****cn-shanghai.pai-eas.aliyuncs.com/api/predict/blade_eas_plugin_test |
| Intranet Endpoint | http://123456789012****.vpc.cn-shanghai.pai-eas.aliyuncs.com/api/predict/blade_eas_plugin_test |
| Token | owic823JI3kRmMDZlOTQzMTA3ODhmOWIzMmVkZmZmZGQyNmQ1N2M5**** |
+-------------------+-------------------------------------------------------------------------------------------------+
[OK] Service is now creating
[OK] Schedule process to node cn-shanghai.i-uf642ocg20xinsme****
[OK] Downloading oss file: oss://blade-qa/test_assets/tf_security_textcnn/
[OK] Fetching processor from [http://pai-blade.oss-cn-zhangjiakou.aliyuncs.com/release/3.18.0/py3.6.8_cu100_tf1.15.0_torch1.6.0_abiprecxx11/TENSORFLOW_SDK_GPU.d12d3dc-91024d0-1.15.0-Linux.tar.gz]
[OK] Successfully downloaded all artifacts
[OK] Executing plugin eas-plugin-7126ee68: registry.cn-shanghai.aliyuncs.com/eas/pai-blade-deploy:gpu_latest
[OK] Building image registry-vpc.cn-shanghai.aliyuncs.com/eas/blade_eas_plugin_test_cn-shanghai:v0.0.1-20211117191732
[OK] Pushing image registry-vpc.cn-shanghai.aliyuncs.com/eas/blade_eas_plugin_test_cn-shanghai:v0.0.1-20211117191732
[OK] Successfully pushed image registry-vpc.cn-shanghai.aliyuncs.com/eas/blade_eas_plugin_test_cn-shanghai:v0.0.1-20211117191732
[OK] Successfully created ingress
[OK] Successfully synchronized resources
[OK] Waiting [Total: 1, Pending: 1, Running: 0]
[OK] Running [Total: 1, Pending: 0, Running: 1]
[OK] Service is running
The logs contain an additional entry compared with configuration without Blade EAS Plugin. This entry indicates successful Blade optimization execution:
[OK] Executing plugin eas-plugin-7126ee68: registry.cn-shanghai.aliyuncs.com/eas/pai-blade-deploy:gpu_latest