PAI SDK for Pythonを使用した推論サービスのデプロイ - Platform For AI

Platform for AI (PAI) は、使いやすい高レベルのAPIを含むPython用のSDKを提供します。 SDKを使用して、モデルをPAIの推論サービスとしてデプロイできます。このトピックでは、PAI SDK for Pythonを使用してPAIに推論サービスをデプロイする方法について説明します。

概要

Python用PAI SDKには、pai.mo del.Modelおよびpai.predictor.Predictorの高レベルAPIが含まれています。 SDKを使用して、PAIのElastic Algorithm Service (EAS) にモデルをデプロイし、モデルサービスをテストできます。

SDKを使用して推論サービスをデプロイするには、次の手順を実行します。

paiの推論サービスの設定を指定します。 model.InferenceSpecオブジェクト。設定には、サービスの展開に使用するプロセッサまたはイメージが含まれます。
InferenceSpecオブジェクトとモデルファイルを使用して、pai.mo del.Modelオブジェクトを作成します。
pai.model.Model.de ploy() メソッドを呼び出して、推論サービスをデプロイします。このメソッドでは、必要なリソースやサービス名など、サービスのデプロイに関する情報を指定します。
deployメソッドを呼び出して、pai.predictor.Predictorオブジェクトを取得します。次に、predictメソッドを呼び出して、推論リクエストを送信します。

サンプルコード：

from pai.model import InferenceSpec, Model, container_serving_spec
from pai.image import retrieve, ImageScope

# 1. Use a PyTorch image provided by PAI for model inference. 
torch_image = retrieve("PyTorch", framework_version="latest",
    image_scope=ImageScope.INFERENCE)


# 2. Specify the configurations of the inference service in the InferenceSpec object. 
inference_spec = container_serving_spec(
    # The startup command of the inference service. 
    command="python app.py",
    source_dir="./src/"
    # The image used for model inference. 
    image_uri=torch_image.image_uri,
)


# 3. Create a Model object. 
model = Model(
    # Use a model file stored in an Object Storage Service (OSS) bucket. 
    model_data="oss://<YourBucket>/path-to-model-data",
    inference_spec=inference_spec,
)

# 4. Deploy the model as an online inference service in EAS and obtain a Predictor object. 
predictor = model.deploy(
    service_name="example_torch_service",
    instance_type="ecs.c6.xlarge",
)

# 5. Test the inference service. 
res = predictor.predict(data=data)

次のセクションでは、SDK for Pythonを使用して推論サービスをデプロイし、対応するサンプルコードを提供する方法について説明します。

InferenceSpecの設定

プロセッサまたはイメージを使用して推論サービスをデプロイできます。 pai.mo del.InferenceSpecオブジェクトは、サービスの展開に使用されるプロセッサやイメージ、サービスストレージパス、ウォームアップ要求ファイル、リモートプロシージャコール (RPC) バッチ処理など、推論サービスの構成を定義します。

組み込みプロセッサを使用して推論サービスをデプロイする

プロセッサは、オンライン予測ロジックを含むパッケージである。プロセッサを使用して、モデルを推論サービスとして直接デプロイできます。 PAIは、TensorFlow SavedModel、PyTorch TorchScript、XGBoost、LightGBM、PMMLなどの一般的な機械学習モデル形式をサポートする組み込みプロセッサを提供します。詳細については、「組み込みプロセッサ」をご参照ください。

InferenceSpec設定のサンプル:

# Use a built-in TensorFlow processor. 
tf_infer_spec = InferenceSpec(processor="tensorflow_cpu_2.3")


# Use a built-in PyTorch processor. 
tf_infer_spec = InferenceSpec(processor="pytorch_cpu_1.10")

# Use a built-in XGBoost processor. 
xgb_infer_spec = InferenceSpec(processor="xgboost")

InferenceSpecオブジェクトの推論サービスに、ウォームアップ要求やRPCなどの追加機能を設定できます。高度な設定については、「モデルサービスのパラメーター」をご参照ください。

# Configure the properties of InferenceSpec. 
tf_infer_spec.warm_up_data_path = "oss://<YourOssBucket>/path/to/warmup-data" # Specify the path of the warmup request file. 
tf_infer_spec.metadata.rpc.keepalive=1000 # Specify the maximum processing time for a single request. 

print(tf_infer_spec.warm_up_data_path)
print(tf_infer_spec.metadata.rpc.keepalive)

イメージを使用した推論サービスのデプロイ

プロセッサはモデル展開手順を簡素化しますが、特にモデルや推論サービスに複雑な依存関係がある場合は、カスタム展開要件を満たすことができません。この問題に対処するため、PAIはイメージを使用した柔軟なモデル展開をサポートしています。

モデルのコードと依存関係をDockerイメージにパッケージ化し、DockerイメージをAlibaba Cloud Container Registry (ACR) にプッシュできます。次に、Dockerイメージに基づいてInferenceSpecオブジェクトを作成できます。

from pai.model import InferenceSpec, container_serving_spec

# Call the container_serving_spec method to create an InferenceSpec object from an image. 
container_infer_spec = container_serving_spec(
    # The image used to run the inference service. 
    image_uri="<CustomImageUri>",
    # The port on which the inference service listens. The inference requests are forwarded to this port by PAI. 
    port=8000,
    environment_variables=environment_variables,
    # The startup command of the inference service. 
    command=command,
    # The Python package required by the inference service. 
    requirements=[
        "scikit-learn",
        "fastapi==0.87.0",
    ],
)


print(container_infer_spec.to_dict())

m = Model(
    model_data="oss://<YourOssBucket>/path-to-tensorflow-saved-model",
    inference_spec=custom_container_infer_spec,
)
p = m.deploy(
    instance_type="ecs.c6.xlarge"
)

カスタムイメージを使用する場合は、用意した推論コードをコンテナーに統合し、イメージを構築してから、イメージをACRにプッシュします。 Python用PAI SDKは、このプロセスを簡素化します。コードをベースイメージに追加して、カスタムイメージを作成できます。このようにして、最初からイメージを構築する必要はありません。 pai.mo del.container_serving_spec() メソッドでは、source_dirパラメーターを、推論コードを含むオンプレミスディレクトリに設定できます。 SDKは自動的にディレクトリをパッケージ化してOSSバケットにアップロードし、OSSパスをコンテナーにマウントします。 startupコマンドを指定して推論サービスを起動できます。
```
from pai.model import InferenceSpec

inference_spec = container_serving_spec(
    # The on-premises directory that contains the inference code. The directory is uploaded to an OSS bucket, and the OSS path is mounted to the container. Default container path: /ml/usercode/. 
    source_dir="./src",
    # The startup command of the inference service. If you specify the source_dir parameter, the /ml/usercode directory is used as the working directory of the container by default. 
    command="python run.py",
    image_uri="<ServingImageUri>",
    requirements=[
        "fastapi",
        "uvicorn",
    ]
)
print(inference_spec.to_dict())
```

コードまたはモデルをコンテナーに追加する場合は、pai.model.InferenceSpec.mo unt() メソッドを呼び出して、オンプレミスディレクトリまたはOSSパスをコンテナーにマウントできます。

# Upload the on-premises data to OSS and mount the OSS path to the /ml/tokenizers directory in the container. 
inference_spec.mount("./bert_tokenizers/", "/ml/tokenizers/")

# Mount the OSS path to the /ml/data directory in the container. 
inference_spec.mount("oss://<YourOssBucket>/path/to/data/", "/ml/data/")

PAIが提供するパブリックイメージの取得

PAIは、TensorFlow、PyTorch、XGBoostなどの一般的な機械学習フレームワークに基づいた複数の推論イメージを提供します。推論イメージを取得するには、pai.image.list_imagesおよびpai.image.retrieveメソッドで、image_scopeパラメーターをImageScope.INFERENCEに設定します。

from pai.image import retrieve, ImageScope, list_images

# Obtain all PyTorch inference images provided by PAI. 
for image_info in list_images(framework_name="PyTorch", image_scope=ImageScope.INFERENCE):
  	print(image_info)


# Obtain PyTorch 1.12 images for CPU-based inference. 
retrieve(framework_name="PyTorch", framework_version="1.12", image_scope=ImageScope.INFERENCE)

# Obtain PyTorch 1.12 images for GPU-based inference. 
retrieve(framework_name="PyTorch", framework_version="1.12", accelerator_type="GPU", image_scope=ImageScope.INFERENCE)

# Obtain the images that support the latest version of PyTorch for GPU-based inference. 
retrieve(framework_name="PyTorch", framework_version="latest", accelerator_type="GPU", image_scope=ImageScope.INFERENCE)

推論サービスのデプロイと推論要求の送信

推論サービスのデプロイ

pai.mo del.InferenceSpecオブジェクトとmodel_dataパラメーターを使用して、pai.mo del.Modelオブジェクトを作成します。次に、deployメソッドを呼び出してモデルをデプロイします。 model_dataパラメーターは、モデルのパスを指定します。パラメーターの値は、OSS URIまたはオンプレミスパスにすることができます。オンプレミスパスを指定すると、パスに保存されているモデルファイルがOSSバケットにアップロードされ、OSSバケットから推論サービスにロードされます。

deployメソッドで、必要なリソース、インスタンス数、サービス名など、推論サービスのパラメーターを指定します。高度な設定については、「モデルサービスのパラメーター」をご参照ください。

from pai.model import Model, InferenceSpec
from pai.predictor import Predictor

model = Model(
    # The path of the model, which can be an OSS URI or an on-premises path. If you specify an on-premises path, the model file stored in the path is uploaded to an OSS bucket by default. 
    model_data="oss://<YourBucket>/path-to-model-data",
    inference_spec=inference_spec,
)

# Deploy the inference service in EAS. 
predictor = m.deploy(
    # The name of the inference service. 
    service_name="example_xgb_service",
    # The instance type used for the inference service. 
    instance_type="ecs.c6.xlarge",
    # The number of instances. 
    instance_count=2,
    # Optional. Use a dedicated resource group for service deployment. By default, the public resource group is used. 
    # resource_id="<YOUR_EAS_RESOURCE_GROUP_ID>",
    options={
        "metadata.rpc.batching": True,
        "metadata.rpc.keepalive": 50000,
        "metadata.rpc.max_batch_size": 16,
        "warm_up_data_path": "oss://<YourOssBucketName>/path-to-warmup-data",
    },
)

resource_configパラメーターを使用して、vCPUの数や各サービスインスタンスのメモリサイズなど、サービスのデプロイに使用されるリソースの数を指定することもできます。

from pai.model import ResourceConfig

predictor = m.deploy(
    service_name="dedicated_rg_service",
    # Specify the number of vCPUs and the memory size of each service instance. 
    # In this example, each service instance has two vCPUs and 4,000 MB of memory. 
    resource_config=ResourceConfig(
        cpu=2,
        memory=4000,
    ),
)

推論サービスへのリクエストの送信

pai.model.Model.de ployメソッドで、EAS API操作を呼び出して推論サービスをデプロイします。対応するpai.predictor.Predictorオブジェクトが返されます。 Predictorオブジェクトでpredictメソッドとraw_predictメソッドを使用して、推論リクエストを送信できます。

説明

pai.predictor. raw_predictメソッドの入力と出力は、シリアライザで処理する必要はありません。

from pai.predictor import Predictor, EndpointType

# Deploy an inference service. 
predictor = model.deploy(
    instance_type="ecs.c6.xlarge",
    service_name="example_xgb_service",
)

# The inference service to which the inference request is sent. 
predictor = Predictor(
    service_name="example_xgb_service",
    # By default, you can access the inference service over the Internet. To access the inference service over a virtual private cloud (VPC) endpoint, you can set the endpoint type to INTRANET. In this case, the client must be deployed in the VPC. 
    # endpoint_type=EndpointType.INTRANET
)

# Use the predict method to send a request to the inference service and obtain the result. The input and output are processed by a serializer. 
res = predictor.predict(data_in_nested_list)


# Use the raw_predict method to send a request to the inference service in a more flexible manner. 
response: RawResponse = predictor.raw_predict(
  	# The input data of the bytes type and file-like objects can be directly passed to the HTTP request body. 
  	# Other data is serialized into JSON-formatted data and then passed to the HTTP request body. 
  	data=data_in_nested_list
  	# path="predict", # The path of HTTP requests. Default value: "/". 
  	# headers=dict(), # The request header. 
  	# method="POST", # The HTTP request method. 
  	# timeout=30, # The request timeout period. Unit: seconds. 
)

# Obtain the returned HTTP body and header. 
print(response.content, response.headers)
# Deserialize the returned JSON-formatted data into a Python object. 
print(response.json())

    
# Stop the inference service. 
predictor.stop_service()
# Start the inference service. 
predictor.start_service()
# Delete the inference service. 
predictor.delete_service()

シリアライザーを使用して入力と出力を処理する

モデル推論のためにpai.predictor.Predictor.predictメソッドを呼び出すときは、入力されたPythonデータを推論サービスでサポートされているデータ形式にシリアル化し、返された結果を読み取り可能または操作可能なPythonオブジェクトに逆シリアル化する必要があります。 Predictorオブジェクトは、シリアライザークラスを使用してシリアル化と逆シリアル化を実行します。

predict(data=<PredictionData>) メソッドを呼び出すと、dataパラメーターは、serilizer.serializeメソッドを呼び出して、リクエストデータをbytes形式にシリアル化します。そして、変換されたリクエストデータは、HTTPリクエストボディを介して推論サービスに渡されます。
推論サービスがHTTP応答を返すと、Predictorオブジェクトはserializer.de serializeメソッドを呼び出して応答を逆シリアル化します。 predictメソッドから変換されたレスポンスを取得できます。

Python用PAI SDKは、一般的なデータ形式用に複数の組み込みシリアライザーを提供します。シリアライザは、PAIによって提供される内蔵プロセッサの入力および出力を処理することができる。

JsonSerializer
JsonSerializerはオブジェクトをJSON文字列にシリアル化し、JSON文字列をオブジェクトに逆シリアル化します。 predictメソッドの入力データは、NumPy arrayまたはlistです。 JsonSerializer.serializeメソッドは、入力データをJSON文字列にシリアル化します。 JsonSerializer.de serializeメソッドは、返されたJSON文字列をPythonオブジェクトに逆シリアル化します。
XGBoostプロセッサやPMMLプロセッサなどの特定の組み込みプロセッサは、JSON形式のデータのみを受け取り、返します。デフォルトでは、これらのプロセッサの入力と出力を処理するためにJsonSerializerが使用されます。

from pai.serializers import JsonSerializer

# In the deploy method, specify the serializer that you want to use. 
p = Model(
    inference_spec=InferenceSpec(processor="xgboost"),
    model_data="oss://<YourOssBucket>/path-to-xgboost-model"
).deploy(
    instance_type="ecs.c6.xlarge",
    # Optional. By default, JsonSerializer is used to process the input and output of the XGBoost processor. 
    serializer=JsonSerializer()
)

# You can also specify a serializer when you create a Predictor object. 
p = Predictor(
    service_name="example_xgb_service"
    serializer=JsonSerializer(),
)

# The returned result is a list. 
res = p.predict([[2,3,4], [4,5,6]])

TensorFlowSerializer
組み込みのTensorFlowプロセッサを使用して、SavedModel形式のTensorFlowモデルをPAIにデプロイできます。 TensorFlowサービスの入力と出力は、protocol buffersのメッセージです。データ形式の詳細については、「tf_predict.proto」をご参照ください。
Python用PAI SDKには組み込みのTensorFlowSerializerがあり、推論リクエストをNumPy arrayとして送信できます。シリアライザは、NumPy arrayをprotocol buffersメッセージにシリアル化し、返されたprotocol buffersメッセージをNumPy arraysに逆シリアル化します。

# Deploy a model service by using the TensorFlow processor. 
tf_predictor = Model(
    inference_spec=InferenceSpec(processor="tensorflow_cpu_2.7"),
    model_data="oss://<YourOssBucket>/path-to-tensorflow-saved-model"
).deploy(
    instance_type="ecs.c6.xlarge",
    # Optional. By default, TensorFlowSerializer is used to process the input and output of the TensorFlow processor. 
    # serializer=TensorFlowSerializer(),
)

# You can obtain the service signature by calling an API operation. 
print(tf_predictor.inspect_signature_def())

# The input of the TensorFlow processor is of the dictionary type. The dictionary key is the name of the input signature. The dictionary value is the specific input data. 
tf_result = tf_predictor.predict(data={
    "flatten_input": numpy.zeros(28*28*2).reshape((-1, 28, 28))
})

assert result["dense_1"].shape == (2, 10)

PyTorchSerializer
組み込みのPyTorchプロセッサを使用して、PyTorchモデルをTorchScript形式でPAIにデプロイできます。 PyTorchサービスの入力および出力は、protocol buffersメッセージである。データ形式の詳細については、「tf_predict.proto」をご参照ください。
Python用PAI SDKには組み込みのPyTorchSerializerがあり、推論リクエストをNumPy arrayとして送信できます。シリアライザは、NumPy arrayをprotocol buffersメッセージにシリアル化し、返されたプロトコルバッファメッセージをNumPy arrayに逆シリアル化します。

# Deploy a model service by using the PyTorch processor. 
torch_predictor = Model(
    inference_spec=InferenceSpec(processor="pytorch_cpu_1.10"),
    model_data="oss://<YourOssBucket>/path-to-torch_script-model"
).deploy(
    instance_type="ecs.c6.xlarge",
    # Optional. By default, PyTorchSerializer is used to process the input and output of the PyTorch processor. 
    # serializer=PyTorchSerializer(),
)

#1. Convert the input data into a format supported by the model service. 
#2. Use a list or tuple for multiple inputs. Each element is a NumPy array. 
torch_result = torch_predictor.predict(data=numpy.zeros(28 * 28 * 2).reshape((-1, 28, 28)))
assert torch_result.shape == (2, 10)

カスタムシリアライザー
pai.serializers.SerializerBaseクラスを使用して、推論サービスでサポートされているデータ形式に基づいてカスタムシリアライザーを作成できます。
このセクションでは、シリアル化と逆シリアル化の実行方法を示すために、カスタムのNumpySerializerを例として使用します。
1. クライアント:NumpySerializer.serializerメソッドが呼び出され、NumPy arrayまたはpandas DataFrame入力に. npy形式を指定します。変換されたデータは、サーバに送信される。
2. サーバー: 推論サービスは、受信したデータを逆シリアル化します。. npy形式で推論結果を生成し、結果をシリアル化して. npy形式を指定します。結果は、シリアル化後にクライアントに返されます。
3. クライアント: NumpySerializer.de serializeメソッドを呼び出して、返された結果をNumPy arrayに逆シリアル化します。

オンプレミス環境での推論サービスのデプロイ

Python用PAI SDKを使用すると、カスタムイメージを使用してオンプレミス環境に推論サービスをデプロイすることもできます。オンプレミス環境で推論サービスを実行するには、model.de ployメソッドでinstance_typeパラメーターをlocalに設定します。 SDKはDocker containerを使用して、オンプレミスマシンで推論サービスを実行します。モデルはOSSバケットから自動的にダウンロードされ、オンプレミスマシンで実行されるコンテナーにマウントされます。

from pai.predictor import LocalPredictor

p: LocalPredictor = model.deploy(
    # Specify to deploy the inference service in an on-premises environment. 
    instance_type="local",
    serializer=JsonSerializer()
)

p.predict(data)

# Delete the Docker container. 
p.delete_service()

Platform For AI:推論サービスのデプロイ

概要