Serverless Devs で GPU 非同期推論ジョブを実行 - FC

このトピックでは、Serverless Devsを使用して非同期タスクに基づいてGPU関数を呼び出し、呼び出し結果を構成された非同期宛先関数に渡す方法について説明します。

背景情報

GPU高速化インスタンス

機械学習、特に詳細な学習の広範なアプリケーションでは、CPUは、多数のベクトル、行列、およびテンソル演算によって生成される計算能力要件を満たすことができません。計算能力要件には、トレーニングシナリオでの高精度計算の要件、および推論シナリオでの低精度計算の要件が含まれます。 2007年、Nvidiaは、プログラム可能な汎用コンピューティングプラットフォームであるCompute Unified Device Architecture (CUDA) フレームワークを立ち上げました。研究者と開発者は、数十倍または数千倍もパフォーマンスを向上させるために多数のアルゴリズムを改訂しました。 GPUは、機械学習が普及して以来、さまざまなツール、アルゴリズム、フレームワークの基本的な機能の1つになっています。

Apsara Conference 2021中に、Alibaba Cloud Function Computeはチューリングアーキテクチャを使用するGPUアクセラレーションインスタンスを正式にリリースしました。サーバーレス開発者は、GPUハードウェアを使用してAIトレーニングと推論タスクを高速化できます。このようにして、モデルトレーニングおよび推論サービスの効率が改善される。

非同期タスク

Function Computeは、非同期タスクの分散、実行、および監視に使用できるフルスタック機能を提供します。これにより、タスク処理ロジックのコンパイルに集中でき、タスク処理関数を作成して送信するだけで済みます。 Function Computeは、非同期タスクログ、メトリクス、期間統計などのさまざまなモニタリング機能を各フェーズで提供します。 Function Computeは、インスタンスの自動スケーリング、タスクの重複排除、指定されたタスクの終了、バッチタスクの一時停止、再開、削除などの機能も提供します。詳細については、「概要」をご参照ください。

シナリオ

非リアルタイムおよびオフラインのAI推論シナリオ、AIトレーニングシナリオ、オーディオおよびビデオ制作シナリオでは、GPU関数は非同期タスクに基づいて呼び出されます。これにより、開発者はビジネスに集中し、ビジネス目標を迅速に達成できます。次のセクションでは、実装方法について説明します。

GPUリソースは、GPU仮想化テクノロジを使用して、1/8、1/4、1/2、または排他モードで使用できます。このようにして、GPU高速化インスタンスをきめ細かく構成できます。
非同期モード管理、タスク重複排除、タスク監視、タスク再試行、イベントトリガ、結果コールバック、およびタスクオーケストレーションなど、さまざまな成熟した非同期タスク処理機能が提供されます。
開発者は、ドライバーとCUDAのバージョン管理、マシンの運用管理、GPUの不良カード管理など、GPUクラスターでO&Mを実行する必要なく、コード開発とビジネス目標の達成に集中できます。

制御ポリシー機能の動作

このトピックでは、GPU関数をデプロイし、結果コールバックを実装する方法について説明します。このトピックでは、tgpu_basic_func GPU関数がデプロイされ、async-callback-succ-func関数が成功した呼び出しのコールバック関数として指定され、async-callback fail-func関数が失敗した呼び出しのコールバック関数として設定されます。次の表に、上記の関数に関する情報を示します。

関数	説明	ランタイム環境	インスタンスタイプ	トリガータイプ
tgpu_basic_func	function ComputeのGPU高速化インスタンスに基づいて、AI準リアルタイムタスクとAIオフラインタスクを実行する関数	カスタムコンテナ	GPU を備えたインスタンス	HTTP関数
async-callback-succ-func	タスク実行が成功するための宛先コールバック関数	Python 3	Elasticインスタンス	イベント関数
async-callback-fail-func	失敗したタスク実行の宛先コールバック関数	Python 3	Elasticインスタンス	イベント関数

次の図は、ワークフローを示しています。

始める前に

ステップ1: 成功した呼び出しのためのコールバック関数のデプロイ

プロジェクトを初期化します。

s init devsapp/start-fc-event-python3 -d async-succ-callback

次のサンプルコードは、作成されたプロジェクトのディレクトリを示しています。

├── async-succ-callback
│   ├── code
│   │   └── index.py
│   └── s.yaml

プロジェクトが存在するディレクトリに移動します。
```
cd async-succ-callback
```

ビジネス要件に基づいて、ディレクトリファイルのパラメーター設定を変更します。

s.yamlファイルを編集します。例：

edition: 1.0.0
name: hello-world-app
# access specifies the key information required by the current application.
# For information about how to configure keys, visit https://www.serverless-devs.com/serverless-devs/command/config.
# For more information about how to use keys, visit https://www.serverless-devs.com/serverless-devs/tool.
access: "default"

vars: # The global variables
  region: "cn-shenzhen"

services:
  helloworld: # The name of the service or module.
    component: fc
    props:
      region: ${vars.region}
      service:
        name: "async-callback-service"
        description: 'async callback service'
        # Obtain the logConfig configuration document from https://gitee.com/devsapp/fc/blob/main/docs/zh/yaml/service.md#logconfig.
        logConfig:
          project: tgpu-prj-sh             # The project that stores the request logs. You must create the project in Simple Log Service in advance. We recommend that you configure this item.
          logstore: tgpu-logstore-sh       # The Logstore that stores the request logs. You must create the Logstore in Simple Log Service in advance. We recommend that you configure this item.
          enableRequestMetrics: true
          enableInstanceMetrics: true
          logBeginRule: DefaultRegex
      function:
        name: "async-callback-succ-func"
        description: 'async callback succ func'
        runtime: python3
        codeUri: ./code
        handler: index.handler
        memorySize: 128
        timeout: 60

index.pyファイルを編集します。例：

# -*- coding: utf-8 -*-
import logging

# To enable the initializer feature
# please implement the initializer function as below:
# def initializer(context):
#   logger = logging.getLogger()
#   logger.info('initializing')

def handler(event, context):
  logger = logging.getLogger()
  logger.info('hello async callback succ')
  return 'hello async callback succ'

コードをデプロイするDeploy the code toFunction Compute.
```
s deploy
```
デプロイされた関数は、Function Computeコンソールで表示できます。
オンプレミスマシンを使用して関数を呼び出し、デバッグします。
```
s invoke
```
呼び出しが完了すると、hello async callback succが返されます。

ステップ2: 失敗した呼び出しのコールバック関数をデプロイする

プロジェクトを初期化します。

s init devsapp/start-fc-event-python3 -d async-fail-callback

次のサンプルコードは、作成されたプロジェクトのディレクトリを示しています。

├── async-fail-callback
│   ├── code
│   │   └── index.py
│   └── s.yaml

プロジェクトが存在するディレクトリに移動します。
```
cd async-fail-callback
```

ビジネス要件に基づいて、ディレクトリファイルのパラメーター設定を変更します。

s.yamlファイルを編集します。例：

edition: 1.0.0
name: hello-world-app
# access specifies the key information required by the current application.
# For information about how to configure keys, visit https://www.serverless-devs.com/serverless-devs/command/config.
# For more information about how to use keys, visit https://www.serverless-devs.com/serverless-devs/tool.
access: "default"

vars: # The global variables
  region: "cn-shenzhen"

services:
  helloworld: # The name of the service or module.
    component: fc
    props:
      region: ${vars.region}
      service:
        name: "async-callback-service"
        description: 'async callback service'
        # Obtain the logConfig configuration document from https://gitee.com/devsapp/fc/blob/main/docs/zh/yaml/service.md#logconfig.
        logConfig:
          project: tgpu-prj-sh             # The project that stores the request logs. You must create the project in Simple Log Service in advance. We recommend that you configure this item.
          logstore: tgpu-logstore-sh       # The Logstore that stores the request logs. You must create the Logstore in Simple Log Service in advance. We recommend that you configure this item.
          enableRequestMetrics: true
          enableInstanceMetrics: true
          logBeginRule: DefaultRegex
      function:
        name: "async-callback-fail-func"
        description: 'async callback fail func'
        runtime: python3
        codeUri: ./code
        handler: index.handler
        memorySize: 128
        timeout: 60

index.pyファイルを編集します。例：

# -*- coding: utf-8 -*-
import logging

# To enable the initializer feature
# please implement the initializer function as below:
# def initializer(context):
#   logger = logging.getLogger()
#   logger.info('initializing')

def handler(event, context):
  logger = logging.getLogger()
  logger.info('hello async callback fail')
  return 'hello async callback fail'

コードをデプロイするDeploy the code toFunction Compute.
```
s deploy
```
デプロイされた関数は、Function Computeコンソールで表示できます。
オンプレミスマシンを使用して関数を呼び出し、デバッグします。
```
s invoke
```
呼び出しが完了すると、hello async callback failが返されます。

ステップ3: GPU関数のデプロイ

プロジェクトディレクトリを作成します。
```
mkdir fc-gpu-async-job&&cd fc-gpu-async-job
```

次のディレクトリ構造に基づいてファイルを作成します。ファイルを作成するときに、パラメーターの実際の設定を使用します。

ディレクトリ構造：

├── fc-gpu-async-job
├── code
│   ├── app.py
│   └── Dockerfile
└── s.yaml

s.yamlファイルを編集します。例：

edition: 1.0.0
name: gpu-container-demo
# access specifies the key information required by the current application.
# For information about how to configure keys, visit https://www.serverless-devs.com/serverless-devs/command/config.
# For information about the order in which keys are used, visit https://www.serverless-devs.com/serverless-devs/tool.
access: default
vars:
  region: cn-shenzhen
services:
  customContainer-demo:
    component: devsapp/fc
    props:
      region: ${vars.region}
      service:
        name: tgpu_basic_service
        internetAccess: true
        # Obtain the logConfig configuration document from https://gitee.com/devsapp/fc/blob/main/docs/zh/yaml/service.md#logconfig.
        logConfig:
          project: aliyun****          # The project that stores the request logs. You must create the project in Simple Log Service in advance. We recommend that you configure this item.
          logstore: func****     # The Logstore that stores the request logs. You must create the Logstore in Simple Log Service in advance. We recommend that you configure this item.
          enableRequestMetrics: true
          enableInstanceMetrics: true
          logBeginRule: DefaultRegex
      function:
        name: tgpu_basic_func
        description: test gpu basic
        handler: not-used
        timeout: 600
        caPort: 9000
        # You can select an appropriate GPU-accelerated instance type based on the actual GPU memory usage. The following example shows the 1/8 virtualized GPU specification:
        instanceType: fc.gpu.tesla.1
        gpuMemorySize: 2048
        cpu: 1
        memorySize: 4096
        diskSize: 512
        instanceConcurrency: 1
        runtime: custom-container
        customContainerConfig:
          # Specify the information about your image. You must create a Container Registry Personal Edition or Enterprise Edition instance in advance. You must also create a namespace and an image repository.
          image: registry.cn-shenzhen.aliyuncs.com/my****/my****
          # Enable image acceleration. This feature can optimize the cold start of gigabyte-level images.
          accelerationType: Default
        codeUri: ./code
        # Asynchronous mode configurations
        #For more information, see https://gitee.com/devsapp/fc/blob/main/docs/zh/yaml/function.md#asyncconfiguration.
        asyncConfiguration:
          destination:           
            # Specify the Alibaba Cloud Resource Name (ARN) of the callback function for failed invocations.
            onFailure: "acs:fc:cn-shenzhen:164901546557****:services/async-callback-service.LATEST/functions/async-callback-fail-func"
            # Specify the ARN of the callback function for successful invocations.
            onSuccess: "acs:fc:cn-shenzhen:164901546557****:services/async-callback-service.LATEST/functions/async-callback-succ-func"
          statefulInvocation: true
      triggers:
        - name: httpTrigger
          type: http
          config:
            authType: anonymous
            methods:
              - GET

Dockerfileファイルを編集します。例：

FROM nvidia/cuda:11.0-base
FROM ubuntu
WORKDIR /usr/src/app
RUN apt-get update
RUN apt-get install -y python3
COPY . .
CMD [ "python3", "-u", "/usr/src/app/app.py" ]
EXPOSE 9000

app.pyファイルを編集します。例：

# -*- coding: utf-8 -*-
# python2 and python3

from __future__ import print_function
from http.server import HTTPServer, BaseHTTPRequestHandler
import json
import sys
import logging
import os
import time

host = ('0.0.0.0', 9000)

class Resquest(BaseHTTPRequestHandler):
    def do_GET(self):
        print("simulate long execution scenario, sleep 10 seconds")
        time.sleep(10)

        print("show me GPU info")
        msg = os.popen("nvidia-smi -L").read()
        data = {'result': msg}
        self.send_response(200)
        self.send_header('Content-type', 'application/json')
        self.end_headers()
        self.wfile.write(json.dumps(data).encode())

if __name__ == '__main__':
    server = HTTPServer(host, Resquest)
    print("Starting server, listen at: %s:%s" % host)
    server.serve_forever()

コードをデプロイするDeploy the code toFunction Compute.
```
s deploy
```
デプロイされたGPU関数と関数の非同期設定は、Function Computeコンソールで確認できます。
オンプレミスマシンを使用して関数を呼び出し、デバッグします。
```
s invoke
```
呼び出しが完了したら、Hello, World! が返されます。
非同期タスクを送信します。
1. GPU機能のイメージアクセラレーションの準備ステータスを表示します。
  イメージアクセラレーションのステータスが [利用可能] に変わった後、非同期タスクを開始することを推奨します。そうでない場合、リンクタイムアウトなどの例外が発生する可能性があります。
2. Function Computeコンソールにログインします。 GPU関数tgpu_basic_funcを見つけます。 [非同期タスク] タブで、[タスクの送信] をクリックします。
実行が完了すると、タスクのステータスが [成功] に変わります。
呼び出しを成功させるために設定されたコールバック関数async-callback-succ-funcを見つけることができます。 [ログ] > [呼び出し要求リスト] を選択し、非同期リクエストの結果行を見つけて、呼び出しが成功したかどうかを確認します。

追加情報

GPU関数のベストプラクティスの詳細については、「サーバーレスGPUアプリケーションのユースケース」をご参照ください。