We recommend that you use official Elastic Algorithm Service (EAS) SDKs provided by Machine Learning Platform for AI (PAI) to call the services deployed based on models. This reduces the time required for defining the call logic and improves the call stability. This topic describes EAS SDK for Python in detail. This topic also provides demos to describe how to use EAS SDK for Python to call services. In these demos, inputs and outputs are of commonly used types.

Installation command

pip install -U eas-prediction --user

Methods

Class Method Detailed information
PredictClient PredictClient(endpoint, service_name)
  • Description: Creates a client object of the PredictClient class.
  • Parameters:
    • endpoint: the endpoint of the server.

      To call a service in regular mode, set this parameter to the endpoint of the default gateway. Example: 182848887922***.cn-shanghai.pai-eas.aliyuncs.com.

      If you want to use a Virtual Private Cloud (VPC) direct connection, set this parameter to the common endpoint of the current region. For example, if the current region is China (Shanghai), set this parameter to pai-eas-vpc.cn-shanghai.aliyuncs.com.

    • service_name: the name of the service.
set_endpoint(endpoint)
  • Description: Specifies the endpoint of the server.
  • Parameter: endpoint: the endpoint of the server.

    To call a service in regular mode, set this parameter to the endpoint of the default gateway. Example: 182848887922***.cn-shanghai.pai-eas.aliyuncs.com.

    If you want to use a Virtual Private Cloud (VPC) direct connection, set this parameter to the common endpoint of the current region. For example, if the current region is China (Shanghai), set this parameter to pai-eas-vpc.cn-shanghai.aliyuncs.com.

set_service_name(service_name)
  • Description: Specifies the name of the service.
  • Parameter: service_name: the name of the service.
set_endpoint_type(endpoint_type)
  • Description: Specifies the gateway type of the server.
  • Parameter: endpoint_type: the gateway type to be used. The following gateway types are supported:
    • ENDPOINT_TYPE_GATEWAY: the default gateway.
    • ENDPOINT_TYPE_DIRECT: VPC direct connection channels. By default, if you do not set this parameter, a gateway is used to access the service.
set_token(token)
  • Description: Specifies the token for service access.
  • Parameter: token: the token for service access.
set_retry_count(max_retry_count)
  • Description: Sets the maximum number of retries allowed after a request failure.
  • Parameter: max_retry_count: the maximum number of retries allowed after a request failure. Default value: 5.
    Notice The client must resend requests if process errors occur on the server, server errors occur, or persistent connections to gateways are closed. Therefore, you cannot set this parameter to 0.
set_max_connection_count(max_connection_count)
  • Description: Sets the maximum number of persistent connections allowed in the connection pool of the client. To achieve better performance, the client establishes persistent connections to the server and stores the persistent connections in the connection pool. Each time you initiate a request, the client uses an idle connection in the connection pool to access the required service.
  • Parameter: max_connection_count: the maximum number of persistent connections allowed in the connection pool. Default value: 100.
set_timeout(timeout)
  • Description: Sets the timeout period of a request.
  • Parameter: timeout: the timeout period of a request. Unit: milliseconds. Default value: 5000.
init() Description: Initializes a client object. After all the preceding methods that are used to set parameters are called, the parameters take effect only after you call the Init() method.
predict(request)
  • Description: Sends a prediction request to the online prediction service.
  • Parameter: request: an abstract class, which can be a request of various types, such as a request by using a string or a TensorFlow request.
  • Return value: the response to the prediction request.
StringRequest StringRequest(request_data)
  • Description: Creates an object of the StringRequest class.
  • Parameter: request_data: the request string to be sent.
StringResponse to_string()
  • Description: converts the response of the StringResponse class to a string.
  • Return value: the response body of the request.
TFRequest TFRequest(signature_name)
  • Description: Creates an object of the TFRequest class.
  • Parameter: signature_name: the signature name of the model of the service to be called.
add_feed(self, input_name, shape, data_type, content)
  • Description: Specifies the input tensor of the TensorFlow model of the online prediction service to be called.
  • Parameters:
    • input_name: the alias of the input tensor.
    • shape: the shape of the input tensor.
    • data_type: the data type of the input tensor. The following data types are supported:
      • TFRequest.DT_FLOAT
      • TFRequest.DT_DOUBLE
      • TFRequest.DT_INT8
      • TFRequest.DT_INT16
      • TFRequest.DT_INT32
      • TFRequest.DT_INT64
      • TFRequest.DT_STRING
      • TFRequest.TF_BOOL
    • content: the data of the input tensor. Specify the value in the form of a one-dimensional array.
add_fetch(self, output_name)
  • Description: Specifies the alias of the output tensor to be exported of the TensorFlow model.
  • Parameter: output_name: the alias of the output tensor to be exported.

    If the TensorFlow model is in the SavedModel format, this parameter is optional. If this parameter is not specified, all output tensors are exported.

    If the TensorFlow model is a frozen model, this parameter is required.

to_string()
  • Description: Serializes the protocol buffer (PB) object into a string. The PB object is created by using the TFRequest class and is used to transmit requests.
  • Return value: the string that is obtained after the TFRequest-based serialization is complete.
TFResponse get_tensor_shape(output_name)
  • Description: Queries the shape of the output tensor identified by the specified alias.
  • Parameter: output_name: the alias of the output tensor whose shape you want to query.
  • Return value: the shape of the output tensor.
get_values(output_name)
  • Description: Queries the data of the specified output tensor.
  • Parameter: output_name: the alias of the output tensor whose data you want to query.
  • Return value: a one-dimensional array. You can call this method together with the get_tensor_shape() method to query the shape of the output tensor. The return value is in the form of a multi-dimensional array. The data type of the output tensor determines the data type of the one-dimensional array that is returned.
TorchRequest TorchRequest() Description: Creates an object of the TorchRequest class.
add_feed(self, index, shape, data_type, content)
  • Description: Specifies the input tensor of the PyTorch model of the online prediction service to be called.
  • Parameters:
    • index: the index of the input tensor.
    • shape: the shape of the input tensor.
    • data_type: the data type of the input tensor. The following data types are supported:
      • TFRequest.DT_FLOAT
      • TFRequest.DT_DOUBLE
      • TFRequest.DT_INT8
      • TFRequest.DT_INT16
      • TFRequest.DT_INT32
      • TFRequest.DT_INT64
      • TFRequest.DT_STRING
      • TFRequest.TF_BOOL
    • content: the data of the input tensor. Specify the value in the form of a one-dimensional array.
add_fetch(self, output_index)
  • Description: Specifies the index of the output tensor to be exported of the PyTorch model. This method is optional. If you do not call this method to set the index of the output tensor, all output tensors are exported.
  • Parameter: output_index: the index of the output tensor to be exported.
to_string()
  • Description: Serializes the PB object into a string. The PB object is created by using the TorchRequest class and is used to transmit requests.
  • Return value: the string that is obtained after the TorchRequest-based serialization is complete.
TorchResponse get_tensor_shape(output_index)
  • Description: Queries the shape of the output tensor identified by the specified index.
  • Parameter: output_index: the index of the output tensor whose shape you want to query.
  • Return value: the shape of the output tensor identified by the specified index.
get_values(output_index)
  • Description: Queries the data of the specified output tensor. The return value is in the form of a one-dimensional array. You can call this method together with the get_tensor_shape() method to query the shape of the output tensor. The return value is in the form of a multi-dimensional array. The data type of the output tensor determines the data type of the one-dimensional array that is returned.
  • Parameter: output_index: the index of the output tensor whose data you want to query.
  • Return value: a one-dimensional array.

Demos

  • Input and output as strings
    If you use custom processors to deploy models as services, strings are often used to call the services, such as a service deployed based on a Predictive Model Markup Language (PMML) model. The following demo is for your reference:
    #!/usr/bin/env python
    
    from eas_prediction import PredictClient
    from eas_prediction import StringRequest
    
    if __name__ == '__main__':
        client = PredictClient('http://182848887922****.cn-shanghai.pai-eas.aliyuncs.com', 'scorecard_pmml_example')
        client.set_token('YWFlMDYyZDNmNTc3M2I3MzMwYmY0MmYwM2Y2MTYxMTY4NzBkNzdj****')
        client.init()
    
        request = StringRequest('[{"fea1": 1, "fea2": 2}]')
        for x in range(0, 1000000):
            resp = client.predict(request)
            print(resp)
  • Call a TensorFlow model
    If you use TensorFlow to deploy models as services, you must use the TFRequest and TFResponse classes to call the services. The following demo is for your reference:
    #!/usr/bin/env python
    
    from eas_prediction import PredictClient
    from eas_prediction import StringRequest
    from eas_prediction import TFRequest
    
    if __name__ == '__main__':
        client = PredictClient('http://182848887922****.cn-shanghai.pai-eas.aliyuncs.com', 'mnist_saved_model_example')
        client.set_token('YTg2ZjE0ZjM4ZmE3OTc0NzYxZDMyNmYzMTJjZTQ1YmU0N2FjMTAy****')
        client.init()
    
        #request = StringRequest('[{}]')
        req = TFRequest('predict_images')
        req.add_feed('images', [1, 784], TFRequest.DT_FLOAT, [1] * 784)
        for x in range(0, 1000000):
            resp = client.predict(req)
            print(resp)
  • Use a VPC direct connection channel to call a service
    You can use a VPC direct connection channel to access only the services that are deployed in the dedicated resource group for EAS. In addition, to use the channel, the dedicated resource group for EAS and the specified vSwitch must be connected to the VPC. For more information, see Dedicated resource groups and VPC direct connection channel. Compared with the regular mode, this mode contains an additional line of code: client.set_endpoint_type(ENDPOINT_TYPE_DIRECT). You can use this mode in high-concurrency and heavy-traffic scenarios. The following demo is for your reference:
    #!/usr/bin/env python
    
    from eas_prediction import PredictClient
    from eas_prediction import StringRequest
    from eas_prediction import TFRequest
    from eas_prediction import ENDPOINT_TYPE_DIRECT
    
    if __name__ == '__main__':
        client = PredictClient('http://pai-eas-vpc.cn-hangzhou.aliyuncs.com', 'mnist_saved_model_example')
        client.set_token('M2FhNjJlZDBmMzBmMzE4NjFiNzZhMmUxY2IxZjkyMDczNzAzYjFi****')
        client.set_endpoint_type(ENDPOINT_TYPE_DIRECT)
        client.init()
    
        request = TFRequest('predict_images')
        request.add_feed('images', [1, 784], TFRequest.DT_FLOAT, [1] * 784)
        for x in range(0, 1000000):
            resp = client.predict(request)
            print(resp)
  • Call a PyTorch model
    If you use PyTorch to deploy models as services, you must use the TorchRequest and TorchResponse classes to call the services. The following demo is for your reference:
    #!/usr/bin/env python
    
    from eas_prediction import PredictClient
    from eas_prediction import TorchRequest
    
    if __name__ == '__main__':
        client = PredictClient('http://182848887922****.cn-shanghai.pai-eas.aliyuncs.com', 'pytorch_gpu_wl')
        client.init()
    
        req = TorchRequest()
        req.add_feed(0, [1, 3, 224, 224], TorchRequest.DT_FLOAT, [1] * 150528)
        # req.add_fetch(0)
        import time
        st = time.time()
        timer = 0
        for x in range(0, 10):
            resp = client.predict(req)
            timer += (time.time() - st)
            st = time.time()
            print(resp.get_tensor_shape(0))
            # print(resp)
        print("average response time: %s s" % (timer / 10) )
  • Call a Blade processor-based model
    If you use Blade processors to deploy models as services, you must use the BladeRequest and BladeResponse classes to call the services. The following demo is for your reference:
    #!/usr/bin/env python
    
    from eas_prediction import PredictClient
    from eas_prediction import BladeRequest 
    
    if __name__ == '__main__':
        client = PredictClient('http://182848887922****.cn-shanghai.pai-eas.aliyuncs.com', 'nlp_model_example')
        client.init()
    
        req = BladeRequest()
    
        req.add_feed('input_data', 1, [1, 360, 128], BladeRequest.DT_FLOAT, [0.8] * 85680)
        req.add_feed('input_length', 1, [1], BladeRequest.DT_INT32, [187])
        req.add_feed('start_token', 1, [1], BladeRequest.DT_INT32, [104])
        req.add_fetch('output', BladeRequest.DT_FLOAT)
        import time
        st = time.time()
        timer = 0
        for x in range(0, 10):
            resp = client.predict(req)
            timer += (time.time() - st)
            st = time.time()
            # print(resp)
            # print(resp.get_values('output'))
            print(resp.get_tensor_shape('output'))
        print("average response time: %s s" % (timer / 10) )
  • Call a Blade processor-based model that is compatible with default TensorFlow methods
    You can use the TFRequest and TFResponse classes to call a Blade processor-based model that is compatible with default TensorFlow methods supported by EAS. The following demo is for your reference:
    #!/usr/bin/env python
    
    from eas_prediction import PredictClient
    from eas_prediction.blade_tf_request import TFRequest # Need Importing blade TFRequest 
    
    if __name__ == '__main__':
        client = PredictClient('http://182848887922****.cn-shanghai.pai-eas.aliyuncs.com', 'nlp_model_example')
        client.init()
    
        req = TFRequest(signature_name='predict_words')
    
        req.add_feed('input_data', [1, 360, 128], TFRequest.DT_FLOAT, [0.8] * 85680)
        req.add_feed('input_length', [1], TFRequest.DT_INT32, [187])
        req.add_feed('start_token', [1], TFRequest.DT_INT32, [104])
        req.add_fetch('output')
        import time
        st = time.time()
        timer = 0
        for x in range(0, 10):
            resp = client.predict(req)
            timer += (time.time() - st)
            st = time.time()
            # print(resp)
            # print(resp.get_values('output'))
            print(resp.get_tensor_shape('output'))
        print("average response time: %s s" % (timer / 10) )