Invoke EAS AI Endpoints via Go SDK - Platform For AI - Alibaba Cloud - Platform For AI

This guide explains the APIs of the official SDK for Go and provides complete code examples for common input and output formats.

Note

For information about the SDK's use cases and invocation principles, see Service invocation SDKs.

Prerequisites

When you call an inference service using the SDK for Go, the Go package manager automatically downloads the SDK source code from GitHub during compilation. You do not need to install the SDK beforehand. If you need to customize the invocation logic, you can download the SDK for Go source code and modify it locally.

To import the SDK, use the following code:

import (
    "github.com/pai-eas/eas-golang-sdk/eas"
)

Quick start

Select the Request class that corresponds to your model's input data format. The following example shows a minimal, end-to-end invocation using a string request.

package main

import (
    "fmt"
    "github.com/pai-eas/eas-golang-sdk/eas"
)

func main() {
    client := eas.NewPredictClient("182848887922****.cn-shanghai.pai-eas.aliyuncs.com", "my_service")
    client.SetToken("YOUR_SERVICE_TOKEN")
    client.Init()

    resp, err := client.StringPredict("[{}]")
    if err != nil {
        fmt.Printf("failed to predict: %v\n", err.Error())
    } else {
        fmt.Printf("%v\n", resp)
    }
}

API reference

The SDK for Go provides the following classes, grouped by purpose:

Group	Description
Main client	PredictClient: Configures service information such as endpoint, service name, and token, sends requests, and receives responses.
Input and output	TFRequest / TFResponse: Wraps requests and responses for TensorFlow models. TorchRequest / TorchResponse: Wraps requests and responses for PyTorch models. For string-based scenarios, you do not need dedicated Request or Response classes. Pass a string directly to the `StringPredict()` method and receive a string response.
Queuing service	QueueClient: An asynchronous queue client used to send data, subscribe to data pushes, and query queue status. types.Watcher: A watcher, created by `QueueClient.Watch()`, that receives pushed data.

PredictClient

The main client class used to configure service information, send requests, and receive prediction results.

Method	Description
`NewPredictClient(endpoint string, serviceName string) *PredictClient`	Creates a PredictClient object. Parameters: endpoint: Required. The endpoint address of the server. For a standard service, set this parameter to the default gateway endpoint. serviceName: Required. The name of the service. Returns a `PredictClient` object.
`SetEndpoint(endpointName string)`	Sets the service endpoint. endpointName specifies the endpoint address of the server. For a standard service, set this parameter to the default gateway endpoint.
`SetServiceName(serviceName string)`	Sets the name of the service to call. serviceName specifies the name of the service to call.
`SetEndpointType(endpointType string)`	Sets the gateway type for the server. endpointType specifies the gateway type. The system supports the following gateway types: "DEFAULT": The default gateway. This type is used if no gateway type is specified. "DIRECT": Accesses the service through a high-speed direct connection channel.
`SetToken(token string)`	Sets the token for service access. token specifies the authentication token used to access the service.
`SetHttpTransport(transport *http.Transport)`	Sets the Transport property of the HTTP client. transport specifies the Transport object used when sending HTTP requests.
`SetRetryCount(max_retry_count int)`	Sets the number of retries for failed requests. max_retry_count specifies the number of times to retry a failed request. The default value is 5. Important Requests are automatically retried if they fail due to a server process error, server error, or a dropped persistent gateway connection. Therefore, do not set this parameter to 0.
`SetTimeout(timeout int)`	Sets the request timeout. timeout specifies the request timeout duration in milliseconds (ms). The default value is 5000.
`Init()`	Initializes the PredictClient object. After setting parameters with the preceding methods, you must call `Init()` to apply the changes.
`Predict(request Request) Response`	Submits a prediction request to an inference service. Parameter: An object that implements the Request interface, such as `StringRequest`, `TFRequest`, or `TorchRequest`. Returns: An object that implements the Response interface, such as `StringResponse`, `TFResponse`, or `TorchResponse`.
`StringPredict(request string) string`	Submits a string prediction request to an inference service. request specifies the request string to be sent. Returns a string containing the service response.
`TorchPredict(request TorchRequest) TorchResponse`	Submits a PyTorch prediction request to an inference service. request specifies an object of the TorchRequest class. Returns the corresponding TorchResponse object.
`TFPredict(request TFRequest) TFResponse`	Submits a TensorFlow prediction request to an inference service. request specifies an object of the TFRequest class. Returns the corresponding TFResponse object.

TFRequest

Builds input data for TensorFlow models.

Method	Description
`TFRequest(signatureName string)`	Creates a `TFRequest` object. signatureName specifies the Signature Name of the requested model.
`AddFeed(?)(inputName string, shape []int64{}, content []?)`	Specifies an input tensor for the TensorFlow service request. Parameters: inputName: The alias of the input tensor. shape: The TensorShape of the input tensor. content: The content of the input tensor, flattened into a one-dimensional array. Supported types include INT32, INT64, FLOAT32, FLOAT64, STRING, and BOOL. The method name indicates the data type (e.g., `AddFeedInt32()`). For other data types, you must construct the request in PB format by referring to the source code.
`AddFetch(outputName string)`	Specifies the alias of an output tensor to retrieve. outputName specifies the alias of the output tensor to retrieve. For a SavedModel model, this parameter is optional. If this parameter is not set, all outputs are returned. For a frozen model, this parameter is required.

TFResponse

Parses output data from TensorFlow models.

Method	Description
`GetTensorShape(outputName string) []int64`	Gets the TensorShape of the output tensor for the specified alias. outputName specifies the alias of the tensor whose output shape you want to retrieve. Returns: The tensor shape, with each dimension represented in an array.
`Get(?)Val(outputName string) [](?)`	Gets the data vector of the output tensor. The output is a one-dimensional array. Use this method with the `GetTensorShape()` method to reconstruct the multi-dimensional tensor. Supported types include FLOAT, DOUBLE, INT, INT64, STRING, and BOOL. The method name indicates the data type, for example, `GetFloatVal()`. outputName specifies the alias of the tensor whose output data you want to retrieve. Returns: A one-dimensional array that contains the flattened data of the output tensor.

TorchRequest

Builds input data for PyTorch models.

Method	Description
`TorchRequest()`	Creates a `TorchRequest` object.
`AddFeed(?)(index int, shape []int64{}, content []?)`	Specifies an input tensor for the PyTorch service request. Parameters: index: The index of the input tensor. shape: The TensorShape of the input tensor. content: The content of the input tensor, flattened into a one-dimensional array. Supported types include INT32, INT64, FLOAT32, and FLOAT64. The method name indicates the data type (e.g., `AddFeedInt32()`). For other data types, you must construct the request in PB format by referring to the source code.
`AddFetch(outputIndex int)`	Specifies by index an output tensor to retrieve. This method is optional. If you do not call this method, all outputs are returned. outputIndex specifies the index of the output tensor.

TorchResponse

Parses output data from PyTorch models.

Method	Description
`GetTensorShape(outputIndex int) []int64`	Gets the TensorShape of the output tensor for the specified index. outputIndex specifies the index of the output tensor. Returns: The tensor shape, with each dimension represented in an array.
`Get(?)Val(outputIndex int) [](?)`	Gets the data vector of the output tensor. The output is a one-dimensional array. Use this method with the `GetTensorShape()` method to reconstruct the multi-dimensional tensor. Supported types include FLOAT, DOUBLE, INT, and INT64. The method name indicates the data type, for example, `GetFloatVal()`. outputIndex specifies the index of the tensor whose output data you want to retrieve. Returns: A one-dimensional array that contains the flattened data of the output tensor.

QueueClient

Interacts with the EAS queuing service to produce, consume, and manage data.

Method	Description
`NewQueueClient(endpoint, queueName, token string) (*QueueClient, error)`	Creates a `QueueClient` object. Parameters: endpoint: The endpoint address of the server. queueName: The name of the queuing service. token: The token for the queuing service. Returns a `QueueClient` object and an error, if any.
`Truncate(ctx context.Context, index uint64) error`	Truncates data in the queue before the specified index, retaining only the data from the specified index onwards. Parameters: ctx: The context for the current operation. index: The index of the data in the queue to be truncated.
`Put(ctx context.Context, data []byte, tags types.Tags) (index uint64, requestId string, err error)`	Writes a record to the queue. Parameters: ctx: The context for the current operation. data: The content of the data to be written to the queue. Returns: index: The index of the current data record in the queue. You can use this index to query data from the queue. requestId: The auto-generated request ID for the data record. This ID is a special tag that you can also use to query the record.
`GetByIndex(ctx context.Context, index uint64) (dfs []types.DataFrame, err error)`	Gets a record from the queue by its index. The record is automatically deleted from the queue upon retrieval. Parameters: ctx: The context for the current operation. index: The index of the data to be queried from the queue. dfs: The query result, which is encapsulated as a DataFrame object.
`GetByRequestId(ctx context.Context, requestId string) (dfs []types.DataFrame, err error)`	Gets a record from the queue by its request ID. The record is automatically deleted from the queue upon retrieval. Parameters: ctx: The context for the current operation. requestId: The request ID of the data to be queried from the queue. dfs: The query result, which is encapsulated as a DataFrame object.
`Get(ctx context.Context, index uint64, length int, timeout time.Duration, autoDelete bool, tags types.Tags) (dfs []types.DataFrame, err error)`	Queries data from the queue based on specified conditions. The `GetByIndex()` and `GetByRequestId()` methods are simple wrappers for the `Get()` method. Parameters: ctx: The context for the current operation. index: The starting index of the data to be queried. length: The number of data records to be queried. A maximum of length records starting from index (inclusive) are returned. timeout: The waiting time for the query. If length records become available within this period, the call returns immediately. Otherwise, the call returns after the timeout duration is reached. autoDelete: A boolean that specifies whether to automatically delete the queried data from the queue. If `false`, the data can be queried repeatedly. You can manually delete the data by calling the `Del()` method. tags: Queries data that contains specified tags. This parameter is of the `map[string]string` type. The system iterates through `length` records starting from the specified `index` and returns the data that contains the specified tags. dfs: The query result, which is encapsulated as a DataFrame object.
`Del(ctx context.Context, indexes ...uint64)`	Deletes data with the specified indexes from the queue. Parameters: ctx: The context for the current operation. indexes: A list of indexes for the data to be deleted from the queue.
`Attributes() (attrs types.Attributes, err error)`	Gets the attribute information of the queue, including the total length and the current data length. attrs: The attribute information of the queue, which is of the `map[string]string` type.
`Watch(ctx context.Context, index, window uint64, indexOnly bool, autocommit bool) (watcher types.Watcher, err error)`	Subscribes to data in the queue. The queuing service pushes data to the client based on specified conditions. Parameters: ctx: The context for the current operation. index: The starting index for the subscription. window: The subscription window size. This is the maximum number of records that the queuing service can push to a single client instance at a time. Note The server does not push new records until the previous ones are committed. If you commit N records, the queuing service pushes N more records. This ensures that the client does not process more records than the window size at any given time and helps limit concurrency. indexOnly: A boolean that specifies whether to push only index values. autocommit: A boolean that specifies whether to automatically commit a record after it is pushed. We recommend that you set this to `false`. After you receive and process a record, commit it manually. If an instance fails before committing, the queuing service redistributes the uncommitted data to other instances for processing. Returns: A watcher object that can be used to read the pushed data.
`Commit(ctx context.Context, indexes ...uint64) error`	Commits records with the specified indexes. Note Committing indicates that the data pushed by the queuing service has been processed. The service can then remove the data from the queue and will not push it to other instances. Parameters: ctx: The context for the current operation. indexes: A list of indexes for the data to be committed in the queue.

types.Watcher

Reads pushed data from the subscription channel of the queuing service.

Method	Description
`FrameChan() <-chan types.DataFrame`	Returns a channel object. Data pushed from the server is written to this channel, from which you can loop to read the data. Returns: A channel object for reading pushed data.
`Close()`	Closes a watcher object and its backend data connection. Note A client can have only one active watcher object at a time. You must close the current watcher object before creating a new one.

Examples

Synchronous inference by format

Choose the code sample based on your service's input and output types.

String

If you deploy a service with a custom processor, you typically invoke it using strings, for example, when calling a PMML model service. The following program shows a complete example.

package main

import (
        "fmt"
        "github.com/pai-eas/eas-golang-sdk/eas"
)

func main() {
    client := eas.NewPredictClient("182848887922****.cn-shanghai.pai-eas.aliyuncs.com", "scorecard_pmml_example")
    client.SetToken("YWFlMDYyZDNmNTc3M2I3MzMwYmY0MmYwM2Y2MTYxMTY4NzBkNzdj****")
    client.Init()
    req := "[{\"fea1\": 1, \"fea2\": 2}]"
    for i := 0; i < 100; i++ {
        resp, err := client.StringPredict(req)
        if err != nil {
            fmt.Printf("failed to predict: %v\n", err.Error())
        } else {
            fmt.Printf("%v\n", resp)
        }
    }
}

TensorFlow

For TensorFlow models, use TFRequest and TFResponse as the input and output data formats, respectively. The following program shows a complete example.

package main

import (
        "fmt"
        "github.com/pai-eas/eas-golang-sdk/eas"
)

func main() {
    client := eas.NewPredictClient("182848887922****.cn-shanghai.pai-eas.aliyuncs.com", "mnist_saved_model_example")
    client.SetToken("YTg2ZjE0ZjM4ZmE3OTc0NzYxZDMyNmYzMTJjZTQ1YmU0N2FjMTAy****")
    client.Init()

    tfreq := eas.TFRequest{}
    tfreq.SetSignatureName("predict_images")
    tfreq.AddFeedFloat32("images", []int64{1, 784}, make([]float32, 784))

    for i := 0; i < 100; i++ {
        resp, err := client.TFPredict(tfreq)
        if err != nil {
            fmt.Printf("failed to predict: %v", err)
        } else {
            fmt.Printf("%v\n", resp)
        }
    }
}

PyTorch

For PyTorch models, use TorchRequest and TorchResponse as the input and output data formats, respectively. The following program shows a complete example.

package main

import (
        "fmt"
        "github.com/pai-eas/eas-golang-sdk/eas"
)

func main() {
    client := eas.NewPredictClient("182848887922****.cn-shanghai.pai-eas.aliyuncs.com", "pytorch_resnet_example")
    client.SetTimeout(500)
    client.SetToken("ZjdjZDg1NWVlMWI2NTU5YzJiMmY5ZmE5OTBmYzZkMjI0YjlmYWVl****")
    client.Init()
    req := eas.TorchRequest{}
    req.AddFeedFloat32(0, []int64{1, 3, 224, 224}, make([]float32, 150528))
    req.AddFetch(0)
    for i := 0; i < 10; i++ {
        resp, err := client.TorchPredict(req)
        if err != nil {
            fmt.Printf("failed to predict: %v", err)
        } else {
            fmt.Println(resp.GetTensorShape(0), resp.GetFloatVal(0))
        }
    }
}

VPC direct connection

A VPC direct connection lets you access services deployed in an Elastic Algorithm Service (EAS) dedicated resource group. You must also connect the resource group to the specified vSwitch before you can use this mode. For information about how to purchase an EAS dedicated resource group and configure network connectivity, see Use EAS resource groups and Configure EAS to access public or internal resources. This method differs from a standard invocation by requiring only one additional line of code: client.SetEndpointType(eas.EndpointTypeDirect). This mode is ideal for high-traffic, high-concurrency services. The following is a code sample:

package main

import (
        "fmt"
        "github.com/pai-eas/eas-golang-sdk/eas"
)

func main() {
    // Format of a VPC direct connection endpoint: {uid}.vpc.{region-id}.pai-eas.aliyuncs.com. You can find the endpoint on the Invocation Information tab of the service details page in the EAS console.
    client := eas.NewPredictClient("182848887922****.vpc.cn-shanghai.pai-eas.aliyuncs.com", "scorecard_pmml_example")
    client.SetToken("YWFlMDYyZDNmNTc3M2I3MzMwYmY0MmYwM2Y2MTYxMTY4NzBkNzdj****")
    client.SetEndpointType(eas.EndpointTypeDirect)
    client.Init()
    req := "[{\"fea1\": 1, \"fea2\": 2}]"
    for i := 0; i < 100; i++ {
        resp, err := client.StringPredict(req)
        if err != nil {
            fmt.Printf("failed to predict: %v\n", err.Error())
        } else {
            fmt.Printf("%v\n", resp)
        }
    }
}

Client connection parameters

You can set the connection parameters for the client by using the http.Transport property. The following example demonstrates how to configure these settings:

package main

import (
        "fmt"
        "github.com/pai-eas/eas-golang-sdk/eas"
        "net/http"
        "time"
)

func main() {
    // Format of a VPC direct connection endpoint: {uid}.vpc.{region-id}.pai-eas.aliyuncs.com. You can find the endpoint on the Invocation Information tab of the service details page in the EAS console.
    client := eas.NewPredictClient("182848887922****.vpc.cn-shanghai.pai-eas.aliyuncs.com", "network_test")
    client.SetToken("MDAwZDQ3NjE3OThhOTI4ODFmMjJiYzE0MDk1NWRkOGI1MmVhMGI0****")
    client.SetEndpointType(eas.EndpointTypeDirect)
    client.SetHttpTransport(&http.Transport{
        MaxConnsPerHost:       300,
        TLSHandshakeTimeout:   100 * time.Millisecond,
        ResponseHeaderTimeout: 200 * time.Millisecond,
        ExpectContinueTimeout: 200 * time.Millisecond,
    })
}

Queuing service

You can use QueueClient to send data to a queuing service, query data, query the status of the queuing service, and subscribe to data pushes from the queuing service. In this example, one goroutine sends data to the queuing service, while another uses a watcher to subscribe to and receive that data.

Note

When you deploy an asynchronous inference service in EAS, an input queue and an output queue are automatically generated. The addresses are typically in the following formats:

Input queue: <domain>/api/predict/<service_name>

Output queue: <domain>/api/predict/<service_name>/sink

Use <service_name> or <service_name>/sink to build the QueueClient based on your requirements.

    const (
        QueueEndpoint = "182848887922****.cn-shanghai.pai-eas.aliyuncs.com"
        // For example, if the EAS service name is test_qservice, the input queue name is test_qservice, and the output queue name is test_qservice/sink.
        QueueName     = "test_qservice"
        QueueToken    = "YmE3NDkyMzdiMzNmMGM3ZmE4ZmNjZDk0M2NiMDA3OTZmNzc1MTUx****"
    )
    queue, err := NewQueueClient(QueueEndpoint, QueueName, QueueToken)

    // truncate all messages in the queue
    attrs, err := queue.Attributes()
    if index, ok := attrs["stream.lastEntry"]; ok {
        idx, _ := strconv.ParseUint(index, 10, 64)
        queue.Truncate(context.Background(), idx+1)
    }

    ctx, cancel := context.WithCancel(context.Background())

    // create a goroutine to send messages to the queue
    go func() {
        i := 0
        for {
            select {
            case <-time.NewTicker(time.Microsecond * 1).C:
                _, _, err := queue.Put(context.Background(), []byte(strconv.Itoa(i)), types.Tags{})
                if err != nil {
                    fmt.Printf("Error occured, retry to handle it: %v\n", err)
                }
                i += 1
            case <-ctx.Done():
                break
            }
        }
    }()

    // create a watcher to watch the messages from the queue
    watcher, err := queue.Watch(context.Background(), 0, 5, false, false)
    if err != nil {
        fmt.Printf("Failed to create a watcher to watch the queue: %v\n", err)
        return
    }

    // read messages from the queue and commit manually
    for i := 0; i < 100; i++ {
        df := <-watcher.FrameChan()
        err := queue.Commit(context.Background(), df.Index.Uint64())
        if err != nil {
            fmt.Printf("Failed to commit index: %v(%v)\n", df.Index, err)
        }
    }

    // everything is done, close the watcher
    watcher.Close()
    cancel()

Troubleshooting

For troubleshooting common issues when calling services with the SDK for Go, including their causes and solutions, see the "Troubleshoot invocation exceptions" section in Service invocation SDKs.

For a complete list of service status codes, error messages, and recommended actions, see Appendix: Service status codes and common errors.

Prerequisites

Quick start

API reference

PredictClient

TFRequest

TFResponse

TorchRequest

TorchResponse

QueueClient

types.Watcher

Examples

Synchronous inference by format

String

TensorFlow

PyTorch

VPC direct connection

Client connection parameters

Queuing service

Troubleshooting

Related documentation