Use the SelectObject method to run SQL statements against CSV and JSON files stored in OSS — without downloading the entire file first.
Prerequisites
Before you begin, make sure you have:
An OSS bucket in the
cn-hangzhouregion (or another supported region)The
oss:GetObjectpermission on the target object. For details, see Grant custom permissions to a RAM userAccess credentials configured as environment variables. For details, see Configure access credentials
Limitations
Only CSV and JSON files are supported as input.
The sample code uses the
cn-hangzhouregion with a public endpoint. To access OSS from another Alibaba Cloud service in the same region, use the internal endpoint instead. For a full list of regions and endpoints, see Regions and endpoints.
How it works
Each SelectObject request requires two configuration objects:
Input serialization (
InputSerializationSelect): Specifies the format of the source file — CSV or JSON — and how to parse it. For CSV files, setFileHeaderInfotoUseto treat the first row as a column header. For JSON files, setJSONTypetoDOCUMENTfor a single JSON object.Output serialization (
OutputSerializationSelect): Controls the format of the query result. SetOutputHeadertotrueto include column headers in the output.
The SQL expression uses ossobject as the table alias. For example, select * from ossobject limit 10 returns the first 10 rows.
The result is returned as a stream. Use io.ReadAll to read it fully into memory.
Method definition
func (c *Client) SelectObject(ctx context.Context, request *SelectObjectRequest, optFns ...func(*Options)) (*SelectObjectResult, error)Request parameters
| Parameter | Type | Description |
|---|---|---|
ctx | context.Context | The request context. Use this to set an overall timeout for the request. |
request | *SelectObjectRequest | The request parameters. For details, see SelectObjectRequest. |
optFns | ...func(*Options) | (Optional) Operation-level configuration options. For details, see Options. |
Response parameters
| Parameter | Type | Description |
|---|---|---|
result | *SelectObjectResult | The response. Valid only when err is nil. For details, see SelectObjectResult. |
err | error | The request status. Non-nil if the request fails. |
Sample code
Query a CSV file
The following sample runs select * from ossobject limit 10 against a CSV file. The FileHeaderInfo field is set to Use, which tells OSS to treat the first row as a header and include it when resolving column references.
package main
import (
"context"
"flag"
"io"
"log"
"github.com/aliyun/alibabacloud-oss-go-sdk-v2/oss"
"github.com/aliyun/alibabacloud-oss-go-sdk-v2/oss/credentials"
)
// Define global variables.
var (
region string // The storage region.
bucketName string // The bucket name.
objectName string // The object name.
)
// The init function is used to initialize command line parameters.
func init() {
flag.StringVar(®ion, "region", "", "The region in which the bucket is located.")
flag.StringVar(&bucketName, "bucket", "", "The name of the bucket.")
flag.StringVar(&objectName, "object", "", "The name of the object.")
}
func main() {
// Parse command line parameters.
flag.Parse()
// Check whether the bucket name is empty.
if len(bucketName) == 0 {
flag.PrintDefaults()
log.Fatalf("invalid parameters, bucket name required")
}
// Check whether the region is empty.
if len(region) == 0 {
flag.PrintDefaults()
log.Fatalf("invalid parameters, region required")
}
// Check whether the object name is empty.
if len(objectName) == 0 {
flag.PrintDefaults()
log.Fatalf("invalid parameters, object name required")
}
// Load the default configurations and set the credential provider and region.
cfg := oss.LoadDefaultConfig().
WithCredentialsProvider(credentials.NewEnvironmentVariableCredentialsProvider()).
WithRegion(region)
// Create an OSS client.
client := oss.NewClient(cfg)
// Create a request to select an object.
request := &oss.SelectObjectRequest{
Bucket: oss.Ptr(bucketName), // The bucket name.
Key: oss.Ptr(objectName), // The object name.
SelectRequest: &oss.SelectRequest{
Expression: oss.Ptr("select * from ossobject limit 10"), // Define an SQL expression to query the first 10 rows of data in the object.
InputSerializationSelect: oss.InputSerializationSelect{
CsvBodyInput: &oss.CSVSelectInput{
FileHeaderInfo: oss.Ptr("Use"),
},
},
OutputSerializationSelect: oss.OutputSerializationSelect{
OutputHeader: oss.Ptr(true),
},
},
}
// Execute the request to select the object.
result, err := client.SelectObject(context.TODO(), request)
if err != nil {
log.Fatalf("failed to select object %v", err)
}
content, err := io.ReadAll(result.Body)
if err != nil {
log.Fatalf("failed to read object %v", err)
}
// Print the result of selecting the object.
log.Printf("select object result:%#v\n", string(content))
}Query a JSON file
The following sample runs the same SQL expression against a JSON file. The JSONType field is set to DOCUMENT, which treats the file as a single JSON object.
package main
import (
"context"
"flag"
"io"
"log"
"github.com/aliyun/alibabacloud-oss-go-sdk-v2/oss"
"github.com/aliyun/alibabacloud-oss-go-sdk-v2/oss/credentials"
)
// Define global variables.
var (
region string // The storage region.
bucketName string // The bucket name.
objectName string // The object name.
)
// The init function is used to initialize command line parameters.
func init() {
flag.StringVar(®ion, "region", "", "The region in which the bucket is located.")
flag.StringVar(&bucketName, "bucket", "", "The name of the bucket.")
flag.StringVar(&objectName, "object", "", "The name of the object.")
}
func main() {
// Parse command line parameters.
flag.Parse()
// Check whether the bucket name is empty.
if len(bucketName) == 0 {
flag.PrintDefaults()
log.Fatalf("invalid parameters, bucket name required")
}
// Check whether the region is empty.
if len(region) == 0 {
flag.PrintDefaults()
log.Fatalf("invalid parameters, region required")
}
// Check whether the object name is empty.
if len(objectName) == 0 {
flag.PrintDefaults()
log.Fatalf("invalid parameters, object name required")
}
// Load the default configurations and set the credential provider and region.
cfg := oss.LoadDefaultConfig().
WithCredentialsProvider(credentials.NewEnvironmentVariableCredentialsProvider()).
WithRegion(region)
// Create an OSS client.
client := oss.NewClient(cfg)
// Create a request to select an object.
request := &oss.SelectObjectRequest{
Bucket: oss.Ptr(bucketName), // The bucket name.
Key: oss.Ptr(objectName), // The object name.
SelectRequest: &oss.SelectRequest{
Expression: oss.Ptr("select * from ossobject limit 10"), // Define an SQL expression to query the first 10 rows of data in the object.
InputSerializationSelect: oss.InputSerializationSelect{
JsonBodyInput: &oss.JSONSelectInput{
JSONType: oss.Ptr("DOCUMENT"),
},
},
OutputSerializationSelect: oss.OutputSerializationSelect{
OutputHeader: oss.Ptr(true),
},
},
}
// Send the request to select the object.
result, err := client.SelectObject(context.TODO(), request)
if err != nil {
log.Fatalf("failed to select object %v", err)
}
content, err := io.ReadAll(result.Body)
if err != nil {
log.Fatalf("failed to read object %v", err)
}
// Print the result of selecting the object.
log.Printf("select object result:%#v\n", string(content))
}Error handling
If SelectObject returns an error, inspect the error message for details. For the complete API reference, see SelectObject.
What's next
For the complete sample code, see GitHub sample.
For the full API reference, see SelectObject.
For an overview of the SelectObject feature, see Query files.