You can call this operation to execute SQL statements on an object and obtain the execution result.

Note
  • You must have the read permissions on the object.
  • HTTP status code 206 is returned if the operation is successfully performed. If the SQL statements are not valid or cannot match any objects, HTTP status code 400 is returned.
  • For more information about the functions of SelectObject, see SelectObject in OSS Developer Guide.

Request syntax

Request syntax for either CSV or JSON objects.
  • Request syntax for CSV objects
    POST /object? x-oss-process=csv/select HTTP/1.1 
    HOST: BucketName.oss-cn-hangzhou.aliyuncs.com 
    Date: time GMT
    Content-Length: ContentLength
    Content-MD5: MD5Value 
    Authorization: Signature
    <? xml  version="1.0"  encoding="UTF-8"? >
    <SelectRequest>    
        <Expression>Base64-encoded SQL statement. Example: c2VsZWN0IGNvdW50KCopIGZyb20gb3Nzb2JqZWN0IHdoZXJlIF80ID4gNDU=</Expression>
        <InputSerialization>
            <CompressionType>None|GZIP</CompressionType>
            <CSV>            
                <FileHeaderInfo>
                    NONE|IGNORE|USE
                </FileHeaderInfo>
                <RecordDelimiter>Base64-encoded character</RecordDelimiter>
                <FieldDelimiter>Base64-encoded character</FieldDelimiter>
                <QuoteCharacter>Base64-encoded character</QuoteCharacter>
                <CommentCharacter>Base64-encoded character</CommentCharacter>
                <Range>line-range=start-end|split-range=start-end</Range>
                <AllowQuotedRecordDelimiter>true|false</AllowQuotedRecordDelimiter>
            </CSV>   
            </InputSerialization>
            <OutputSerialization>
                 <CSV>
                 <RecordDelimiter>Base64-encoded character</RecordDelimiter>
                 <FieldDelimiter>Base64-encoded character</FieldDelimiter>
    
                </CSV>
                <KeepAllColumns>false|true</KeepAllColumns>
                <OutputRawData>false|true</OutputRawData>
                <EnablePayloadCrc>true</EnablePayloadCrc>
                <OutputHeader>false</OutputHeader>
           </OutputSerialization>
         <Options>
            <SkipPartialDataRecord>false</SkipPartialDataRecord>
            <MaxSkippedRecordsAllowed>
            max allowed number of records skipped
            </MaxSkippedRecordsAllowed>
        </Options>
    </SelectRequest>
  • Request syntax for JSON objects
    POST /object? x-oss-process=json/select HTTP/1.1 
    HOST: BucketName.oss-cn-hangzhou.aliyuncs.com 
    Date: time GMT
    Content-Length: ContentLength
    Content-MD5: MD5Value 
    Authorization: Signature
    <? xml  version="1.0"  encoding="UTF-8"? >
    <SelectRequest>    
        <Expression>
            Base64-encoded SQL statement. Example: c2VsZWN0IGNvdW50KCopIGZyb20gb3Nzb2JqZWN0IHdoZXJlIF80ID4gNDU=
        </Expression>
        <InputSerialization>
            <CompressionType>None|GZIP</CompressionType>
            <JSON>
                <Type>DOCUMENT|LINES</Type>
                <Range>
                line-range=start-end|split-range=start-end
                </Range>
                <ParseJsonNumberAsString> true|false
                </ParseJsonNumberAsString>
            </JSON>
        </InputSerialization>
        <OutputSerialization>
            <JSON>
                <RecordDelimiter>
                    Base64 of record delimiter
                </RecordDelimiter>
            </JSON>
            <OutputRawData>false|true</OutputRawData>
                     <EnablePayloadCrc>true</EnablePayloadCrc>
        </OutputSerialization>
        <Options>
            <SkipPartialDataRecord>
                false|true
            </SkipPartialDataRecord>
            <MaxSkippedRecordsAllowed>
                max allowed number of records skipped
               </MaxSkippedRecordsAllowed>
            </Options>
    </SelectRequest>

Request elements

Element Type Description
SelectRequest Container

Specifies the container that stores the SelectObject request.

Child node: Expression, InputSerialization, and OutputSerialization

Parent node: none

Expression String

Specifies the Base64-encoded SQL statement.

Child node: none

Parent node: SelectRequest

InputSerialization Container

Optional. This element specifies the input serialization parameters.

Child node: CompressionType, CSV, and JSON

Parent node: SelectRequest

OutputSerialization Container

Optional. This element specifies the output serialization parameters.

Child node: CSV, JSON, and OutputRawData

Parent node: SelectRequest

CSV(InputSerialization) Container

Optional. This element specifies the input serialization parameters when the CSV object is queried.

Child node: FileHeaderInfo, RecordDelimiter, FieldDelimiter, QuoteCharacter, CommentCharacter, and Range

Parent node: InputSerialization

CSV(OutputSerialization) Container

Optional. This element specifies the output serialization parameters when the CSV object is queried.

Child node: RecordDelimiter and FieldDelimiter

Parent node: OutputSerialization

JSON(InputSerialization) Container

Optional. This element specifies the input serialization parameters when the JSON object is queried.

Child node: Type, Range, and ParseJsonNumberAsString

JSON(OutputSerialization) Container

Optional. This element specifies the output serialization parameters when the JSON object is queried.

Child node: RecordDelimiter

Type Enumeration

Specifies the type of the input JSON object: DOCUMENT or LINES.

OutputRawData Boolean (false by default)

Optional. This element specifies the output data as raw data.

Child node: none

Parent node: OutputSerialization

Note
  • If OutputRawData is specified in the request, OSS returns data accordingly.
  • If OutputRawData is not specified in the request, OSS automatically selects a format and returns it in the response.
  • If OutputRawData is set to True and the SQL statement takes a long time to return data, the HTTP request may time out.
CompressionType Enumeration

Specifies the compression type of the object: None|GZIP.

Child node: none

Parent node: InputSerialization

FileHeaderInfo Enumeration

Optional. This element specifies the header information of the CSV object.

Valid values:

  • Use: indicates that the CSV object contains header information. You can use the column name in the CSV object as the column name in the SelectObject operation.
  • Ignore: indicates that the CSV object contains header information. However, you cannot use the column name in the CSV object as the column name in the SelectObject operation.
  • None: the default value, indicating that the CSV object does not contain header information.

Child node: none

Parent node: CSV (input)

RecordDelimiter String

Optional. This element specifies a Base64-encoded delimiter. Default value: \n. The value of this element before being encoded can be an ANSI value of up to two characters in length. For example, \n is used to indicate a line break in Java code.

Child node: none

Parent node: CSV (input and output) and JSON (output)

FieldDelimiter String

Optional. This element specifies the delimiter used to separate columns in the CSV object. The value of this element must be Base64-encoded. Default value: ,. The value of this element before being encoded must be a one-character ANSI value. For example, , is used to indicate a comma in Java code.

Child node: none

Parent node: CSV (input and output)

QuoteCharacter String

Optional. This element specifies a Base64-encoded quote character used in the CSV object. Default value: \". In a CSV object, line breaks and column delimiters enclosed in quotation marks are processed as normal characters. The value of this element before being encoded must be a one-character ANSI value. For example, \" is used to indicate a quote character in Java code.

Child node: none

Parent node: CSV (input)

CommentCharacter String Specifies the comment character used in the CSV object. The value of this element must be Base64-encoded. The default value is null.
Range String
Optional. This element specifies the query range. The following two query methods are supported:
Note Select Meta must be created for objects that are queried based on Range. For more information about Select Meta, see CreateSelectObjectMeta.
  • Query by row: line-range=start-end. For example, line-range=10-20 indicates that rows 10 to 20 will be scanned.
  • Query by split: split-range=start-end. For example, split-range=10-20 indicates that splits 10 to 20 will be scanned.

The start and end parameters of the preceding functions are both inclusive. The two preceding parameters have the same format as that of the range parameter in the range get function.

This parameter can only be used when the object is in CSV format or when the JSON Type is LINES.

Child node: none

Parent node: CSV (input) and JSON (output)

KeepAllColumns Boolean

Optional. A value of true indicates that all columns in the CSV object are included in the returned result. The default value is false. However, only columns included in the SELECT clause have values. The columns in the returned result are sorted in ascending order of the column numbers. Example:

select _5, _1 from ossobject.

If you set KeepAllColumns to true and six columns are included in the CSV object, the following result is returned for the preceding SELECT clause:

Value of 1st column,,,,Value of 5th column,\n

Child node: none

Parent node: OutputSerialization (CSV)

EnablePayloadCrc Boolean

Indicates that each frame includes a CRC-32 value for verification. The client can calculate the CRC-32 value of each payload and compare it with the included CRC-32 value to verify data integrity.

Child node: none

Parent node: OutputSerialization

Options Container

Specifies other optional parameters.

Type: container

Child node: SkipPartialDataRecord and MaxSkippedRecordsAllowed

Parent node: SelectRequest

OutputHeader Boolean

A value of true indicates that the header information of the CSV object is included in the beginning of the returned result.

Type: Boolean. Default value: false.

Child node: none

Parent node: OutputSerialization

SkipPartialDataRecord Boolean

Specifies whether to ignore rows without data. If this parameter is set to false, OSS will process the row data as null without reporting errors. If this parameter is set to true, rows without data are skipped. If the number of skipped rows has exceeded the maximum allowed number, OSS reports an error and stops processing the data.

Type: Boolean. Default value: false.

Child node: none

Parent node: Options

MaxSkippedRecordsAllowed Int

Specifies the maximum allowed number of skipped rows. If a row does not match the type specified in the SQL statement, or if one or more columns in a row are missing and the value of SkipPartialDataRecord is True, the row is skipped. If the number of skipped rows has exceeded the value of this parameter, OSS reports an error and stops processing the data.

Note If a row in a CSV object is not properly formatted, OSS stops processing the data and reports an error because this format error may result in incorrect resolution to the CSV object. For example, a column in the row includes continual odd numbered quote characters. This parameter can be used to adjust the tolerance for irregular data but cannot be applied to invalid CSV objects.

Type: integer. Default value: 0.

Child node: none

Parent node: Options

ParseJsonNumberAsString Boolean

Parses integers and floating-point numbers in a JSON object into strings. The precision of floating-point numbers in a JSON object degrades when the numbers are parsed. Therefore, we recommend that you set the value of this parameter to true if you want to retain the raw data. To use the parsed numbers in calculations, you can use the CAST function in an SQL statement to convert the parsed data into the required type, such as int, double, or decimal.

Default value: false

Child node: none

Parent node: JSON

AllowQuotedRecordDelimiter Boolean Specifies whether the CSV object contains line breaks in quotation marks (").

For example, the value of a column is "abc\ndef" and \n is a line break, you must set this value to true. If this parameter is set to false, SelectObject allows you to specify a range in the request header for more efficient multipart query.

Default value: true

Child node: none

Parent node: InputSerialization

Response body

  • If the HTTP status code included in the response is 4xx, the request has not passed the SQL syntax check or that the request contains errors. In this case, the body format of the returned error message is the same as that of the error message returned for a GetObject request.
  • If the HTTP status code included in the response is 5xx, an error occurs in the server. In this case, the body format of the returned error message is the same as that of the error message returned for a GetObject request.
  • HTTP status code 206 is returned when the operation is successfully performed. In this case:
    • If the value of header x-oss-select-output-raw is true, the object data (not including frame-based data) was successfully returned. The client can obtain the data in the same manner as the GetObject operation.
    • If the value of x-oss-select-output-raw is false, the result is returned as frames.
  • A returned frame is in Version|Frame-Type | Payload Length | Header Checksum | Payload | Payload Checksum<1 byte><--3 bytes--><---4 bytes----><-------4 bytes--><variable><----4bytes----------> format.
    Note The Checksum is CRC-32 in all frames. All integers in a frame are big-endian. Currently, the value of Version is 1.

Frame type

The following table describes three frame types supported by SelectObject.
Frame type Value Payload format Description
Data Frame 8388609 offset | data<-8 bytes><---variable-> A data frame includes the data returned for the SelectObject request. The offset parameter is an 8-bit integer that indicates the current scanning location (the offset from the file header) and is used to report the progress of the operation.
Continuous Frame 8388612 offset<----8 bytes--> A continuous frame is used to report the progress of an operation and keep an HTTP connection. If no data is returned for a query request within 5 seconds, a continuous frame is returned.
End Frame 8388613 offset | total scanned bytes | http status code | error message<--8bytes-><--8bytes---------><----4 bytes--------><-variable------> An end frame is used to return the final status of an operation, including the scanned bytes and the possible error messages.
  • The offset parameter indicates the final location offset after scanning.
  • The total scanned bytes parameter indicates the size of the scanned data.
  • The http status code parameter indicates the final status of the operation.
    Note SelectObject is a stream operation. Only the first data block is processed when the response header is sent. If the first data block matches the SQL statement, the HTTP status code in the response header is 206, which indicates that the operation was successful. However, the final status code may not be 206 because the subsequent data blocks may be invalid. The status code in the response header cannot be modified. Therefore, HTTP status code is included in the end frame to indicate the final status of the operation. The client uses the status code included in the end frame to determine whether the operation is successful.
  • The error message parameter includes error messages, including the number of each skipped row and the total quantity of skipped rows.
    Note The format of error messages included in an end frame is ErrorCodes.DetailMessage. The ErrorCodes section contains one or more ErrorCodes separated by commas (,). ErrorCodes and DetailMessage are separated by a period (.).

Sample requests

Sample requests for either CSV or JSON objects.
  • Sample request for CSV objects
    POST /oss-select/bigcsv_normal.csv? x-oss-process=csv%2Fselect HTTP/1.1
    Date: Fri, 25 May 2018 22:11:39 GMT
    Content-Type:
    Authorization: OSS LTAIJPLocA0fD:FC/9JRbBGRw4o2QqdaL246Px****
    User-Agent: aliyun-sdk-dotnet/2.8.0.0(windows 16.7/16.7.0.0/x86;4.0.30319.42000)
    Content-Length: 748
    Expect: 100-continue
    Connection: keep-alive
    Host: host name
    <? xml version="1.0"? >
    <SelectRequest>
        <Expression>c2VsZWN0IGNvdW50KCopIGZyb20gb3Nzb2JqZWN0IHdoZXJlIF80ID4gNDU=
        </Expression>
        <InputSerialization>
            <Compression>None</Compression>
            <CSV>
                <FileHeaderInfo>Ignore</FileHeaderInfo>
                <RecordDelimiter>Cg==</RecordDelimiter>
                <FieldDelimiter>LA==</FieldDelimiter>
                <QuoteCharacter>Ig==</QuoteCharacter>
                <CommentCharacter>Iw==</CommentCharacter/>
            </CSV>
        </InputSerialization>
        <OutputSerialization>
            <CSV>
                <RecordDelimiter>Cg==</RecordDelimiter>
                <FieldDelimiter>LA==</FieldDelimiter>
                <QuoteCharacter>Ig==</QuoteCharacter>            
            </CSV>
            <KeepAllColumns>false</KeepAllColumns>
                <OutputRawData>false</OutputRawData>
        </OutputSerialization>
    </SelectRequest>
  • Sample request for JSON objects
    POST /oss-select/sample_json.json? x-oss-process=json%2Fselect HTTP/1.1
    Host: host name
    Accept-Encoding: identity
    User-Agent: aliyun-sdk-python/2.6.0(Darwin/16.7.0/x86_64;3.5.4)
    Accept: */*
    Connection: keep-alive
    date: Mon, 10 Dec 2018 18:28:11 GMT
    authorization: OSS AccessKeySignature
    Content-Length: 317
    <SelectRequest>
        <Expression>c2VsZWN0ICogZnJvbSBvc3NvYmplY3Qub2JqZWN0c1sqXSB3aGVyZSBwYXJ0eSA9ICdEZW1vY3JhdCc=
        </Expression>
        <InputSerialization>
        <JSON>
            <Type>DOCUMENT</Type>
        </JSON>
        </InputSerialization>
        <OutputSerialization>
        <JSON>
            <RecordDelimiter>LA==</RecordDelimiter>
        </JSON>
        </OutputSerialization>
        <Options />
    </SelectRequest>

Regular expressions in an SQL statement

The regular expression in an SQL statement is SELECT select-list from table where_opt limit_opt.
Note SELECT, OSSOBJECT, and WHERE are keywords that cannot be modified.
select_list: column name
            | column index (example: _1, _2. column index applies only to CSV objects)
            | json path (example: s.contacts.firstname. json path applies only to JSON objects)
            | function(column index | column name)
            | function(json_path) (applies only to JSON objects)
            | select_list AS alias
Note The following functions are supported: AVG, SUM, MAX, MIN, COUNT, and CAST (type conversion function). You can use only the wildcard (*) after COUNT.
table: OSSOBJECT

      | OSSOBJECT json_path (applies only to JSON objects)

For CSV objects, table must be OSSOBJECT. For JSON objects (including DOCUMENT and LINES type objects), you can specify a json_path after OSSOBJECT.

json_path: ['string '] (The quotation marks around string can be deleted if the string does not include a space or a wildcard (*). In this case, ['string '] is equivalent to .'string '.

          | [n] (Used to indicate the nth element in an array. The value of n is counted from 0.)

          | [*] (Used to indicate any child element in an array or object.)

          | .'string ' (The quotation marks around string can be deleted if the string does not include a space or a wildcard (*).)

          | json_path jsonpath (You can concatenate multiple elements in a json path, such as [n].property1.attributes[*].)
Where_opt:
| WHERE expr
expr:
| literal value
| column name
| column index
| json path (applies only to JSON objects)
| expr op expr
| expr OR expr
| expr AND expr
| expr IS NULL
| expr IS NOT NULL
| (column name | column index | json path) IN (value1, value2,...)
| (column name | column index | json path) NOT in (value1, value2,...)
| (column name | column index | json path) between value1 and value2
| NOT (expr)
| expr op expr
| (expr)
| cast (column index |column name | json path | literal as INT|DOUBLE|)
  • op: includes the following operators: >, <, >=, <=, ! =, =, ,, LIKE, +, -, *, /, %, and ||.
  • cast: You can use the CAST function to convert data in a column from one type into another.
  • The combination of an aggregation function and limit: Select avg(cast(_1 as int)) from ossobject limit 100. The preceding statement calculates the average values of the first columns in the first 100 rows. This function differs from the MySQL statement because only a single row is returned for an aggregation function in SelectObject operations. This makes it unnecessary to limit the output. Limit is performed before aggregation functions in SelectObject operations.

Limits for SQL statements

Limits for SQL statements are as follows:
  • Only text objects encoded in UTF-8 and UTF-8 text objects compressed in GZIP format are supported. The deflate format is not supported for GZIP objects.
  • An SQL statement can only query a single object. The following clauses are not supported: JOIN, ORDER BY, GROUP BY, and HAVING.
  • A WHERE clause cannot include aggregation conditions. For example, the following clause is not allowed: WHERE max(cast(age as int)) > 100.
  • A maximum of 1,000 columns can be specified for an SQL statement. The column name in an SQL statement can be up to 1,024 bytes in length.
  • A maximum of five wildcards (%) are supported in a LIKE clause. Both the percent sign (%) and the asterisk (*) are wildcards that represent zero or more characters. The ESCAPE keyword is supported in SQL LIKE clauses, and is used to escape special characters such as percent signs (%), asterisks (*), and question marks (?) into normal strings.
  • A maximum of 1,024 constants are supported in an IN clause.
  • The Projection after SELECT can be a column name, a CSV column index (such as _1 and _2), an aggregation function, or a CAST function. Other expressions are not supported, such as select _1 + _2 from ossobject.
  • The maximum column size and row size for a CSV object are 256 KB.
  • The json path after FROM supports a JSON node with a maximum size of 512 KB. The path can contain up to 10 levels, and an array can contain a maximum of 5,000 elements. The fields after SELECT and WHERE must be from the nodes corresponding to the json path after FROM.
  • In SQL statements for a JSON object, the SELECT or WHERE expressions cannot include the array wildcard ([*]). The array wildcard ([*]) can only be included in the json path after FROM. For example, select * from ossobject.contacts[*] instead of select s.contacts[*] from ossobject s is supported.
  • The maximum size of an SQL statement is 16 KB. A maximum of 20 expressions can be added after WHERE. Each statement supports up to 10 levels and 100 aggregation operations.

Data error handling

The following section lists common methods for handling data errors.
  • Some columns are missing in some rows in a CSV object.

    If the value of SkipPartialDataRecord is not specified or is set to False, OSS calculates the expressions in the SQL statement by processing the values of the missing columns as null.

    If the value of SkipPartialDataRecord is set to True, OSS ignores the rows in which some columns are missing. In this case, if the value of MaxSkippedRecordsAllowed is not specified or is set to a value smaller than the number of skipped rows, OSS reports an error by sending HTTP status code 400 or including status code 400 in the end frame.

    Assume that the SQL statement is select _1, _3 from ossobject and the data in a row of the CSV object is "John, company A".

    • If the value of SkipPartialDataRecord is set to False, "John,\n" is returned.
    • If the value of SkipPartialDataRecord is set to True, this row is skipped.
  • Some keys are missing in a JSON object.

    Some JSON objects may not include the keys specified in the SQL statement.

    • If the value of SkipPartialDataRecord is not specified or is set to False, OSS calculates the expressions in the SQL statement by processing the missing keys as null.
    • If the value of SkipPartialDataRecord is True, OSS skips the data in the JSON node. In this case, if the value of MaxSkippedRecordsAllowed is not specified or is set to a value smaller than the number of skipped rows, OSS reports an error by sending HTTP status code 400 or including status code 400 in the end frame.

    Assume that the SQL statement is select s.firstName, s.lastName , s.age from ossobject.contacts[*] s and the value of a JSON node is {"firstName":"John", "lastName":"Smith"}.

    • If the value of SkipPartialDataRecord is not specified or is set to False, {"firstName":"John", "lastName":"Smith"} is returned.
    • If the value of SkipPartialDataRecord is set to True, this row is skipped.
    Note For keys in the returned data of a request for JSON objects, the output JSON objects can only be LINES. The Key value in the output result is determined based on the following rules.
    • Assume that the SQL statement is select * from ossobject.... If * corresponds to a JSON object ({...}), it is returned. If * corresponds to a String or an Array, it is returned as a DummyKey _1.

      When the data is {"Age":5}select * from ossobject.Age s where s = 5, {"_1":5} is returned because 5 that corresponds to * is not a JSON object. When the SQL statement is select * from ossobject s where s.Age = 5, {"Age":5} corresponding to * is returned.

    • If the SQL statement does not use select * but specifies columns, the returned content is in {"{Column 1}": Value, "{Column 2}": Value...} format. {Column n} can be generated in the following ways:
      • If the Alias of the column is specified in the SELECT clause, the Alias applies.
      • If the column is the key of a JSON object, this key is used as the output key value.
      • If the column is an element of a JSON array or an aggregate function, prefix the column with serial number (starting from 1) plus _ as the output key value.
      Assume that the data is {"contacts":{"Age":35, "Children":["child1", "child2", "child3"]}}:
      • When the SQL statement is select s.contacts.Age, s.contacts.Children[0] from ossobject s, Age is the key of the input JSON object, and Children[0] indicates the first element of Children of the array type and is the second column in the output content. {"Age":35, "_2":"child1"} is returned.
      • When the SQL statement is select max(cast(s.Age as int)) from ossobject.contacts s, and the selected column is an aggregate function, the column is prefixed with _1 plus its serial number in the output. {"_1":35} is returned.
      • When the Alias of the column is specified in the SQL statement select s.contacts.Age, s.contacts.Children[0] as firstChild from ossobject, {"Age":35, "firstChild":"child1"} is returned.
    • Keys matching in JSON objects and SQL statements are case-sensitive. For example, "select s. Age" and "select s. age" are different.
  • The data type of some columns in a CSV object does not match the SQL statement.

    If the data type of a row in a CSV object does not match the type specified in the SQL statement, the row is skipped. If the number of skipped rows has exceeded the value of MaxSkippedRecordsAllowed, OSS stops processing data and returns HTTP status code 400.

    Assume that the SQL statement is select _1, _3 from ossobject where _3 > 5. If the value of a row in the CSV object is John, Company A, To be hired, this row is skipped because the third column in the row is not of the integer type.

  • The data type of some keys in a JSON object does not match the SQL statement.

    Assume that the SQL statement is select s.name from ossobject s where s.aliren_age > 5. If the value of a JSON node is {"Name":"John", "aliren_age":To be hired}, this node is skipped.

CreateSelectObjectMeta

You can call this operation to obtain information about the target CSV object, such as the total number of rows, the total number of columns in the CSV object, and the number of splits. If the information does not exist in the object, the entire CSV object is scanned to analyze and record the preceding information. The information obtained in the first API call is used when the API is called again, so that the entire CSV object does not need to be scanned again.

Note
  • You must have the write permissions on the target object before you can perform the CreateSelectObjectMeta operation.
  • HTTP status code 200 is returned if the operation is successfully performed. If the target object is not a valid CSV or JSON LINES object, or the specified delimiter does not match any CSV objects, HTTP status code 400 is returned.
  • Request syntax
    • Request syntax for CSV objects
      POST  /samplecsv? x-oss-process=csv/meta
      <CsvMetaRequest>
          <InputSerialization>
              <CompressionType>None</CompressionType>
              <CSV>
                  <RecordDelimiter>Base64-encoded character</RecordDelimiter>
                  <FieldDelimiter>Base64-encoded character</FieldDelimiter>
                  <QuoteCharacter>Base64-encoded character</QuoteCharacter>
              </CSV>
          </InputSerialization>
          <OverwriteIfExists>false|true</OverwriteIfExists>
      </CsvMetaRequest>
    • Request syntax for JSON objects
      POST  /samplecsv? x-oss-process=json/meta
      <JsonMetaRequest>
          <InputSerialization>
              <CompressionType>None</CompressionType>
              <JSON>
                  <Type>LINES</Type>
              </JSON>
          </InputSerialization>
          <OverwriteIfExists>false|true</OverwriteIfExists>
      </JsonMetaRequest>
  • Request elements
    Element Type Description
    CsvMetaRequest Container

    Specifies the container that stores the CreateSelectObjectMeta request for CSV objects.

    Child node: InputSerialization

    Parent node: none

    JsonMetaRequest Container

    Specifies the container that stores the CreateSelectObjectMeta request for JSON objects.

    Child node: InputSerialization

    Parent node: none

    InputSerialization Container

    Optional. This element specifies the input serialization parameters.

    Child node: CompressionType, CSV, and JSON

    Parent node: CsvMetaRequest and JsonMetaRequest

    OverwriteIfExists Boolean

    Optional. This element specifies whether to recalculate the SelectMeta and overwrite the existing data. A value of false indicates that the result is directly returned if the SelectMeta already exists. The default value is false.

    Child node: none

    Parent node: CsvMetaRequest and JsonMetaRequest

    CompressionType Enumeration

    Optional. This element specifies the compression type of the object. Only None is supported currently.

    Child node: none

    Parent node: InputSerialization

    RecordDelimiter String

    Optional. This element specifies a Base64-encoded delimiter used to separate rows in the CSV object. The default value is \n. The value of this element before being encoded can be an ANSI value of up to two characters. For example, \n is used to indicate a line break in Java code.

    Child node: none

    Parent node: CSV

    FieldDelimiter String

    Optional. This element specifies a Base64-encoded delimiter used to separate columns in the CSV object. Default value: ,.

    The value of this element before being encoded must be a one-character ANSI value. For example, , is used to indicate a comma in Java code.

    Child node: none

    Parent node: CSV (input and output)

    QuoteCharacter String

    Optional. This element specifies a Base64-encoded quote character used in the CSV object. Default value: \". In a CSV object, line breaks and column delimiters enclosed in quotation marks are processed as normal characters. The value of this element before being encoded must be a one-character ANSI value. For example, \" is used to indicate a quote character in Java code.

    Child node: none

    Parent node: CSV (input)

    CSV Container

    Specifies the format of the input CSV object.

    Child node: RecordDelimiter, FieldDelimiter, and QuoteCharacter

    Parent node: InputSerialization

    JSON Container

    Specifies the format of the input JSON object.

    Child node: Type

    Parent node: InputSerialization

    Type Enumeration

    Specifies the type of the JSON object.

    Valid values: LINES

  • Response body

    Similar to SelectObject, the results for CreateSelectObjectMeta are also returned as frames.

    Frame type Value Payload format Description
    Meta End Frame (CSV) 8388614

    offset | total scanned bytes | status| splits count | rows count | columns count | error message

    <-8 bytes><------8 bytes------><--4bytes><--4 bytes--><--8 bytes><--4 bytes---><variable size>

    Reports the final status of the CreateSelectObjectMeta operation.

    • offset: an 8-bit integer, indicates the offset when the scanning is complete.
    • total scanned bytes: an 8-bit integer, indicates the size of the scanned data.
    • status: a 4-bit integer, indicates the final status of the operation.
    • splits_count: a 4-bit integer, indicates the total number of splits.
    • rows_count: an 8-bit integer, indicates the total number of rows.
    • cols_count: a 4-bit integer, indicates the total number of columns.
    • error_message: includes detailed error messages. If no error occurs, the value of this parameter is null.
    Meta End Frame (JSON) 8388615

    offset | total scanned bytes | status| splits count | rows count | error message

    <-8 bytes><------8 bytes------><--4bytes><--4 bytes--><--8 bytes><variable size>

    Reports the final status of the CreateSelectObjectMeta operation.

    • offset: an 8-bit integer, indicates the offset when the scanning is complete.
    • total scanned bytes: an 8-bit integer, indicates the size of the scanned data.
    • status: a 4-bit integer, indicates the final status of the operation.
    • splits_count: a 4-bit integer, indicates the total number of splits.
    • rows_count: an 8-bit integer, indicates the total number of rows.
    • error_message: includes detailed error messages. If no error occurs, the value of this parameter is null.
  • Sample requests
    • Sample request for CSV objects
      POST /oss-select/bigcsv_normal.csv? x-oss-process=csv%2Fmeta HTTP/1.1
      Date: Fri, 25 May 2018 23:06:41 GMT
      Content-Type:
      Authorization: OSS AccessKeySignature
      User-Agent: aliyun-sdk-dotnet/2.8.0.0(windows 16.7/16.7.0.0/x86;4.0.30319.42000)
      Content-Length: 309
      Expect: 100-continue
      Connection: keep-alive
      Host: Host
      <? xml version="1.0"? >
      <CsvMetaRequest>
          <InputSerialization>
              <CSV>
                  <RecordDelimiter>Cg==</RecordDelimiter>
                  <FieldDelimiter>LA==</FieldDelimiter>
                  <QuoteCharacter>Ig==</QuoteCharacter>
              </CSV>
          </InputSerialization>
          <OverwriteIfExisting>false</OverwriteIfExisting>
      </CsvMetaRequest>
    • Sample request for JSON objects
      POST /oss-select/sample.json? x-oss-process=json%2Fmeta HTTP/1.1
      Date: Fri, 25 May 2018 23:06:41 GMT
      Content-Type:
      Authorization: OSS AccessKeySignature
      User-Agent: aliyun-sdk-dotnet/2.8.0.0(windows 16.7/16.7.0.0/x86;4.0.30319.42000)
      Content-Length: 309
      Expect: 100-continue
      Connection: keep-alive
      Host: Host
      <? xml version="1.0"? >
      <JsonMetaRequest>
          <InputSerialization>
              <JSON>
                  <Type>LINES</Type>
              </JSON>
          </InputSerialization>
          <OverwriteIfExisting>false</OverwriteIfExisting>
      </JsonMetaRequest>

Supported time formats

You can convert any of the following listed formats into a timestamp without specifying the time format. For example, the string cast­(’20121201' as timestamp) is automatically parsed as a timestamp: December 1, 2012.

The following table describes the recognized time formats.

Format Description
YYYYMMDD year month day
YYYY/MM/DD year/month/day
DD/MM/YYYY/ day/month/year
YYYY-MM-DD year-month-day
DD-MM-YY day-month-year
DD.MM.YY day.month.year
HH:MM:SS.mss hour:minute:second.millisecond
HH:MM:SS hour:minute:second
HH MM SS mss hour minute second millisecond
HH.MM.SS.mss hour.minute.second.millisecond
HHMM hour minute
HHMMSSmss hour minute second millisecond
YYYYMMDD HH:MM:SS.mss year month day hour:minute:second.millisecond
YYYY/MM/DD HH:MM:SS.mss year/month/day hour:minute:second.millisecond
DD/MM/YYYY HH:MM:SS.mss day/month/year hour:minute:second.millisecond
YYYYMMDD HH:MM:SS year month day hour:minute:second
YYYY/MM/DD HH:MM:SS year/month/day hour:minute:second
DD/MM/YYYY HH:MM:SS day/month/year hour:minute:second
YYYY-MM-DD HH:MM:SS.mss year-month-day hour:minute:second.millisecond
DD-MM-YYYY HH:MM:SS.mss day-month-year hour:minute:second.millisecond
YYYY-MM-DD HH:MM:SS year-month-day hour:minute:second
YYYYMMDDTHH:MM:SS year month day T hour:minute:second
YYYYMMDDTHH:MM:SS.mss year month day T hour:minute:second.millisecond
DD-MM-YYYYTHH:MM:SS.mss day-month-year T hour:minute:second.millisecond
DD-MM-YYYYTHH:MM:SS day-month-year T hour:minute:second
YYYYMMDDTHHMM year month day T hour minute
YYYYMMDDTHHMMSS year month day T hour minute second
YYYYMMDDTHHMMSSMSS year month day T hour minute second millisecond
ISO8601-0

year-month-day T hour:minute+hour:minute, or year-month-day T hour:minute-hour:minute

"+" indicates that the time in the current time zone is ahead of standard UTC time. "-" indicates that the time in the current time zone is behind the standard UTC time. In this format, ISO 8601-0 can be used.

ISO8601-1

year-month-day T hour:minute+hour:minute, or year-month-day T hour:minute-hour:minute

"+" indicates that the time in the current time zone is in front of standard UTC time. "-" indicates that the time in the current time zone is behind the standard UTC time. In this format, ISO 8601-1 can be used.

CommonLog Example: 28/Feb/2017:12:30:51 +0700
RFC822 Example: Tue, 28 Feb 2017 12:30:51 GMT
?D/?M/YY day/month/year, in which the day and month can be one or two digits.
?D/?M/YY ?H:?M day/month/year/hour:minute, in which the day, month, hour, and minute can be one or two digits.
?D/?M/YY ?H:?M:?S day/month/year/hour:minute:second, in which the day, month, hour, minute, and second can be one or two digits.

The formats in the following table are ambiguous. You must specify a time format when you use strings in these formats. For example, the cast('20121201' as timestamp format 'YYYYDDMM') statement incorrectly parses the string 20121201 as January 12, 2012.

Format Description
YYYYDDMM year day month
YYYY/DD/MM year/day/month
MM/DD/YYYY month/day/year
YYYY-DD-MM year-day-month
MM-DD-YYYY month-day-year
MM.DD.YYYY month.day.year

Error codes

SelectObject returns error codes in the following two ways:
  • The HTTP status code is included in the response header and the error code is included in the response body, which is the same as other OSS requests. Error code returned in this manner indicates that the SQL statement contained input or data errors, such as an invalid SQL statement.
  • The error code is included in the end frame of the response body. Error code returned in this manner indicates that the data is not correct or does not match the SQL statement. For example, a string exists in a column of which the type is specified as integer in the SQL statement. In this case, a part of data is processed and returned to the client, and the HTTP status code is 206.

Some error codes such as InvalidCSVLine can be returned as an HTTP status code in the response header or the end frame based on the location of the error row within the CSV object.

Error code Description HTTP status code HTTP status code in end frame
InvalidSqlParameter Invalid SQL parameter.

Indicates that the SQL statement in the request is null, the SQL statement size has exceeded the limit, or the SQL statement is not Base64-encoded.

400 None
InvalidInputFieldDelimiter Invalid column delimiter in the input CSV object.

Indicates that the parameter is not Base64-encoded or that the size of the parameter is greater than 1 byte after being decoded.

400 None
InvalidInputRecordDelimiter Invalid row delimiter in the input CSV object. Indicates that the parameter is not Base64-encoded or that the size of the parameter is greater than 2 bytes after being decoded. 400 None
InvalidInputQuote Invalid quote character in the input CSV object. Indicates that the parameter is not Base64-encoded or that the size of the parameter is greater than 1 byte after being decoded. 400 None
InvalidOutputFieldDelimiter Invalid column delimiter in the output CSV object. Indicates that the parameter is not Base64-encoded or that the size of the parameter is greater than 1 byte after being decoded. 400 None
InvalidOutputRecordDelimiter Invalid column delimiter in the output CSV object. Indicates that the parameter is not Base64-encoded or that the size of the parameter is greater than 2 bytes after being decoded. 400 None
UnsupportedCompressionFormat Invalid Compression parameter. Indicates that the value of the parameter is not NONE or GZIP (case-insensitive). 400 None
InvalidCommentCharacter Invalid comment character in the CSV object. Indicates that the parameter is not Base64-encoded or that the size of the parameter is greater than 1 byte after being decoded. 400 None
InvalidRange Invalid Range parameter. Indicates that the parameter is not prefixed with line-range= or split-range=, or that the range value does not meet the HTTP standard for Range. 400 None
DecompressFailure Indicates that the value of Compression is GZIP and that the object cannot be decompressed. 400 None
InvalidMaxSkippedRecordsAllowed Indicates that the value of MaxSkippedRecordsAllowed is not an integer. 400 None
SelectCsvMetaUnavailable Indicates that the Range parameter is specified and the CreateSelectObjectMeta operation has been called, but the target object does not include CSV Meta. 400 None
InvalidTextEncoding Indicates that the object is not UTF-8 encoded. 400 None
InvalidOSSSelectParameters Indicates that the EnablePayloadCrc and OutputRawData parameters are both set to True, which results in conflicts. 400 None
InternalError Indicates that an OSS system error has occurred. 500 or 206 None or 500
SqlSyntaxError Indicates that the syntax of the Base64-decoded SQL statement is invalid. 400 None
SqlExceedsMaxInCount Indicates that the number of values included in the SQL IN clause has exceeded 1,024. 400 None
SqlExceedsMaxColumnNameLength Indicates that the size of the column name has exceeded 1,024 bytes. 400 None
SqlInvalidColumnIndex Indicates that the column index in the SQL statement is less than 1 byte or greater than 1,000 bytes in length. 400 None
SqlAggregationOnNonNumericType Indicates that an aggregation function is used in a non-numeric column. 400 None
SqlInvalidAggregationOnTimestamp Indicates that the SUM or AVG aggregation function is used in the timestamp column. 400 None
SqlValueTypeOfInMustBeSame Indicates that values of different types are included in the SQL IN clause. 400 None
SqlInvalidEscapeChar Indicates that an invalid escape character such as a question mark (?), percent sign (%), or an asterisk (*) was specified in the SQL LIKE clause. 400 None
SqlOnlyOneEscapeCharIsAllowed Indicates that the size of the escape character in the SQL LIKE clause is greater than 1 byte in length. 400 None
SqlNoCharAfterEscapeChar Indicates that there are no characters after the escape character in the SQL LIKE clause. 400 None
SqlInvalidLimitValue Indicates that the number after the SQL Limit clause is smaller than 1. 400 None
SqlExceedsMaxWildCardCount Indicates that the number of wildcards (*) or (%) has exceeded the limit in the SQL LIKE clause. 400 None
SqlExceedsMaxConditionCount Indicates that the number of conditional expressions in the SQL WHERE clause has exceeded the limit. 400 None
SqlExceedsMaxConditionDepth Indicates that the depth of the conditional tree in the SQL WHERE clause has exceeded the limit. 400 None
SqlOneColumnCastToDifferentTypes Indicates that a column was converted into different types by including the CAST function in the SQL statement. 400 None
SqlOperationAppliedToDifferentTypes Indicates that an operator was used for two objects of different types in the SQL statement. For example, this error code is returned if col1 in _col1 > 3 is a string. 400 None
SqlInvalidColumnName Indicates that a column name used in the SQL statement was not included in the header of the CSV object. 400 None
SqlNotSupportedTimestampFormat Indicates that the timestamp format specified in the SQL CAST clause is not supported. 400 None
SqlNotMatchTimestampFormat Indicates that the timestamp format specified in the SQL CAST clause does not match the timestamp string. 400 None
SqlInvalidTimestampValue Indicates that no timestamp format is specified in the SQL CAST clause and the provided string cannot be converted into a timestamp. 400 None
SqlInvalidLikeOperand Indicates that column names or indexes are not specified in the SQL LIKE clause, the specified column is not of the string type, or the right column in the LIKE clause is of the string type. 400 None
SqlInvalidMixOfAggregationAndColumn Indicates that the SQL SELECT clause includes column names and indexes for both aggregation functions and non-aggregation functions. 400 None
SqlExceedsMaxAggregationCount Indicates that the number of aggregation functions included in the SQL SELECT clause has exceeded the limit. 400 None
SqlInvalidMixOfStarAndColumn Indicates that an asterisk (*), a column name, and a column index are included within the same SQL statement. 400 None
SqlInvalidKeepAllColumnsWithAggregation Indicates that the SQL statement includes aggregation functions while the KeepAllColumns parameter is set to True. 400 None
SqlInvalidKeepAllColumnsWithDuplicateColumn Indicates that the SQL statement includes repeated column names or column indexes while the KeepAllColumns parameter is set to True. 400 None
SqlInvalidSqlAfterAnalysis Indicates that the SQL statement is not supported because it is too complex to resolve. 400 None
InvalidArithmeticOperand Indicates that the SQL statement contains arithmetical operations performed on non-numeric constants or columns. 400 None
SqlInvalidAndOperand Indicates that the expressions connected by the AND operator in the SQL statement are not of the Boolean type. 400 None
SqlInvalidOrOperand Indicates that the expressions connected by the OR operator in the SQL statement are not of the Boolean type. 400 None
SqlInvalidNotOperand Indicates that the expressions connected by the NOT operator in the SQL statement are not of the Boolean type. 400 None
SqlInvalidIsNullOperand Indicates that the SQL statement contains the IS NULL operator performed on a constant. 400 None
SqlComparerOperandTypeMismatch Indicates that the SQL statement contains the comparison operator performed on two objects of different types. 400 None
SqlInvalidConcatOperand Indicates that the SQL statement contains two constants connected by the concatenation operator (||). 400 None
SqlUnsupportedSql Indicates that the SQL statement is too complex so that the size of the generated SQL plan has exceeded the limit. 400 None
HeaderInfoExceedsMaxSize Indicates that the size of the header information specified in the SQL statement has exceeded the limit. 400 None
OutputExceedsMaxSize Indicates that a row of output results has exceeded the size limit. 400 None
InvalidCsvLine Indicates that a row in the CSV object is invalid or has exceeded the size limit, or that the number of skipped rows has exceeded the value of MaxSkippedRecordsAllowed. 206 or 400 400 or None
NegativeRowIndex Indicates that the value of the array index in the SQL statement is a negative number. 400 None
ExceedsMaxNestedColumnDepth Indicates that the nested levels of the JSON object in the SQL statement have exceeded the level limit. 400 None
NestedColumnNotSupportInCsv Indicates that the SQL statement contains nested columns that include periods (.) or arrays with brackets ([]), which are not supported for SQL statements of CSV objects. 400 None
TableRootNodeOnlySupportInJson Indicates that the root node path can only be specified after From ossobject in JSON objects. 400 None
JsonNodeExceedsMaxSize Indicates that the size of the root node in the JSON object has exceeded the limit. 400 or 206 None or 400
InvalidJsonData Indicates that the JSON data is formatted incorrectly. 400 or 206 None or 400
ExceedsMaxJsonArraySize Indicates that the number of elements in an array in the root node of the JSON object has exceeded the limit. 400 or 206 None or 400
WildCardNotAllowed Indicates that the wildcard (*) cannot be used in SQL SELECT clauses or SQL WHERE clauses in the JSON object. For example, the following statement is not supported: select s.a.b[*] from ossobject where a.c[*] > 0. 400 None
JsonNodeExceedsMaxDepth Indicates that the depth of the root node of the JSON object has exceeded the limit. 400 or 206 None or 400