This topic describes how to create, read from, and write to TEXTFILE-formatted OSS external tables.
Usage notes
OSS external tables do not support the cluster property.
The size of a single file cannot exceed 2 GB. You must split files that are larger than 2 GB.
MaxCompute and OSS must be in the same region.
Create an external table
Syntax
For the syntax for creating external tables in different formats, see OSS external tables.
Simplified syntax
CREATE EXTERNAL TABLE [IF NOT EXISTS] <mc_oss_extable_name> ( <col_name> <data_type>, ... ) [COMMENT <table_comment>] [PARTITIONED BY (<col_name> <data_type>, ...)] STORED AS TEXTFILE LOCATION '<oss_location>';Full syntax
To parse CSV data stored in the TEXTFILE format, see CSV and TSV external tables.
CREATE EXTERNAL TABLE [IF NOT EXISTS] <mc_oss_extable_name> ( <col_name> <data_type>, ... ) [COMMENT <table_comment>] [PARTITIONED BY (<col_name> <data_type>, ...)] ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' [WITH SERDEPROPERTIES ( ['<property_name>'='<property_value>',...]) ] STORED AS TEXTFILE LOCATION '<oss_location>' [TBLPROPERTIES ('<tbproperty_name>'='<tbproperty_value>',...)];
Common parameters
For detailed descriptions of the following common parameters, see Basic syntax parameters.
mc_oss_extable_namecol_namedata_typetable_commentoss_location
Specific parameters
SERDEPROPERTIES parameters
Parameter | Use case | Description | Value | Default |
field.delim | Use this property to specify the column delimiter for data in the TEXTFILE format. | Specifies the character that separates columns in the data files. | A single-character string. | \001 |
TBLPROPERTIES parameters
Parameter | Use case | Description | Value | Default |
mcfed.mapreduce.output.fileoutputformat.compress | When you need to write TEXTFILE data files to OSS in a compressed format, add this property. By default, the output files are compressed with | TEXTFILE compression property. When this parameter is set to True, MaxCompute can write TEXTFILE data files to OSS in a compressed format. Otherwise, the files are not compressed. |
| False |
mcfed.mapreduce.output.fileoutputformat.compress.codec | Use this property to specify the compression codec when writing TEXTFILE data to OSS. | TEXTFILE compression property. Sets the compression algorithm for TEXTFILE data files. Only the codecs listed in the |
| None |
io.compression.codecs | Use this property when the source data files in OSS are in Raw-Snappy format. | Required to allow MaxCompute to read Raw-Snappy compressed data from OSS. | com.aliyun.odps.io.compress.SnappyRawCodec | None |
odps.external.data.output.prefix (compatible with odps.external.data.prefix) | Use this property to add a custom prefix to output file names. |
| A valid string, for example, 'mc_'. | None |
odps.external.data.enable.extension | Use this property to display the file extension of output files. | If set to |
| False |
odps.external.data.output.suffix | Use this property to add a custom suffix to output file names. | Must contain only letters (a-z, A-Z), digits (0-9), and underscores (_). | A valid string, for example, '_hangzhou'. | None |
odps.external.data.output.explicit.extension | Use this property to add a custom extension to output file names. |
| A valid string, for example, "txt". | None |
Write data
For more information about the write syntax of MaxCompute, see Write syntax.
Query data
For more information about the SELECT syntax, see Query syntax.
For more information about query plan optimization, see Query optimization.
Examples
Create an OSS external table
If a data file has fewer columns than the external table's schema, MaxCompute pads the missing columns with NULL values when it reads the data. Conversely, if a data file has more columns than are defined in the external table, the extra columns are ignored.
Prerequisites
A MaxCompute project has been created. For more information, see Create a MaxCompute project.
An OSS bucket and directory have been prepared. For more information, see Create a bucket and Manage directories.
MaxCompute is deployed only in some regions. To avoid potential cross-region data connectivity issues, we recommend using a bucket in the same region as your MaxCompute project.
The required permissions are granted:
The permission to access OSS. Alibaba Cloud accounts, RAM users, or RAM roles can be used to access OSS external tables. For more information about authorization, see STS-mode authorization for OSS.
The
CreateTablepermission in your MaxCompute project. For more information about table operation permissions, see MaxCompute permissions.
Example 1: Compressed data
This example creates a TEXTFILE external table from a Snappy-compressed file and demonstrates how to read from and write to the table.
In the
oss-mc-testbucket described in Sample data, create a folder namedDemo-textfileand upload the snappy sample file to this folder.Create a TEXTFILE external table for data in the Snappy compression format.
CREATE EXTERNAL TABLE external_tb_textfile ( vehicleId INT, recordId INT, patientId INT, calls INT, locationLatitude DOUBLE, locationLongitude DOUBLE, recordTime STRING, direction STRING ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ( 'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole') STORED AS TEXTFILE LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/Demo-textfile/' TBLPROPERTIES ( 'mcfed.mapreduce.output.fileoutputformat.compress'='True', 'mcfed.mapreduce.output.fileoutputformat.compress.codec'='org.apache.hadoop.io.compress.SnappyCodec') ;Read data from the TEXTFILE external table.
SELECT * FROM external_tb_textfile;Sample output:
+------------+------------+------------+------------+------------------+-------------------+------------+------------+ | vehicleid | recordid | patientid | calls | locationlatitude | locationlongitude | recordtime | direction | +------------+------------+------------+------------+------------------+-------------------+------------+------------+ | 1 | 12 | 76 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:10 | SW | | 1 | 1 | 51 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | S | | 1 | 2 | 13 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:01 | NE | | 1 | 3 | 48 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:02 | NE | | 1 | 4 | 30 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:03 | W | | 1 | 5 | 47 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:04 | S | | 1 | 6 | 9 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:05 | S | | 1 | 7 | 53 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:06 | N | | 1 | 8 | 63 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:07 | SW | | 1 | 9 | 4 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:08 | NE | | 1 | 10 | 31 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:09 | N | +------------+------------+------------+------------+------------------+-------------------+------------+------------+
Example 2: Specifying a delimiter
This example creates a TEXTFILE external table to read a text file that uses the vertical bar (|) as the delimiter and demonstrates how to write data to the external table.
In the
oss-mc-testbucket described in Sample data, create a folder namedDemo-textfile-txtand upload the vehicle.txt sample file to this folder.Create a TEXTFILE external table and specify the delimiter.
CREATE EXTERNAL TABLE text_textfile_test01 ( vehicleId INT, recordId INT, patientId INT, calls INT, locationLatitude DOUBLE, locationLongitude DOUBLE, recordTime STRING, direction STRING ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('field.delim'='|') STORED AS TEXTFILE LOCATION 'oss://oss-cn-beijing-internal.aliyuncs.com/oss-mc-test/Demo-textfile-txt/' TBLPROPERTIES ('odps.external.data.enable.extension'='True');Read data from the TEXTFILE external table.
SELECT * FROM text_textfile_test01;Sample output:
+------------+------------+------------+------------+------------------+-------------------+------------+------------+ | vehicleid | recordid | patientid | calls | locationlatitude | locationlongitude | recordtime | direction | +------------+------------+------------+------------+------------------+-------------------+------------+------------+ | 1 | 1 | 51 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | S | | 1 | 2 | 13 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:01 | NE | | 1 | 3 | 48 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:02 | NE | | 1 | 4 | 30 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:03 | W | | 1 | 5 | 47 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:04 | S | | 1 | 6 | 9 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:05 | S | | 1 | 7 | 53 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:06 | N | | 1 | 8 | 63 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:07 | SW | | 1 | 9 | 4 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:08 | NE | | 1 | 10 | 31 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:09 | N | +------------+------------+------------+------------+------------------+-------------------+------------+------------+Write data to the TEXTFILE external table and then query the data.
INSERT INTO text_textfile_test01 VALUES (1,12,76,1,46.81006,-92.08174,'9/14/2014 0:10','E'); SELECT * FROM text_textfile_test01 WHERE recordId=12;Sample output:
+------------+------------+------------+------------+------------------+-------------------+------------+------------+ | vehicleid | recordid | patientid | calls | locationlatitude | locationlongitude | recordtime | direction | +------------+------------+------------+------------+------------------+-------------------+------------+------------+ | 1 | 12 | 76 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:10 | E | +------------+------------+------------+------------+------------------+-------------------+------------+------------+
Supported data types
For more information about MaxCompute data types, see Data types (Version 1.0) and Data types (Version 2.0).
For more information about SmartParse, see Flexible type compatibility of SmartParse.
Type | Supported | Type | Supported |
TINYINT | STRING | ||
SMALLINT | DATE | ||
INT | DATETIME | ||
BIGINT | TIMESTAMP | ||
BINARY | TIMESTAMP_NTZ | ||
FLOAT | BOOLEAN | ||
DOUBLE | ARRAY | ||
DECIMAL(precision,scale) | MAP | ||
VARCHAR(n) | STRUCT | ||
CHAR(n) | JSON |
Supported compression formats
When you read data from or write data to compressed OSS files, you must add the
WITH SERDEPROPERTIESclause to theCREATE TABLEstatement. For more information, see SERDEPROPERTIES parameters.You can read from and write to TEXTFILE files that are compressed with Snappy or LZO.
Schema evolution support
For TEXTFILE external tables, columns in the table schema are mapped to columns in the data files by position.
The Data compatibility column in the following table describes how a schema evolution operation affects data reads. It specifies whether the table can correctly read existing data (conforming to the original schema) and new data (conforming to the modified schema).
Operation | Supported | Description | Data compatibility |
Add a column |
|
| |
Drop a column | This operation is not recommended. Because TEXTFILE external tables map columns by position, dropping a column causes a mismatch between the schema and existing data. |
| |
Change column order | This operation is not recommended. Because TEXTFILE external tables map columns by position, reordering columns causes a mismatch between the schema and existing data. |
| |
Change a column's data type | For the list of supported conversions, see Change the data type of a column. | Compatible | |
Rename a column | Compatible | ||
Change a column's comment | The comment must be a valid string of no more than 1,024 bytes. Otherwise, MaxCompute reports an error. | Compatible | |
Change the NOT NULL constraint | This operation is not supported. Columns are nullable by default. | Not applicable |