This topic describes how to create, read from, and write to CSV and TSV external tables in Object Storage Service (OSS).
Prerequisites
Ensure your Alibaba Cloud account, RAM user, or RAM role has the required permissions to access OSS external tables. For more information about authorization, see STS authorization for OSS.
(Optional) Prepare an OSS bucket, directories, and data files. For more information, see Create buckets, Manage directories, and Simple upload.
MaxCompute can automatically create directories in OSS. You can use a single SQL statement to read from or write to an external table that uses a UDF. Manual directory creation is no longer required, but the legacy method is still supported.
A MaxCompute project has been created. For more information, see Create a MaxCompute project.
MaxCompute is deployed only in specific regions. To prevent cross-region data connectivity issues, we recommend that you use a bucket in the same region as your MaxCompute project.
The Alibaba Cloud account or RAM user has the CreateTable permission on your project. For more information about table operation permissions, see MaxCompute permissions.
Limits
The cluster attribute is not supported for OSS external tables.
A single file cannot exceed 3 GB. If a file exceeds this limit, we recommend that you split it.
Notes
An error is reported when you read data from a CSV or TSV file if the number of columns in the file does not match the number of columns in the external table's DDL statement. For example: FAILED: ODPS-0123131:User defined function exception - Traceback:java.lang.RuntimeException: SCHEMA MISMATCH:xxx.
To control this behavior, you can set the odps.sql.text.schema.mismatch.mode parameter at the session level:
SET odps.sql.text.schema.mismatch.mode=error: If the number of columns does not match, the system reports an error.SET odps.sql.text.schema.mismatch.mode=truncate: If the number of columns does not match, extra columns are discarded. If the source data has fewer columns than the table definition, the missing columns are populated with NULL.
Data type support
For more information about MaxCompute data types, see Data types (Version 1.0) and Data types (Version 2.0).
For more information about Smart Parse, see More flexible type parsing with Smart Parse.
Data type | com.aliyun.odps.CsvStorageHandler/ TsvStorageHandler (built-in text data parser) | org.apache.hadoop.hive.serde2.OpenCSVSerde (built-in open source data parser) |
TINYINT |
|
|
SMALLINT |
|
|
INT |
|
|
BIGINT |
|
|
BINARY |
|
|
FLOAT |
|
|
DOUBLE |
|
|
DECIMAL(precision,scale) |
|
|
VARCHAR(n) |
|
|
CHAR(n) |
|
|
STRING |
|
|
DATE |
|
|
DATETIME |
|
|
TIMESTAMP |
|
|
TIMESTAMP_NTZ |
|
|
BOOLEAN |
|
|
ARRAY |
|
|
MAP |
|
|
STRUCT |
|
|
JSON |
|
|
Compression format support
When you read from or write to compressed OSS files, you must add the with serdeproperties property to the CREATE TABLE statement. For more information, see with serdeproperties parameters.
Compression format | com.aliyun.odps.CsvStorageHandler/ TsvStorageHandler (built-in text data parser) | org.apache.hadoop.hive.serde2.OpenCSVSerde (built-in open source data parser) |
GZIP |
|
|
SNAPPY |
|
|
LZO |
|
|
Create an external table
Syntax
For more information about the syntax for creating external tables in different formats, see OSS external tables.
Built-in text data parser
CSV format
CREATE EXTERNAL TABLE [IF NOT EXISTS] <mc_oss_extable_name> ( <col_name> <data_type>, ... ) [COMMENT <table_comment>] [PARTITIONED BY (<col_name> <data_type>, ...)] STORED BY 'com.aliyun.odps.CsvStorageHandler' [WITH serdeproperties ( ['<property_name>'='<property_value>',...] )] LOCATION '<oss_location>' [tblproperties ('<tbproperty_name>'='<tbproperty_value>',...)];TSV format
CREATE EXTERNAL TABLE [IF NOT EXISTS] <mc_oss_extable_name> ( <col_name> <data_type>, ... ) [COMMENT <table_comment>] [PARTITIONED BY (<col_name> <data_type>, ...)] STORED BY 'com.aliyun.odps.TsvStorageHandler' [WITH serdeproperties ( ['<property_name>'='<property_value>',...] )] LOCATION '<oss_location>' [tblproperties ('<tbproperty_name>'='<tbproperty_value>',...)];
Built-in open source data parser
CREATE EXTERNAL TABLE [IF NOT EXISTS] <mc_oss_extable_name> ( <col_name> <data_type>, ... ) [COMMENT <table_comment>] [PARTITIONED BY (<col_name> <data_type>, ...)] ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' [WITH serdeproperties ( ['<property_name>'='<property_value>',...] )] STORED AS TEXTFILE LOCATION '<oss_location>' [tblproperties ('<tbproperty_name>'='<tbproperty_value>',...)];
Common parameters
For more information about common parameters, see Basic syntax parameters.
Unique parameters
with serdeproperties parameters
Applicable parser | Property name | Scenario | Description | Property value | Default value |
Built-in text data parser (CsvStorageHandler/TsvStorageHandler) | odps.text.option.gzip.input.enabled | Add this property to read data from GZIP-compressed CSV or TSV files. | CSV and TSV compression property. Set this parameter to True to allow MaxCompute to read compressed files. Otherwise, the read operation fails. |
| False |
odps.text.option.gzip.output.enabled | Add this property to write data to OSS in GZIP compression format. | CSV and TSV compression property. Set this parameter to True to allow MaxCompute to write data to OSS in GZIP compression format. Otherwise, the data is not compressed. |
| False | |
odps.text.option.header.lines.count | Add this property to ignore the first N rows of a CSV or TSV data file in OSS. | When MaxCompute reads a data file from OSS, it ignores the specified number of rows. | A non-negative integer | 0 | |
odps.text.option.null.indicator | Add this property to define how NULL values are parsed in a CSV or TSV data file in OSS. | The string specified by this parameter is parsed as NULL in SQL. For example, if | String | Empty string | |
odps.text.option.ignore.empty.lines | Add this property to define how empty lines are handled in a CSV or TSV data file in OSS. | If this parameter is set to True, MaxCompute ignores empty lines in the data file. Otherwise, empty lines are read. |
| True | |
odps.text.option.encoding | Add this property if the encoding of the CSV or TSV data file in OSS is not the default encoding. | Make sure that the encoding specified here matches the encoding of the OSS data file. Otherwise, MaxCompute cannot read the data. |
| UTF-8 | |
odps.text.option.delimiter | Add this property to specify the column delimiter for a CSV or TSV data file. | Make sure that the specified column delimiter can correctly read each column of the OSS data file. Otherwise, the data read by MaxCompute may be misaligned. | A single character | A comma (,) | |
odps.text.option.use.quote | Add this property if a field in the CSV or TSV data file contains a line break (CRLF), a double quotation mark, or a comma. | When a field in a CSV file contains a line break, a double quotation mark (escaped by prefixing |
| False | |
odps.sql.text.option.flush.header | When writing data to OSS, the first row of the file block is the header. | This parameter is effective only for CSV files. |
| False | |
odps.sql.text.schema.mismatch.mode | When the number of data columns in the OSS file does not match the number of columns in the external table schema. | Specifies how to handle rows with a mismatched number of columns. Note: This setting has no effect when the odps.text.option.use.quote parameter is set to True. |
| error | |
Built-in open source data parser (OpenCSVSerde) | separatorChar | Add this property to specify the column delimiter for CSV data saved in TEXTFILE format. | Specifies the column delimiter for CSV data. | A single string | A comma (,) |
quoteChar | Add this property if a field in the CSV data saved in TEXTFILE format contains a line break, a double quotation mark, or a comma. | Specifies the quote character for CSV data. | A single string | None | |
escapeChar | Add this property to specify the escape rule for CSV data saved in TEXTFILE format. | Specifies the escape character for CSV data. | A single string | None |
tblproperties parameters
Applicable parser | Property name | Scenario | Description | Property value | Default value |
Built-in open source data parser (OpenCSVSerde) | skip.header.line.count | Add this property to ignore the first N rows of a CSV file saved in TEXTFILE format. | When MaxCompute reads data from OSS, it ignores the specified number of rows starting from the first row. | A non-negative integer | None |
skip.footer.line.count | Add this property to ignore the last N rows of a CSV file saved in TEXTFILE format. | When MaxCompute reads data from OSS, it ignores the specified number of rows starting from the last row. | A non-negative integer | None | |
mcfed.mapreduce.output.fileoutputformat.compress | Add this property to write TEXTFILE data files to OSS in a compressed format. | TEXTFILE compression property. Set this parameter to True to allow MaxCompute to write TEXTFILE data files to OSS in a compressed format. Otherwise, the data is not compressed. |
| False | |
mcfed.mapreduce.output.fileoutputformat.compress.codec | Add this property to write TEXTFILE data files to OSS in a compressed format. | TEXTFILE compression property. Sets the compression method for TEXTFILE data files. Note: Only four compression methods are supported for |
| None | |
io.compression.codecs | Add this property if the OSS data file is in Raw-Snappy format. | Set this parameter to True to allow MaxCompute to read compressed data. Otherwise, MaxCompute cannot read the data. | com.aliyun.odps.io.compress.SnappyRawCodec | None |
Write data
For more information about the write syntax in MaxCompute, see Write syntax.
Query analysis
For more information about the SELECT syntax, see Query syntax.
For more information about optimizing query plans, see Query optimization.
More flexible type parsing with Smart Parse
MaxCompute SQL uses Data type 2.0 to read and write data for CSV-formatted OSS external tables. This provides compatibility with various value types in CSV files, whereas previously only values in strict formats were supported. The following table provides more details.
Type | Input as String | Output as String | Description |
BOOLEAN |
Note The input is trimmed using the |
| If the value is not in the enumeration, parsing fails. |
TINYINT |
Note
|
| 8-bit integer. An error is reported if the value is outside the range of |
SMALLINT | 16-bit integer. An error is reported if the value is outside the range of | ||
INT | 32-bit integer. An error is reported if the value is outside the range of | ||
BIGINT | 64-bit integer. An error is reported if the value is outside the range of Note
| ||
FLOAT |
Note
|
| Special values (case-insensitive) include NaN, Inf, -Inf, Infinity, and -Infinity. An error is reported if the value is outside the value range. If the precision is exceeded, the value is rounded and truncated. |
DOUBLE |
Note
|
| Special values (case-insensitive) include NaN, Inf, -Inf, Infinity, and -Infinity. An error is reported if the value is outside the value range. If the precision is exceeded, the value is rounded and truncated. |
DECIMAL (precision, scale) Example: DECIMAL(15,2) |
Note
|
| An error is reported if the integer part exceeds If the fractional part exceeds the scale, the value is rounded and truncated. |
CHAR(n) Example: CHAR(7) |
|
| The maximum length is 255. If the length is less than n, it is padded with spaces. The padding spaces are not used in comparisons. If the length exceeds n, the string is truncated. |
VARCHAR(n) Example: VARCHAR(7) |
|
| The maximum length is 65535. If the length exceeds n, the string is truncated. |
STRING |
|
| The length limit is 8 MB. |
DATE |
Note You can also customize the parsing method for input by setting |
|
|
TIMESTAMP_NTZ Note OpenCsvSerde does not support this type because it is not compatible with the Hive data format. |
|
|
|
DATETIME |
| If the system time zone is Asia/Shanghai:
|
|
TIMESTAMP |
| If the system time zone is Asia/Shanghai:
|
|
Notes
For any data type, an empty string in a CSV data file is parsed as NULL when read into the table.
Complex types such as STRUCT, ARRAY, and MAP are not supported because their values can easily conflict with common CSV delimiters, such as the comma (,), which can cause parsing to fail.
BINARY and INTERVAL types are not currently supported. To use these types, you can contact MaxCompute technical support.
For INT, SMALLINT, TINYINT, BIGINT, FLOAT, DOUBLE, and DECIMAL data types, the default parsing capability is flexible.
If you want to parse only basic number strings, you can set the
odps.text.option.smart.parse.levelparameter tonaiveintblproperties. This configures the parser to support only regular number strings, such as "123" and "123.456". Parsing other string formats results in an error.For the four date and time-related types (DATE, DATETIME, TIMESTAMP, and TIMESTAMP_NTZ), the underlying code uses
java.time.format.DateTimeFormatterfor processing. MaxCompute has several built-in default parsing formats. You can also define multiple parsing formats and one output format by settingodps.text.option.<date|datetime|timestamp|timestamp_ntz>.io.formatintblproperties. You can separate parsing formats with a hash symbol (#).Custom formats have a higher priority than MaxCompute's built-in parsing formats. The output format uses the first custom format.
For example, if you define a custom format string for the DATE type as
pattern1#pattern2#pattern3, you can parse strings that matchpattern1,pattern2, orpattern3. However, when you write data to a file, the output uses thepattern1format. For more information, see DateTimeFormatter.Do not use 'z' (time-zone name) as a custom time zone format, especially for users in China, because 'z' can be ambiguous in some contexts.
For example, 'CST' usually means China Standard Time (UTC+8) in China. However, when 'CST' is parsed by
java.time.format.DateTimeFormatter, it is identified as US Central Standard Time (UTC-6). This can lead to unexpected input or output. Use the 'x' (zone-offset) or 'VV' (time-zone ID) pattern instead.
Examples
When you run the following sample code, replace
<uid>with your Alibaba Cloud account ID.The role used in the following examples is
aliyunodpsdefaultrole. If you want to use a different role, replacealiyunodpsdefaultrolewith the name of the target role and grant the target role permissions to access OSS.
Create an OSS external table using the built-in text data parser
Example 1: Non-partitioned table
Map the table to the
Demo1/folder in the sample data. The following sample code shows how to create an OSS external table.CREATE EXTERNAL TABLE IF NOT EXISTS mc_oss_csv_external1 ( vehicleId INT, recordId INT, patientId INT, calls INT, locationLatitute DOUBLE, locationLongitude DOUBLE, recordTime STRING, direction STRING ) STORED BY 'com.aliyun.odps.CsvStorageHandler' WITH serdeproperties ( 'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole' ) LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/Demo1/'; -- You can run the desc extended mc_oss_csv_external1; command to view the schema of the created OSS external table.Query the non-partitioned external table.
SELECT * FROM mc_oss_csv_external1;The following result is returned:
+------------+------------+------------+------------+------------------+-------------------+----------------+------------+ | vehicleid | recordid | patientid | calls | locationlatitute | locationlongitude | recordtime | direction | +------------+------------+------------+------------+------------------+-------------------+----------------+------------+ | 1 | 1 | 51 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | S | | 1 | 2 | 13 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | NE | | 1 | 3 | 48 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | NE | | 1 | 4 | 30 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | W | | 1 | 5 | 47 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | S | | 1 | 6 | 9 | 1 | 46.81006 | -92.08174 | 9/15/2014 0:00 | S | | 1 | 7 | 53 | 1 | 46.81006 | -92.08174 | 9/15/2014 0:00 | N | | 1 | 8 | 63 | 1 | 46.81006 | -92.08174 | 9/15/2014 0:00 | SW | | 1 | 9 | 4 | 1 | 46.81006 | -92.08174 | 9/15/2014 0:00 | NE | | 1 | 10 | 31 | 1 | 46.81006 | -92.08174 | 9/15/2014 0:00 | N | +------------+------------+------------+------------+------------------+-------------------+------------+----------------+Write data to the non-partitioned external table and check whether the data is written.
INSERT INTO mc_oss_csv_external1 VALUES(1,12,76,1,46.81006,-92.08174,'9/14/2014 0:10','SW'); SELECT * FROM mc_oss_csv_external1 WHERE recordId=12;The following result is returned:
+------------+------------+------------+------------+------------------+-------------------+----------------+------------+ | vehicleid | recordid | patientid | calls | locationlatitute | locationlongitude | recordtime | direction | +------------+------------+------------+------------+------------------+-------------------+----------------+------------+ | 1 | 12 | 76 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:10 | SW | +------------+------------+------------+------------+------------------+-------------------+----------------+------------+A new file is generated in the
Demo1/path in OSS.
Example 2: Partitioned table
Map the table to the
Demo2/folder in the sample data. The following sample code shows how to create an OSS external table and import partition data.CREATE EXTERNAL TABLE IF NOT EXISTS mc_oss_csv_external2 ( vehicleId INT, recordId INT, patientId INT, calls INT, locationLatitute DOUBLE, locationLongitude DOUBLE, recordTime STRING ) PARTITIONED BY ( direction STRING ) STORED BY 'com.aliyun.odps.CsvStorageHandler' WITH serdeproperties ( 'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole' ) LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/Demo2/'; -- You can run the DESC EXTENDED mc_oss_csv_external2; command to view the schema of the created external table.Import partition data. If the OSS external table that you created is a partitioned table, you must also import the partition data. For more information, see Syntax for adding partition data to an OSS external table.
MSCK REPAIR TABLE mc_oss_csv_external2 ADD PARTITIONS; --This is equivalent to the following statement. ALTER TABLE mc_oss_csv_external2 ADD PARTITION (direction = 'N') PARTITION (direction = 'NE') PARTITION (direction = 'S') PARTITION (direction = 'SW') PARTITION (direction = 'W');Query the partitioned external table.
SELECT * FROM mc_oss_csv_external2 WHERE direction='NE';The following result is returned:
+------------+------------+------------+------------+------------------+-------------------+----------------+------------+ | vehicleid | recordid | patientid | calls | locationlatitute | locationlongitude | recordtime | direction | +------------+------------+------------+------------+------------------+-------------------+----------------+------------+ | 1 | 2 | 13 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | NE | | 1 | 3 | 48 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | NE | | 1 | 9 | 4 | 1 | 46.81006 | -92.08174 | 9/15/2014 0:00 | NE | +------------+------------+------------+------------+------------------+-------------------+----------------+------------+Write data to the partitioned external table and check whether the data is written.
INSERT INTO mc_oss_csv_external2 PARTITION(direction='NE') VALUES(1,12,76,1,46.81006,-92.08174,'9/14/2014 0:10'); SELECT * FROM mc_oss_csv_external2 WHERE direction='NE' AND recordId=12;The following result is returned:
+------------+------------+------------+------------+------------------+-------------------+----------------+------------+ | vehicleid | recordid | patientid | calls | locationlatitute | locationlongitude | recordtime | direction | +------------+------------+------------+------------+------------------+-------------------+----------------+------------+ | 1 | 12 | 76 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:10 | NE | +------------+------------+------------+------------+------------------+-------------------+----------------+------------+A new file is generated in the
Demo2/direction=NEpath in OSS.
Example 3: Compressed data
This example shows how to create a GZIP-compressed CSV external table and perform read and write operations.
Create an internal table and write test data to it.
CREATE TABLE vehicle_test( vehicleid INT, recordid INT, patientid INT, calls INT, locationlatitute DOUBLE, locationLongitude DOUBLE, recordtime STRING, direction STRING ); INSERT INTO vehicle_test VALUES (1,1,51,1,46.81006,-92.08174,'9/14/2014 0:00','S');Create a GZIP-compressed CSV external table and map it to the
Demo3/(compressed data) folder in the sample data. The following sample statement is used to create an OSS external table:CREATE EXTERNAL TABLE IF NOT EXISTS mc_oss_csv_external3 ( vehicleId INT, recordId INT, patientId INT, calls INT, locationLatitute DOUBLE, locationLongitude DOUBLE, recordTime STRING, direction STRING ) PARTITIONED BY (dt STRING) STORED BY 'com.aliyun.odps.CsvStorageHandler' WITH serdeproperties ( 'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole', 'odps.text.option.gzip.input.enabled'='true', 'odps.text.option.gzip.output.enabled'='true' ) LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/Demo3/'; -- Import partition data. MSCK REPAIR TABLE mc_oss_csv_external3 ADD PARTITIONS; -- You can run the DESC EXTENDED mc_oss_csv_external3; command to view the schema of the created external table.Use the MaxCompute client to read data from OSS. The following is a sample command:
NoteIf the compressed data in OSS is in an open source data format, you must add the
set odps.sql.hive.compatible=true;command before the SQL statement and submit them together.--Enable a full table scan. This setting is valid only for the current session. SET odps.sql.allow.fullscan=true; SELECT recordId, patientId, direction FROM mc_oss_csv_external3 WHERE patientId > 25;The following result is returned:
+------------+------------+------------+ | recordid | patientid | direction | +------------+------------+------------+ | 1 | 51 | S | | 3 | 48 | NE | | 4 | 30 | W | | 5 | 47 | S | | 7 | 53 | N | | 8 | 63 | SW | | 10 | 31 | N | +------------+------------+------------+Read data from the internal table and write it to the OSS external table.
You can run the
INSERT OVERWRITEorINSERT INTOcommand on the external table using the MaxCompute client to write data to OSS.INSERT INTO TABLE mc_oss_csv_external3 PARTITION (dt='20250418') SELECT * FROM vehicle_test;After the command is successfully executed, you can view the exported file in the OSS folder.
Create an OSS external table and specify the first row of the corresponding OSS file as the header
Create the Demo11 folder in the oss-mc-test bucket from the sample data and run the following statements:
--Create an external table.
CREATE EXTERNAL TABLE mf_oss_wtt
(
id BIGINT,
name STRING,
tran_amt DOUBLE
)
STORED BY 'com.aliyun.odps.CsvStorageHandler'
WITH serdeproperties (
'odps.text.option.header.lines.count' = '1',
'odps.sql.text.option.flush.header' = 'true'
)
LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/Demo11/';
--Insert data.
INSERT OVERWRITE TABLE mf_oss_wtt VALUES (1, 'val1', 1.1),(2, 'value2', 1.3);
--Query data.
--When you create the table, you can define all fields as STRING. Otherwise, an error occurs when the header is read.
--Alternatively, add the 'odps.text.option.header.lines.count' = '1' parameter when you create the table to skip the header.
SELECT * FROM mf_oss_wtt;The following result is returned:
+----------+--------+------------+
| id | name | tran_amt |
+----------+--------+------------+
| 1 | val1 | 1.1 |
| 2 | value2 | 1.3 |
+----------+--------+------------+Create an OSS external table with a mismatched column count
Create the
demofolder in theoss-mc-testbucket from the sample data and upload thetest.csvfile. Thetest.csvfile contains the following content.1,kyle1,this is desc1 2,kyle2,this is desc2,this is two 3,kyle3,this is desc3,this is three, I have 4 columnsCreate an external table.
Set the behavior for rows with a mismatched number of columns to
TRUNCATE.--Delete the table. DROP TABLE test_mismatch; --Create a new external table. CREATE EXTERNAL TABLE IF NOT EXISTS test_mismatch ( id string, name string, dect string, col4 string ) STORED BY 'com.aliyun.odps.CsvStorageHandler' WITH serdeproperties ('odps.sql.text.schema.mismatch.mode' = 'truncate') LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/demo/';Set the behavior for rows with a mismatched number of columns to
IGNORE.--Delete the table. DROP TABLE test_mismatch01; --Create a new external table. CREATE EXTERNAL TABLE IF NOT EXISTS test_mismatch01 ( id STRING, name STRING, dect STRING, col4 STRING ) STORED BY 'com.aliyun.odps.CsvStorageHandler' WITH serdeproperties ('odps.sql.text.schema.mismatch.mode' = 'ignore') LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/demo/';Query the table data.
Query the test_mismatch table.
SELECT * FROM test_mismatch; --Result +----+-------+---------------+---------------+ | id | name | dect | col4 | +----+-------+---------------+---------------+ | 1 | kyle1 | this is desc1 | NULL | | 2 | kyle2 | this is desc2 | this is two | | 3 | kyle3 | this is desc3 | this is three | +----+-------+---------------+---------------+Query the test_mismatch01 table.
SELECT * FROM test_mismatch01; --Result +----+-------+----------------+-------------+ | id | name | dect | col4 | +----+-------+----------------+-------------+ | 2 | kyle2 | this is desc2 | this is two +----+-------+----------------+-------------+
Create an OSS external table using the built-in open source parser
This example shows how to create an OSS external table using the built-in open source parser to read a comma-separated file and ignore the first and last rows of data.
Create the
demo-testfolder in theoss-mc-testbucket from the sample data and upload the test file test.csv.The test file contains the following data.
1,1,51,1,46.81006,-92.08174,9/14/2014 0:00,S 1,2,13,1,46.81006,-92.08174,9/14/2014 0:00,NE 1,3,48,1,46.81006,-92.08174,9/14/2014 0:00,NE 1,4,30,1,46.81006,-92.08174,9/14/2014 0:00,W 1,5,47,1,46.81006,-92.08174,9/14/2014 0:00,S 1,6,9,1,46.81006,-92.08174,9/15/2014 0:00,S 1,7,53,1,46.81006,-92.08174,9/15/2014 0:00,N 1,8,63,1,46.81006,-92.08174,9/15/2014 0:00,SW 1,9,4,1,46.81006,-92.08174,9/15/2014 0:00,NE 1,10,31,1,46.81006,-92.08174,9/15/2014 0:00,NCreate an external table, specify the delimiter as a comma, and set parameters to ignore the first and last rows.
CREATE EXTERNAL TABLE ext_csv_test08 ( vehicleId INT, recordId INT, patientId INT, calls INT, locationLatitute DOUBLE, locationLongitude DOUBLE, recordTime STRING, direction STRING ) ROW FORMAT serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH serdeproperties ( "separatorChar" = "," ) stored AS textfile location 'oss://oss-cn-hangzhou-internal.aliyuncs.com/***/' -- Set parameters to ignore the first and last rows. TBLPROPERTIES ( "skip.header.line.COUNT"="1", "skip.footer.line.COUNT"="1" ) ;Read the external table.
SELECT * FROM ext_csv_test08; -- The result shows that only 8 rows were read, ignoring the first and last rows. +------------+------------+------------+------------+------------------+-------------------+------------+------------+ | vehicleid | recordid | patientid | calls | locationlatitute | locationlongitude | recordtime | direction | +------------+------------+------------+------------+------------------+-------------------+------------+------------+ | 1 | 1 | 51 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | S | | 1 | 2 | 13 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | NE | | 1 | 3 | 48 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | NE | | 1 | 4 | 30 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | W | | 1 | 5 | 47 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | S | | 1 | 6 | 9 | 1 | 46.81006 | -92.08174 | 9/15/2014 0:00 | S | | 1 | 7 | 53 | 1 | 46.81006 | -92.08174 | 9/15/2014 0:00 | N | | 1 | 8 | 63 | 1 | 46.81006 | -92.08174 | 9/15/2014 0:00 | SW | | 1 | 9 | 4 | 1 | 46.81006 | -92.08174 | 9/15/2014 0:00 | NE | | 1 | 10 | 31 | 1 | 46.81006 | -92.08174 | 9/15/2014 0:00 | N | +------------+------------+------------+------------+------------------+-------------------+------------+------------+
Create a CSV-formatted OSS external table with custom date and time data types
For more information about custom parsing and output formats for CSV time types, see More flexible type parsing with Smart Parse.
Create a CSV external table for various time data types (datetime, timestamp, and timestamp_ntz).
CREATE EXTERNAL TABLE test_csv ( col_date DATE, col_datetime DATETIME, col_timestamp TIMESTAMP, col_timestamp_ntz TIMESTAMP_NTZ ) STORED BY 'com.aliyun.odps.CsvStorageHandler' LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/demo/' TBLPROPERTIES ( 'odps.text.option.date.io.format' = 'MM/dd/yyyy', 'odps.text.option.datetime.io.format' = 'yyyy-MM-dd-HH-mm-ss x', 'odps.text.option.timestamp.io.format' = 'yyyy-MM-dd HH-mm-ss VV', 'odps.text.option.timestamp_ntz.io.format' = 'yyyy-MM-dd HH:mm:ss.SS' ); INSERT OVERWRITE test_csv VALUES(DATE'2025-02-21', DATETIME'2025-02-21 08:30:00', TIMESTAMP'2025-02-21 12:30:00', TIMESTAMP_NTZ'2025-02-21 16:30:00.123456789');After you insert the data, the content of the CSV file is:
02/21/2025,2025-02-21-08-30-00 +08,2025-02-21 12-30-00 Asia/Shanghai,2025-02-21 16:30:00.12Read the data again to see the result.
SELECT * FROM test_csv;The following result is returned:
+------------+---------------------+---------------------+------------------------+ | col_date | col_datetime | col_timestamp | col_timestamp_ntz | +------------+---------------------+---------------------+------------------------+ | 2025-02-21 | 2025-02-21 08:30:00 | 2025-02-21 12:30:00 | 2025-02-21 16:30:00.12 | +------------+---------------------+---------------------+------------------------+