All Products
Search
Document Center

TSV

Last Updated: Jul 25, 2019

This topic describes how to create tables for TSV data in DLA.

Prerequisites

Before creating tables, you must create an OSS schema first. In the example, all tables use the following OSS schema:

  1. CREATE SCHEMA dla_oss_db with DBPROPERTIES(
  2. catalog='oss',
  3. location 'oss://dlaossfile1/dla/'
  4. );

Single character-separated files

TSV files are also plain text files. Columns are separated by tabs.

For example, the OSS instance has a file oss://dlaossfile1/dla/tsv_single.tsv whose columns are separated by tabs. It stores the following data:

  1. Beijing China 010
  2. ShangHai China 021
  3. Tianjin China 022

The following is a sample statement for creating a table based on the file:

  1. CREATE EXTERNAL TABLE tsv_single (
  2. city STRING,
  3. country STRING,
  4. code INT
  5. )
  6. ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
  7. STORED AS TEXTFILE
  8. LOCATION 'oss://dlaossfile1/dla/';

The query result is as follows:

  1. SELECT * FROM dla_oss_db.tsv_single
  2. |----------|--------------|
  3. | city |country| code |
  4. |----------|----- |-----|
  5. | Beijing | China| 10 |
  6. | ShangHai | China| 21 |
  7. | Tianjin | China| 22 |

Multi character-separated files

Separators of fields in TSV files can contain multiple characters. For example, fields of each column in tsv_multidelimit.tsv are separated by ||. || can be replaced by a specific separator string.

  1. https://farm7.staticflickr.com/5769/21094803716_da3cea21b8_o.jpg||7825604||EEGQ0ZQ/OxJAIGzfJ6LMvw==
  2. https://c1.staticflickr.com/4/3743/11539980784_b52f835317_o.jpg||4227077||oc+H0KgW0sxyEgAlfOiQDQ==

The following is a sample statement for creating a table based on the file:

  1. CREATE EXTERNAL TABLE tsv_multidelimit (
  2. image_address STRING,
  3. id STRING,
  4. name STRING
  5. )
  6. ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe'
  7. with serdeproperties(
  8. "field.delim"="||"
  9. )
  10. STORED AS TEXTFILE LOCATION 'oss://dlaossfile1/dla/';

The query result is as follows:

  1. SELECT * FROM dla_oss_db.tsv_multidelimit
  2. |-----------------------------------------------------------------|---|--------|
  3. | image_address | id| name |
  4. | https://farm7.staticflickr.com/5769/21094803716_da3cea21b8_o.jpg
  5. | 7825604| EEGQ0ZQ/OxJAIGzfJ6LMvw==|
  6. | https://c1.staticflickr.com/4/3743/11539980784_b52f835317_o.jpg
  7. | 4227077| oc+H0KgW0sxyEgAlfOiQDQ==|