All Products
Search
Document Center

Execute INSERT OVERWRITE to insert data into an OSS partition table

Last Updated: May 13, 2019

This topic describes how to use the INSERT OVERWRITE statement to overwrite data in an OSS partition table.

Prerequisites

When DLA performs an INSERT OVERWRITE operation, it deletes historical data in an OSS partition table before writing new data. Therefore, you must follow these steps to grant DLA the permission to delete OSS files first.

Step 1: Create a custom authorization policy

  1. Use your DLA account to log on to the RAM console.

  2. In the left-side navigation pane, choose Policies. On the Polices page that appears, click Create Policy. On the Create Custom Policy page, configure the parameters as follows:

    • Configuration Mode: Select Script.

    • Policy Document: Copy and paste the following content into the text box. Replace “your-bucket-name” with your OSS bucket name. If you want DLA permissions at a smaller granularity, enter the complete path of the OSS file that you want to delete.

      1. {
      2. "Version": "1",
      3. "Statement": [
      4. {
      5. "Action": [
      6. "oss:DeleteObject"
      7. ],
      8. "Resource": "acs:oss:*:*:<your-bucket-name>/*",
      9. "Effect": "Allow"
      10. }
      11. ]
      12. }

      1

Step 2: Grant the authorization policy to the role of DLA

  1. In the left-side navigation pane, choose RAM Roles. On the RAM Roles page that appears, locate the target RAM role AliyunOpenAnalyticsAccessingOSSRole.

    2

  2. Click the role name. On the Permissions tab page that appears by default, click Add Permissions.

    3

  3. In the Add Permissions dialog box that appears, select Custom Policy . Then, select the policy you created in step 1 and grant it to the role of DLA (AliyunOpenAnalyticsAccessingOSSRole).

    4

Procedure

Step 1: Create an OSS schema

  1. CREATE SCHEMA dla_oss_demo with DBPROPERTIES(
  2. catalog='oss',
  3. location= 'oss://bucket-name/dla/'
  4. );

location: indicates the file storage directory in the OSS bucket, which must end with a slash (/).

Step 2: Create an OSS partition table

  1. CREATE EXTERNAL TABLE dla (
  2. prod_name string,
  3. cnt bigint)
  4. PARTITIONED BY (y string, m string)
  5. ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
  6. STORED AS TEXTFILE
  7. LOCATION 'oss://bucket-name/dla/';

OSS files are stored in the following path:

  • https://bucket-name.oss-cn-hangzhou.aliyuncs.com/dla/y=2019/m=3/cnt.txt

    5

  • https://bucket-name.oss-cn-hangzhou.aliyuncs.com/dla/y=2018/m=12/cnt2.txt

    6

Step 3: Run the MSCK command to update partition information

  1. MSCK REPAIR TABLE dla;

Step 4: Run the SHOW PARTITIONS command to view partition information

  1. show partitions dla;
  2. +-----------+
  3. | y=2018/m=12|
  4. | y=2019/m=2 |

Step 5: Query data in the partition table

  1. select * from dla;
  2. +---------------------------+
  3. |prod_name | cnt | y | m |
  4. +---------------------------+
  5. |Mouse | 1 |2019 | 3 |
  6. |Monitor | 2 |2019 | 3 |
  7. |Keyboard | 4 |2019 | 3 |
  8. |Desk | 100 |2018 | 12 |
  9. |Chair | 150 |2018 | 12 |

Step 6: Insert a data record into the partition table

  1. insert into dla values ("Hard disk",10,"2019","3");
  2. +---------------------------+
  3. select * from dla;
  4. +---------------------------+
  5. |prod_name | cnt | y | m |
  6. +---------------------------+
  7. |Hard disk | 10 |2019 | 3 |
  8. |Mouse | 1 |2019 | 3 |
  9. |Monitor | 2 |2019 | 3 |
  10. |Keyboard | 4 |2019 | 3 |
  11. |Desk | 100 |2018 | 12 |
  12. |Chair | 150 |2018 | 12 |

Step 7: Execute the INSERT OVERWRITE statement to insert the data into the partition table

  1. insert overwrite dla values ("Computer",10,"2019","3");
  2. select * from dla;
  3. +---------------------------+
  4. |prod_name | cnt | y | m |
  5. +---------------------------+
  6. |Computer | 10 |2019 | 3 |
  7. |Desk | 100 |2018 | 12 |
  8. |Chair | 150 |2018 | 12 |

After the INSERT OVERWRITE statement is executed, the original data in the table is replaced with the new data record. In the OSS bucket, the data in the path of dla/y=2018/m=12/ remains unchanged, whereas the cnt.txt file in dla/y=2019/m=3/ has been deleted, leaving only the data record that is inserted by the INSERT OVERWRITE statement.