Data import

Last Updated: May 07, 2018

Data import and export through MaxCompute can be achieved by:

For data export, see commands about downloading in Tunnel Commands.

Tunnel commands

Data preparation

In the following example, the contents of a local file ‘wc_example.txt’ are as follows:

  1. Hello World!

The file is saved into the directory: D:\odps\odps\bin.

Create a MaxCompute table

To import the data created in the preceding step, a MaxCompute table must be created. An example is as follows:

  1. CREATE TABLE wc_in (word string);

Run Tunnel command

To import the data on MaxCompute console, run the tunnel command as follows:

  1. tunnel upload D:\odps\odps\bin\wc_example.txt wc_in;

After the running is successful, check the records in the table wc_in, as follows:

  1. odps@ $odps_project>select * from wc_in;
  2. ID = 20150918110501864g5z9c6
  3. Log view:
  4. http://webconsole.odps.aliyun-inc.com:8080/logview/?h=http://service-corp.odps.aliyun-inc.com/api&p=odps_public_dev&i=20150918
  5. QWxsb3ciLCJSZXNvdXJjZSI6WyJhY3M6b2RwczoqOnByb2plY3RzL29kcHNfcHVibGljX2Rldi9pbnN0YW5jZXMvMjAxNTA5MTgxMTA1MDE4NjRnNXo5YzYiXX1dLC
  6. +------+
  7. | word |
  8. +------+
  9. | Hello World! |
  10. +------+

Note:

  • For more information of Tunnel commands, for example, how to import data into a partitioned table, see Tunnel Operation.

  • If multiple columns are in the table, you can specify column separators by ‘-fd’ parameter.

MaxCompute Studio

Before using MaxCompute Studie, make sure that you have installed MaxCompute Studio and configured Project Space Connection.

Data Preparation

In the following example, the contents of a local file ‘wc_example.txt’ are as follows:

  1. Hello World!

The file is saved into the directory: D:\odps\odps\bin.

Create a MaxCompute table

To import the data created in the preceding step, a MaxCompute table must be created first. Right-click tables&views in the project and operate as follows:

create table

If the statement is executed successfully, then the table has been created.

Upload data files

Right-click the table name created in the preceding tables&views list in the project. If the table name not appears in the list, click the refresh button.

loaddata

For more information, see Import and Export Data.

Tunnel SDK

Scenario

Upload data into MaxCompute, where the project is “odps_public_dev”, the table name is “tunnel_sample_test” and the partitions are ”pt=20150801,dt=”hangzhou”.

Procedure

  1. Create a table and add corresponding partitions:

    1. CREATE TABLE IF NOT EXISTS tunnel_sample_test(
    2. id STRING,
    3. name STRING)
    4. PARTITIONED BY (pt STRING, dt STRING); --Create a table.
    5. ALTER TABLE tunnel_sample_test
    6. ADD IF NOT EXISTS PARTITION (pt='20150801',dt='hangzhou'); --Add the partitions.
  2. Create the program directory structure of UploadSample as follows:

    1. |---pom.xml
    2. |---src
    3. |---main
    4. |---java
    5. |---com
    6. |---aliyun
    7. |---odps
    8. |---tunnel
    9. |---example
    10. |---UploadSample.java
    • pom.xml: maven program file.
    • UploadSample: tunnel source file.
  3. Write UploadSample program as follows:

    1. package com.aliyun.odps.tunnel.example;
    2. import java.io.IOException;
    3. import java.util.Date;
    4. import com.aliyun.odps.Column;
    5. import com.aliyun.odps.Odps;
    6. import com.aliyun.odps.PartitionSpec;
    7. import com.aliyun.odps.TableSchema;
    8. import com.aliyun.odps.account.Account;
    9. import com.aliyun.odps.account.AliyunAccount;
    10. import com.aliyun.odps.data.Record;
    11. import com.aliyun.odps.data.RecordWriter;
    12. import com.aliyun.odps.tunnel.TableTunnel;
    13. import com.aliyun.odps.tunnel.TunnelException;
    14. import com.aliyun.odps.tunnel.TableTunnel.UploadSession;
    15. public class UploadSample {
    16. private static String accessId = "####";
    17. private static String accessKey = "####";
    18. private static String tunnelUrl = "http://dt-corp.odps.aliyun-inc.com";
    19. private static String odpsUrl = "http://service-corp.odps.aliyun-inc.com/api";
    20. private static String project = "odps_public_dev";
    21. private static String table = "tunnel_sample_test";
    22. private static String partition = "pt=20150801,dt=hangzhou";
    23. public static void main(String args[]) {
    24. Account account = new AliyunAccount(accessId, accessKey);
    25. Odps odps = new Odps(account);
    26. odps.setEndpoint(odpsUrl);
    27. odps.setDefaultProject(project);
    28. try {
    29. TableTunnel tunnel = new TableTunnel(odps);
    30. tunnel.setEndpoint(tunnelUrl);
    31. PartitionSpec partitionSpec = new PartitionSpec(partition);
    32. UploadSession uploadSession = tunnel.createUploadSession(project,
    33. table, partitionSpec);
    34. System.out.println("Session Status is : "
    35. + uploadSession.getStatus().toString());
    36. TableSchema schema = uploadSession.getSchema();
    37. RecordWriter recordWriter = uploadSession.openRecordWriter(0);
    38. Record record = uploadSession.newRecord();
    39. for (int i = 0; i < schema.getColumns().size(); i++) {
    40. Column column = schema.getColumn(i);
    41. switch (column.getType()) {
    42. case BIGINT:
    43. record.setBigint(i, 1L);
    44. break;
    45. case BOOLEAN:
    46. record.setBoolean(i, true);
    47. break;
    48. case DATETIME:
    49. record.setDatetime(i, new Date());
    50. break;
    51. case DOUBLE:
    52. record.setDouble(i, 0.0);
    53. break;
    54. case STRING:
    55. record.setString(i, "sample");
    56. break;
    57. default:
    58. throw new RuntimeException("Unknown column type: "
    59. + column.getType());
    60. }
    61. }
    62. for (int i = 0; i < 10; i++) {
    63. recordWriter.write(record);
    64. }
    65. recordWriter.close();
    66. uploadSession.commit(new Long[]{0L});
    67. System.out.println("upload success!");
    68. } catch (TunnelException e) {
    69. e.printStackTrace();
    70. } catch (IOException e) {
    71. e.printStackTrace();
    72. }
    73. }
    74. }

    Note:

    The configuration of AccessKeyId and AccessKeySecret is ignored in the preceding example. In actual operation, apply the required AccessKeyId and AccessKeySecret.

  4. The configuration of pom.xml is as follows:

    1. <?xml version="1.0" encoding="UTF-8"?>
    2. <project xmlns="http://maven.apache.org/POM/4.0.0"
    3. xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    4. xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    5. <modelVersion>4.0.0</modelVersion>
    6. <groupId>com.aliyun.odps.tunnel.example</groupId>
    7. <artifactId>UploadSample</artifactId>
    8. <version>1.0-SNAPSHOT</version>
    9. <dependencies>
    10. <dependency>
    11. <groupId>com.aliyun.odps</groupId>
    12. <artifactId>odps-sdk-core-internal</artifactId>
    13. <version>0.20.7</version>
    14. </dependency>
    15. </dependencies>
    16. <repositories>
    17. <repository>
    18. <id>alibaba</id>
    19. <name>alibaba Repository</name>
    20. <url>http://mvnrepo.alibaba-inc.com/nexus/content/groups/public/</url>
    21. </repository>
    22. </repositories>
    23. </project>
  5. Compile and run:

    Compile the program UploadSample:

    1. mvn package

    Run the program UploadSample. Here, Eclipse is used to import the Maven project:

    1. Right-click on the Java program and click Import > Maven > Existing Maven Projects.

    2. Right-click on ‘UploadSample.java’ and click Run As > Run Configurations.

    3. Click Run. After running successfully, the console shows as follows:

      1. Session Status is : NORMAL
      2. upload success!
  6. Check running result.
    Input the following statement on the console:

    1. select * from tunnel_sample_test;

    The result is shown as follows:

    1. +----+------+----+----+
    2. | id | name | pt | dt |
    3. +----+------+----+----+
    4. | sample | sample | 20150801 | hangzhou |
    5. | sample | sample | 20150801 | hangzhou |
    6. | sample | sample | 20150801 | hangzhou |
    7. | sample | sample | 20150801 | hangzhou |
    8. | sample | sample | 20150801 | hangzhou |
    9. | sample | sample | 20150801 | hangzhou |
    10. | sample | sample | 20150801 | hangzhou |
    11. | sample | sample | 20150801 | hangzhou |
    12. | sample | sample | 20150801 | hangzhou |
    13. | sample | sample | 20150801 | hangzhou |
    14. +----+------+----+----+

    Note:

Other import methods

In addition to MaxCompute Console and Tunnel Java SDK, data can also be imported through Alibaba Cloud DTplus products, Sqoop, Fluentd, Flume, LogStash, and more Tools.

Thank you! We've received your feedback.