All Products
Search
Document Center

DataWorks:Create tables and upload data

Last Updated:Aug 17, 2023

This topic describes how to create tables in the DataWorks console and upload data from your on-premises machine to the tables. The bank_data and result_table tables are used in the examples.

Prerequisites

A MaxCompute compute engine is associated with the workspace in which you want to create tables. The MaxCompute service is available in a workspace only after you associate a MaxCompute compute engine with the workspace on the Computing engine information tab of the Workspace page in the DataWorks console. For more information, see Create and manage workspaces.

Background information

The bank_data table is used to store business data and the result_table table is used to store data analytics results.

Create the bank_data table

  1. Go to the DataStudio page.

    Log on to the DataWorks console. In the left-side navigation pane, choose Data Modeling and Development > DataStudio. On the page that appears, select the desired workspace from the drop-down list and click Go to DataStudio.

  2. In the Scheduled Workflow pane of the DataStudio page, move the pointer over the Create icon and choose Create Table > MaxCompute > Table. Create a MaxCompute table

    Alternatively, you can find the desired workflow in the Business Flow section, right-click MaxCompute, and then select Create Table.

  3. In the Create Table dialog box, configure the Path parameter, set the Name parameter to bank_data, and then click Create.

    Important

    If multiple MaxCompute compute engines are associated with the current workspace as compute engine instances, you must select a MaxCompute compute engine instance from the Engine Instance drop-down list in the Create Table dialog box.

  4. On the table configuration tab, click DDL.

  5. In the DDL dialog box, enter the following statement and click Generate Table Schema:

    CREATE TABLE IF NOT EXISTS bank_data
    (
     age             BIGINT COMMENT 'Age',
     job             STRING COMMENT 'Job type',
     marital         STRING COMMENT 'Marital status',
     education       STRING COMMENT 'Education level',
     default         STRING COMMENT 'Credit card',
     housing         STRING COMMENT 'Mortgage',
     loan            STRING COMMENT 'Loan',
     contact         STRING COMMENT 'Contact information',
     month           STRING COMMENT 'Month',
     day_of_week     STRING COMMENT 'Day of the week',
     duration        STRING COMMENT 'Duration',
     campaign        BIGINT COMMENT 'Number of contacts during the campaign',
     pdays           DOUBLE COMMENT 'Interval from the last contact',
     previous        DOUBLE COMMENT 'Number of contacts with the customer',
     poutcome        STRING COMMENT 'Result of the previous marketing campaign',
     emp_var_rate    DOUBLE COMMENT 'Employment change rate',
     cons_price_idx  DOUBLE COMMENT 'Consumer price index',
     cons_conf_idx   DOUBLE COMMENT 'Consumer confidence index',
     euribor3m       DOUBLE COMMENT 'Euro deposit rate',
     nr_employed     DOUBLE COMMENT 'Number of employees',
     y               BIGINT COMMENT 'Time deposit available or not'
    );

    For more information about the SQL syntax for creating tables, see Create tables.

  6. In the Confirm message, click OK.

  7. Configure the Display Name parameter in the General section and click Commit to DEV and Commit to PROD.

    Note

    A workspace in standard mode is used in the example. If you are using a workspace in basic mode, you only need to click Commit to PROD.

  8. In the left-side navigation pane, click the Workspace Tables icon.

  9. In the Workspace Tables pane, double-click the name of the created table to view the table information.

Create the result_table table

  1. In the Scheduled Workflow pane of the DataStudio page, move the pointer over the Create icon and choose Create Table > Table.

    Alternatively, you can find the desired workflow in the Business Flow section, right-click MaxCompute, and then select Create Table.

  2. In the Create Table dialog box, configure the Path parameter, set the Name parameter to result_table, and then click Create.

  3. In the DDL Statement dialog box, enter the following statement and click Generate Table Schema:

    CREATE TABLE IF NOT EXISTS result_table
    (  
     education   STRING COMMENT 'Education level',
     num         BIGINT COMMENT 'Number of persons'
    );
  4. In the Confirm message, click OK.

  5. Configure the Display Name parameter in the General section and click Commit to DEV and Commit to PROD.

  6. In the left-side navigation pane, click the Workspace Tables icon.

  7. In the Workspace Tables pane, double-click the name of the created table to view the table information.

Upload a file from your on-premises machine to the bank_data table

DataWorks allows you to perform the following operations:

  • Upload a text file from your on-premises machine to a table in a workspace.

  • Use Data Integration to import business data from various data sources to a workspace.

Note

You must take note of the following items when you upload a file from your on-premises machine:

  • File format: The file must be in the .txt, .csv, or .log format.

  • File size: The size of the file cannot exceed 30 MB.

    If you need to upload a file that is greater than 30 MB in size, use one of the following methods:

    • Upload the file to Object Storage Service (OSS) and import the object data to a MaxCompute table by using an external table of MaxCompute. For more information about how to upload a file to OSS, see Upload objects. For more information about how to use external tables of MaxCompute, see External table.

    • Upload the file to OSS and synchronize the object data to a MaxCompute table by using Data Integration. For more information about how to upload a file to OSS, see Upload objects. For more information about how to synchronize data from OSS to MaxCompute, see Configure a batch synchronization node by using the codeless UI.

    • Use the data upload feature provided by DataAnalysis.

  • Table type: You can import data from a file to a partitioned table or a non-partitioned table. If you import data from a file to a partitioned table, values in the partition columns cannot contain special characters such as ampersands (&) and asterisks (*).

In this example, the file banking.txt is uploaded from an on-premises machine to the bank_data table. To upload the file, perform the following steps:

  1. Click the Import icon on the DataStudio page. Upload the file on the DataStudio page

  2. In the Import Local Data to Table in Development Environment dialog box, enter at least three letters in the search box to search for tables, select the bank_data from the search results, and then click Next.

    Note

    If you cannot find the table that you created, you can manually synchronize the table to DataWorks in DataMap. Then, you can search for the table by keyword in the dialog box again. For more information about how to manually synchronize a table, see Manually synchronize a table.

  3. In the dialog box that appears, click Browse next to Select File to select the desired file. The value of the Select Data Import Method parameter is Upload Local File and cannot be changed. Then, configure the other parameters.

    Upload a file

    Parameter

    Description

    Select Data Import Method

    The method that can be used to upload data. Default value: Upload Local File. The default value cannot be changed.

    File Format

    Valid values: CSV and Custom Text File.

    Select File

    The file that you want to upload. To upload a file, click Browse and select the file to upload.

    Select Delimiter

    The delimiter used in the file. Valid values: Comma (,), Tab, Semicolon (;), Space, |, #, and &. In this example, Comma (,) is selected.

    Original Character Set

    The character set of the file. Valid values: GBK, UTF-8, CP936, and ISO-8859. In this example, GBK is selected.

    Import First Row

    The row from which data is to be imported. In this example, 1 is selected.

    First Row as Field Names

    Specifies whether to use the first row as the header row. In this example, Yes is not selected.

    Preview

    The preview result of the data that you want to upload.

  4. Click Next.

  5. Select a matching mode for the fields in the source file and destination table. In this example, By Location is selected.

  6. Click Import Data.

What to do next

You have learned how to create tables and upload data from your on-premises machine to the created tables. You can proceed with the next tutorial. In the next tutorial, you will learn how to create, configure, and commit a workflow, and perform data computing and analytics in the workspace. For more information, see Create a workflow.