Before you process data in a MaxCompute project, you must select development tools and prepare the required environment based on your business requirements. This topic describes how to prepare an environment and install the required development tools.

Prerequisites

A MaxCompute project is created. For more information about how to create a MaxCompute project, see Create a MaxCompute project.

Background information

The following table describes the development tools supported by MaxCompute.

Development tool Manual installation required Scenario
Query editor (MaxCompute console) No
  • When you use and test MaxCompute for the first time, you can use the query editor to experience the core features of MaxCompute based on the public datasets.
  • If you are a data analyst, you can use the query editor to query data. Then, you can switch to the analysis mode to analyze the query results by using EXCEL online. To reduce the frequency at which data is transferred and ensure data security, you can also download the query results to your device for analysis.
  • If you are a security administrator, you can find a required project and click Project permission management in the Actions column to manage role permissions. However, this feature is in trial use. You must run commands to manage permissions in most scenarios. The query editor allows you to run most of the security commands without the need to perform additional configurations.
MaxCompute client Yes The MaxCompute client is a command-line client. It is suitable for all scenarios, which helps you compile commands to process data.
DataWorks No DataWorks implements comprehensive features, such as data development, data integration, and data services in a visual manner based on MaxCompute projects. If you need to periodically schedule jobs, we recommend that you use DataWorks.
MaxCompute Studio Yes MaxCompute Studio is a development plug-in that is based on IntelliJ IDEA. MaxCompute Studio helps you develop data more easily and quickly. If you are familiar with IntelliJ IDEA, we recommend that you use MaxCompute Studio.

Prepare an environment

The following table describes the environment requirements of the preceding development tools.

Development tool Environment requirement
Query editor (MaxCompute console) We recommend that you use the latest version of Google Chrome.
MaxCompute client You must install Java 8 or later.
DataWorks We recommend that you use the latest version of Google Chrome.
MaxCompute Studio
  • The client runs a Windows, macOS, or Linux operating system.
  • IntelliJ IDEA 2018.2.4 or later is installed on the client. The Ultimate, PyCharm, and free Community editions are supported.
  • Java Runtime Environment (JRE) 1.8 is installed. The latest version of IntelliJ IDEA is bundled with JRE 1.8. If you are using IntelliJ IDEA of the latest version, you do not need to separately install JRE 1.8
  • Java Development Kit (JDK) 1.8 is installed. JDK 1.8 is required only if you want to develop and debug user-defined functions (UDFs) in Java.
    Note MaxCompute Studio version 0.28.0 and later support JDK 1.9. The earlier versions of MaxCompute Studio support only JDK 1.8.

Install and configure the MaxCompute client

Note MaxCompute client V0.27.0 and later support the MaxCompute V2.0 data type edition. We recommend that you use the MaxCompute V2.0 data type edition. For more information about the supported data types, see MaxCompute V2.0 data type edition.

To install and configure the MaxCompute client, perform the following steps:

  1. Download the MaxCompute client installation package.
  2. Decompress the downloaded package to obtain the bin, conf, lib, and plugins folders.
  3. Open the conf folder and configure the odps_config.ini file.
    The following example shows the content in the odps_config.ini file.
    project_name=
    access_id=
    access_key=
    end_point=
    log_view_host=
    https_check=
    # confirm threshold for query input size(unit: GB)
    data_size_confirm=
    # this url is for odpscmd update
    update_url=
    # download sql results by instance tunnel
    use_instance_tunnel=
    # the max records when download sql results by instance tunnel
    instance_tunnel_max_record=
    # IMPORTANT:
    #   If leaving tunnel_endpoint untouched, console will try to automatically get one from odps service, which might charge networking fees in some cases.
    #   Please refer to Configure endpoints
    # tunnel_endpoint=
    
    # use set.<key>=
    # e.g. set.odps.sql.select.output.format=

    In the odps_config.ini file, lines that start with a number sign (#) are comments. The following table describes the parameters in the file.

    Parameter Required Description Example
    project_name Yes The name of the MaxCompute project that you want to access.

    If you create a workspace in standard mode, pay attention to the differences of the project names between the production environment and development environment when you specify this parameter. The names of the projects in the development environment end with _dev. For more information, see Basic mode and standard mode.

    You can log on to the MaxCompute console and view the MaxCompute project names on the Project Management tab.

    doc_test_dev
    access_id Yes The AccessKey ID of your Alibaba Cloud account or a RAM user within the Alibaba Cloud account.

    You can obtain the AccessKey ID from the Security Management page.

    None
    access_key Yes The AccessKey secret that corresponds to the AccessKey ID.

    You can obtain the AccessKey secret from the Security Management page.

    None
    end_point Yes The endpoint of MaxCompute.

    You must set this parameter based on the region and network connection method you selected when you create the MaxCompute project. For more information about the endpoints that correspond to each region and network, see Endpoints.

    Notice If the endpoint that you configured is invalid, an error occurs when you access MaxCompute.
    http://service.cn-hangzhou.maxcompute.aliyun.com/api
    log_view_host No The Logview Uniform Resource Locator (URL). You can view the detailed runtime information of a job by using this URL. This information helps you locate job errors. Set the value to http://logview.odps.aliyun.com.
    Note We recommend that you set this parameter. If you do not set this parameter, you cannot locate the cause of job errors.
    http://logview.odps.aliyun.com
    https_check No Specifies whether to enable HTTPS access. If HTTPS access is enabled, requests to access MaxCompute projects are encrypted. Valid values:
    • True: HTTPS access is used.
    • False: HTTP access is used.

    Default value: False.

    True
    data_size_confirm No The maximum size of input data, in GB. The value range is unlimited. We recommend that you set this parameter to 100. 100
    update_url No A reserved parameter. None
    use_instance_tunnel No Specifies whether to use InstanceTunnel to download the results of SQL statements. Valid values:
    • True: InstanceTunnel is used to download the results of SQL statements.
    • False: InstanceTunnel is not used to download the results of SQL statements.

    Default value: False.

    True
    instance_tunnel_max_record No The maximum number of SQL execution results that can be returned by the client. You must specify this parameter if the use_instance_tunnel parameter is set to True. Maximum value: 10000. 10000
    tunnel_endpoint No The public endpoint of MaxCompute Tunnel. If you do not specify this parameter, traffic is automatically routed to the Tunnel endpoint that corresponds to the network where MaxCompute resides. If you specify this parameter, traffic is routed to the specified endpoint and automatic routing is not performed.

    For more information about the Tunnel endpoints that correspond to each region and network, see Endpoints.

    http://dt.cn-hangzhou.maxcompute.aliyun.com
    set.<key> No The properties of the MaxCompute project.

    For more information about the properties of MaxCompute projects, see Properties.

    set.odps.sql.decimal.odps2=true

Install and configure MaxCompute Studio

To install and configure MaxCompute Studio, perform the following steps:

  1. Install IntelliJ IDEA
    MaxCompute Studio is a plug-in that is integrated with IntelliJ IDEA. To install MaxCompute Studio, you must install IntelliJ IDEA first.
  2. Install MaxCompute Studio
    Install the MaxCompute Studio plug-in on IntelliJ IDEA.
  3. Configure MaxCompute Studio
    Configure the configuration items of MaxCompute Studio.
  4. Connect to a MaxCompute project
    After you connect to a MaxCompute project by using MaxCompute Studio, you can view the information of the MaxCompute project on MaxCompute Studio.

What to do next

After you complete the preparations, use the development tool that you selected to use MaxCompute.
  • If you use the query editor to process data, for more information about the query editor, see Query editor.
  • If you use the MaxCompute client to process data, for more information about the MaxCompute client, see MaxCompute client.
  • If you use DataWorks to process data, perform the operations by following the instructions provided in Quick start of DataWorks.
  • If you use MaxCompute Studio to process data, perform the operations by following the instructions provided in MaxCompute Studio.