All Products
Search
Document Center

MaxCompute:Use Kettle to schedule MaxCompute jobs

Last Updated:Mar 26, 2026

Kettle is an open source extract, transform, load (ETL) tool that connects to MaxCompute through a Java Database Connectivity (JDBC) driver. This topic walks you through installing the driver, configuring a database connection in Spoon, building an ETL workflow, and verifying the results.

Background information

Kettle is an open source ETL tool written in Java. It runs on the Windows, UNIX, and Linux operating systems and provides a GUI. Kettle supports a wide range of input and output data sources, including databases such as Oracle, MySQL, and DB2, and open source big data systems such as the Hadoop Distributed File System (HDFS), HBase, Cassandra, and MongoDB.For more information about Kettle, see Kettle.

Prerequisites

Before you begin, make sure you have:

  • A MaxCompute project. See Create a MaxCompute project.

  • An AccessKey pair for your Alibaba Cloud account. Go to AccessKey Management to get your AccessKey ID and AccessKey secret.

  • The MaxCompute JDBC driver package (V3.2.8 or later) downloaded from GitHub Releases. Download the file whose name ends with jar-with-dependencies.jar — this variant bundles all required dependencies, so you do not need to configure the classpath separately. This topic uses odps-jdbc-3.2.9-jar-with-dependencies.jar.

  • Kettle downloaded from the Hitachi Vantara community site and extracted to a local directory. This topic uses Kettle 8.2.0.0-342.

Step 1: Install the JDBC driver

Copy odps-jdbc-3.2.9-jar-with-dependencies.jar (or whichever version you downloaded) into the data-integration/lib directory inside your Kettle installation.

Save the driver package

To confirm the driver is in place, check that the file appears in data-integration/lib with the correct name before launching Spoon.

Step 2: Connect Kettle to MaxCompute

Find your connection values

You need two values to build the connection URL:

  • MaxCompute endpoint — depends on the region where your project resides. See Endpoints for the full list.

  • MaxCompute project name — log in to the MaxCompute console, select your region in the top navigation bar, and find the project name on the Project management tab.

The project name refers to your MaxCompute project, not the DataWorks workspace associated with it.

The connection URL format is:

jdbc:odps:<MaxCompute_endpoint>?project=<MaxCompute_project_name>

Replace <MaxCompute_endpoint> and <MaxCompute_project_name> with your actual values.

Configure the connection in Spoon

  1. In the data-integration directory, launch Spoon:

    • Windows: double-click Spoon.bat

    • macOS: double-click Spoon

  2. In the menu bar, choose File > New > Job.

    Create a job

  3. Click the View tab. In the navigation tree, right-click Database connections and select New.

    Establish a data connection

  4. In the General tab of the dialog box, set the following parameters:

  5. Click Test. If the connection succeeds, click OK and then Confirm.

Step 3: Create a job scheduling workflow

This step creates an ETL workflow that loads CSV data from Object Storage Service (OSS) into a MaxCompute internal table using the LOAD command. The sample data is described in Use a built-in extractor or a storage handler to import data.

The workflow uses three jobs connected in sequence:

ETL workflow
  1. Click the Design tab. Drag the three job objects from the left panel onto the canvas, then connect them in the order shown above. To connect two objects, select the source object and press Shift while clicking the destination object.

    Create a workflow

  2. Right-click each job object and select Edit. In the SQL dialog box, configure the following parameters, then click OK. Repeat for all three jobs. Create table

    ParameterDescription
    Job entry nameThe job name, such as Create table, Load from OSS, or Processing
    ConnectionSelect the database connection you created in Step 2, such as MaxCompute
    Send SQL as singleLeave unchecked
    SQL ScriptThe SQL script for each job (see below)
    CREATE TABLE ambulance_data_csv_load (
    vehicleId INT,
    recordId INT,
    patientId INT,
    calls INT,
    locationLatitute DOUBLE,
    locationLongtitue DOUBLE,
    recordTime STRING,
    direction STRING);

    Load from OSS

    LOAD OVERWRITE TABLE ambulance_data_csv_load
    FROM
    LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/mc-test/data_location/'
    STORED BY 'com.aliyun.odps.CsvStorageHandler'
    WITH serdeproperties (
    'odps.properties.rolearn'='acs:ram::xxxxx:role/aliyunodpsdefaultrole', -- Alibaba Cloud Resource Name (ARN) of the AliyunODPSDefaultRole role. Get it from the Roles page in the Resource Access Management (RAM) console.
    'odps.text.option.delimiter'=','
    );

    Processing

    INSERT OVERWRITE TABLE ambulance_data_csv SELECT * FROM ambulance_data_csv_load;

    Edit core objects

Step 4: Run the workflow

  1. Click the Run icon in the upper-left corner of the job tab. In the Run Options dialog box, click Run.

    Run

  2. (Optional) If prompted to save the workflow, click Yes and enter a name such as mc.

    Save the workflow

  3. Monitor progress in the directed acyclic graph (DAG) view or the Execution Results section. The workflow is complete when all jobs show a success status.

    Workflow completed

Step 5: Verify the results

  1. Click the View tab, then expand the database connection under your job (such as mc).

  2. Right-click the database connection (such as MaxCompute) and select SQL Editor.

    SQL Editor

  3. In the Simple SQL editor dialog box, run the following query and review the output in the Examine preview data dialog box.

    SELECT * FROM ambulance_data_csv;

    If the query returns rows, the data has been successfully loaded into the MaxCompute table.

    Execute the SQL script