All Products
Search
Document Center

MaxCompute:Query unstructured data

Last Updated:Jun 20, 2026

MaxCompute V2.0 allows you to use external tables to access storage services such as Object Storage Service (OSS) and Tablestore. MaxCompute Studio provides code templates to help you query unstructured data. This topic describes how to use MaxCompute Studio to query unstructured data.

Prerequisites

The following prerequisites are met:

Write a StorageHandler, Extractor, or Outputer program

  1. In the Project pane, right-click the source code directory of the module (that is, src > main > java), and select New > MaxCompute Java.

  2. Enter a Name, select Extractor, StorageHandler, or Outputer as the type, and then press Enter.

    • Name: The name of the MaxCompute Java class. If a Package has not been created, enter packagename.classname to automatically create a Package.

    • Select Extractor, StorageHandler, or Outputer as the type.

      Note

      You can select Extractor, StorageHandler, or Outputer based on your business requirements.

      • Extractor: the class that allows the custom configuration of logic for reading unstructured data.

      • StorageHandler: the class that is used to implement the logic defined in the Extractor or Outputer program.

      • Outputer: the class that allows the custom configuration of logic for writing unstructured data.

  3. After the class is created, develop a Java program in the code editor. The Java template is automatically filled with framework code. You need to only compile the logic code based on your requirements.

Debug the Extractor or Outputer program

You can refer to the example unit tests in the examples directory to write test cases.

import ...
public class ExtractorTest {
    private String ambulanceFullSchema =
            "vehicle:bigint;id:bigint;patient:bigint;calls:bigint;latitude:d...";
    private String speechDataFullSchema = "sentence_snr:double;id:string";
    @Test
    public void testTextExtractor() throws Exception {
        /**
         * Equivalent to the following SQL:
         *    CREATE EXTERNAL TABLE  ambulance_data_external
         *    ( vehicle bigint, id bigint, patient bigint, calls bigint,
         *      Latitude double, Longitude double, time string, direction string)
         *    STORED BY 'com.aliyun.odps.udf.example.text.TextStorageHandler'
         *    LOCATION 'oss://.../data/ambulance_csv/'
         *    USING 'jar_file_name.jar';
         *
         *    SELECT * FROM ambulance_data_external;
         */
        Column[] externalTableSchema = UnstructuredUtils.parseSchemaString(
                ambulanceFullSchema);
    }
}

Package and upload the program

After you debug the program, compress the program into a JAR package and upload the package to the MaxCompute server as a resource. For more information, see Package a Java program, upload the package, and create a MaxCompute UDF.

Query unstructured data

  1. In the Project tool window, right-click scripts and select New > MaxCompute SQL Script.

  2. In the Script Name field, enter a script name. From the MaxCompute Project drop-down list, select the target project. Then, click OK.

  3. In the code editor, enter the SQL statement that is used to create an external table and click the 运行 icon.

    --name:osd
    --author:liuyi
    --create time:2017-03-07 17:06
    set odps.service.mode=off;
    CREATE EXTERNAL TABLE IF NOT EXISTS myun_src_external
    (
    vehicleId bigint,
    recordId bigint,
    patientId bigint,
    calls bigint,
    locationLatitute double,
    locationLongtitue double,
    recordTime string,
    direction string
    )
    STORED BY 'myun.MyStorageHandler'
    WITH SERDEPROPERTIES('delimiter'='|')
    LOCATION 'oss://oss-cn-hangzhou-zmf.aliyuncs.com/074799/demo/SampleData/CSV/src/'
    USING 'myun.jar';
  4. Create a MaxCompute SQL script, enter the following query statement, and then click the 运行 icon to query data.

    set odps.task.major.version=unstructured_data;
    select * from myun_src_external where patientId > 25;

References

Example: Create an OSS external table by using a custom extractor