All Products
Search
Document Center

MaxCompute:Query unstructured data

Last Updated:Dec 21, 2023

MaxCompute V2.0 allows you to use external tables to access storage services such as Object Storage Service (OSS) and Tablestore. MaxCompute Studio provides code templates to help you query unstructured data. This topic describes how to use MaxCompute Studio to query unstructured data.

Prerequisites

The following prerequisites are met:

Write a StorageHandler, Extractor, or Outputer program

  1. In the left-side navigation pane of the Project tab, choose src > main > java, right-click java, and then choose New > MaxCompute Java.

    11

  2. Configure Name, select Extractor, StorageHandler, or Outputer, and then press Enter.

    • Name: the name of the MaxCompute Java class that you want to create. If no package is created, enter a name in the Package name.Class name format. The system automatically creates a package that is named in this format.

    • Select Extractor, StorageHandler, or Outputer as the class type.

      Note

      You can select Extractor, StorageHandler, or Outputer based on your business requirements.

      • Extractor: the class that allows the custom configuration of logic for reading unstructured data.

      • StorageHandler: the class that is used to implement the logic defined in the Extractor or Outputer program.

      • Outputer: the class that allows the custom configuration of logic for writing unstructured data.

  3. After the class is created, develop a Java program in the code editor. The Java template is automatically filled with framework code. You need to only compile the logic code based on your requirements.

Debug the Extractor or Outputer program

Write your test cases to debug your Extractor or Outputer program based on the unit test examples in the examples directory.示例

Package and upload the program

After you debug the program, compress the program into a JAR package and upload the package to the MaxCompute server as a resource. For more information, see Package a Java program, upload the package, and create a MaxCompute UDF.

Query unstructured data

  1. In the Project tool window, right-click scripts under your MaxCompute project and choose New > MaxCompute SQL Script.

    添加脚本

  2. Enter the name of an SQL script in the Script Name field, select a MaxCompute project from the MaxCompute Project drop-down list, and then click OK.

    创建脚本

  3. In the code editor, enter the SQL statement that is used to create an external table and click the 运行 icon.

  4. Create a MaxCompute SQL script, enter the following query statement, and then click the 运行 icon to query data.

    查询

References

Example: Create an OSS external table by using a custom extractor