MaxCompute 2.0 allows you to use external tables to access Object Storage Service (OSS) and Tablestore. MaxCompute Studio provides code templates to help you develop unstructured data queries. This topic describes how to use MaxCompute Studio to query unstructured data.

Prerequisites

Before you begin, perform the following operations:

Write StorageHandlers, Extractors, or Outputters

  1. In the Project tool window, click your MaxCompute Java module and choose src > main > java. Then, right-click java and choose New > MaxCompute Java.
  2. Create an Extractor class. Specify Name and Kind, and click OK.
    • Name: the name of the MaxCompute Java class. If no package is created, enter packagename.classname. The system automatically creates a package.
    • Kind: the category of the MaxCompute Java class. Select Extractor. Supported categories include custom functions (UDF, UDAF, and UDTF), MapReduce (Driver, Mapper, and Reducer), and non-structural development frameworks (StorageHandler, Extractor, and Outputer).
      Note If you create a StorageHandler or Outputter class, set Kind to StorageHandler or Outputer.
  3. After you create an Extractor class, you can develop a Java program in the editor. The Java template is automatically populated with framework code. You need only to compile the logic code based on your requirements.
  4. Use the same method to create a StorageHandler and an Outputter.

Debug Extractors or Outputters

Refer to the unit test examples in the examples directory and write your test cases to debug your Extractor and Outputter.

Package and upload Java programs

After you debug a Java program, compress the Java program into a JAR package and upload the package to the MaxCompute server as a resource. For more information, see Package, upload, and register.

Query unstructured data

  1. In the Project tool window, right-click scripts under your MaxCompute project, and choose New > MaxCompute SQL Script.
  2. Enter the name of the SQL script in the Script Name field, select a MaxCompute project from the MaxCompute Project drop-down list, and then click OK.
  3. Enter SQL statements to create an external table in the editor.
  4. Enter the query statement and click the Run MaxCompute SQL Script icon to query data.