All Products
Search
Document Center

MaxCompute:Overview of external tables

Last Updated:Feb 02, 2024

MaxCompute allows you to use external tables to query and analyze data that is stored in external storage systems, such as Object Storage Service (OSS). This way, you can manage external data without the need to import data to MaxCompute internal storage. This improves data processing flexibility.

Background information

MaxCompute SQL provides an entry point for distributed data processing. This allows you to process and store exabytes of offline data. The computing framework of MaxCompute continues to evolve to meet the requirements that arise from expanded big data business and new use scenarios. In early versions, MaxCompute provides powerful computing capabilities to process internal data in special formats. MaxCompute now supports the processing of external data.

MaxCompute SQL is now used to process structured data that is stored in MaxCompute internal tables in the CFile column store format. You must use different tools to import external user data to MaxCompute tables for data computations. The user data includes texts and unstructured data. For example, to process OSS data in MaxCompute, you can use one of the following methods:

  • Use OSS SDK or other tools to download data from OSS. Then, use MaxCompute Tunnel to import the downloaded data to a MaxCompute table.

  • Write a user-defined function (UDF) to call OSS SDK and access OSS data.

However, the two methods have deficiencies.

  • The first method requires data transfer operations outside the MaxCompute system. If a large amount of OSS data needs to be processed, parallel operations are required to accelerate the process. As a result, you cannot fully utilize the large-scale computing capabilities of MaxCompute.

  • The second method requires UDF-based access permissions. It also requires that developers control the number of parallel jobs and handle issues related to data partitioning.

MaxCompute provides external tables to address these issues. External tables are used to process data that is stored outside MaxCompute internal tables. You can execute a simple DDL statement to create an external table in MaxCompute. Then, you can use this table to associate it with external data sources. This allows access to and output of data in various formats. In most cases, external tables can be accessed like standard MaxCompute tables. You can fully utilize the computing capabilities of MaxCompute SQL to process external data.

Note
  • If you use an external table, the data in this table is not stored in MaxCompute, and you are not charged for the storage of the table data.

  • Full search is supported for external tables.

  • Tunnel commands and Tunnel SDK cannot be used for external tables. You can use Tunnel to upload data to MaxCompute internal tables. You can also use OSS SDK for Python to upload data to OSS and map the data to external tables in MaxCompute.

  • You can create, search for, configure, and process external tables in the DataWorks console. You can also query and analyze data by using the external table feature. For more information, see External table.

  • If external tables are used, you are charged only computing fees for MaxCompute based on the billing rules of computing resources in MaxCompute. Data in external tables is not stored in MaxCompute. Therefore, no storage fees are generated for MaxCompute. For more information about storage fees, see the description related to the billing rules for data source storage. If you use a public endpoint of MaxCompute to access an external table, you are charged for Internet traffic and data downloads. For more information about MaxCompute fees, see Overview.

Examples

This section describes how to use MaxCompute external tables to process unstructured data:

References