Project-level Information Schema (To be phased out) - MaxCompute

Information Schema is the metadata service of MaxCompute. This topic describes the basic concepts, features, and limits of Information Schema.

MaxCompute Information Schema provides information such as project metadata and historical usage data. Fields and views that are specific to MaxCompute are added to ANSI SQL-92 Information Schema. MaxCompute provides a public project named Information Schema. You can query the metadata and historical usage data of your project by accessing the read-only views provided by this public project.

Limits

Information Schema provides metadata views of the current project. You are not authorized to access cross-project metadata. If you want to query and analyze the metadata of multiple projects, you must obtain the metadata of each project and integrate the metadata.
Quasi-real-time views are provided for metadata system tables. For applications that require high metadata timeliness, we recommend that you use an SDK or CLI to obtain the metadata of a specified object.
Metadata and historical data of jobs are stored in the Information Schema project. To create a snapshot of the historical data or obtain historical job data of more than 14 days, you can back up Information Schema data to a specified project on a regular basis.

Obtain the Information Schema service

From March 1, 2024, MaxCompute no longer automatically provides the project-level Information Schema service for new projects. By default, no package of project-level Information Schema is provided for the projects that are created on March 1, 2024 and onwards. If you want to query metadata, you can use tenant-level Information Schema to obtain more comprehensive information. For more information about how to use tenant-level Information Schema, see Tenant-level Information Schema.

Before you use Information Schema in an existing project, you must obtain the permissions to access the project metadata. To obtain the permissions, install the Information Schema permission package as the project owner or a RAM user that is assigned the Super_Administrator role. For more information, see Assign a role to a user. You can use one of the following methods to install the permission package:

Log on to the MaxCompute client and run the following command:
```
install package Information_Schema.systables;
```
Log on to the DataWorks console and go to the SQL Query page. For more information about ad hoc queries, see Use the ad-hoc query feature to execute SQL statements (optional). Run the following command:
```
install package Information_Schema.systables;
```

After the package is installed, you can use Information Schema to query the metadata of the current project. Data is stored in the Information Schema project. You do not need to pay for metadata storage.

You can run the following command to query the views provided by the Information Schema project:

odps@myproject1> describe package Information_Schema.systables;

The following figure shows the query result. information_schema 截图返回结果

Query a metadata view

If you want to query a metadata view, you must prefix the project name Information Schema to the view name. The format is Information Schema.view_name.

If the project that you access is myproject1, you can run the following command to query the metadata of all tables in myproject1:

odps@myproject1>select * from Information_Schema.tables;

The Information Schema project also contains the job history view. This view allows you to query the job history of the current project. You can run the following command to query historical jobs by date:

odps@myproject1>select * from Information_Schema.tasks_history where ds='yyyymmdd' limit 100;

Access authorization

The views provided by Information Schema contain all the user data at the project level. By default, the owner of a project can view the user data of this project. Other users or roles in the project must be granted the required permissions to view the data. For more information, see Cross-project resource access based on packages.

Syntax of the statements that are used to grant permissions to users or roles:

grant <actions> on package Information_Schema.systables to user <user_name>;
grant <actions> on package Information_Schema.systables to role <role_name>;

actions: the permissions that you want to grant. Set the value to Read.
user_name: an Alibaba Cloud account or RAM user that is added to the project.
You can run the list users; command on the MaxCompute client to obtain user accounts.
role_name: a role that is added to the project.
You can run the list roles; command on the MaxCompute client to query the name of the role.

Example:

grant read on package Information_Schema.systables to user RAM$Bob@aliyun.com:user01;

Metadata views

The metadata views of the Information Schema service allow you to browse and retrieve metadata.

The usage information views of the Information Schema service allow you to optimize jobs and plan resources. For example, you can analyze the metrics of a job, such as the resource consumption, running duration, and amount of processed data.

Different views have different validity periods or default retention periods. Data that exceeds the retention period is inaccessible. You can manually export data from Information Schema to a MaxCompute table to back up the data at a specified interval. This backup applies to historical data that requires a longer storage period.

The fees for Information Schema vary based on the following scenarios:

For projects that use pay-as-you-go computing resources, you are charged based on the SQL statements that you execute to query a view of Information Schema. Range-clustered tables are used in the views of Information Schema to reduce the amount of input query data and improve query performance. If you query the TASKS_HISTORY and TUNNELS_HISTORY views, query the data of the previous day after 6:00:00 every day to avoid querying the data of the current day. This minimizes the amount of input query data and reduces query costs.
For projects that use subscription computing resources, the compute units (CUs) that you purchased are consumed when you query views of Information Schema.
You do not need to pay storage fees for the views of Information Schema.

Note

When you export data, we recommend that you explicitly specify the field names of the view. If you do not explicitly specify the field names of the view, you may fail to back up data by running insert into select * from information_schema.*** after some fields are inserted into a MaxCompute table.

The following table describes the metadata views.

Type	View	Timeliness and retention period	Delay
Metadata information	TABLES	Quasi-real-time view	Online data is displayed in metadata views with a delay of about 3 hours.
	PARTITIONS	Quasi-real-time view
	COLUMNS	Quasi-real-time view
	UDFS	Quasi-real-time view
	RESOURCES	Quasi-real-time view
	UDF_RESOURCES	Quasi-real-time view
	USERS	Quasi-real-time view
	ROLES	Quasi-real-time view
	USER_ROLES	Quasi-real-time view
	PACKAGE_OBJECTS	Quasi-real-time view
	INSTALLED_PACKAGES	Quasi-real-time view
	SCHEMA_PRIVILEGES	Quasi-real-time view
	TABLE_PRIVILEGES	Quasi-real-time view
	COLUMN_PRIVILEGES	Quasi-real-time view
	UDF_PRIVILEGES	Quasi-real-time view
	RESOURCE_PRIVILEGES	Quasi-real-time view
	TABLE_LABELS	Quasi-real-time view
	COLUMN_LABELS	Quasi-real-time view
	TABLE_LABEL_GRANTS	Quasi-real-time view
	COLUMN_LABEL_GRANTS	Quasi-real-time view
Usage information	TASKS	Real-time snapshots of running jobs	Online data is displayed in metadata views with a delay of a few seconds. This view is in public preview without SLA guarantee and will be available in the future.
	TASKS_HISTORY	Quasi-real-time view. Historical data is stored in a partitioned table, and data from the last 14 days is retained.	Online data is displayed in metadata views with a delay of approximately 3 hours.
	TUNNELS_HISTORY	Quasi-real-time view. Historical data is stored in a partitioned table, and data from the last 14 days is retained.

TABLES

Displays the information about a table in a project.

Field	Data type	Description
table_catalog	STRING	The value is fixed to `odps`.
table_schema	STRING	The name of the project.
table_name	STRING	The name of the table.
table_type	STRING	The type of the table. Valid values: MANAGED_TABLE VIRTUAL_VIEW EXTERNAL_TABLE
is_partitioned	BOOLEAN	Specifies whether the table is a partitioned table.
owner_id	STRING	The ID of the table owner.
owner_name	STRING	Optional. The Alibaba Cloud account of the table owner.
create_time	DATETIME	The time when the data table was created.
last_modified_time	DATETIME	The time when the table data was last modified.
data_length	BIGINT	If the table is a non-partitioned table, the value of this parameter is the size of the table data. If the table is a partitioned table, the system does not calculate the size of the table data. In this case, the value of this parameter is NULL. The PARTITIONS view includes the data size of each partition in a partitioned table. Unit: bytes.
table_comment	STRING	The comments on the table.
life_cycle	BIGINT	Optional. The lifecycle of the table.
is_archived	BOOLEAN	A reserved field.
table_exstore_type	STRING	A reserved field.
cluster_type	STRING	The clustering type of the MaxCompute table. Valid values: HASH and RANGE.
number_buckets	BIGINT	Optional. The number of buckets in the clustered table. The value 0 indicates that the number of buckets dynamically changes during job execution.
view_original_text	STRING	The view text in the table of the VIRTUAL_VIEW type.

PARTITIONS

Displays information about a table partition in a project.

Field	Data type	Description
table_catalog	STRING	The value is fixed to `odps`.
table_schema	STRING	The name of the project.
table_name	STRING	The name of the table.
partition_name	STRING	The name of the partition. Example: `ds='20190130'`.
create_time	DATETIME	The time when the partition was created.
last_modified_time	DATETIME	The time when the table was last modified.
data_length	BIGINT	The size of the data in the partition. Unit: bytes.
is_archived	BOOLEAN	A reserved field.
is_exstore	BOOLEAN	A reserved field.
cluster_type	STRING	Optional. The clustering type of the MaxCompute table. Valid values: HASH and RANGE.
number_buckets	BIGINT	Optional. The number of buckets in the clustered table. The value 0 indicates that the number of buckets dynamically changes during job execution.

COLUMNS

Displays information about a table column in a project.

Field	Data type	Description
table_catalog	STRING	The value is fixed to `odps`.
table_schema	STRING	The name of the project.
table_name	STRING	The name of the table.
column_name	STRING	The name of the column.
ordinal_position	BIGINT	The serial number of the column.
column_default	STRING	The default value of the column.
is_nullable	BOOLEAN	Optional. The value is fixed to YES.
data_type	STRING	The data type of the column.
column_comment	STRING	The comments on the column.
is_partition_key	BOOLEAN	Specifies whether the column is a partition key.

UDFS

Displays information about a user-defined function (UDF) in a project.

Field	Data type	Description
udf_catalog	STRING	The value is fixed to `odps`.
udf_schema	STRING	The name of the project.
udf_name	STRING	The name of the UDF.
owner_id	STRING	The ID of the UDF owner.
owner_name	STRING	Optional. The Alibaba Cloud account of the UDF owner.
create_time	DATETIME	The time when the UDF was created.
last_modified_time	DATETIME	The last time when the UDF was modified.

RESOURCES

Displays information about a resource in a project.

Field	Data type	Description
resource_catalog	STRING	The value is fixed to `odps`.
resource_schema	STRING	The name of the project.
resource_name	STRING	The name of the resource.
resource_type	STRING	The type of the resource. Valid values: Py and Jar.
owner_id	STRING	The ID of the resource owner.
owner_name	STRING	Optional. The Alibaba Cloud account of the resource owner.
create_time	DATETIME	The time when the resource was created.
last_modified_time	DATETIME	The time when the resource was last modified.
size	BIGINT	The storage space used by the resource.
comment	STRING	The comments on the resource.
is_temp_resource	BOOLEAN	Specifies whether the resource is a temporary resource.

UDF_RESOURCES

Displays information about the dependent resource of a UDF in a project.

Field	Data type	Description
udf_catalog	STRING	The value is fixed to `odps`.
udf_schema	STRING	The name of the project.
udf_name	STRING	The name of the UDF.
resource_schema	STRING	The name of the project to which the resource belongs.
resource_name	STRING	The name of the resource.

USERS

Displays the list of users in a project.

Field	Data type	Description
user_catalog	STRING	Valid values: ALIYUN and RAM.
user_schema	STRING	The name of the project.
user_name	STRING	Optional. The name of the user.
user_id	STRING	The ID of the Alibaba Cloud user.
user_label	STRING	The label of the user.

ROLES

Displays the list of roles in a project.

Field	Data type	Description
role_catalog	STRING	The value is fixed to `odps`.
role_schema	STRING	The name of the project.
role_name	STRING	The name of the role.
role_label	STRING	The label of the role.
comment	STRING	The comments on the role.

USER_ROLES

Displays information about a role that a user assumes in a project.

Field	Data type	Description
user_role_catalog	STRING	The value is fixed to `odps`.
user_role_schema	STRING	The name of the project.
role_name	STRING	The name of the role.
user_name	STRING	The name of the user.
user_id	STRING	The ID of the user.

PACKAGE_OBJECTS

Displays the object information of a package in a project.

Field	Data type	Description
package_catalog	STRING	The value is fixed to `odps`.
package_schema	STRING	The name of the project.
package_name	STRING	The name of the package.
object_type	STRING	The type of the package object.
object_name	STRING	The name of the package object.
column_name	STRING	The name of the table column.
allowed_privileges	VECTOR<STRING>	The shared permissions.
allowed_label	STRING	The shared label.

INSTALLED_PACKAGES

Displays information about an installed package in a project.

Field	Data type	Description
installed_package_catalog	STRING	The value is fixed to `odps`.
installed_package_schema	STRING	The name of the project.
package_project	STRING	The name of the project in which the package was created.
package_name	STRING	The name of the package.
installed_time	DATETIME	Reserved. The time when the package was installed.
allowed_label	STRING	The shared label.

SCHEMA_PRIVILEGES

Displays information about a schema permission in a project.

Field	Data type	Description
user_catalog	STRING	The value is fixed to `odps`.
user_schema	STRING	The name of the project.
grantee	STRING	The name of the user.
user_id	STRING	The ID of the user.
grantor	STRING	The account that grants the permission. The current value is NULL.
privilege_type	STRING	The type of the permission.

TABLE_PRIVILEGES

Displays information about a table permission in a project.

Field	Data type	Description
table_catalog	STRING	The value is fixed to `odps`.
table_schema	STRING	The name of the project to which the table belongs.
table_name	STRING	The name of the table.
grantee	STRING	The name of the user.
user_id	STRING	The ID of the user.
grantor	STRING	The account that grants the permission. The current value is NULL.
privilege_type	STRING	The type of the permission.
user_schema	STRING	The name of the project to which the user belongs.

COLUMN_PRIVILEGES

Displays information about a column permission in a project.

Field	Data type	Description
table_catalog	STRING	The value is fixed to `odps`.
table_schema	STRING	The name of the project to which the table belongs.
table_name	STRING	The name of the table.
column_name	STRING	The name of the column.
grantee	STRING	The name of the user.
user_id	STRING	The ID of the user.
grantor	STRING	Optional. The current value is NULL.
privilege_type	STRING	The type of the permission.
user_schema	STRING	The name of the project to which the user belongs.

UDF_PRIVILEGES

Displays information about a UDF permission in a project.

Field	Data type	Description
udf_catalog	STRING	The value is fixed to `odps`.
udf_schema	STRING	The name of the project.
udf_name	STRING	The name of the UDF.
user_schema	STRING	The name of the project to which the user belongs.
grantee	STRING	The name of the user.
user_id	STRING	The ID of the user.
grantor	STRING	The account that grants the permission. The current value is NULL.
privilege_type	STRING	The type of the permission.

RESOURCE_PRIVILEGES

Displays information about a resource permission in a project.

Field	Data type	Description
resource_catalog	STRING	The value is fixed to `odps`.
resource_schema	STRING	The name of the project.
resource_name	STRING	The name of the resource.
user_schema	STRING	The name of the project to which the user belongs.
grantee	STRING	The name of the user.
user_id	STRING	The ID of the user.
grantor	STRING	The account that grants the permission. The current value is NULL.
privilege_type	STRING	The type of the permission.

TABLE_LABELS

Displays information about a table label in a project.

Field	Data type	Description
table_catalog	STRING	The value is fixed to `odps`.
table_schema	STRING	The name of the project.
table_name	STRING	The name of the table.
label_type	STRING	The type of the label. The value is fixed to NULL.
label_level	STRING	The level of the label.

COLUMN_LABELS

Displays information about a table column label in a project.

Field	Data type	Description
table_catalog	STRING	The value is fixed to `odps`.
table_schema	STRING	The name of the project.
table_name	STRING	The name of the table.
column_name	STRING	The name of the column.
label_type	STRING	The type of the label. The value is fixed to NULL.
label_level	STRING	The level of the label.

TABLE_LABEL_GRANTS

Displays the authorization information of a table label in a project.

Field	Data type	Description
table_label_grant_catalog	STRING	The value is fixed to `odps`.
table_label_grant_schema	STRING	The name of the project to which the user belongs.
user	STRING	The name of the user.
user_id	STRING	The ID of the user.
table_schema	STRING	The name of the project to which the table belongs.
table_name	STRING	The name of the table.
grantor	STRING	The account that grants the permission. The current value is NULL.
label_level	STRING	The granted level of the label.
expired	DATETIME	The time when the authorization expires.

COLUMN_LABEL_GRANTS

Displays the authorization information of a table column label in a project.

Field	Data type	Description
column_label_grant_catalog	STRING	The value is fixed to `odps`.
column_label_grant_schema	STRING	The name of the project to which the user belongs.
user	STRING	The name of the user.
user_id	STRING	The ID of the user.
table_schema	STRING	The name of the project to which the table belongs.
table_name	STRING	The name of the table.
column_name	STRING	The name of the column.
grantor	STRING	The account that grants the permission. The current value is NULL.
label_level	STRING	The granted level of the label.
expired	DATETIME	The time when the authorization expires.

TASKS

Displays the real-time snapshots of jobs. This view is used to monitor jobs in real time.

Important

The TASKS view is in the internal testing process and its fields and field content may be changed. This view has no SLA guarantee. Use this view with caution. For more information about the subsequent changes of the release status, see Service notices.

Field	Data type	Description
project_name	STRING	The name of the project.
task_name	STRING	The name of the job.
task_type	STRING	The type of the job. Valid values: SQL: SQL jobs CUPID: Spark or Mars jobs SQLCost: SQL cost estimation jobs SQLRT: MaxCompute Query Acceleration (MCQA) SQL jobs LOT: MapReduce jobs PS: Parameter Server of PAI AlgoTask: Machine Learning Platform for AI (PAI) jobs
inst_id	STRING	The ID of the instance that is created for the job.
status	STRING	The status of the job when data is collected. Valid values: Running and Waiting.
owner_id	STRING	The ID of the Alibaba Cloud account that submits the job.
owner_name	STRING	The name of the Alibaba Cloud account that submits the job.
start_time	DATETIME	The time when the job starts.
priority	BIGINT	The priority of the job. This parameter is applicable only to jobs that use subscription resources.
signature	STRING	The job signature.
queue_name	STRING	The name of the compute queue.
cpu_usage	BIGINT	The current CPU utilization. The value of this field is calculated by using the following formula: Number of CPU cores × 100.
mem_usage	BIGINT	The current memory usage. Unit: MB.
gpu_usage	BIGINT	The current GPU usage. The value of this field is calculated by using the following formula: Number of GPUs × 100.
total_cpu_usage	BIGINT	The accumulated CPU utilization. The value of this field is calculated by using the following formula: Number of CPU cores × 100 × Running duration of the job (seconds).
total_mem_usage	BIGINT	The accumulated memory usage. The value of this field is calculated by using the following formula: Memory size (MB) × Running duration of the job (seconds).
total_gpu_usage	BIGINT	The accumulated GPU usage. The value of this field is calculated by using the following formula: Number of GPUs × 100 × Running duration of the job (seconds).
cpu_min_ratio	BIGINT	The ratio of the CPU utilization of the job to the total CPU utilization. This parameter is applicable only to jobs that use subscription resources.
mem_min_ratio	BIGINT	The ratio of the memory consumed by the job to the total memory usage. This parameter is applicable only to jobs that use subscription resources.
gpu_min_ratio	BIGINT	The ratio of the GPUs consumed by the job to the total GPU usage. This parameter is applicable only to jobs that use subscription resources.
cpu_max_ratio	BIGINT	The ratio of the CPU utilization of the job to the maximum CPU utilization. This parameter is applicable only to jobs that use subscription resources.
mem_max_ratio	BIGINT	The ratio of the memory consumed by the job to the maximum memory usage. This parameter is applicable only to jobs that use subscription resources.
gpu_max_ratio	BIGINT	The ratio of the GPUs consumed by the job to the maximum GPU usage. This parameter is applicable only to jobs that use subscription resources.
settings	STRING	The custom scheduling settings of an upper-layer application, such as DataWorks.
additional_info	STRING	The additional information. This is a reserved field.

TASKS_HISTORY

Displays the job execution history in a MaxCompute project. Data from the last 14 days is retained.

Field	Data type	Description
task_catalog	STRING	The value is fixed to `odps`.
task_schema	STRING	The name of the project.
task_name	STRING	The name of the job.
task_type	STRING	The type of the job. Valid values: SQL: SQL jobs CUPID: Spark or Mars jobs SQLCost: SQL cost estimation jobs SQLRT: MaxCompute Query Acceleration (MCQA) SQL jobs LOT: MapReduce jobs PS: Parameter Server of PAI AlgoTask: Machine Learning Platform for AI (PAI) jobs
inst_id	STRING	The ID of the instance.
status	STRING	The status of the job when data is collected. This is not a real-time state. Valid values: Terminated: The job is complete. Failed: The job fails. Cancelled: The job is canceled.
owner_id	STRING	The ID of the Alibaba Cloud account.
owner_name	STRING	The name of the Alibaba Cloud account.
result	STRING	The error information displayed if an error occurs in an SQL job.
start_time	DATETIME	The time when the job starts.
end_time	DATETIME	The end time of the job. If the job does not end on the current day, this value is NULL.
input_records	BIGINT	The number of records read by the job.
output_records	BIGINT	The number of records generated by the job.
input_bytes	BIGINT	The amount of scanned data, which is the same as that displayed on LogView.
output_bytes	BIGINT	The number of output bytes.
input_tables	STRING	The job input tables in the [project.table1,project.table2] format. Some jobs, such as SQL COST jobs, do not have such information.
output_tables	STRING	The job output tables in the [project.table1,project.table2] format.
operation_text	STRING	The source XML file of the query statement. If the size of the source XML file exceeds 256 KB, set the value to NULL.
signature	STRING	Optional. The job signature.
complexity	DOUBLE	Optional. The job complexity. This parameter is available only for SQL jobs.
cost_cpu	DOUBLE	The CPU utilization of the job. The value 100 indicates that 1 CPU core multiplies the job running duration in seconds. For example, if 10 CPU cores run for five seconds, cost_cpu is 5000, which is calculated by using the following formula: 10 × 100 × 5.
cost_mem	DOUBLE	The memory consumed by the job. The value of this field is calculated by using the following formula: Memory size (MB) × Running duration of the job (seconds).
settings	STRING	The information that is scheduled by the upper layer application or specified by users. The information is saved in the JSON format. The information includes the following fields: USERAGENT, BIZID, SKYNET_ID, and SKYNET_NODENAME.
ds	STRING	The date when the data was collected. Example: 20190101.

TUNNELS_HISTORY

Displays historical data that is uploaded and downloaded at the same time over a data tunnel. Data of the previous 14 days is retained.

Field	Data type	Description
tunnel_catalog	STRING	The value is fixed to `odps`.
tunnel_schema	STRING	The name of the project.
session_id	STRING	The session ID, which is saved in the format of `TIMESTAMP (YYYYMMDDHHmmss, 14 characters) + IP address (8 characters) + numHex (8 characters)` Example: 2013060414484474e5e60a00000002.
operate_type	STRING	The type of the operation. Valid values: UPLOADLOG DOWNLOADLOG DOWNLOADINSTANCELOG
tunnel_type	STRING	The type of the tunnel. Valid values: TUNNEL LOG and TUNNEL INSTANCE LOG.
request_id	STRING	The ID of the request.
object_type	STRING	The type of object on which the operation is performed. Valid values: TABLE and INSTANCE.
object_name	STRING	The table name or instance ID.
partition_spec	STRING	The partition information. Example: `time=20130222,loc=beijing`
data_size	BIGINT	The size of data. Unit: bytes.
block_id	BIGINT	The ID of the block uploaded by using the tunnel. This parameter is available only if operate_type is set to UPLOADLOG. Otherwise, this parameter is left empty.
offset	BIGINT	The number of records to skip before data is downloaded. By default, the download starts from record 0.
length	BIGINT	The number of records to download or upload in the current session. The number of downloaded records is equal to the value of this parameter.
owner_id	STRING	The ID of the Alibaba Cloud account.
owner_name	STRING	The name of the Alibaba Cloud account.
start_time	DATETIME	The start time of the request.
end_time	DATETIME	The end time of the request.
client_ip	STRING	The IP address of the client that initiates the request.
user_agent	STRING	The information about the user agent, which is the client that initiates the request. The information may be the Java version or the operating system.
columns	STRING	The columns that are specified when the data is downloaded over a data tunnel.
ds	STRING	The date when the data was collected. Example: 20190101.