All Products
Search
Document Center

:Public dataset reference

Last Updated:Jun 14, 2023

If you have activated MaxCompute, you can use SQL analytics of MaxCompute to obtain and query tables from public datasets. This helps you quickly get started with MaxCompute. This topic describes the public datasets of MaxCompute and how to use SQL analytics of MaxCompute to query and analyze data in the public datasets.

The open data of MaxCompute mainly refers to the data in the datasets of the click rate predictions for ads displayed on Taobao.com. The datasets are provided by Alibaba Group. For more information about the fields in the datasets, see Tianchi dataset. The data is stored in the MAXCOMPUTE_PUBLIC_DATA project of MaxCompute.

Disclaimer

Data in the public datasets of MaxCompute is only for product testing. The data is not periodically updated and its accuracy is not ensured. Therefore, do not use the data in the production process.

Precautions

You can authorize all MaxCompute users to access the public datasets by using a special authorization mechanism of MaxCompute. When you use the public datasets, take note of the following items:

  • All data is stored in the public MaxCompute project MAXCOMPUTE_PUBLIC_DATA. No MaxCompute users belong to this project. Therefore, when you compile an SQL script to access the public datasets, you must specify the project name before the table name. Sample statements:

    SELECT * FROM MAXCOMPUTE_PUBLIC_DATA.raw_sample limit 10;
    Note

    You can view the data in the public datasets free of charge. However, you are charged when you execute query statements. For more information about billing rules, see Computing pricing.

  • You cannot find the tables in the public datasets on the Data Map page of DataWorks because cross-project access is required.

Public datasets

The following tables describe the details of each public dataset in the MAXCOMPUTE_PUBLIC_DATA project.