If you have activated MaxCompute, you can use the MaxCompute query editor to obtain and query the tables from public datasets. This helps you quickly get started with MaxCompute. This topic describes the public datasets of MaxCompute and how to use the MaxCompute query editor to query and analyze data in the public datasets.

The open data of MaxCompute mainly refers to the data in the datasets of the click rate predictions for ads displayed on Taobao.com. The datasets are provided by Alibaba Group. For more information about the fields in the datasets, see Tianchi dataset. The data is stored in the MAXCOMPUTE_PUBLIC_DATA project of MaxCompute.

Disclaimer

Data in the public datasets of MaxCompute is only for product testing. The data is not periodically updated and its accuracy is not ensured. Therefore, do not use the data in the production process.

Usage notes

You can authorize all MaxCompute users to access the public datasets by using a special authorization mechanism of MaxCompute. When you use the public datasets, take note of the following items:
  • All data is stored in the public MaxCompute project MAXCOMPUTE_PUBLIC_DATA. No MaxCompute users belong to this project. Therefore, when you compile an SQL script to access the public datasets, you must specify the project name before the table name. The following statement shows an example:
    SELECT * FROM MAXCOMPUTE_PUBLIC_DATA.raw_sample limit 10;
    Note You can view the data in the public datasets free of charge. However, you are charged when you execute query statements. For more information about billing rules,see Computing pricing.
  • You cannot find the tables in the public datasets on the Data Map page of DataWorks because cross-project access is required.

Public datasets

The following tables describe the details of each public dataset in the MAXCOMPUTE_PUBLIC_DATA project.

  • Raw samples

    Raw samples consist of the ad click logs within an eight-day period of more than one million users that are randomly sampled from Taobao.com.

    Project name MAXCOMPUTE_PUBLIC_DATA
    Table name raw_sample
    Update cycle Fixed data is provided and is not updated.
    Schema query DESC MAXCOMPUTE_PUBLIC_DATA.table_name;
    Query example SELECT * FROM MAXCOMPUTE_PUBLIC_DATA.raw_sample limit 10;
  • Basic ad information

    This dataset contains basic information about some ads in the raw_sample table.

    Project name MAXCOMPUTE_PUBLIC_DATA
    Table name ad_feature
    Update cycle Fixed data is provided and is not updated.
    Schema query DESC MAXCOMPUTE_PUBLIC_DATA.table_name;
    Query example SELECT * FROM MAXCOMPUTE_PUBLIC_DATA.ad_feature limit 10;
  • Basic user information

    This dataset contains basic information about all users in the raw_sample table.

    Project name MAXCOMPUTE_PUBLIC_DATA
    Table name user_profile
    Update cycle Fixed data is provided and is not updated.
    Schema query DESC MAXCOMPUTE_PUBLIC_DATA.table_name;
    Query example SELECT * FROM MAXCOMPUTE_PUBLIC_DATA.user_profile limit 10;
  • User behavior logs

    This dataset contains the shopping behavior of all users in the raw_sample table within a 22-day period.

    Project name MAXCOMPUTE_PUBLIC_DATA
    Table name behavior_log
    Update cycle Fixed data is provided and is not updated.
    Schema query DESC MAXCOMPUTE_PUBLIC_DATA.table_name;
    Query example SELECT * FROM MAXCOMPUTE_PUBLIC_DATA.behavior_log limit 10;