The public datasets are open source datasets provided by Alibaba Cloud. The public datasets are stored in the public storage of Alibaba Cloud. You can register the public datasets without the need to create replicas in your storage and use these datasets in data processing and modeling tasks. This topic describes the public datasets provided by Machine Learning Platform for AI and how to download these datasets.

Background information

The CIFAR-10 dataset

CIFAR-10 is an open source dataset that is widely used to classify images in deep learning. The dataset consists of 60,000 images that are classified into the following 10 categories: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck. All images are stored in the following three folders:
  • train: stores 50,000 images. These images are used for training.
  • test: stores 10,000 images. These images are used for testing.
  • predict: stores several images. These images are used for prediction.
The following section describes the paths in which the CIFAR-10 dataset is stored and how to download the dataset:
  • Dataset addresses
    • China (Hangzhou): oss://pai-vision-data-hz2.oss-cn-hangzhou.aliyuncs.com/data/cifar10/qince_data/
    • China (Shanghai): oss://pai-vision-data-sh.oss-cn-shanghai.aliyuncs.com/data/cifar10/qince_data/
    • China (Beijing): oss://pai-vision-data-bj.oss-cn-beijing.aliyuncs.com/data/cifar10/qince_data/
    • China (Shenzhen): oss://pai-vision-data-sz.oss-cn-shenzhen.aliyuncs.com/data/cifar10/qince_data/
  • How to download

    Use the ossutil CLI provided by Object Storage Service (OSS) to download the dataset to your on-premises machine. Example:

    1. For more information about how to download and install ossutil, see Download and installation.
    2. Run the following command to download the dataset to your on-premises machine. For more information about ossutil commands, see ossutil.
      ./ossutilmac64 cp -r oss://pai-vision-data-hz2/data/cifar10/qince_data /Users/tongxin/Desktop

The PASCAL VOC 2007 dataset

PASCAL VOC 2007 is an open source dataset that is widely used to detect objects and partition images. The dataset is also used to benchmark Faster-RCNN and Yolo. The dataset consists of the following files: the Annotations, ImageSets, JPEGImages, SegmentationClass, and SegmentationObject files.

The following section describes the paths in which the PASCAL VOC 2007 dataset is stored and how to download the dataset:
  • Dataset addresses
    • China (Hangzhou): oss://pai-vision-data-hz2.oss-cn-hangzhou.aliyuncs.com/data/VOCdevkit/VOC2007/
    • China (Shanghai): oss://pai-vision-data-sh.oss-cn-shanghai.aliyuncs.com/data/VOCdevkit/VOC2007/
    • China (Beijing): oss://pai-vision-data-bj.oss-cn-beijing.aliyuncs.com/data/VOCdevkit/VOC2007/
    • China (Shenzhen): oss://pai-vision-data-sz.oss-cn-shenzhen.aliyuncs.com/data/VOCdevkit/VOC2007/
  • How to download

    Use the ossutil CLI provided by OSS to download the dataset to your on-premises machine. Example:

    1. For more information about how to download and install ossutil, see Download and installation.
    2. Run the following command to download the dataset to your on-premises machine. For more information about ossutil commands, see ossutil.
      ./ossutilmac64 cp -r oss://pai-vision-data-hz2/data/VOCdevkit/VOC2007 /Users/tongxin/Desktop

Image classification dataset for content moderation

The image classification dataset for content moderation is commonly used in the scenarios in which the content moderation solutions based on image classification provided by Machine Learning Platform for AI are used. The dataset consists of a training set and a test set. You can use the content moderation solution to streamline data preparation, model building, and model deployment based on your business scenarios. This allows you to develop your own risk control system.

The following section describes the paths in which the image classification dataset for content moderation is stored and how to download the dataset:
  • Dataset addresses

    China (Shanghai): oss://pai-vision-data-sh.oss-cn-shanghai.aliyuncs.com/data/image_inspection_cls/

  • How to download
    Use the ossutil CLI provided by OSS to download the dataset to your on-premises machine. Example:
    1. For more information about how to download and install ossutil, see Download and installation.
    2. Run the following command to download the dataset to your on-premises machine. For more information about ossutil commands, see ossutil.
      ./ossutilmac64 cp -r oss://pai-vision-data-sh/data/image_inspection_cls /Users/tongxin/Desktop

Object detection dataset for content moderation

The object detection dataset for content moderation is commonly used in the scenarios in which the object detection solutions provided by Machine Learning Platform for AI are used. The dataset consists of a training set, evaluation set, and test set. You can use the content moderation solution to streamline data preparation, model building, and model deployment based on your business scenarios. This allows you to develop your own risk control system.

The following section describes the paths in which the object detection dataset for content moderation is stored and how to download the dataset:
  • Dataset addresses
    • China (Hangzhou): oss://pai-vision-data-hz2.oss-cn-hangzhou.aliyuncs.com/data/image_inspection_det/
    • China (Shanghai): oss://pai-vision-data-sh.oss-cn-shanghai.aliyuncs.com/data/image_inspection_det/
    • China (Beijing): oss://pai-vision-data-bj.oss-cn-beijing.aliyuncs.com/data/image_inspection_det/
    • China (Shenzhen): oss://pai-vision-data-sz.oss-cn-shenzhen.aliyuncs.com/data/image_inspection_det/
  • How to download
    Use the ossutil CLI provided by OSS to download the dataset to your on-premises machine. Example:
    1. For more information about how to download and install ossutil, see Download and installation.
    2. Run the following command to download the dataset to your on-premises machine. For more information about ossutil commands, see ossutil.
      ./ossutilmac64 cp -r oss://pai-vision-data-hz2/data/image_inspection_det /Users/tongxin/Desktop

The Deepfashion2 dataset

Deepfashion2 is an open source dataset that is widely used to match similar clothing images and retrieve clothing images. Machine Learning Platform for AI provides more than 310,000 clothing images from the Deepfashion2 dataset. You can use the image matching and image retrieval solution provided by Machine Learning Platform for AI to streamline data preparation, model building, and model development. This allows you to develop your own image retrieval system.

The following section describes the paths in which the Deepfashion2 dataset is stored and how to download the dataset:
  • Dataset addresses
    • China (Hangzhou): oss://pai-vision-data-hz2.oss-cn-hangzhou.aliyuncs.com/data/deepfashion2/train_crop/
    • China (Shanghai): oss://pai-vision-data-sh.oss-cn-shanghai.aliyuncs.com/data/deepfashion2/train_crop/
    • China (Beijing):oss://pai-vision-data-bj.oss-cn-beijing.aliyuncs.com/data/deepfashion2/train_crop/
    • China (Shenzhen): oss://pai-vision-data-sz.oss-cn-shenzhen.aliyuncs.com/data/deepfashion2/train_crop/
  • How to download
    Use the ossutil CLI provided by OSS to download the dataset to your on-premises machine. Example:
    1. For more information about how to download and install ossutil, see Download and installation.
    2. Run the following command to download the dataset to your on-premises machine. For more information about ossutil commands, see ossutil.
      ./ossutilmac64 cp -r oss://pai-vision-data-hz2/data/deepfashion2/train_crop /Users/tongxin/Desktop