The public datasets are open source datasets provided by Alibaba Cloud. The public datasets are stored in the public storage of Alibaba Cloud. You can register the public datasets without the need to create replicas in your storage and use these datasets in data processing and modeling tasks. This topic describes the public datasets provided by Machine Learning Platform for AI and how to download these datasets.
Background information
The CIFAR-10 dataset
- train: stores 50,000 images. These images are used for training.
- test: stores 10,000 images. These images are used for testing.
- predict: stores several images. These images are used for prediction.
- Dataset addresses
- China (Hangzhou):
oss://pai-vision-data-hz2.oss-cn-hangzhou.aliyuncs.com/data/cifar10/qince_data/
- China (Shanghai):
oss://pai-vision-data-sh.oss-cn-shanghai.aliyuncs.com/data/cifar10/qince_data/
- China (Beijing):
oss://pai-vision-data-bj.oss-cn-beijing.aliyuncs.com/data/cifar10/qince_data/
- China (Shenzhen):
oss://pai-vision-data-sz.oss-cn-shenzhen.aliyuncs.com/data/cifar10/qince_data/
- China (Hangzhou):
- How to download
Use the ossutil CLI provided by Object Storage Service (OSS) to download the dataset to your on-premises machine. Example:
- For more information about how to download and install ossutil, see Download and installation.
- Run the following command to download the dataset to your on-premises machine. For
more information about ossutil commands, see ossutil.
./ossutilmac64 cp -r oss://pai-vision-data-hz2/data/cifar10/qince_data /Users/tongxin/Desktop
The PASCAL VOC 2007 dataset
PASCAL VOC 2007 is an open source dataset that is widely used to detect objects and partition images. The dataset is also used to benchmark Faster-RCNN and Yolo. The dataset consists of the following files: the Annotations, ImageSets, JPEGImages, SegmentationClass, and SegmentationObject files.
- Dataset addresses
- China (Hangzhou):
oss://pai-vision-data-hz2.oss-cn-hangzhou.aliyuncs.com/data/VOCdevkit/VOC2007/
- China (Shanghai):
oss://pai-vision-data-sh.oss-cn-shanghai.aliyuncs.com/data/VOCdevkit/VOC2007/
- China (Beijing):
oss://pai-vision-data-bj.oss-cn-beijing.aliyuncs.com/data/VOCdevkit/VOC2007/
- China (Shenzhen):
oss://pai-vision-data-sz.oss-cn-shenzhen.aliyuncs.com/data/VOCdevkit/VOC2007/
- China (Hangzhou):
- How to download
Use the ossutil CLI provided by OSS to download the dataset to your on-premises machine. Example:
- For more information about how to download and install ossutil, see Download and installation.
- Run the following command to download the dataset to your on-premises machine. For
more information about ossutil commands, see ossutil.
./ossutilmac64 cp -r oss://pai-vision-data-hz2/data/VOCdevkit/VOC2007 /Users/tongxin/Desktop
Image classification dataset for content moderation
The image classification dataset for content moderation is commonly used in the scenarios in which the content moderation solutions based on image classification provided by Machine Learning Platform for AI are used. The dataset consists of a training set and a test set. You can use the content moderation solution to streamline data preparation, model building, and model deployment based on your business scenarios. This allows you to develop your own risk control system.
- Dataset addresses
China (Shanghai):
oss://pai-vision-data-sh.oss-cn-shanghai.aliyuncs.com/data/image_inspection_cls/
- How to download
Use the ossutil CLI provided by OSS to download the dataset to your on-premises machine. Example:
- For more information about how to download and install ossutil, see Download and installation.
- Run the following command to download the dataset to your on-premises machine. For
more information about ossutil commands, see ossutil.
./ossutilmac64 cp -r oss://pai-vision-data-sh/data/image_inspection_cls /Users/tongxin/Desktop
Object detection dataset for content moderation
The object detection dataset for content moderation is commonly used in the scenarios in which the object detection solutions provided by Machine Learning Platform for AI are used. The dataset consists of a training set, evaluation set, and test set. You can use the content moderation solution to streamline data preparation, model building, and model deployment based on your business scenarios. This allows you to develop your own risk control system.
- Dataset addresses
- China (Hangzhou):
oss://pai-vision-data-hz2.oss-cn-hangzhou.aliyuncs.com/data/image_inspection_det/
- China (Shanghai):
oss://pai-vision-data-sh.oss-cn-shanghai.aliyuncs.com/data/image_inspection_det/
- China (Beijing):
oss://pai-vision-data-bj.oss-cn-beijing.aliyuncs.com/data/image_inspection_det/
- China (Shenzhen):
oss://pai-vision-data-sz.oss-cn-shenzhen.aliyuncs.com/data/image_inspection_det/
- China (Hangzhou):
- How to download
Use the ossutil CLI provided by OSS to download the dataset to your on-premises machine. Example:
- For more information about how to download and install ossutil, see Download and installation.
- Run the following command to download the dataset to your on-premises machine. For
more information about ossutil commands, see ossutil.
./ossutilmac64 cp -r oss://pai-vision-data-hz2/data/image_inspection_det /Users/tongxin/Desktop
The Deepfashion2 dataset
Deepfashion2 is an open source dataset that is widely used to match similar clothing images and retrieve clothing images. Machine Learning Platform for AI provides more than 310,000 clothing images from the Deepfashion2 dataset. You can use the image matching and image retrieval solution provided by Machine Learning Platform for AI to streamline data preparation, model building, and model development. This allows you to develop your own image retrieval system.
- Dataset addresses
- China (Hangzhou):
oss://pai-vision-data-hz2.oss-cn-hangzhou.aliyuncs.com/data/deepfashion2/train_crop/
- China (Shanghai):
oss://pai-vision-data-sh.oss-cn-shanghai.aliyuncs.com/data/deepfashion2/train_crop/
- China (Beijing):
oss://pai-vision-data-bj.oss-cn-beijing.aliyuncs.com/data/deepfashion2/train_crop/
- China (Shenzhen):
oss://pai-vision-data-sz.oss-cn-shenzhen.aliyuncs.com/data/deepfashion2/train_crop/
- China (Hangzhou):
- How to download
Use the ossutil CLI provided by OSS to download the dataset to your on-premises machine. Example:
- For more information about how to download and install ossutil, see Download and installation.
- Run the following command to download the dataset to your on-premises machine. For
more information about ossutil commands, see ossutil.
./ossutilmac64 cp -r oss://pai-vision-data-hz2/data/deepfashion2/train_crop /Users/tongxin/Desktop