Data Science Workshop (DSW) of Platform for AI (PAI) is a one-stop AI development platform tailored for algorithm developers. DSW integrates multiple cloud development environments, such as JupyterLab, WebIDE, and Terminal, for coding, debugging, and task running. DSW provides various heterogeneous computing resources and open-source images and supports mounting of datasets of the Object Storage Service (OSS), Apsara File Storage NAS (NAS), and Cloud Parallel File Storage (CPFS) types. You can manage the lifecycle of DSW instances and use DSW for development in an easy and efficient manner.
Features
One-stop service
DSW allows you to mount file systems, such as OSS, NAS, and CPFS file systems, access MaxCompute data, and use Deep Learning Containers (DLC) and Elastic Algorithm Service (EAS) tools.
DSW allows you to implement AI development that covers data processing, coding, debugging, model training, and model deployment.
Flexibility and ease of use
DSW provides various heterogeneous computing resources, including public resource groups and dedicated resource groups. You can flexibly configure and manage resources in DSW.
DSW provides images of multiple open-source frameworks and supports custom images.
DSW provides built-in development environments, such as Notebook, WebIDE, and Terminal, to meet various development requirements.
DSW supports the writing and execution of R language and SQL statements on top of Python.
Fine-grained management
DSW allows you to configure scheduled stop for an instance or auto stop for idle instances to reduce costs.
DSW provides real-time monitoring of CPU, GPU, and memory usage to help you analyze the resource usage in real time, adjust task allocation, and optimize code performance at the earliest opportunity.
Scenario-based tutorials
DSW provides Notebook Gallery as a content platform for developers. You can use the tutorials for large language model (LLM) and AI content generation-related industries in Notebook Gallery to quickly get started with development.
Enterprise-class capabilities
DSW uses the workspace administrator role to allocate global resources and configure resource reclaim policies.
Scenarios
Machine learning and data science
DSW supports the JupyterLab interactive programming environment and provides various images such as PyTorch and TensorFlow images. You can easily perform tasks such as data engineering, model development and training, and visual analysis without the need to perform resource O&M and environment configuration.
Generative AI and LLM
Notebook Gallery provides various use cases and best practices in common scenarios that you can access in Notebook, such as Stable Diffusion, Llama2, and Tongyi Qianwen. You can directly use the tutorials in DSW and code based on the tutorials.
AI and big data integration
In addition to Python and R languages, DSW supports big data integration. You can use the SQL File plug-in to query data from MaxCompute data sources by using SQL statements or connect to E-MapReduce clusters by using Notebook to submit Spark jobs.