Advanced dataset management and usage features - Container Service for Kubernetes

Configure the access mode of a dataset

Container Service for Kubernetes (ACK) allows you to configure different access modes for a dataset, including the ReadOnlyMany and ReadWriteOnce modes. You can configure an appropriate access mode to flexibly manage the access permissions and usage of datasets based on the requirements of application scenarios. This feature ensures efficient and secure access to datasets in Kubernetes clusters and is suitable for big data and AI scenarios. For more information, see Configure the access mode of a dataset.

Periodically update a dataset

ACK allows you to use a DataLoad job to periodically update datasets. You can configure a DataLoad job to synchronize the latest data from external data sources, such as Object Storage Service (OSS) or Hadoop Distributed File System (HDFS), to a dataset. This periodic update mechanism ensures the timeliness and accuracy of the dataset and is suitable for application scenarios that require dynamic data, such as real-time data analysis and machine learning training. For more information, see Periodically update a dataset by running a DataLoad job.

Share datasets across namespaces

ACK allows you to share datasets across namespaces in Kubernetes clusters. Applications in different namespaces can share the same dataset. You can configure a sharing policy for a dataset to implement efficient reuse of data and ensure access permissions and security isolation for the dataset. The cross-namespace sharing feature is suitable for multi-team collaboration and distributed computing scenarios. For more information, see Share datasets across namespaces.

Use JindoRuntime to persist storage for the JindoFS master

JindoRuntime is a data runtime in Fluid that is used to accelerate access to OSS and HDFS. You can use JindoRuntime to persist storage for the JindoFS master to ensure that the status of the runtime can be recovered when a cluster is restarted or a node fails. This ensures the high availability and stability of the data acceleration service. For more information, see Use JindoRuntime to persist storage for the JindoFS master.

Schedule pods based on cache affinity

ACK allows you to schedule pods based on cache affinity. By locally scheduling data cache and computing jobs, ACK reduces data transmission latency and improves the overall system performance. This feature is suitable for application scenarios that require high-frequency data access and high-concurrency processing, such as distributed computing and AI training. For more information, see Schedule pods based on cache affinity.

The auto recovery feature provided by Fluid for FUSE file systems

The auto recovery feature provided by Fluid transparently integrates the Fuse file system with distributed storage systems, such as OSS and HDFS, in the runtime. The auto recovery feature ensures continuity and reliability of data access in the event of failures or when a node becomes unavailable. You can use the Fuse client to implement transparent access to distributed storage and quickly restore the data access path when required to ensure the continuous operation of your business. For more information, see Enable the auto recovery feature for FUSE mount targets.

Summary

Configure the access mode of a dataset: This allows you to flexibly configure the access permission and mode of a dataset to ensure efficient and secure data access.
Periodically update a dataset: This allows you to use DataLoad jobs to dynamically update datasets to maintain the timeliness and accuracy of data.
Share datasets across namespaces: This allows you to share and reuse datasets in multi-team collaboration and distributed computing scenarios to improve resource utilization.
Use JindoRuntime to persist storage: This ensures the high availability and stability of the data acceleration service and ensures the recovery of the runtime status.
Schedule pods based on cache affinity: This optimizes data access performance, reduces latency, and improves the overall system efficiency.
Enable the auto recovery feature for FUSE: This implements transparent access and fault recovery of distributed storage to ensure the continuity of data access.

The preceding features and configurations help you comprehensively manage the lifecycle of datasets, optimize data access performance, and ensure high system availability and data security.