Fluid is a distributed dataset orchestrator and accelerator of open source Kubernetes for data-intensive applications in cloud-native scenarios. Fluid enables the observability, auto scaling, and portability of datasets by managing and scheduling cache runtimes. This topic uses JindoFS as an example to describe how to periodically update a dataset by running a DataLoad job.
Prerequisites
A Container Service for Kubernetes (ACK) Pro cluster that runs Kubernetes 1.18 or later is created. For more information, see Create an ACK managed cluster.
The cloud-native AI suite is installed and the ack-fluid component is deployed.
ImportantIf you have already installed open source Fluid, uninstall Fluid and deploy the ack-fluid component.
Make sure that the ack-fluid version is 1.0.3.
If you have not installed the cloud-native AI suite, enable Fluid acceleration when you install the suite. For more information, see Deploy the cloud-native AI set.
If you have installed the cloud-native AI suite, log on to the ACK console and deploy ack-fluid from the Cloud-native AI Suite page.
- A kubectl client is connected to the ACK Pro cluster. For more information, see Connect to a cluster by using kubectl.
Step 1: Upload data to OSS
Run the following command to download the test data:
wget https://archive.apache.org/dist/hbase/2.5.2/RELEASENOTES.md
Install ossutil and create an Object Storage Service (OSS) bucket. For more information, see Install ossutil.
Run the following command to upload the test data to the OSS bucket:
ossutil64 cp RELEASENOTES.md oss://<bucket>/<path>/RELEASENOTES.md
Step 2: Create a dataset and a JindoRuntime
Create a file named
mySecret.yaml
to store theAccessKey ID
andAccessKey secret
that are used to access OSS. The following YAML template provides an example:apiVersion: v1 kind: Secret metadata: name: mysecret stringData: fs.oss.accessKeyId: ****** # Enter the AccessKey ID. fs.oss.accessKeySecret: ****** # # Enter the AccessKey secret.
Run the following command to create a Secret:
kubectl create -f mySecret.yaml
Expected output:
secret/mysecret created
Create a file named
dataset.yaml
. The file is used to create a dataset.Run the following command to deploy
dateset.yaml
to create a JindoRuntime and a dataset:kubectl create -f dataset.yaml
Expected output:
dataset.data.fluid.io/demo created jindoruntime.data.fluid.io/demo created
Run the following command to check whether the dataset is deployed:
kubectl get dataset
Expected output:
NAME UFS TOTAL SIZE CACHED CACHE CAPACITY CACHED PERCENTAGE PHASE AGE demo 588.90KiB 0.00B 10.00GiB 0.0% Bound 2m7s
Step 3: Create a DataLoad job that runs periodically
Create a file named
dataload.yaml
.Run the following command to deploy
dataload.yaml
to create a DataLoad job:kubectl apply -f dataload.yaml
Expected output:
dataload.data.fluid.io/cron-dataload created
Run the following command to query the status of the DataLoad job:
kubectl get dataload
In the following code block, if the status of the
PHASE
parameter isComplete
, data is loaded and you can proceed to the next step.NAME DATASET PHASE AGE DURATION cron-dataload demo Complete 68s 8s
Run the following command to query the status of the dataset:
kubectl get dataset
Expected output:
NAME UFS TOTAL SIZE CACHED CACHE CAPACITY CACHED PERCENTAGE PHASE AGE demo 588.90KiB 588.90KiB 10.00GiB 100.0% Bound 5m50s
The output indicates that all data in OSS is loaded to the cache.
Step 4: Create an application pod to access data in OSS
Create a file named
app.yaml
and use an application pod to access theRELEASENOTES.md
file.Run the following command to create an application pod:
kubectl create -f app.yaml
Expected output:
pod/nginx created
After the application pod is ready, run the following command to access the data in OSS:
kubectl exec -it nginx -- ls -lh /data
Expected output:
total 589K -rwxrwxr-x 1 root root 589K Jul 31 04:20 RELEASENOTES.md
Run the following command to write the string
"hello, crondataload."
to theRELEASENOTES.md
file:echo "hello, crondataload." >> RELEASENOTES.md
Run the following command to upload the
RELEASENOTES.md
file to OSS:ossutil64 cp RELEASENOTES.md oss://<bucket-name>/<path>/RELEASENOTES.md
Press
y
. Expected output:cp: overwrite "oss://<bucket-name>/<path>/RELEASENOTES.md"(y or N)? y Succeed: Total num: 1, size: 21. OK num: 1(upload 1 files). average speed 0(byte/s) 81.827978(s) elapsed
Run the following command to query the status of the DataLoad job:
kubectl describe dataload cron-dataload
Expected output:
... Status: Conditions: Last Probe Time: 2023-08-24T06:44:08Z Last Transition Time: 2023-08-24T06:44:08Z Status: True Type: Complete Duration: 8s Last Schedule Time: 2023-08-24T06:44:00Z # The most recent time when the DataLoad job started. Last Successful Time: 2023-08-24T06:44:08Z # The most recent time when the DataLoad job was completed. Phase: Complete ...
Run the following command to query the status of the dataset:
kubectl get dataset
Expected output:
NAME UFS TOTAL SIZE CACHED CACHE CAPACITY CACHED PERCENTAGE PHASE AGE demo 588.90KiB 1.15MiB 10.00GiB 100.0% Bound 10m
The output indicates that the updated file is loaded to the cache.
Run the following command to view the updated file in the application pod:
kubectl exec -it nginx -- tail /data/RELEASENOTES.md
Expected output:
hello, crondataload.
The output indicates that the application pod can access the updated file.
(Optional) Step 5: Clear data
If you do not need to use the data acceleration feature, clear the related data.
Run the following command to delete the JindoRuntime and the application pod:
kubectl delete -f app.yaml
kubectl delete -f dataset.yaml
Expected output:
pod "nginx" deleted
dataset.data.fluid.io "demo" deleted
jindoruntime.data.fluid.io "demo" deleted