By Che Yang, Maintainer of the Fluid Community and Xie Yuandong, Committer of the Fluid Community
Today, more data-intensive applications, such as big data and AI, are being deployed and run in Kubernetes. With this trend, the divergence between the design concepts of data-intensive application computing frameworks and cloud-native and flexible application orchestration has led to data access and computational bottlenecks.
Fluid, the cloud-native data orchestration engine provides data access acceleration for applications through the abstraction of data sets with a distributed cache and the scheduler.
As one of the core features of Kubernetes, auto scaling always focuses on stateless application loads. Fluid provides the auto scaling capability with a distributed cache, which allows the data cache to expand and shrink flexibly. Based on runtime, Fluid supports performance metrics, such as cache space and the proportion of existing caches. Moreover, it provides data cache and on-demand scaling in combination with its capability to scale runtime resources.
This capability is very important for big data applications under Internet scenarios because most big data applications are implemented through end-to-end pipelines. The pipeline contains the steps below:
There are different types of computing tasks in the end-to-end pipeline. In practice, each computing task is processed with a professional system, including TensorFlow, PyTorch, Spark, and Presto. However, these systems are independent of each other. Therefore, an external file system is often used to transfer data from one stage to the next. The frequent use of file systems for data exchange results in significant input/output (I/O) overhead, which often becomes a workflow bottleneck.
Fluid is very suitable for the scenario mentioned above. Users can create a dataset, which can distribute data to Kubernetes compute nodes as a medium for data exchange, avoiding remote data writing and reading and improving data usage efficiency. The problem here is the resource estimation and reservation of the temporary data cache. Before data is produced and consumed, it is difficult to estimate the data size accurately. A higher estimate leads to a waste of resource reservation, whereas a lower estimate increases the possibility of data write failures. Therefore, scaling on demand is more user-friendly. Hopefully, a page cache-like effect that is transparent to the end users can be achieved, but the cache acceleration it brings is real.
The cache auto scaling was introduced through Fluid by customizing the horizontal pod autoscaler (HPA) mechanism. The condition for auto scaling is that when the amount of existing cache data reaches a certain proportion, the auto scaling will be triggered to expand the cache space. For example, set the trigger condition to the cache space accounted for more than 75%. In this case, the total cache space is 10 GB. When the data has filled up to 8 GB, the expansion mechanism will be triggered.
The following example shows the auto scaling of Fluid.
Kubernetes 1.18 or later is recommended. HPA cannot customize the scaling policy before version 1.18, which is implemented through hard coding. However, after version 1.18, users can customize the scaling policy, for example, defining the cooldown after a scale-up.
1. Install jq to parse the json object. This example uses the CentOS operating system. Users can run the yum command to install jq.
yum install -y jq
2. Download and install the latest version of Fluid
git clone https://github.com/fluid-cloudnative/fluid.git
cd fluid/charts
kubectl create ns fluid-system
helm install fluid fluid
3. Deploy or configure Prometheus
Metrics exposed by AlluxioRuntime's cache engine are collected here via Prometheus. If Prometheus does not exist in the cluster:
$ cd fluid
$ kubectl apply -f integration/prometheus/prometheus.yaml
If the Prometheus exists in the cluster, write the following configuration into the Prometheus configuration file:
scrape_configs:
- job_name: 'alluxio runtime'
metrics_path: /metrics/prometheus
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_label_monitor]
regex: alluxio_runtime_metrics
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
regex: web
action: keep
- source_labels: [__meta_kubernetes_namespace]
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_label_release]
target_label: fluid_runtime
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_name]
target_label: pod
replacement: $1
action: replace
4. Verify whether Prometheus is installed
$ kubectl get ep -n kube-system prometheus-svc
NAME ENDPOINTS AGE
prometheus-svc 10.76.0.2:9090 6m49s
$ kubectl get svc -n kube-system prometheus-svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
prometheus-svc NodePort 172.16.135.24 <none> 9090:32114/TCP 2m7s
Install Grafana to visualize monitoring metrics and verify the monitoring data. For more information, please see the document (article in Chinese).
5. Deploy the metrics server
Check whether the cluster includes a metrics server. If the kubectl top node has the correct output for memory and CPU, the cluster metrics server is configured correctly.
kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
192.168.1.204 93m 2% 1455Mi 10%
192.168.1.205 125m 3% 1925Mi 13%
192.168.1.206 96m 2% 1689Mi 11%
Otherwise, manually run the following command:
kubectl create -f integration/metrics-server
6. Deploy custom-metrics-api components
Two components are needed to scale based on custom metrics. The first component collects metrics from the application and stores them in the Prometheus time-series database. The other one extends the custom metrics API of Kubernetes with a collection of metrics, the k8s-prometheus-adapter. The first component is deployed in step three. Now, the second component will be deployed like this:
If custom-metrics-api is already configured, add dataset-related configurations to the adapter's ConfigMap configuration:
apiVersion: v1
kind: ConfigMap
metadata:
name: adapter-config
namespace: monitoring
data:
config.yaml: |
rules:
- seriesQuery: '{__name__=~"Cluster_(CapacityTotal|CapacityUsed)",fluid_runtime!="",instance!="",job="alluxio runtime",namespace!="",pod!=""}'
seriesFilters:
- is: ^Cluster_(CapacityTotal|CapacityUsed)$
resources:
overrides:
namespace:
resource: namespace
pod:
resource: pods
fluid_runtime:
resource: datasets
name:
matches: "^(.*)"
as: "capacity_used_rate"
metricsQuery: ceil(Cluster_CapacityUsed{<<.LabelMatchers>>}*100/(Cluster_CapacityTotal{<<.LabelMatchers>>}))
Otherwise, manually run the following command:
kubectl create -f integration/custom-metrics-api/namespace.yaml
kubectl create -f integration/custom-metrics-api
Note: Since the custom-metrics-api docks to the Prometheus access address in the cluster, please replace the Prometheus URL with the Prometheus address you actually use.
Check custom metrics
$ kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq
{
"kind": "APIResourceList",
"apiVersion": "v1",
"groupVersion": "custom.metrics.k8s.io/v1beta1",
"resources": [
{
"name": "pods/capacity_used_rate",
"singularName": "",
"namespaced": true,
"kind": "MetricValueList",
"verbs": [
"get"
]
},
{
"name": "datasets.data.fluid.io/capacity_used_rate",
"singularName": "",
"namespaced": true,
"kind": "MetricValueList",
"verbs": [
"get"
]
},
{
"name": "namespaces/capacity_used_rate",
"singularName": "",
"namespaced": false,
"kind": "MetricValueList",
"verbs": [
"get"
]
}
]
}
7. Submit the dataset used for the test.
$ cat<<EOF >dataset.yaml
apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
name: spark
spec:
mounts:
- mountPoint: https://mirrors.bit.edu.cn/apache/spark/
name: spark
---
apiVersion: data.fluid.io/v1alpha1
kind: AlluxioRuntime
metadata:
name: spark
spec:
replicas: 1
tieredstore:
levels:
- mediumtype: MEM
path: /dev/shm
quota: 1Gi
high: "0.99"
low: "0.7"
properties:
alluxio.user.streaming.data.timeout: 300sec
EOF
$ kubectl create -f dataset.yaml
dataset.data.fluid.io/spark created
alluxioruntime.data.fluid.io/spark created
8. Check whether this dataset is in the available state. The total amount of data in this dataset is 2.71 GiB, and the maximum caching capacity provided by Fluid is 1 GiB, with the current number of cache nodes being 1. The amount of data cannot meet the demand of a full data cache.
$ kubectl get dataset
NAME UFS TOTAL SIZE CACHED CACHE CAPACITY CACHED PERCENTAGE PHASE AGE
spark 2.71GiB 0.00B 1.00GiB 0.0% Bound 7m38s
9. When the dataset is in the available status, check whether the metrics can be obtained from the custom-metrics-api.
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/datasets.data.fluid.io/*/capacity_used_rate" | jq
{
"kind": "MetricValueList",
"apiVersion": "custom.metrics.k8s.io/v1beta1",
"metadata": {
"selfLink": "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/datasets.data.fluid.io/%2A/capacity_used_rate"
},
"items": [
{
"describedObject": {
"kind": "Dataset",
"namespace": "default",
"name": "spark",
"apiVersion": "data.fluid.io/v1alpha1"
},
"metricName": "capacity_used_rate",
"timestamp": "2021-04-04T07:24:52Z",
"value": "0"
}
]
}
10. Create an HPA task
$ cat<<EOF > hpa.yaml
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: spark
spec:
scaleTargetRef:
apiVersion: data.fluid.io/v1alpha1
kind: AlluxioRuntime
name: spark
minReplicas: 1
maxReplicas: 4
metrics:
- type: Object
object:
metric:
name: capacity_used_rate
describedObject:
apiVersion: data.fluid.io/v1alpha1
kind: Dataset
name: spark
target:
type: Value
value: "90"
behavior:
scaleUp:
policies:
- type: Pods
value: 2
periodSeconds: 600
scaleDown:
selectPolicy: Disabled
EOF
In the sample configuration, there are two main parts. One is the scaling rules, and another is the scaling sensitivity.
AlluxioRuntime
, and the minimum and the maximum number of replicas are 1 and 4, respectively. The dataset and AlluxioRuntime
objects must be in the same namespace.periodSeconds
field is set to 10 minutes, and two replicas are added during the scaling. However, this cannot exceed the limit of max replicas. After the scale-up, the stabilizationWindowSeconds
field is set to 20 minutes, while after scale-down can choose to close it directly.11. View the HPA configuration. The current proportion of cache space is 0, far below the condition for triggering the scale-up.
$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
spark AlluxioRuntime/spark 0/90 1 4 1 33s
$ kubectl describe hpa
Name: spark
Namespace: default
Labels: <none>
Annotations: <none>
CreationTimestamp: Wed, 07 Apr 2021 17:36:39 +0800
Reference: AlluxioRuntime/spark
Metrics: ( current / target )
"capacity_used_rate" on Dataset/spark (target value): 0 / 90
Min replicas: 1
Max replicas: 4
Behavior:
Scale Up:
Stabilization Window: 0 seconds
Select Policy: Max
Policies:
- Type: Pods Value: 2 Period: 600 seconds
Scale Down:
Select Policy: Disabled
Policies:
- Type: Percent Value: 100 Period: 15 seconds
AlluxioRuntime pods: 1 current / 1 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True ScaleDownStabilized recent recommendations were higher than current one, applying the highest recent recommendation
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from Dataset metric capacity_used_rate
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
Events: <none>
12. Create a data preheating task
$ cat<<EOF > dataload.yaml
apiVersion: data.fluid.io/v1alpha1
kind: DataLoad
metadata:
name: spark
spec:
dataset:
name: spark
namespace: default
EOF
$ kubectl create -f dataload.yaml
$ kubectl get dataload
NAME DATASET PHASE AGE DURATION
spark spark Executing 15s Unfinished
13. At this time, we can see that the amount of cached data is close to the cache capability provided by Fluid (1GiB.) In addition, the condition for auto scaling is triggered.
$ kubectl get dataset
NAME UFS TOTAL SIZE CACHED CACHE CAPACITY CACHED PERCENTAGE PHASE AGE
spark 2.71GiB 1020.92MiB 1.00GiB 36.8% Bound 5m15s
According to the HPA monitoring, the scale-up of the AlluxioRuntime
has started, and the scale-up step length is 2.
$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
spark AlluxioRuntime/spark 100/90 1 4 2 4m20s
$ kubectl describe hpa
Name: spark
Namespace: default
Labels: <none>
Annotations: <none>
CreationTimestamp: Wed, 07 Apr 2021 17:56:31 +0800
Reference: AlluxioRuntime/spark
Metrics: ( current / target )
"capacity_used_rate" on Dataset/spark (target value): 100 / 90
Min replicas: 1
Max replicas: 4
Behavior:
Scale Up:
Stabilization Window: 0 seconds
Select Policy: Max
Policies:
- Type: Pods Value: 2 Period: 600 seconds
Scale Down:
Select Policy: Disabled
Policies:
- Type: Percent Value: 100 Period: 15 seconds
AlluxioRuntime pods: 2 current / 3 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True SucceededRescale the HPA controller was able to update the target scale to 3
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from Dataset metric capacity_used_rate
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulRescale 21s horizontal-pod-autoscaler New size: 2; reason: Dataset metric capacity_used_rate above target
Normal SuccessfulRescale 6s horizontal-pod-autoscaler New size: 3; reason: Dataset metric capacity_used_rate above target
14. After waiting for a period, the cache space of the dataset increases from 1GiB to 3GiB, and the data cache is nearly complete.
$ kubectl get dataset
NAME UFS TOTAL SIZE CACHED CACHE CAPACITY CACHED PERCENTAGE PHASE AGE
spark 2.71GiB 2.59GiB 3.00GiB 95.6% Bound 12m
Meanwhile, the status of HPA shows that the number of replicas of the dataset's corresponding runtime is 3, and the capacity_used_rate
of cache space already used is 85%, which will not trigger cache scale-up.
$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
spark AlluxioRuntime/spark 85/90 1 4 3 11m
15. Clean up the environment
kubectl delete hpa spark
kubectl delete dataset spark
Fluid provides the capability to combine Prometheus, Kubernetes HPA, and custom metrics, triggering auto scaling based on the proportion of occupied cache space and enabling the on-demand use of cache. Therefore, users can be more flexible while using the distributed cache to improve data access acceleration. In the future, the timed scaling will be supported to provide greater certainty for scaling.
Code Library of Fluid: https://github.com/fluid-cloudnative/fluid.git
You are welcome to follow and contribute code.
KubeVela 1.0 Introduces the Future of Programmable Application Platform
495 posts | 48 followers
FollowAlibaba Cloud Native - July 14, 2023
Alibaba Cloud Native Community - September 19, 2023
Alibaba Developer - January 6, 2022
Alibaba Cloud ECS - October 10, 2018
Alibaba Cloud Native Community - September 20, 2023
Alibaba Cloud Native - November 29, 2023
495 posts | 48 followers
FollowAuto Scaling automatically adjusts computing resources based on your business cycle
Learn MoreAlibaba Cloud provides big data consulting services to help enterprises leverage advanced data technology.
Learn MoreAlibaba Cloud experts provide retailers with a lightweight and customized big data consulting service to help you assess your big data maturity and plan your big data journey.
Learn MoreApsaraDB for HBase is a NoSQL database engine that is highly optimized and 100% compatible with the community edition of HBase.
Learn MoreMore Posts by Alibaba Cloud Native Community