KServe is an open source project that aims to simplify the deployment and management of machine learning models on Kubernetes by providing declarative APIs through YAML files. ack-kserve is deeply optimized based on open source KServe and tightly integrated with the Alibaba Cloud ecosystem (such as storage, logs, network, etc.), simplifying the deployment and operations processes of KServe. This topic describes how to deploy and manage ack-kserve components in an ACK cluster.
Prerequisites
You have an ACK managed cluster Pro with Kubernetes version 1.22 or later. For more information, see Create an ACK managed cluster.
The Nginx Ingress Controller component is installed. For more information, see Manage the Nginx Ingress Controller component.
Step 1: Install the cert-manager component
Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side navigation pane, choose .
Click Create in the upper-left corner. On the Basic Information page, enter the Application Name, search for and select
cert-manager
in the Chart section, and then click Next.On the Parameters page, confirm the Chart Version and Parameters information, and then click OK.
After the deployment is successful, you can view the Helm component information of cert-manager on the Helm page.
Step 2: Install the ack-kserve component
The ack-kserve component is deployed in RawDeployment mode by default and integrated with the Nginx Ingress Controller component.
Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side navigation pane, choose .
Click Create in the upper-left corner. On the Basic Information page, enter the Application Name, search for and select
ack-kserve
in the Chart section, and then click Next.On the Parameters page, confirm the Chart Version and Parameters information, and then click OK.
After the deployment is successful, you can view the Helm component information of ack-kserve on the Helm page.
Verify that ack-kserve is running.
Run the following command to check the status of the pods:
kubectl get pod -n kserve
If the
STATUS
in the expected output isrunning
, it indicates that the ack-kserve component has been successfully installed.
(Optional) Step 3: View or update the ack-kserve️ component
Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side navigation pane, choose .
View the details of the ack-kserve️ component.
On the Helm page, click Actions in the row of the ack-kserve component and select Details to view the basic information, parameter configuration, and historical versions of the component.
Update the ack-kserve️ component information.
On the Helm page, click Actions in the row of the ack-kserve component and select Update to update the version and parameters of the component.
(Optional) Step 4: Clean up resources and uninstall components
To avoid resource waste, delete the KServe CR (Custom Resource) and CRD (Custom Resource Definition) resources in the cluster before uninstalling the ack-kserve️ component.
ImportantBefore deleting CR and CRD resources, make sure that your business no longer uses the CR and CRD resources. Deleting a CRD resource will also delete the corresponding CR resources. Once CR resources are deleted, they cannot be recovered.
After confirming that the business no longer uses them, delete all KServe CR resources in the cluster. Deleting CR resources may involve the following commands:
# View all isvc resources in the cluster. kubectl get isvc --all-namespaces # Save all isvc resources in the cluster. kubectl get isvc --all-namespaces -oyaml > isvc.yaml.bak # Delete isvc resources after confirming that the business no longer uses them. kubectl delete isvc --all
Delete the KServe CRD resources in the cluster.
Before deleting CRDs, make sure to delete all CRs that depend on the CRD first, otherwise the CRD deletion will fail.
kubectl delete crd clusterservingruntimes.serving.kserve.io kubectl delete crd clusterstoragecontainers.serving.kserve.io kubectl delete crd inferencegraphs.serving.kserve.io kubectl delete crd inferenceservices.serving.kserve.io kubectl delete crd predictors.serving.kserve.io kubectl delete crd servingruntimes.serving.kserve.io kubectl delete crd trainedmodels.serving.kserve.io
Uninstall the ack-kserve component.
Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side navigation pane, choose .
On the Helm page, click Delete in the Operation column of the ack-kserve component to uninstall it according to the page prompts.
Uninstall the cert-manager component.
WarningBefore uninstalling the cert-manager component, make sure that no other components in the cluster are using it, otherwise it may cause business unavailability.
Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side navigation pane, choose .
On the Helm page, click Delete in the Operation column of the cert-manager component to uninstall it according to the page prompts.
Run the following command to delete the cert-manager CRD resources in the cluster.
kubectl delete crd certificaterequests.cert-manager.io kubectl delete crd certificates.cert-manager.io kubectl delete crd challenges.acme.cert-manager.io kubectl delete crd clusterissuers.cert-manager.io kubectl delete crd issuers.cert-manager.io kubectl delete crd orders.acme.cert-manager.io
FAQ
Issue: When installing the ack-kserve component, the following error occurs: failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:443/validate?timeout=30s": tls: failed to verify certificate: x509: certificate signed by unknown authority
.
Cause: The ack-kserve component strongly depends on the cert-manager component. If the cert-manager component is not installed or not ready in the current cluster, the above error will occur when installing the ack-kserve component.
Solution:
Run the following command to confirm whether the cert-manager component is installed in the cluster.
kubectl get crd |grep certificates.cert-manager.io
The expected output is shown below, indicating that the cert-manager component is installed in the cluster.
certificates.cert-manager.io 2024-05-06T07:09:17Z
If there are no cert-manager CRD resources in the cluster, see Step 1 to install the cert-manager component.
Run the following command to confirm whether the cert-manager component is ready.
kubectl -n cert-manager get po
The expected output is shown below, indicating that all pods of the cert-manager component are ready.
NAME READY STATUS RESTARTS AGE cert-manager-7f4bb44d5b-jrrfn 1/1 Running 0 23h cert-manager-cainjector-79544456cc-qp5pp 1/1 Running 0 23h cert-manager-webhook-f74ccb647-7m5dt 1/1 Running 0 23h
If all pods are in the Ready state, first uninstall the ack-kserve component as described above, and then reinstall it to resolve the error.