This topic describes common issues and solutions for ossfs 2.0 persistent volumes (PVs).
Quick navigation
Category | Question |
Mount | |
Scale-out | Do I need to scale out a volume when the actual storage capacity exceeds the volume's configuration? |
Usage |
Mount
OSS persistent volume mount fails
Symptoms
When you mount an OSS PV using ossfs 2.0, the pod fails to start and a FailedMount error is reported in the pod events.
Cause
Cause 1: The mount fails because the AccessKey lacks the required permissions.
Cause 2: The event log contains the error message
failed to get secret secrets "xxx" is forbidden: User "serverless-xxx" cannot get resource "secrets" in API group "" in the namespace "xxx". For applications that run on virtual nodes (ACS pods), if a PersistentVolumeClaim (PVC) uses the nodePublishSecretRef field to specify authentication credentials, the Secret must be in the same namespace as the PVC.Cause 3: The event log contains the error message
FailedMount /run/fuse.ossfs/xxxxxx/mounter.sock: connect: no such file or directory. This error occurs because the ossfs 2.0 pod did not start correctly or was deleted unexpectedly.Cause 4: The OSS Bucket is configured for mirroring-based back-to-origin, and the mount directory has not been synchronized from the source.
Cause 5: The OSS Bucket is configured for static website hosting. When ossfs 2.0 checks the mount directory in OSS, the request is redirected to files such as index.html.
Solutions
Solution for Cause 1
Verify that the access policy of the RAM user that is used for mounting meets the requirements. For more information, see Use a statically provisioned ossfs 2.0 volume.
Check whether the AccessKey used for mounting has been disabled or rotated.
ImportantChanges to the AccessKey in the Secret that is specified by the
nodePublishSecretReffield in the PV do not take effect immediately. You must restart the ossfs 2.0 client pod. For more information about how to restart the client, see How to restart the ossfs 2.0 process.
Solution for Cause 2:
Create the required Secret in the same namespace as the PVC. When you create the new PV, set the
nodePublishSecretReffield to this Secret. For more information, see Method 2: Authenticate using a RAM user's AccessKey.Solution for Cause 3
Verify that the ossfs 2.0 pod exists.
In the following command,
PV_NAMEspecifies the name of the mounted OSS PV, andNODE_NAMEspecifies the name of the node where the application pod that requires the volume is located.kubectl -n ack-csi-fuse get pod -l csi.alibabacloud.com/volume-id=<PV_NAME> -owide | grep <NODE_NAME>If the pod exists but is in an abnormal state, troubleshoot the pod to resolve the issue. Make sure that the pod is in the Running state, and then restart the application pod to trigger a remount.
If the pod does not exist, proceed to the next step.
(Optional) Check audit logs or other records to determine whether the pod was deleted unexpectedly.
Common reasons for unexpected deletion include cleanup scripts, node drain operations, and node auto-healing. Adjust your configurations to prevent this issue from recurring.
Check for any remaining VolumeAttachment resources.
kubectl get volumeattachment | grep <PV_NAME> | grep <NODE_NAME>Restart the application pod to trigger a remount and verify that the ossfs 2.0 pod is created successfully.
Solution for Cause 4
You must synchronize the data from the source before you mount the volume. For more information, see Back-to-origin configurations.
Solution for Cause 5
You must disable or adjust the static website hosting configuration before you mount the volume. For more information, see Host a static website.
How to mount a single file from an OSS bucket using a persistent volume
The ossfs 2.0 tool can mount a path from an OSS bucket as a file system in a pod, but it cannot mount a single file directly. To allow a pod to access a specific file in OSS, you can use the subpath option.
For example, to mount the a.txt and b.txt files from the /subpath directory of an OSS bucket to two different pods at the /path/to/file/ path, use the following PV configuration:
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-oss
spec:
capacity:
storage: 5Gi
accessModes:
- ReadOnlyMany
persistentVolumeReclaimPolicy: Retain
csi:
driver: ossplugin.csi.alibabacloud.com
volumeHandle: pv-oss
volumeAttributes:
bucket: bucket
path: subpath # The parent path of a.txt and b.txt
url: "oss-cn-hangzhou.aliyuncs.com"
fuseType: ossfs2After you create the corresponding PVC, configure the `volumeMounts` in the pod as follows:
volumeMounts:
- mountPath: /path/to/file # The mount path in the pod that corresponds to bucket:/subpath
name: oss-pvc # Must be the same as the name in Volumes
subPath: a.txt # Or b.txt. This is the relative path of the file in bucket:/subpathAfter mounting, the full path to access a.txt in the pod is /path/to/file/a.txt, which corresponds to bucket:/subpath/a.txt.
For more information, see Use a statically provisioned ossfs 2.0 volume.
How to mount an OSS Bucket across different accounts
You can use RRSA-based authorization to mount an OSS Bucket across different accounts.
Ensure that your cluster and CSI component versions meet the requirements for RRSA-based authorization.
The following steps describe how to mount a bucket from Account B, where the OSS Bucket resides, to a cluster in Account A, where the cluster resides. Before you create a volume with RRSA-based authorization, you must complete the RAM authorization prerequisites.
In Account B:
Create a RAM role named roleB that trusts Account A. For more information, see Create a RAM role for a trusted Alibaba Cloud account.
Grant roleB permissions to access the OSS Bucket that you want to mount.
In the RAM console, go to the details page of roleB and copy its ARN, such as
acs:ram::130xxxxxxxx:role/roleB.
In Account A:
Create a RAM role named roleA for RRSA-based authorization for the application. Set the trusted entity type to OIDC IdP.
Grant roleA permission to assume roleB. For more information, see 1. Enable RRSA in a cluster (statically provisioned volume) or Use a dynamically provisioned ossfs 1.0 volume (dynamically provisioned volume).
roleA does not need an access policy for OSS. However, it requires an access policy that includes the
sts:AssumeRoleAPI action, such as the system policyAliyunSTSAssumeRoleAccess.
Configure the volume in the cluster:
When you create the volume, set the
assumeRoleArnparameter to the ARN ofroleB:Statically provisioned volume (PV): In
volumeAttributes, add:assumeRoleArn: <ARN of roleB>Dynamically provisioned volume (StorageClass): In
parameters, add:assumeRoleArn: <ARN of roleB>
How to use CoreDNS to resolve OSS access endpoints
To point an OSS access endpoint to a domain name inside your cluster, you can configure a specific DNS policy for the pod where ossfs runs. This forces the pod to prioritize the cluster's CoreDNS during mounting.
This feature is supported only in CSI component versions v1.34.2 and later. To upgrade, see Upgrade the CSI component.
Statically provisioned volume (PV): In the
spec.csi.volumeAttributesfield of the PV, add thednsPolicyfield.dnsPolicy: ClusterFirstWithHostNetDynamically provisioned volume (StorageClass): In the
parametersfield of the StorageClass, add thednsPolicyfield.dnsPolicy: ClusterFirstWithHostNet
How to use a specific ARN or ServiceAccount for RRSA-based authorization
When you use RRSA for OSS PV authentication, you may have requirements that the default setup cannot meet, such as using a third-party OIDC IdP or a non-default ServiceAccount.
To address this, you can specify the RAM role name using the roleName configuration item in the PV. The CSI storage plugin then automatically retrieves the default Role ARN and OIDC Provider ARN. For more customized RRSA-based authorization, you can modify the PV configuration as follows.
TheroleArnandoidcProviderArnparameters must be configured together. If you set these parameters, you do not need to configureroleName.
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-oss
spec:
capacity:
storage: 5Gi
accessModes:
- ReadOnlyMany
persistentVolumeReclaimPolicy: Retain
csi:
driver: ossplugin.csi.alibabacloud.com
volumeHandle: pv-oss # Must be the same as the PV name.
volumeAttributes:
bucket: "oss"
url: "oss-cn-hangzhou.aliyuncs.com"
authType: "rrsa"
fuseType: "ossfs2"
oidcProviderArn: "<oidc-provider-arn>"
roleArn: "<role-arn>"
#roleName: "<role-name>" # After you configure roleArn and oidcProviderArn, roleName becomes invalid.
serviceAccountName: "csi-fuse-<service-account-name>" Parameter | Description |
| The ARN of the OIDC IdP. You can obtain it after you create the OIDC IdP. For more information, see Manage an OIDC IdP. |
| The ARN of the RAM role. You can obtain it after you create a RAM role that trusts the OIDC IdP. For an example, see Example of role-based SSO that uses OIDC. |
| Optional. The name of the ServiceAccount that the pod for the ossfs container uses. The name must start with csi-fuse- and you must create the ServiceAccount in advance. If this parameter is not set, CSI uses its default maintained ServiceAccount. |
Scale-out
Do I need to scale out a volume when the actual storage capacity exceeds the volume's configuration?
OSS does not limit the capacity of a bucket or subdirectory, nor does it provide a capacity quota feature. Therefore, the .spec.capacity field of the PV and the .spec.resources.requests.storage field of the PVC are ignored and do not take effect. You need only to ensure that the capacity configuration values of the bound PV and PVC are consistent.
If the actual storage capacity exceeds the configuration, normal use is not affected, and you do not need to scale out the volume.
Usage
How to restart the ossfs 2.0 process
Symptoms
After you modify the authorization information or the ossfs 2.0 version, the running ossfs 2.0 process is not automatically updated.
Cause
After an ossfs 2.0 process starts, you cannot dynamically change its configuration, such as authentication credentials. Modifying the configuration requires restarting both the ossfs 2.0 process (the pod named
csi-fuse-ossfs2-*in theack-csi-fusenamespace) and the corresponding application pods, which causes service interruption. For this reason, CSI does not automatically apply changes to a running ossfs 2.0 process.During normal operations, CSI fully manages the lifecycle of an ossfs 2.0 volume. If you manually terminate the pod that runs ossfs 2.0, CSI cannot trigger the automatic recovery or recreation of the volume.
Solution
Restarting the ossfs 2.0 process requires restarting the application pods that use the corresponding OSS PV. Proceed with caution.
Identify the application pods that use the current FUSE pod.
Identify the
csi-fuse-ossfs2-*pod that you want to change.Replace
<pv-name>with the PV name and<node-name>with the node name.kubectl -n ack-csi-fuse get pod -lcsi.alibabacloud.com/volume-id=<pv-name> -owide | grep <node-name>Identify all pods that are mounting this OSS PV.
Replace
<ns>with the namespace and<pvc-name>with the PVC name.kubectl -n <ns> describe pvc <pvc-name>The expected output contains a Used By field:
Used By: oss-static-94849f647-4**** oss-static-94849f647-6**** oss-static-94849f647-h**** oss-static-94849f647-v**** oss-static-94849f647-x****Find the application pods that are mounted through
csi-fuse-ossfs2-xxxx. These are the pods that are running on the same node ascsi-fuse-ossfs2-xxxx.kubectl -n <ns> get pod -owide | grep cn-beijing.192.168.XX.XXExpected output:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES oss-static-94849f647-4**** 1/1 Running 0 10d 192.168.xx.xx cn-beijing.192.168.xx.xx <none> <none> oss-static-94849f647-6**** 1/1 Running 0 7m36s 192.168.xx.xx cn-beijing.192.168.xx.xx <none> <none>
Restart the application and the ossfs 2.0 process.
Delete all application pods (in the preceding example,
oss-static-94849f647-4****andoss-static-94849f647-6****) at the same time using a command such askubectl scale. When no application pods are using the mount, thecsi-fuse-ossfs2-xxxxpod is automatically reclaimed. After the number of replicas is restored, the volume is remounted with the new PV configuration, and CSI creates a newcsi-fuse-ossfs2-yyyypod.If you cannot ensure that these pods are deleted at the same time (for example, deleting pods managed by a Deployment, StatefulSet, or DaemonSet immediately triggers a restart), or if the pods can tolerate OSS read and write failures, run the following command to find the volumeattachment resource that corresponds to the volume:
kubectl get volumeattachment | grep <pv-name> | grep cn-beijing.192.168.XX.XXExpected output:
csi-bd463c719189f858c2394608da7feb5af8f181704b77a46bbc219b********** ossplugin.csi.alibabacloud.com <pv-name> cn-beijing.192.168.XX.XX true 12mDirectly delete the VolumeAttachment. At this point, read and write operations to OSS from the application pod return a
disconnected error.Then, restart the application pods one by one. The restarted pods will resume read and write operations to OSS through the new
csi-fuse-ossfs2-yyyypod created by CSI.