This topic describes common issues and solutions for when you cannot mount a Cloud Parallel File Storage (CPFS) file system on a Linux operating system using a CPFS-POSIX or CPFS-NFS client.
Overview
POSIX client mount targets
CPFS-POSIX client mounting
Can I only use a POSIX client to mount and access a CPFS file system?
What do I do if the "not active on:" error is returned when I mount a CPFS file system?
What do I do if the "Command failed" error is returned when I mount a CPFS file system?
What do I do if the "cpfs.sh is running already" error is returned when I mount a CPFS file system?
How do I purge the residual configuration information of an unmounted ECS instance?
CPFS-NFS client mounting
CPFS scale-out
What do I do if the "Insufficient inventory" error is returned when I create a POSIX client mount target?
Problem:
When you create a POSIX client mount target for a file system in the Cloud Parallel File Storage (CPFS) console, the error shown in the following figure is returned.
Cause:
When you create a POSIX mount target, CPFS automatically creates three pay-as-you-go Elastic Compute Service (ECS) instances (ecs.g*.large) in your Alibaba Cloud account. These instances are used to manage the CPFS-POSIX client cluster. To create a POSIX client mount target, your Alibaba Cloud account must be in good standing and able to purchase ECS instances.
Solution:
Log on to the ECS console and go to the Custom Launch to view the inventory of the required ECS instance type. This helps prevent mount target creation failures that are caused by insufficient inventory of the specified instance type.
What do I do if the "Insufficient number of IP addresses in the vSwitch" error is returned when I create a POSIX client mount target?
Problem:
When you create a POSIX client mount target for a file system in the CPFS console, the error message 'The number of specified vSwitch IP addresses is insufficient' is returned.
Cause:
The storage nodes of a CPFS file system use IP addresses that are allocated from the vSwitch specified by the POSIX client mount target. Each storage node requires one IP address. The storage nodes of a CPFS file system can require up to 160 IP addresses.
Solution:
Log on to the virtual private cloud (VPC) console to query the number of available IP addresses in the vSwitch of the destination VPC. Make sure that the vSwitch has a sufficient number of available IP addresses.
Why am I unable to create a POSIX client mount target?
If you cannot find the mount target that you just created in the CPFS console, check whether your Alibaba Cloud account has an overdue payment. If your Alibaba Cloud account has an overdue payment, you cannot create a mount target for the CPFS file system. You must add funds to your account before you can create the mount target.
How many CPFS file systems can I mount on an ECS instance?
You can mount a maximum of one CPFS file system on an ECS instance.
What do I do if the "unsupported OS for 'X86_64' architecture" error is returned when I mount a CPFS file system?
If this error message is returned when you mount a CPFS file system, the operating system of the compute node is not supported. You must change the operating system of the compute node. For more information about the operating systems that are supported by CPFS clients, see Limits.
[ FATAL ] You cannot add cpfs-client-001 node because it has an unsupported OS for 'X86_64' architecture.What do I do if the "make sure kernel-devel version is consistent with kernel" error is returned when I mount a CPFS file system?
If this error message is returned when you mount a CPFS file system, the kernel-devel and kernel-headers packages are not installed on the ECS instance that you want to mount, or the versions of the installed packages are not compatible.
No package kernel-devel-3.10.0-957.21.3.el7.x86_64 available.
Error: Nothing to do
please make sure kernel-devel version is consistent with kernelRun the following command to check whether the packages are installed on the ECS instance.
rpm -qa | grep kernel-devel-`uname -r`If an empty result is returned, the packages are not correctly installed on the ECS instance. You must reinstall the packages on the ECS instance. For more information, see Step 1: Prepare the environment.
What do I do if the "ssh: connect to host A port 22: Connection timed out" error is returned when I mount a CPFS file system?
If this error message is returned when you mount a CPFS file system, the network connection between the ECS instance that you want to mount and the file system is broken.
====> start check ssh
try ssh root@a.b.c.d by /root/.ssh/id_rsa.pub
ssh: connect to host a.b.c.d port 22: Connection timed outIdentify the cause and resolve the issue based on the following information:
Possible cause | Solution |
The network connection between the ECS instance (a.b.c.d) and the POSIX client control plane node (qr-001) is broken. | Check the network connectivity and run the mount command again. |
The ECS instance (a.b.c.d) is not added to the qr-sg security group. | Check the security group configuration and try to mount the file system again. For more information, see Configure a security group. |
The ECS instance (a.b.c.d) and the CPFS mount target are not in the same VPC. | Select an ECS instance that is in the same VPC as the mount target. |
The IP address of the ECS instance (a.b.c.d) does not exist. | Check the instance status of the ECS instance. |
What do I do if the "not active on:<hostname>" error is returned when I mount a CPFS file system?
If this error message is returned when you mount a CPFS file system, the CPFS file system cannot be started on the ECS instance that you want to mount.
[ WARN ] GPFS is not active on: hostname. Consult the install toolkit logs for possible errors
during install. The GPFS service can also be started manually by running GPFS command
'mmstartup -N Node[,Node...]'
[ FATAL ] GPFS NOT ACTIVEIdentify the cause and resolve the issue based on the following information:
The security group of the ECS instance that you want to mount is incorrectly configured, or the ECS instance is not added to the qr-sg security group. For more information, see Configure a security group.
A CPFS file system requires the ECS instance to have more than 4 GB of available memory. If the ECS instance has insufficient memory, an error is reported. Confirm the available memory of the ECS instance.
What do I do if the "Command failed" error is returned when I mount a CPFS file system?
If this error message is returned when you mount a CPFS file system, the available memory of the ECS instance is less than 4 GB. You must upgrade the memory of the ECS instance and run the cpfs add ip command again to mount the file system.
[ WARN ] GPFS is not active on: hostname. Consult the install toolkit logs for possible errors
during install. The GPFS service can also be started manually by running GPFS command
'mmstartup -N Node[,Node...]'
[ FATAL ] GPFS NOT ACTIVEWhat do I do if the "cpfs.sh is running already" error is returned when I mount a CPFS file system?
If this error message is returned when you mount a CPFS file system, a mount or unmount task is already in progress. Wait for a period of time and then try to mount the file system again.
cpfs.sh is running already, pid: xyzWhat do I do if the "connect to host B port 22: Connection timed out" error is returned when I mount a CPFS file system?
If this error message is returned when you mount a CPFS file system on ECS instance A, an abnormal ECS instance B exists in the current CPFS file system cluster.
# cpfs add A
connect to host B port 22: Connection timed out
B hostname is invalid
Failed to add node.Troubleshoot and fix the abnormal ECS instance B based on the following instructions. Then, try the mount operation again.
On the control plane node qr-001, run mmgetstate -a to check whether the status of ECS instance B is normal. The `active` state indicates a normal status.
If the status of instance B is normal, submit a ticket to the CPFS team for further troubleshooting.
If the status of instance B is abnormal, determine whether to continue using the instance.
If you want to continue using the instance, submit a ticket to the CPFS team to resolve the instance status.
If you no longer want to use the instance, run the
mmdelnode -N <id> --forcecommand to purge the node information.mmdelnode -N iZuf61mhwoc9flkufs0**** --force Do you want to continue? (yes/no) yes mmdelnode: [W] Could not cleanup the following unreached nodes: iZuf61mhwoc9flkufs0**** mmdelnode: Command successfully completed mmdelnode: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process.After the information is purged, delete the host information of instance B from the /etc/hosts file. In this example,
iZuf61mhwoc9flkufs0****is the ID of the destination ECS instance.
What do I do if the "[FATAL] B: Could not connect to B via ssh" error is returned when I mount a CPFS file system?
If this error message is returned when you mount a CPFS file system on ECS instance A, an ECS instance B exists in the current CPFS file system cluster for which the installation was interrupted and residual configurations remain.
[ FATAL ] ssh: connect to host B port 22: Connection timed out
[ FATAL ] B: Could not connect to B via ssh.Versions earlier than 2.2.0
Delete the residual configuration information of ECS instance B from the /usr/lpp/mmfs/5.0.5.0/installer/configuration/clusterdefinition.txt file.
[node4] fqdn = B os = rhel7 arch = x86_64 ip_address = 192.168.6.37 is_admin_node = False is_object_store = False is_nfs = False is_smb = False is_hdfs = False is_protocol_node = False is_nsd_server = False access_ips = is_quorum_node = False is_manager_node = False is_gui_server = False is_ems_node = False is_callhome_node = False is_broker_node = False is_node_offline = False is_node_reachable = True is_node_excluded = False is_mestor_node = FalseVersion 2.2.0 and later
Delete the residual configuration information of ECS instance B from the /usr/lpp/mmfs/5.1.2.0/ansible-toolkit/ansible/ibm-spectrum-scale-install-infra/vars/scale_clusterdefinition.json file.
{ "fqdn": "iZuf6hn0blj1g377w4xxxxZ", "os": "rhel7", "arch": "x86_64", "ip_address": "172.19.0.100", "is_admin_node": false, "is_object_store": false, "is_nfs": false, "is_smb": false, "is_hdfs": false, "is_protocol_node": false, "is_nsd_server": false, "is_quorum_node": false, "is_manager_node": false, "is_gui_server": false, "is_ems_node": false, "is_callhome_node": false, "is_broker_node": false, "is_node_offline": false, "is_node_reachable": true, "is_node_excluded": false, "is_mestor_node": false, "scale_daemon_nodename": "iZuf6hn0blj1g377w4xxxxZ" }
What do I do if the [ FATAL ] No GPFS admin node specified. specify an admin node using 'spectrumscale node add <node name or IP> -a'. error is returned when I mount a CPFS file system?
If this error message is returned when you mount a CPFS file system, you are running the command on the wrong node.
[ FATAL ] No GPFS admin node specified. specify an admin node using 'spectrumscale node add <node name or IP> -a'.Confirm that you are running the CPFS command on the qr-001 node.
What do I do if the Failed to resolve domain: file-system-id.region.cpfs.aliyuncs.com error is returned when I mount a CPFS file system?
Cause
The
file-system-id.region.cpfs.aliyuncs.comparameter in the mount command was not replaced with the mount address of the export directory.Solution
Log on to the NAS console. In the Actions column of the destination CPFS file system, click Manage. On the page that appears, click the Protocol Service tab. In the Actions column, click Export Directory. In the Export Directory panel, obtain the mount address. Then, replace the
file-system-id.region.cpfs.aliyuncs.comparameter in the mount command with the mount address that you obtained. Run the mount command again to mount the file system.
What do I do if an error occurs when I mount a CPFS file system on a cloud computer?
If an error message such as 'Cannot mount' or 'Mount failed' is displayed on the interface when you mount a CPFS file system on a cloud computer, check whether the network between your cloud computer and CPFS is connected. To do so, perform the following steps:
Run the following command to query the IP address of the DNS server.
The following sample command uses
cpfs-009e40ab9c6476e6-001a3e8bf745b****.cn-hangzhou.cpfs.aliyuncs.comas the mount target domain name. You must replace it with your actual mount target domain name.dig -t txt cpfs-009e40ab9c6476e6-001a3e8bf745b****.cn-hangzhou.cpfs.aliyuncs.comRun the
pingcommand to ping the IP address that you obtained in the previous step to check for network connectivity.If the network is disconnected, check your network configurations.
What do I do if a YUM repository error occurs when I run the cpfs add command to mount a CPFS file system?
If this error message is returned when you mount a CPFS file system, the YUM repository configuration for CentOS 8 is invalid.
Errors during downloading metadata for repository 'appstream':
Status code: 404 for http://mirrors.cloud.aliyuncs.com/centos/8/AppStream/x86_64/os/repodata/repomd.xml (IP: 100.100.XX.XX)
Error: Failed to download metadata for repo 'appstream': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were triedThe CentOS 8 operating system has reached its end of life (EOL), and the Linux community no longer maintains this operating system version. You must switch the YUM repository.
What do I do if a CPFS client fails to start?
Problem:
When you run
mmgetstate -aon the control plane node qr-001 to check the ECS instance status, the instance status is `down`.When you run the
/usr/lpp/mmfs/bin/mmstartupcommand, the following information is returned.… mmfslinux.ko kernel extension does not exist. Use mmbuildgpl command to create the needed kernel extension for your kernel …
Cause:
The kernel of the current ECS instance was upgraded.
Solution:
Run the
/usr/lpp/mmfs/bin/mmbuildgplcommand to rebuild the kernel extension.Sample response:
mmbuildgpl: Building GPL (5.1.X.X) module begins at Fri Dec 3 16:05:33 CST 2021. -------------------------------------------------------- Verifying Kernel Header... kernel version = 41800305 (418000305012001, 4.18.0-305.12.1.el8_4.x86_64, 4.18.0-305.12.1) module include dir = /lib/modules/4.18.0-305.12.1.el8_4.x86_64/build/include module build dir = /lib/modules/4.18.0-305.12.1.el8_4.x86_64/build kernel source dir = /usr/src/linux-4.18.0-305.12.1.el8_4.x86_64/include Found valid kernel header file under /usr/src/kernels/4.18.0-305.12.1.el8_4.x86_64/include Getting Kernel Cipher mode... Will use skcipher routines Verifying Compiler... make is present at /bin/make cpp is present at /bin/cpp gcc is present at /bin/gcc g++ is present at /bin/g++ ld is present at /bin/ld Verifying libelf devel package... Verifying elfutils-libelf-devel is installed ... Command: /bin/rpm -q elfutils-libelf-devel The required package elfutils-libelf-devel is installed Verifying Additional System Headers... Verifying kernel-headers is installed ... Command: /bin/rpm -q kernel-headers The required package kernel-headers is installed make World ... make InstallImages ... -------------------------------------------------------- mmbuildgpl: Building GPL module completed successfully at Fri Dec 3 16:05:54 CST 2021. --------------------------------------------------------Run the
/usr/lpp/mmfs/bin/mmstartupcommand to restart the ECS instance.Run the
/usr/lpp/mmfs/bin/mmmount allcommand to mount the file system again.
How do I purge the residual configuration information of an unmounted ECS instance?
First, confirm that the CPFS file system is unmounted from the ECS instance. For more information, see Unmount a file system. Then, run the mmdelnode -N <id> --force command to purge the residual configuration information of the unmounted ECS instance. Example:
mmdelnode -N iZuf61mhwoc9flkufs0**** --force
Do you want to continue? (yes/no) yes
mmdelnode: [W] Could not cleanup the following unreached nodes:
iZuf61mhwoc9flkufs0****
mmdelnode: Command successfully completed
mmdelnode: Propagating the cluster configuration data to all affected nodes. This is an
asynchronous process.In this example, iZuf61mhwoc9flkufs0**** is the ID of the destination ECS instance.
What do I do if the "Insufficient inventory" error is returned during a scale-out operation?
A CPFS file system scale-out operation depends on the inventory of CPFS storage nodes and the number of available IP addresses in the vSwitch of the mount target. Go to the VPC console to view the current number of available IP addresses. A CPFS file system requires a maximum of 164 available IP addresses. Make sure that the vSwitch has a sufficient number of available IP addresses.
Is historical data automatically balanced after a CPFS file system is scaled out?
After a CPFS file system is scaled out, data balancing is not performed by default. This means that historical data is still stored on the original storage nodes and is not automatically migrated to the new storage nodes.
The data balancing process consumes network and disk bandwidth of storage nodes, which causes frontend I/O performance to decrease. In addition, the larger the volume of historical data in the file system, the longer the data balancing process takes. Because most services do not require automatic data balancing after a scale-out operation, a CPFS file system does not automatically balance data after it is scaled out.
Can I only use a POSIX client to mount and access a CPFS file system?
CPFS supports mounting and accessing file systems using CPFS-POSIX clients or CPFS-NFS clients. Mutual access between CPFS-POSIX clients and CPFS-NFS clients is also supported. For example, if you create a file and modify its content using a CPFS-POSIX client, the modified content is visible from a CPFS-NFS client, and vice versa. For more information, see Client description.