FAQ and solutions for mounting and accessing CPFS - Cloud Parallel File Storage

This topic describes common issues and solutions for when you cannot mount a Cloud Parallel File Storage (CPFS) file system on a Linux operating system using a CPFS-POSIX or CPFS-NFS client.

Overview

POSIX client mount targets

What do I do if the "Insufficient inventory" error is returned when I create a POSIX client mount target?
What do I do if the "Insufficient number of IP addresses in the vSwitch" error is returned when I create a POSIX client mount target?
Why am I unable to create a POSIX client mount target?

CPFS-POSIX client mounting

Can I only use a POSIX client to mount and access a CPFS file system?
How many CPFS file systems can I mount on an ECS instance?
What do I do if the "unsupported OS for 'X86_64' architecture" error is returned when I mount a CPFS file system?
What do I do if the "make sure kernel-devel version is consistent with kernel" error is returned when I mount a CPFS file system?
What do I do if the "ssh: connect to host A port 22: Connection timed out" error is returned when I mount a CPFS file system?
What do I do if the "not active on:" error is returned when I mount a CPFS file system?
What do I do if the "Command failed" error is returned when I mount a CPFS file system?
What do I do if the "cpfs.sh is running already" error is returned when I mount a CPFS file system?
What do I do if the "connect to host B port 22: Connection timed out" error is returned when I mount a CPFS file system?
What do I do if the "[FATAL] B: Could not connect to B via ssh" error is returned when I mount a CPFS file system?
What do I do if a YUM repository error occurs when I run the cpfs add command to mount a CPFS file system?
What do I do if a CPFS client fails to start?
How do I purge the residual configuration information of an unmounted ECS instance?

CPFS-NFS client mounting

What do I do if the "[ FATAL ] No GPFS admin node specified. specify an admin node using 'spectrumscale node add -a'." error is returned when I mount a CPFS file system?
What do I do if an error occurs when I mount a CPFS file system on a cloud computer?

CPFS scale-out

What do I do if the "Insufficient inventory" error is returned during a scale-out operation?
Is historical data automatically balanced after a CPFS file system is scaled out?

What do I do if the "Insufficient inventory" error is returned when I create a POSIX client mount target?

Problem:
When you create a POSIX client mount target for a file system in the Cloud Parallel File Storage (CPFS) console, the error shown in the following figure is returned.
Cause:
When you create a POSIX mount target, CPFS automatically creates three pay-as-you-go Elastic Compute Service (ECS) instances (ecs.g*.large) in your Alibaba Cloud account. These instances are used to manage the CPFS-POSIX client cluster. To create a POSIX client mount target, your Alibaba Cloud account must be in good standing and able to purchase ECS instances.
Solution:
Log on to the ECS console and go to the Custom Launch to view the inventory of the required ECS instance type. This helps prevent mount target creation failures that are caused by insufficient inventory of the specified instance type.

What do I do if the "Insufficient number of IP addresses in the vSwitch" error is returned when I create a POSIX client mount target?

Problem:
When you create a POSIX client mount target for a file system in the CPFS console, the error message 'The number of specified vSwitch IP addresses is insufficient' is returned.
Cause:
The storage nodes of a CPFS file system use IP addresses that are allocated from the vSwitch specified by the POSIX client mount target. Each storage node requires one IP address. The storage nodes of a CPFS file system can require up to 160 IP addresses.
Solution:
Log on to the virtual private cloud (VPC) console to query the number of available IP addresses in the vSwitch of the destination VPC. Make sure that the vSwitch has a sufficient number of available IP addresses.

Why am I unable to create a POSIX client mount target?

If you cannot find the mount target that you just created in the CPFS console, check whether your Alibaba Cloud account has an overdue payment. If your Alibaba Cloud account has an overdue payment, you cannot create a mount target for the CPFS file system. You must add funds to your account before you can create the mount target.

How many CPFS file systems can I mount on an ECS instance?

You can mount a maximum of one CPFS file system on an ECS instance.

What do I do if the "unsupported OS for 'X86_64' architecture" error is returned when I mount a CPFS file system?

If this error message is returned when you mount a CPFS file system, the operating system of the compute node is not supported. You must change the operating system of the compute node. For more information about the operating systems that are supported by CPFS clients, see Limits.

[ FATAL ] You cannot add cpfs-client-001 node because it has an unsupported OS for 'X86_64' architecture.

What do I do if the "make sure kernel-devel version is consistent with kernel" error is returned when I mount a CPFS file system?

If this error message is returned when you mount a CPFS file system, the kernel-devel and kernel-headers packages are not installed on the ECS instance that you want to mount, or the versions of the installed packages are not compatible.

No package kernel-devel-3.10.0-957.21.3.el7.x86_64 available.

Error: Nothing to do

please make sure kernel-devel version is consistent with kernel

Run the following command to check whether the packages are installed on the ECS instance.

rpm -qa | grep kernel-devel-`uname -r`

If an empty result is returned, the packages are not correctly installed on the ECS instance. You must reinstall the packages on the ECS instance. For more information, see Step 1: Prepare the environment.

What do I do if the "ssh: connect to host A port 22: Connection timed out" error is returned when I mount a CPFS file system?

If this error message is returned when you mount a CPFS file system, the network connection between the ECS instance that you want to mount and the file system is broken.

====> start check ssh
try ssh root@a.b.c.d by /root/.ssh/id_rsa.pub

ssh: connect to host a.b.c.d port 22: Connection timed out

Identify the cause and resolve the issue based on the following information:

Possible cause	Solution
The network connection between the ECS instance (a.b.c.d) and the POSIX client control plane node (qr-001) is broken.	Check the network connectivity and run the mount command again.
The ECS instance (a.b.c.d) is not added to the qr-sg security group.	Check the security group configuration and try to mount the file system again. For more information, see Configure a security group.
The ECS instance (a.b.c.d) and the CPFS mount target are not in the same VPC.	Select an ECS instance that is in the same VPC as the mount target.
The IP address of the ECS instance (a.b.c.d) does not exist.	Check the instance status of the ECS instance.

What do I do if the "not active on:<hostname>" error is returned when I mount a CPFS file system?

If this error message is returned when you mount a CPFS file system, the CPFS file system cannot be started on the ECS instance that you want to mount.

[ WARN ] GPFS is not active on: hostname. Consult the install toolkit logs for possible errors 
during install. The GPFS service can also be started manually by running GPFS command 
'mmstartup -N Node[,Node...]'

[ FATAL ] GPFS NOT ACTIVE

Identify the cause and resolve the issue based on the following information:

The security group of the ECS instance that you want to mount is incorrectly configured, or the ECS instance is not added to the qr-sg security group. For more information, see Configure a security group.
A CPFS file system requires the ECS instance to have more than 4 GB of available memory. If the ECS instance has insufficient memory, an error is reported. Confirm the available memory of the ECS instance.

What do I do if the "Command failed" error is returned when I mount a CPFS file system?

If this error message is returned when you mount a CPFS file system, the available memory of the ECS instance is less than 4 GB. You must upgrade the memory of the ECS instance and run the cpfs add ip command again to mount the file system.

[ WARN ] GPFS is not active on: hostname. Consult the install toolkit logs for possible errors 
during install. The GPFS service can also be started manually by running GPFS command 
'mmstartup -N Node[,Node...]'

[ FATAL ] GPFS NOT ACTIVE

What do I do if the "cpfs.sh is running already" error is returned when I mount a CPFS file system?

If this error message is returned when you mount a CPFS file system, a mount or unmount task is already in progress. Wait for a period of time and then try to mount the file system again.

cpfs.sh is running already, pid: xyz

What do I do if the "connect to host B port 22: Connection timed out" error is returned when I mount a CPFS file system?

If this error message is returned when you mount a CPFS file system on ECS instance A, an abnormal ECS instance B exists in the current CPFS file system cluster.

# cpfs add A

connect to host B port 22: Connection timed out

B hostname is invalid

Failed to add node.

Troubleshoot and fix the abnormal ECS instance B based on the following instructions. Then, try the mount operation again.

On the control plane node qr-001, run mmgetstate -a to check whether the status of ECS instance B is normal. The `active` state indicates a normal status.

If the status of instance B is normal, submit a ticket to the CPFS team for further troubleshooting.
If the status of instance B is abnormal, determine whether to continue using the instance.
- If you want to continue using the instance, submit a ticket to the CPFS team to resolve the instance status.
- If you no longer want to use the instance, run the mmdelnode -N <id> --force command to purge the node information.
```
mmdelnode -N iZuf61mhwoc9flkufs0**** --force

  Do you want to continue? (yes/no) yes

mmdelnode: [W] Could not cleanup the following unreached nodes:

iZuf61mhwoc9flkufs0****

mmdelnode: Command successfully completed

mmdelnode: Propagating the cluster configuration data to all affected nodes. This is an 
asynchronous process.
```
  After the information is purged, delete the host information of instance B from the /etc/hosts file. In this example, iZuf61mhwoc9flkufs0**** is the ID of the destination ECS instance.

What do I do if the "[FATAL] B: Could not connect to B via ssh" error is returned when I mount a CPFS file system?

If this error message is returned when you mount a CPFS file system on ECS instance A, an ECS instance B exists in the current CPFS file system cluster for which the installation was interrupted and residual configurations remain.

[ FATAL ] ssh: connect to host B port 22: Connection timed out

[ FATAL ] B: Could not connect to B via ssh.

Versions earlier than 2.2.0

Delete the residual configuration information of ECS instance B from the /usr/lpp/mmfs/5.0.5.0/installer/configuration/clusterdefinition.txt file.

[node4]
fqdn = B

os = rhel7
arch = x86_64

ip_address = 192.168.6.37
is_admin_node = False

is_object_store = False

is_nfs = False
is_smb = False

is_hdfs = False
is_protocol_node = False
is_nsd_server = False
access_ips =
is_quorum_node = False

is_manager_node = False
is_gui_server = False
is_ems_node = False

is_callhome_node = False

is_broker_node = False

is_node_offline = False

is_node_reachable = True

is_node_excluded = False

is_mestor_node = False

Version 2.2.0 and later

Delete the residual configuration information of ECS instance B from the /usr/lpp/mmfs/5.1.2.0/ansible-toolkit/ansible/ibm-spectrum-scale-install-infra/vars/scale_clusterdefinition.json file.

   {
      "fqdn": "iZuf6hn0blj1g377w4xxxxZ",
      "os": "rhel7",
      "arch": "x86_64",
      "ip_address": "172.19.0.100",
      "is_admin_node": false,
      "is_object_store": false,
      "is_nfs": false,
      "is_smb": false,
      "is_hdfs": false,
      "is_protocol_node": false,
      "is_nsd_server": false,
      "is_quorum_node": false,
      "is_manager_node": false,
      "is_gui_server": false,
      "is_ems_node": false,
      "is_callhome_node": false,
      "is_broker_node": false,
      "is_node_offline": false,
      "is_node_reachable": true,
      "is_node_excluded": false,
      "is_mestor_node": false,
      "scale_daemon_nodename": "iZuf6hn0blj1g377w4xxxxZ"
    }

What do I do if the `[ FATAL ] No GPFS admin node specified. specify an admin node using 'spectrumscale node add <node name or IP> -a'.` error is returned when I mount a CPFS file system?

If this error message is returned when you mount a CPFS file system, you are running the command on the wrong node.

[ FATAL ] No GPFS admin node specified. specify an admin node using 'spectrumscale node add <node name or IP> -a'.

Confirm that you are running the CPFS command on the qr-001 node. qr-001

What do I do if the `Failed to resolve domain: file-system-id.region.cpfs.aliyuncs.com` error is returned when I mount a CPFS file system?

Cause
The file-system-id.region.cpfs.aliyuncs.com parameter in the mount command was not replaced with the mount address of the export directory.
Solution
Log on to the NAS console. In the Actions column of the destination CPFS file system, click Manage. On the page that appears, click the Protocol Service tab. In the Actions column, click Export Directory. In the Export Directory panel, obtain the mount address. Then, replace the file-system-id.region.cpfs.aliyuncs.com parameter in the mount command with the mount address that you obtained. Run the mount command again to mount the file system.

What do I do if an error occurs when I mount a CPFS file system on a cloud computer?

If an error message such as 'Cannot mount' or 'Mount failed' is displayed on the interface when you mount a CPFS file system on a cloud computer, check whether the network between your cloud computer and CPFS is connected. To do so, perform the following steps:

Run the following command to query the IP address of the DNS server.
The following sample command uses cpfs-009e40ab9c6476e6-001a3e8bf745b****.cn-hangzhou.cpfs.aliyuncs.com as the mount target domain name. You must replace it with your actual mount target domain name.
```
dig  -t txt cpfs-009e40ab9c6476e6-001a3e8bf745b****.cn-hangzhou.cpfs.aliyuncs.com
```
Run the ping command to ping the IP address that you obtained in the previous step to check for network connectivity.
If the network is disconnected, check your network configurations.

What do I do if a YUM repository error occurs when I run the `cpfs add` command to mount a CPFS file system?

If this error message is returned when you mount a CPFS file system, the YUM repository configuration for CentOS 8 is invalid.

Errors during downloading metadata for repository 'appstream':
Status code: 404 for http://mirrors.cloud.aliyuncs.com/centos/8/AppStream/x86_64/os/repodata/repomd.xml (IP: 100.100.XX.XX)
Error: Failed to download metadata for repo 'appstream': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried

The CentOS 8 operating system has reached its end of life (EOL), and the Linux community no longer maintains this operating system version. You must switch the YUM repository.

What do I do if a CPFS client fails to start?

Problem:
1. When you run mmgetstate -a on the control plane node qr-001 to check the ECS instance status, the instance status is `down`.
2. When you run the /usr/lpp/mmfs/bin/mmstartup command, the following information is returned.
```
… mmfslinux.ko kernel extension does not exist. Use mmbuildgpl
command to create the needed kernel extension for your kernel …
```
Cause:
The kernel of the current ECS instance was upgraded.

Solution:

Run the /usr/lpp/mmfs/bin/mmbuildgpl command to rebuild the kernel extension.

Sample response:

mmbuildgpl: Building GPL (5.1.X.X) module begins at Fri Dec 3 16:05:33 CST 2021.
--------------------------------------------------------
Verifying Kernel Header...
 kernel version = 41800305 (418000305012001, 4.18.0-305.12.1.el8_4.x86_64, 4.18.0-305.12.1)
 module include dir = /lib/modules/4.18.0-305.12.1.el8_4.x86_64/build/include
 module build dir  = /lib/modules/4.18.0-305.12.1.el8_4.x86_64/build
 kernel source dir = /usr/src/linux-4.18.0-305.12.1.el8_4.x86_64/include
 Found valid kernel header file under /usr/src/kernels/4.18.0-305.12.1.el8_4.x86_64/include
Getting Kernel Cipher mode...
 Will use skcipher routines
Verifying Compiler...
 make is present at /bin/make
 cpp is present at /bin/cpp
 gcc is present at /bin/gcc
 g++ is present at /bin/g++
 ld is present at /bin/ld
Verifying libelf devel package...
 Verifying elfutils-libelf-devel is installed ...
  Command: /bin/rpm -q elfutils-libelf-devel
  The required package elfutils-libelf-devel is installed
Verifying Additional System Headers...
 Verifying kernel-headers is installed ...
  Command: /bin/rpm -q kernel-headers
  The required package kernel-headers is installed
make World ...
make InstallImages ...
--------------------------------------------------------
mmbuildgpl: Building GPL module completed successfully at Fri Dec 3 16:05:54 CST 2021.
--------------------------------------------------------

Run the /usr/lpp/mmfs/bin/mmstartup command to restart the ECS instance.
Run the /usr/lpp/mmfs/bin/mmmount all command to mount the file system again.

How do I purge the residual configuration information of an unmounted ECS instance?

First, confirm that the CPFS file system is unmounted from the ECS instance. For more information, see Unmount a file system. Then, run the mmdelnode -N <id> --force command to purge the residual configuration information of the unmounted ECS instance. Example:

mmdelnode -N iZuf61mhwoc9flkufs0**** --force
  Do you want to continue? (yes/no) yes
mmdelnode: [W] Could not cleanup the following unreached nodes:

iZuf61mhwoc9flkufs0****

mmdelnode: Command successfully completed

mmdelnode: Propagating the cluster configuration data to all affected nodes. This is an 
asynchronous process.

In this example, iZuf61mhwoc9flkufs0**** is the ID of the destination ECS instance.

What do I do if the "Insufficient inventory" error is returned during a scale-out operation?

A CPFS file system scale-out operation depends on the inventory of CPFS storage nodes and the number of available IP addresses in the vSwitch of the mount target. Go to the VPC console to view the current number of available IP addresses. A CPFS file system requires a maximum of 164 available IP addresses. Make sure that the vSwitch has a sufficient number of available IP addresses.

Is historical data automatically balanced after a CPFS file system is scaled out?

After a CPFS file system is scaled out, data balancing is not performed by default. This means that historical data is still stored on the original storage nodes and is not automatically migrated to the new storage nodes.

The data balancing process consumes network and disk bandwidth of storage nodes, which causes frontend I/O performance to decrease. In addition, the larger the volume of historical data in the file system, the longer the data balancing process takes. Because most services do not require automatic data balancing after a scale-out operation, a CPFS file system does not automatically balance data after it is scaled out.

Can I only use a POSIX client to mount and access a CPFS file system?

CPFS supports mounting and accessing file systems using CPFS-POSIX clients or CPFS-NFS clients. Mutual access between CPFS-POSIX clients and CPFS-NFS clients is also supported. For example, if you create a file and modify its content using a CPFS-POSIX client, the modified content is visible from a CPFS-NFS client, and vice versa. For more information, see Client description.