All Products
Search
Document Center

CloudOps Orchestration Service:ACS-CS-DedicatedMigration

Last Updated:Dec 24, 2024

Template name

ACS-CS-DedicatedMigration

Execute Now

Template description

Hibernates master nodes and backs up etcd data in a Container Service for Kubernetes (ACK) dedicated cluster and uploads the backup data to an Object Storage Service (OSS) bucket.

Template type

Automated

Owner

Alibaba Cloud

Input parameters

Parameter

Description

Data type

Required

Default value

Limit

targets

The destination instances.

Json

Yes

BucketName

The name of the OSS bucket to which you want to upload the snapshot.

String

Yes

OSSEndpoint

The endpoint of the OSS bucket to which you want to upload the snapshot.

String

Yes

ClusterID

The cluster ID.

String

Yes

regionId

The region ID.

String

No

{{ ACS::RegionId }}

workingDir

The directory in which the command is run in the Elastic Compute Service (ECS) instance.

String

No

/root

rateControl

The rate control settings.

Json

No

{'Mode': 'Concurrency', 'MaxErrors': 0, 'Concurrency': 5}

action

The configuration method.

String

No

rollback

OOSAssumeRole

The RAM role that is assumed by CloudOps Orchestration Service (OOS).

String

No

""

Output parameters

Parameter

Description

Data type

sleepOrWakeupControlPlaneOutputs

List

etcdCheckoutOutputs

List

findLeaderOutputs

List

readSignOutputs

List

Permission policy that is required to execute the template

{
    "Version": "1",
    "Statement": [
        {
            "Action": [
                "ecs:DescribeInstances",
                "ecs:DescribeInvocationResults",
                "ecs:DescribeInvocations",
                "ecs:RunCommand"
            ],
            "Resource": "*",
            "Effect": "Allow"
        }
    ]
}

References

ACS-CS-DedicatedMigration

Template content

FormatVersion: OOS-2019-06-01
Description:
  en: Sleep control plane, make etcd snapshot and upload it to oss bucket   
  name-en: ACS-CS-DedicatedMigration   
  categories:
    - others
Parameters:
  regionId:
    Type: String
    Label:
      en: RegionId      
    AssociationProperty: RegionId
    Default: '{{ ACS::RegionId }}'
  workingDir:
    Label:
      en: WorkingDir     
    Type: String
    Default: /root
  rateControl:
    Label:
      en: RateControl     
    Type: Json
    AssociationProperty: RateControl
    Default:
      Mode: Concurrency
      MaxErrors: 0
      Concurrency: 5
  targets:
    Label:
      en: TargetInstance      
    Type: Json
    AssociationProperty: Targets
    AssociationPropertyMetadata:
      ResourceType: 'ALIYUN::ECS::Instance'
      RegionId: regionId
  action:
    Type: String
    Label:
      en: Action       
    Default: rollback
    AllowedValues:
      - migrate
      - rollback
  OOSAssumeRole:
    Label:
      en: OOSAssumeRole      
    Type: String
    Default: ''
  BucketName:
    Label:
      en: BucketName      
    Type: String
  OSSEndpoint:
    Label:
      en: OSSEndpoint       
    Type: String
  ClusterID:
    Label:
      en: ClusterID       
    Type: String
RamRole: '{{ OOSAssumeRole }}'
Tasks:
  - Name: getInstance
    Description:
      en: Views the ECS instances      
    Action: ACS::SelectTargets
    Properties:
      ResourceType: ALIYUN::ECS::Instance
      RegionId: '{{ regionId }}'
      Filters:
        - '{{ targets }}'
    Outputs:
      instanceIds:
        Type: List
        ValueSelector: Instances.Instance[].InstanceId
  - Action: ACS::ECS::RunCommand
    OnError: rollback
    Description:
      en: Execute cloud assistant command      
    Properties:
      regionId: '{{ regionId }}'
      commandContent: |-
        #!/bin/bash
        set -e
        if [ "{{action}}" = "migrate" ]; then
            mkdir -p /etc/kubernetes/manifests.backup
            if_move=$(ls /etc/kubernetes/manifests/ | wc -l)
            if [ "$if_move" != "0" ]; then
                mv -f /etc/kubernetes/manifests/* /etc/kubernetes/manifests.backup/
            fi
            is_ok=0
            set +e
            ps -o cmd -p `pidof kubelet` | grep 'container-runtime-endpoint=/var/run/containerd/containerd.sock'
            if [ $?  -ne 0 ]; then
                 
                for ((integer = 0; integer < 150; integer++)); do
                    count=$(docker ps | grep kube-apiserver | wc -l)
                    if [ "$count" = "0" ]; then
                        is_ok=1
                        break
                    else
                        sleep 2
                    fi
                done
            else
                 
                for ((integer = 0; integer < 150; integer++)); do
                    count=$(crictl --runtime-endpoint /var/run/containerd/containerd.sock  ps |grep kube-apiserver | wc -l)
                    if [ "$count" = "0" ]; then
                        is_ok=1
                        break
                    else
                        sleep 2
                    fi
                done
            fi
            set -e
            if [ "$is_ok" == "0" ]; then
                mv -f /etc/kubernetes/manifests.backup/* /etc/kubernetes/manifests/
                echo "Rollback finish"
                exit 1
            else
                echo "The control plane is sleeping now."
            fi
        elif [ "{{action}}" = "rollback" ]; then
            mkdir -p /etc/kubernetes/manifests.backup
            if_move=$(ls /etc/kubernetes/manifests.backup/ | wc -l)
            if [ "$if_move" != "0" ]; then
                mv -f /etc/kubernetes/manifests.backup/* /etc/kubernetes/manifests/
            fi
            echo "The control plane is wakeup now."
        else
            echo "action must be migrate or rollback"
            exit 1
        fi
      instanceId: '{{ ACS::TaskLoopItem }}'
      commandType: RunShellScript
      workingDir: '{{ workingDir }}'
      timeout: 240
    Loop:
      Items: '{{ getInstance.instanceIds }}'
      RateControl: '{{ rateControl }}'
      Outputs:
        commandOutputs:
          AggregateType: Fn::ListJoin
          AggregateField: commandOutput
    Outputs:
      commandOutput:
        ValueSelector: invocationOutput
        Type: String
    Name: sleepOrWakeupControlPlane
  - Action: ACS::ECS::RunCommand
    OnError: rollback
    Description:
      en: Execute cloud assistant command       
    Properties:
      regionId: '{{ regionId }}'
      commandContent: |-
        #!/bin/bash
        set -e
        if [ "{{action}}" = "rollback" ]; then
            exit 0
        fi
         
        IP=$(/sbin/ifconfig eth0 | grep inet | grep -v 127.0.0.1 | grep -v inet6 | awk '{print $2}' | tr -d "addr:")
        ENDPOINT="https://$IP:2379"
        echo "ENDPOINT:  "$ENDPOINT
        set +e
         
        ETCDCTL_API=3 /usr/bin/etcdctl --cacert=/var/lib/etcd/cert/ca.pem --cert=/var/lib/etcd/cert/etcd-server.pem --key=/var/lib/etcd/cert/etcd-server-key.pem --endpoints=$ENDPOINT endpoint status | grep true
        if [ $?  -ne 0 ]; then
             
            exit 0
        fi
        set -e
        yum install curl wget jq -y
        if [ !  -f "/tmp/ossutil64" ]; then
             
            wget -c -t 10 -O /tmp/ossutil64 https://oos-public-{{regionId}}.oss-{{regionId}}-internal.aliyuncs.com/x64/ossutil64
            if [ $?  -ne 0 ]; then
                 
                exit 1
            fi
            chmod +x /tmp/ossutil64
        fi
        if [ !  -f "/tmp/modify-prefix-v2" ]; then
             
            wget -c -t 10 -O /tmp/modify-prefix-v2 https://aliacs-k8s-{{regionId}}.oss-{{regionId}}-internal.aliyuncs.com/public/pkg/etcd/modify-prefix-v2
            if [ $?  -ne 0 ]; then
                 
                exit 1
            fi
            chmod +x /tmp/modify-prefix-v2
        fi
        if !  [[ {{ClusterID}} =~ ^c.* ]];then
        	 
            exit 1
        fi
        echo "clusterID: {{ClusterID}}"
         
        TIMESTAMP=$(date "+%Y%m%d%H%M%S")
        mkdir -p /tmp/etcdsnap
        set -x
        SNAP_NAME=etcd_{{ClusterID}}_$TIMESTAMP
         
        DestPrefix="/"{{ClusterID}}
        ETCDCTL_API=3 /usr/bin/etcdctl --cacert=/var/lib/etcd/cert/ca.pem --cert=/var/lib/etcd/cert/etcd-server.pem --key=/var/lib/etcd/cert/etcd-server-key.pem --endpoints=$ENDPOINT snapshot save /tmp/etcdsnap/$SNAP_NAME
        set +e
        /tmp/modify-prefix-v2 change-prefix --db=/tmp/etcdsnap/$SNAP_NAME  --dest-prefix=$DestPrefix 
        if [ $?  -ne 0 ]; then
             
            exit 1
        fi
        set -e
         
        ROLE=$(curl -s 100.100.100.200/latest/meta-data/ram/security-credentials/)
        ROLERES=$(curl -s 100.100.100.200/latest/meta-data/ram/security-credentials/$ROLE)
        AccessKeyId=$(echo $ROLERES | jq .AccessKeyId|sed 's/\"//g')
        AccessKeySecret=$(echo $ROLERES | jq .AccessKeySecret|sed 's/\"//g')
        SecurityToken=$(echo $ROLERES | jq .SecurityToken|sed 's/\"//g')
        # put object to oss
        echo "begin put object to oss"
        set +e
        /tmp/ossutil64 -t $SecurityToken -i $AccessKeyId -k $AccessKeySecret -e {{OSSEndpoint}} cp /tmp/etcdsnap/$SNAP_NAME oss://{{BucketName}}/$SNAP_NAME
        if [ $?  -ne 0 ]; then
             
            exit 1
        fi
        set -e
        # sign
        oss_url=$(/tmp/ossutil64 -t $SecurityToken -i $AccessKeyId -k $AccessKeySecret -e {{OSSEndpoint}} sign --timeout 2400 oss://{{BucketName}}/$SNAP_NAME | grep -v "elapsed" | tr -d '\n')
        set +x
        sakey=$(cat /etc/kubernetes/pki/sa.key | base64 -w0)
        sapub=$(cat /etc/kubernetes/pki/sa.pub | base64 -w0)
        frontcrt=$(cat /etc/kubernetes/pki/front-proxy-ca.crt | base64 -w0)
        frontkey=$(cat /etc/kubernetes/pki/front-proxy-ca.key | base64 -w0)
        echo "{\"sakey\":\"$sakey\",\"sapub\":\"$sapub\",\"frontcrt\":\"$frontcrt\",\"frontkey\":\"$frontkey\",\"oss_url\":\"$oss_url\"}" >/tmp/etcdsnap/sign
      instanceId: '{{ ACS::TaskLoopItem }}'
      commandType: RunShellScript
      workingDir: '{{ workingDir }}'
      timeout: 600
    Loop:
      Items: '{{ getInstance.instanceIds }}'
      RateControl: '{{ rateControl }}'
      Outputs:
        commandOutputs:
          AggregateType: Fn::ListJoin
          AggregateField: commandOutput
    Outputs:
      commandOutput:
        ValueSelector: invocationOutput
        Type: String
    Name: etcdCheckout
  - Action: 'ACS::ECS::RunCommand'
    OnError: rollback
    Description:
      En: Execute cloud assistant command      
    Properties:
      regionId: '{{ regionId }}'
      commandContent: |-
        #!/bin/bash
        if [ "{{action}}" = "rollback" ]; then
            exit 0
        fi
        if [ -e  /tmp/etcdsnap/sign ]; then
            curl --retry 10 -sSL 100.100.100.200/latest/meta-data/instance-id
        fi
      instanceId: '{{ ACS::TaskLoopItem }}'
      commandType: RunShellScript
      workingDir: '{{ workingDir }}'
      timeout: 60
    Loop:
      Items: '{{ getInstance.instanceIds }}'
      RateControl: '{{ rateControl }}'
      Outputs:
        commandOutputs:
          AggregateType: 'Fn::ListJoin'
          AggregateField: commandOutput
    Outputs:
      commandOutput:
        ValueSelector: invocationOutput
        Type: String
    Name: findLeader
  - Action: 'ACS::ECS::RunCommand'
    OnError: rollback
    OnSuccess: ACS::END
    Description:
      en: Execute cloud assistant command      
    Properties:
      regionId: '{{ regionId }}'
      commandContent: |-
        #!/bin/bash
        if [ "{{action}}" = "rollback" ]; then
            exit 0
        fi
        if [ -e  /tmp/etcdsnap/sign ]; then
            cat /tmp/etcdsnap/sign
            rm -rf /tmp/etcdsnap/sign
        fi
      instanceId: '{{ ACS::TaskLoopItem }}'
      commandType: RunShellScript
      workingDir: '{{ workingDir }}'
      timeout: 60
    Loop:
      Items:
        'Fn::Intersection':
          - '{{ getInstance.instanceIds }}'
          - '{{ findLeader.commandOutputs }}'
      RateControl: '{{ rateControl }}'
      Outputs:
        commandOutputs:
          AggregateType: Fn::ListJoin
          AggregateField: commandOutput
    Outputs:
      commandOutput:
        ValueSelector: invocationOutput
        Type: String
    Name: readSign
  - Action: ACS::ECS::RunCommand
    Description:
      en: Execute cloud assistant command       
    Properties:
      regionId: '{{ regionId }}'
      commandContent: |-
        #!/bin/bash
        set -e
        mkdir -p /etc/kubernetes/manifests.backup
        if_move=$(ls /etc/kubernetes/manifests.backup/ | wc -l)
        if [ "$if_move" != "0" ]; then
            mv -f /etc/kubernetes/manifests.backup/* /etc/kubernetes/manifests/
        fi
        echo "The control plane is wakeup now."
      instanceId: '{{ ACS::TaskLoopItem }}'
      commandType: RunShellScript
      workingDir: '{{ workingDir }}'
      timeout: 240
    Loop:
      Items: '{{ getInstance.instanceIds }}'
      RateControl: '{{ rateControl }}'
      Outputs:
        commandOutputs:
          AggregateType: Fn::ListJoin
          AggregateField: commandOutput
    Outputs:
      commandOutput:
        ValueSelector: invocationOutput
        Type: String
    Name: rollback
Outputs:
  sleepOrWakeupControlPlaneOutputs:
    Type: List
    Value: '{{ sleepOrWakeupControlPlane.commandOutputs }}'
  etcdCheckoutOutputs:
    Type: List
    Value: '{{ etcdCheckout.commandOutputs }}'
  findLeaderOutputs:
    Type: List
    Value: '{{ findLeader.commandOutputs }}'
  readSignOutputs:
    Type: List
    Value: '{{ readSign.commandOutputs }}'
Metadata:
  ALIYUN::OOS::Interface:
    ParameterGroups:
      - Parameters:
          - ClusterID
          - action
          - BucketName
          - OSSEndpoint
          - workingDir
        Label:
          default:             
            en: Configure Parameters
      - Parameters:
          - regionId
          - targets
        Label:
          default:           
            en: Select ECS Instance
      - Parameters:
          - rateControl
          - OOSAssumeRole
        Label:
          default:            
            en: Control Options