All Products
Search
Document Center

CDN:Run scripts to purge and prefetch content

Last Updated:Sep 12, 2024

Alibaba Cloud CDN provides scripts that can be used to automatically purge and prefetch content such as files and directories from origin servers in batches. Compared with manual operations, scripts greatly simplify the process. This topic describes how to use a Python script. A Windows operating system is used in the example.

Overview

After you specify a file that contains URLs to be purged or prefetched, the script splits the file based on the number of concurrent purge or prefetch tasks. The URLs are then purged or prefetched in batches. A script automatically detects whether a purge or prefetch task is completed. The next purge or prefetch task does not start until the current one ends. The following items show how this feature works:

  1. Process URLs in batches: If you have 100 URLs in your URL list, and you have set a maximum of 10 URLs per batch, the script divides the URL list into 10 batches, each of which contains 10 URLs. If you set a larger or smaller concurrency value, the size of the batch changes accordingly. For example, if you set that 20 URLs can be processed concurrently, the script divides the 100 URLs into 5 batches, each of which contains 20 URLs.

  2. Run tasks by batch: When you run a script, the script submits purge or prefetch requests in sequence by batch. Tasks in each batch are executed concurrently.

  3. Proceed to the next batch of tasks only after the current batch is completed: After purge or prefetch tasks in a batch are completed, the script continues to execute the tasks in the next batch. This process is automatically performed without manual intervention.

Scenarios

We recommend that you use scripts in the following scenarios:

  • Purge and prefetch operations are performed manually because no developer is available. The cost of operations and maintenance (O&M) is high.

  • The number of URLs to be purged or prefetched is large. Batch tasks will reduce efficiency.

  • Whether the purge and prefetch tasks run as expected must be manually checked, which consumes a large amount of resources and time.

Limits

The Python version in the operating system must be 3.x. You can run the python --version or python3 --version command to check whether the Python version meets the requirements.

Before you begin

  1. Create an AccessKey pair for a Resource Access Management (RAM) user. An Alibaba Cloud account has all permissions on resources. If the AccessKey pair of your Alibaba Cloud account is leaked, your resources are exposed to great risks. We recommend that you use the AccessKey pair of a RAM user. For information about how to obtain the AccessKey pair, see Create an AccessKey pair.

  2. Grant the RAM user the permissions on domain name resources. In this example, the AliyunDomainFullAccess system policy is attached to the RAM user.

    1. Use a system policy.

      • AliyunCDNFullAccess: grants full access to Alibaba Cloud CDN resources.

    2. Use a custom policy.

      For more information about how to create custom policies, see Create custom policies.

  3. Configure the AccessKey pair in environment variables. For more information, see Configure environment variables in Linux, macOS, and Windows.

Step 1: Install dependencies

  1. Run the following command to install Alibaba Cloud CDN SDK for Python. The current version is v20180510.

    pip install aliyun-python-sdk-cdn
  2. Run the following command to install the core library of Alibaba Cloud SDK for Python. The current version is 2.6.0.

    pip install aliyun-python-sdk-core

Step 2: Prepare a URL list file

Create a file that contains a list of URLs to be purged or prefetched, such as urllist.txt. Enter one URL per line. Make sure that each URL starts with http:// or https:// and is in valid format. Sample content:

http://example.com/file1.jpg
http://example.com/file2.jpg
http://example.com/file3.jpg
...
http://example.com/fileN.jpg

Step 3: Create a script

Save the following code as a script and name it Refresh.py. This file name is an example. You can specify a custom name for the script.

Script sample code

#!/usr/bin/env python3
# coding=utf-8
# __author__ = 'aliyun.cdn'
# __date__ = '2021-04-23'

'''Check Package'''
try:
    # Import the required libraries.
    import os, re, sys, getopt, time, json, logging
    from aliyunsdkcore.client import AcsClient
    from aliyunsdkcore.acs_exception.exceptions import ClientException, ServerException
    from aliyunsdkcdn.request.v20180510.RefreshObjectCachesRequest import RefreshObjectCachesRequest
    from aliyunsdkcdn.request.v20180510.PushObjectCacheRequest import PushObjectCacheRequest
    from aliyunsdkcdn.request.v20180510.DescribeRefreshTasksRequest import DescribeRefreshTasksRequest
    from aliyunsdkcdn.request.v20180510.DescribeRefreshQuotaRequest import DescribeRefreshQuotaRequest

# Capture import exceptions.
except ImportError as e:
    sys.exit("[error] Please pip install aliyun-python-sdk-cdn and aliyun-python-sdk-core. Details: {e}")

# Initialize log entries.
logging.basicConfig(level=logging.DEBUG, filename='./RefreshAndPredload.log')

# Define a global variable class to store information such as AccessKey ID, AccessKey secret, and file directory.
class Envariable(object):
    LISTS = []
    REGION = 'cn-zhangzhou'
    AK = None
    SK = None
    FD = None
    CLI = None
    TASK_TYPE = None
    TASK_AREA = None
    TASK_OTYPE = None

    # Set the AccessKey ID.
    @staticmethod
    def set_ak(ak):
        Envariable.AK = ak

    # Obtain the AccessKey ID.
    @staticmethod
    def get_ak():
        return Envariable.AK

    # Set the AccessKey secret.
    @staticmethod
    def set_sk(sk):
        Envariable.SK = sk

    # Obtain the AccessKey secret.
    @staticmethod
    def get_sk():
        return Envariable.SK

    # Set the file directory.
    @staticmethod
    def set_fd(fd):
        Envariable.FD = fd

    # Obtain the file directory.
    @staticmethod
    def get_fd():
        return Envariable.FD

    # Set the type of the task.
    @staticmethod
    def set_task_type(task_type):
        Envariable.TASK_TYPE = task_type

    # Obtain the type of the task.
    @staticmethod
    def get_task_type():
        return Envariable.TASK_TYPE

    # Set the region of the task.
    @staticmethod
    def set_task_area(task_area):
        Envariable.TASK_AREA = task_area

    # Obtain the region of the task.
    @staticmethod
    def get_task_area():
        return Envariable.TASK_AREA

    # Set the object type of the task.
    @staticmethod
    def set_task_otype(task_otype):
        Envariable.TASK_OTYPE = task_otype

    # Obtain the object type of the task.
    @staticmethod
    def get_task_otype():
        return Envariable.TASK_OTYPE

    # Create an AcsClient object.
    @staticmethod
    def set_acs_client():
        Envariable.CLI = AcsClient(Envariable.get_ak(), Envariable.get_sk(), Envariable.REGION)

    # Obtain an AcsClient object.
    @staticmethod
    def get_acs_client():
        return Envariable.CLI


class InitHandler(object):
    def __init__(self, ak, sk, region):
        try:
            self.client = AcsClient(ak, sk, region)
        except Exception:
            logging.info("[error]: initial AcsClient failed")
            exit(1)


class BaseCheck(object):
    def __init__(self):
        self.invalidurl = ''
        self.lines = 0
        self.urllist = Envariable.get_fd()

    # Check the quota.
    def printQuota(self):
        try:
            if Envariable.get_acs_client():
                client = Envariable.get_acs_client()
            else:
                Envariable.set_acs_client()
                client = Envariable.get_acs_client()
            quotas = DescribeRefreshQuotaRequest()
            quotaResp = json.loads(Envariable.get_acs_client().do_action_with_exception(quotas))
        except Exception as e:
            logging.info("\n[error]: initial AcsClient failed\n")
            sys.exit(1)

        if Envariable.TASK_TYPE:
            if Envariable.TASK_TYPE == 'push':
                if self.lines > int(quotaResp['PreloadRemain']):
                    sys.exit("\n[error]:PreloadRemain is not enough {0}".format(quotaResp['PreloadRemain']))
                return True
            if Envariable.TASK_TYPE == 'clear':
                if Envariable.get_task_otype() == 'File' and self.lines > int(quotaResp['UrlRemain']):
                    sys.exit("\n[error]:UrlRemain is not enough {0}".format(quotaResp['UrlRemain']))
                elif Envariable.get_task_otype() == 'Directory' and self.lines > int(quotaResp['DirRemain']):
                    sys.exit("\n[error]:DirRemain is not enough {0}".format(quotaResp['DirRemain']))
                else:
                    return True

    # Verify the URL format.
    def urlFormat(self):
        with open(self.urllist, "r") as f:
            for line in f.readlines():
                self.lines += 1
                if not re.match(r'^((https)|(http))', line):
                    self.invalidurl = line + '\n' + self.invalidurl
            if self.invalidurl != '':
                sys.exit("\n[error]: URL format is illegal \n{0}".format(self.invalidurl))
            return True

# The batch processing class, which divides the URL list into multiple batches based on a specific batch size.
class doTask(object):
    @staticmethod
    def urlencode_pl(inputs_str):
        len_str = len(inputs_str)
        if inputs_str == "" or len_str <= 0:
            return ""
        result_end = ""
        for chs in inputs_str:
            if chs.isalnum() or chs in {":", "/", ".", "-", "_", "*"}:
                result_end += chs
            elif chs == ' ':
                result_end += '+'
            else:
                result_end += f'%{ord(chs):02X}'
        return result_end

    # Process URLs in batches.
    @staticmethod
    def doProd():
        gop = 20 # Define the maximum number of URLs in each batch.
        mins = 1
        maxs = gop
        with open(Envariable.get_fd(), "r") as f:
            for line in f.readlines():
                line = doTask.urlencode_pl(line.strip()) + "\n"
                Envariable.LISTS.append(line)
                if mins >= maxs:
                    yield Envariable.LISTS
                    Envariable.LISTS = []
                    mins = 1
                else:
                    mins += 1
        if Envariable.LISTS:
            yield Envariable.LISTS

    # Execute the purge or prefetch task.
    @staticmethod
    def doRefresh(lists):
        try:
            if Envariable.get_acs_client():
                client = Envariable.get_acs_client()
            else:
                Envariable.set_acs_client()
                client = Envariable.get_acs_client()

            if Envariable.get_task_type() == 'clear':
                taskID = 'RefreshTaskId'
                request = RefreshObjectCachesRequest()
                if Envariable.get_task_otype():
                    request.set_ObjectType(Envariable.get_task_otype())
            elif Envariable.get_task_type() == 'push':
                taskID = 'PushTaskId'
                request = PushObjectCacheRequest()
                if Envariable.get_task_area():
                    request.set_Area(Envariable.get_task_area())

            taskreq = DescribeRefreshTasksRequest()
            request.set_accept_format('json')
            request.set_ObjectPath(lists)
            response = json.loads(client.do_action_with_exception(request))
            print(response)

            timeout = 0
            while True:
                count = 0
                taskreq.set_accept_format('json')
                taskreq.set_TaskId(response[taskID])
                taskresp = json.loads(client.do_action_with_exception(taskreq))
                print(f"[{response[taskID]}] is doing... ...")
                for t in taskresp['Tasks']['CDNTask']:
                    if t['Status'] != 'Complete':
                        count += 1
                if count == 0:
                    logging.info(f"[{response[taskID]}] is finish")
                    break
                elif timeout > 5:
                    logging.info(f"[{response[taskID]}] timeout")
                    break
                else:
                    timeout += 1
                    time.sleep(5)
                    continue
        except Exception as e:
            logging.info(f"\n[error]: {e}")
            sys.exit(1)


class Refresh(object):
    def main(self, argv):
        if len(argv) < 1:
            sys.exit(f"\n[usage]: {sys.argv[0]} -h ")
        try:
            opts, args = getopt.getopt(argv, "hi:k:n:r:t:a:o:")
        except getopt.GetoptError as e:
            sys.exit(f"\n[usage]: {sys.argv[0]} -h ")

        for opt, arg in opts:
            if opt == '-h':
                self.help()
                sys.exit()
            elif opt == '-i':
                Envariable.set_ak(arg)
            elif opt == '-k':
                Envariable.set_sk(arg)
            elif opt == '-r':
                Envariable.set_fd(arg)
            elif opt == '-t':
                Envariable.set_task_type(arg)
            elif opt == '-a':
                Envariable.set_task_area(arg)
            elif opt == '-o':
                Envariable.set_task_otype(arg)
            else:
                sys.exit(f"\n[usage]: {sys.argv[0]} -h ")

        try:
            if not (Envariable.get_ak() and Envariable.get_sk() and Envariable.get_fd() and Envariable.get_task_type()):
                sys.exit("\n[error]: Must be by parameter '-i', '-k', '-r', '-t'\n")
            if Envariable.get_task_type() not in {"push", "clear"}:
                sys.exit("\n[error]: taskType Error, '-t' option in 'push' or 'clear'\n")
            if Envariable.get_task_area() and Envariable.get_task_otype():
                sys.exit("\n[error]: -a and -o cannot exist at same time\n")
            if Envariable.get_task_area():
                if Envariable.get_task_area() not in {"domestic", "overseas"}:
                    sys.exit("\n[error]: Area value Error, '-a' option in 'domestic' or 'overseas'\n")
            if Envariable.get_task_otype():
                if Envariable.get_task_otype() not in {"File", "Directory"}:
                    sys.exit("\n[error]: ObjectType value Error, '-a' options in 'File' or 'Directory'\n")
                if Envariable.get_task_type() == 'push':
                    sys.exit("\n[error]: -t must be clear and 'push' -a use together\n")
        except Exception as e:
            logging.info(f"\n[error]: Parameter {e} error\n")
            sys.exit(1)

        handler = BaseCheck()
        if handler.urlFormat() and handler.printQuota():
            for g in doTask.doProd():
                doTask.doRefresh(''.join(g))
                time.sleep(1)

    def help(self):
        print("\nscript options explain: \
                    \n\t -i <AccessKey>                  The AccessKey ID that is used to log on to Alibaba Cloud. You can view your AccessKey pair in the Alibaba Cloud Management Console. \
                    \n\t -k <AccessKeySecret>            The AccessKey secret that is used to log on to Alibaba Cloud. You can view your AccessKey secret in the Alibaba Cloud Management Console. \
                    \n\t -r <filename>                   The file path and file name. After the script is executed, the script reads the URLs in the file. Each line contains only one URL. Encode URLs that contain special characters. The encoded URLs must start with http or https. \
                    \n\t -t <taskType>                   The type of the task. Set the value to clear to create a purge task. Set the value to push to create a prefetch task. \
                    \n\t -a [String,<domestic|overseas>] Optional. The regions in which the content will be prefetched. The default value is overseas. \
                    \n\t    domestic                     Chinese mainland only. \
                    \n\t    overseas                     Global (excluding the Chinese mainland). \
                    \n\t -o [String,<File|Directory>]    Optional. The type of the resource to be purged. \
                    \n\t    File                         File (default value). \
                    \n\t    Directory                    Directory.")


if __name__ == '__main__':
    fun = Refresh()
    fun.main(sys.argv[1:])

Code execution process

  1. Divide the file into batches by the number specified by gop (100).

  2. Process URLs of each batch sequentially.

  3. Proceed to the next batch after the current batch is completed.

Note

You can change the size of each batch by configuring the gop variable.

View the help information

After you create a script, you can run the python $script -h in a command line interface (CLI), such as Command Prompt, PowerShell, or Terminal, to query and display the command line help information of the Python script.

Note

In most cases, $script is a variable, which specifies the file name of a Python script. For example, if the file name of your script is Refresh.py, you can run the python Refresh.py -h command.

Run the following command in a CLI, such as Command Prompt, PowerShell, or Terminal. The script displays help information about the usage and parameters of the script.

python Refresh.py -h

After you run the command, the following content is returned:

script options explain:
              -i <AccessKey>               //The AccessKey ID that is used to log on to Alibaba Cloud. You can view your AccessKey pair in the Alibaba Cloud Management Console.
              -i <AccessKey>               //The AccessKey secret that is used to log on to Alibaba Cloud. You can view your AccessKey pair in the Alibaba Cloud Management Console.
              -r <filename>                    //The file path and file name. After the script is executed, the script reads the URLs in the file. Each line contains only one URL. Encode URLs that contain special characters. The encoded URLs must start with http or https.
              -t <taskType>                 //The type of the task. Set the value to clear to create a purge task. Set the value to push to create a prefetch task.
              -a [String,<domestic|overseas>   //Optional. regions in which the content will be prefetched. The default value is overseas.            
                   domestic                   //Chinese mainland only.             
                   overseas                    //Global (excluding the Chinese mainland).             
              -o [String,<File|Directory>]    Optional. The type of the resource to be purged.             
                   File                            //File (default value).             
                   Directory                   //Directory.

Step 4: Run the script

Run the following command in a CLI, such as Command Prompt, PowerShell, or Terminal:

python Refresh.py -i <YourAccessKey> -k <YourAccessKeySecret> -r <PathToUrlFile> -t <TaskType>
    Note

    <YourAccessKey>: the AccessKey ID of your Alibaba Cloud account.

    <YourAccessKeySecret>: the AccessKey secret of your Alibaba Cloud account.

    <PathToUrlFile>: the path to the file that contains the list of URLs. Example: urllist.txt.

    <TaskType>: the task type. Valid values: clear (purge) and push (prefetch).

Sample commands

  • Assume that the AccessKey ID is yourAccessKey, the AccessKey secret is yourAccessKeySecret, the URL list file is urllist.txt, the URL list file and the Refresh.py script are in the same directory, and the task type is clear (purge). Run the following command in a CLI, such as Command Prompt, PowerShell, or Terminal:

    python Refresh.py -i yourAccessKey -k yourAccessKeySecret -r urllist.txt -t clear
  • If the URL list file is in a different directory, such as D:\example\filename\urllist.txt, run the following command in a CLI, such as Command Prompt, PowerShell, or Terminal:

    python Refresh.py -i yourAccessKey -k yourAccessKeySecret -r D:\example\filename\urllist.txt -t clear

Sample output:

python Refresh.py -i yourAccessKey -k yourAccessKeySecret -r urllist.txt -t clear
{'RequestId': 'C1686DCA-F3B5-5575-ADD1-05F96617D770', 'RefreshTaskId': '18392588710'}
[18392588710] is doing... ...
{'RequestId': '5BEAD371-9D82-5DA5-BE60-58EC2C915E82', 'RefreshTaskId': '18392588804'}
[18392588804] is doing... ...
{'RequestId': 'BD0B3D22-66CF-5B1D-A995-D912A5EA8E2F', 'RefreshTaskId': '18392588804'}
[18392588804] is doing... ...
[18392588804] is doing... ...
[18392588804] is doing... ...