All Products
Search
Document Center

CDN:Use a script to refresh and prefetch content

Last Updated:Aug 26, 2025

Alibaba Cloud CDN provides a script that helps you run refresh or prefetch tasks in batches. You can use the script to quickly refresh and prefetch files or directories. This automates the otherwise tedious process of submitting batch tasks manually. This topic describes how to use the Python script and uses the Windows operating system as an example.

Features

After you specify a file that contains a list of URLs to refresh or prefetch, the script splits the URLs into batches based on the specified number of concurrent tasks. After a batch starts, the script automatically checks its status. The script starts the next batch only after the current batch is complete. The logic is as follows:

  1. Process in batches: Assume your URL list contains 100 URLs and you set the maximum number of URLs per batch to 10. The script splits the URL list into 10 mini-batches, with each batch containing 10 URLs. The batch size adjusts based on the concurrency value. For example, if you set the concurrency to 20, the script splits the 100 URLs into 5 batches, with each batch containing 20 URLs.

  2. Run tasks in batches: When the script starts, it sequentially submits refresh or prefetch requests for each batch. The tasks in each batch run concurrently.

  3. Wait for a batch to complete before starting the next batch: After the refresh or prefetch tasks in one batch are complete, the script proceeds to the next batch. This process is automatic and requires no manual intervention.

Scenarios

You can use scripts in the following scenarios:

  • Refresh and prefetch tasks are submitted manually due to a lack of developer resources, which results in high operations and maintenance (O&M) costs.

  • The number of URLs to refresh or prefetch is large, and manually submitting them in batches is inefficient.

  • You must manually or programmatically check whether refresh and prefetch tasks are completed, which is time-consuming and resource-intensive.

Limits

Make sure that your operating system has Python 3.x installed. You can run the python --version or python3 --version command to check your Python version.

Prerequisites

  1. Because an Alibaba Cloud account has full permissions on your resources, leaked AccessKey pairs pose a high security risk. We recommend that you create and use an AccessKey pair for a Resource Access Management (RAM) user instead. For more information, see Create an AccessKey pair.

  2. Grant the RAM user the permissions required to manage domain name resources. In this example, the AliyunDomainFullAccess system policy is used.

    1. Use a system policy.

      • AliyunCDNFullAccess: Grants full access to CDN resources.

    2. Use a custom policy.

      For more information about how to create custom policies, see Create custom policies.

Step 1: Install dependencies

Run the following command to install the Alibaba Cloud CDN software development kit (SDK) for Python. This example uses version v20180510.

pip install alibabacloud_cdn20180510

Step 2: Prepare a URL file

Create a file that contains a list of URLs to refresh or prefetch, such as urllist.txt. Enter one URL per line. Make sure that each URL starts with http:// or https:// and is in a valid URL format. The following code provides an example:

http://example.com/file1.jpg
http://example.com/file2.jpg
http://example.com/file3.jpg
...
http://example.com/fileN.jpg

Step 3: Create a script

Save the following code as a script and name it Refresh.py. You can use a custom name for the script. The file name is used as an example.

Sample script

#!/usr/bin/env python3
# coding=utf-8
# __author__ = 'aliyun.cdn'
# __date__ = '2025-08-15'

# SDK installation command: pip install alibabacloud_cdn20180510

'''Check Package'''
# Import the required libraries.
import re, sys, getopt, time, logging, os

try:
    from alibabacloud_cdn20180510.client import Client as Cdn20180510Client
    from alibabacloud_credentials.models import Config as CreConfig
    from alibabacloud_credentials.client import Client as CredentialClient
    from alibabacloud_tea_openapi.models import Config
    from alibabacloud_cdn20180510 import models as cdn_20180510_models
    from alibabacloud_tea_util import models as util_models

# Catch import exceptions.
except ImportError as e:
    sys.exit(f"[error] Please pip install alibabacloud_cdn20180510. Details: {e}")

# Initialize logging.
logging.basicConfig(level=logging.DEBUG, filename='./RefreshAndPredload.log')

# Define a global variable class to store information such as AccessKey ID, AccessKey secret, and file directory.
class Envariable(object):
    LISTS = []
    # For Endpoints, see https://api.aliyun.com/product/Cdn
    ENDPOINT = 'cdn.aliyuncs.com'
    AK = None
    SK = None
    FD = None
    CLI = None
    TASK_TYPE = None
    TASK_AREA = None
    TASK_OTYPE = None

    # Set the AccessKey ID.
    @staticmethod
    def set_ak():
        Envariable.AK = os.environ.get('ALIBABA_CLOUD_ACCESS_KEY_ID')

    # Get the AccessKey ID.
    @staticmethod
    def get_ak():
        return Envariable.AK

    # Set the AccessKey secret.
    @staticmethod
    def set_sk():
        Envariable.SK = os.environ.get('ALIBABA_CLOUD_ACCESS_KEY_SECRET')

    # Get the AccessKey secret.
    @staticmethod
    def get_sk():
        return Envariable.SK

    # Set the file directory.
    @staticmethod
    def set_fd(fd):
        Envariable.FD = fd

    # Get the file directory.
    @staticmethod
    def get_fd():
        return Envariable.FD

    # Set the task type.
    @staticmethod
    def set_task_type(task_type):
        Envariable.TASK_TYPE = task_type

    # Get the task type.
    @staticmethod
    def get_task_type():
        return Envariable.TASK_TYPE
    
    # Set the task area.
    @staticmethod
    def set_task_area(task_area):
        Envariable.TASK_AREA = task_area

    # Get the task area.
    @staticmethod
    def get_task_area():
        return Envariable.TASK_AREA

    # Set the task object type.
    @staticmethod
    def set_task_otype(task_otype):
        Envariable.TASK_OTYPE = task_otype

    # Get the task object type.
    @staticmethod
    def get_task_otype():
        return Envariable.TASK_OTYPE

    # Create a new client.
    @staticmethod
    def set_acs_client():
        try:
            # Use the AccessKey pair to initialize the Credentials client.
            credentialsConfig = CreConfig(
                # Credential type.
                type='access_key',
                # Set to the AccessKey ID.
                access_key_id=Envariable.get_ak(),
                # Set to the AccessKey secret.
                access_key_secret=Envariable.get_sk(),
            )
            credentialClient = CredentialClient(credentialsConfig)

            cdnConfig = Config(credential=credentialClient)
            # Configure the service endpoint.
            cdnConfig.endpoint = Envariable.ENDPOINT
            # Initialize the CDN client.
            Envariable.CLI = Cdn20180510Client(cdnConfig)
        except Exception as e:
            logging.error(f"Failed to create client: {e}")
            raise

    # Get the client.
    @staticmethod
    def get_acs_client():
        return Envariable.CLI


# Module-level initializer function.
def initialize_credentials_and_client():
    """Initializes the AccessKey pair and client when the module is loaded."""
    try:
        # Initialize the AccessKey pair from environment variables.
        Envariable.set_ak()
        Envariable.set_sk()
        
        # Check whether the AccessKey pair is obtained.
        if not Envariable.get_ak() or not Envariable.get_sk():
            logging.warning("AK or SK not found in environment variables")
            return False
            
        # Initialize the client.
        Envariable.set_acs_client()
        logging.info("Credentials and client initialized successfully")
        return True
    except Exception as e:
        logging.error(f"Failed to initialize credentials and client: {e}")
        return False


# Run initialization when the module is loaded.
_initialization_success = initialize_credentials_and_client()





class BaseCheck(object):
    def __init__(self):
        self.invalidurl = ''
        self.lines = 0
        self.urllist = Envariable.get_fd()

    # Check the quota.
    def printQuota(self):
        try:
            client = Envariable.get_acs_client()
            if not client:
                raise Exception("CDN client not initialized")
            
            # Use the SDK to make the call.
            request = cdn_20180510_models.DescribeRefreshQuotaRequest()
            runtime = util_models.RuntimeOptions()
            response = client.describe_refresh_quota_with_options(request, runtime)
            quotaResp = response.body.to_map()
        except Exception as e:
            logging.error(f"\n[error]: initial Cdn20180510Client failed: {e}\n")
            sys.exit(1)

        if Envariable.TASK_TYPE:
            if Envariable.TASK_TYPE == 'push':
                if self.lines > int(quotaResp['PreloadRemain']):
                    sys.exit("\n[error]:PreloadRemain is not enough {0}".format(quotaResp['PreloadRemain']))
                return True
            if Envariable.TASK_TYPE == 'clear':
                if Envariable.get_task_otype() == 'File' and self.lines > int(quotaResp['UrlRemain']):
                    sys.exit("\n[error]:UrlRemain is not enough {0}".format(quotaResp['UrlRemain']))
                elif Envariable.get_task_otype() == 'Directory' and self.lines > int(quotaResp['DirRemain']):
                    sys.exit("\n[error]:DirRemain is not enough {0}".format(quotaResp['DirRemain']))
                else:
                    return True

    # Verify the URL format.
    def urlFormat(self):
        try:
            with open(self.urllist, "r") as f:
                for line in f.readlines():
                    self.lines += 1
                    if not re.match(r'^((https)|(http))', line):
                        self.invalidurl = line + '\n' + self.invalidurl
                if self.invalidurl != '':
                    sys.exit("\n[error]: URL format is illegal \n{0}".format(self.invalidurl))
                return True
        except FileNotFoundError:
            sys.exit(f"\n[error]: File not found: {self.urllist}\n")
        except Exception as e:
            sys.exit(f"\n[error]: Failed to read file {self.urllist}: {e}\n")

# Batch processing class that splits the URL list into multiple batches of a specified size.
class doTask(object):
    @staticmethod
    def urlencode_pl(inputs_str):
        len_str = len(inputs_str)
        if inputs_str == "" or len_str <= 0:
            return ""
        result_end = ""
        for chs in inputs_str:
            if chs.isalnum() or chs in {":", "/", ".", "-", "_", "*"}:
                result_end += chs
            elif chs == ' ':
                result_end += '+'
            else:
                result_end += f'%{ord(chs):02X}'
        return result_end

    # Process URLs in batches.
    @staticmethod
    def doProd():
        gop = 20  # Defines the maximum number of URLs per batch.
        mins = 1
        maxs = gop
        current_batch = []  # Use a local variable instead of a global variable.
        
        try:
            with open(Envariable.get_fd(), "r") as f:
                for line in f.readlines():
                    line = doTask.urlencode_pl(line.strip()) + "\n"
                    current_batch.append(line)
                    if mins >= maxs:
                        yield current_batch
                        current_batch = []
                        mins = 1
                    else:
                        mins += 1
            if current_batch:
                yield current_batch
        except FileNotFoundError:
            sys.exit(f"\n[error]: File not found: {Envariable.get_fd()}\n")
        except Exception as e:
            sys.exit(f"\n[error]: Failed to read file {Envariable.get_fd()}: {e}\n")

    # Run the refresh or prefetch task.
    @staticmethod
    def doRefresh(lists):
        try:
            client = Envariable.get_acs_client()
            if not client:
                raise Exception("CDN client not initialized")

            runtime = util_models.RuntimeOptions()
            taskID = None
            response_data = None

            if Envariable.get_task_type() == 'clear':
                taskID = 'RefreshTaskId'
                request = cdn_20180510_models.RefreshObjectCachesRequest()
                if Envariable.get_task_otype():
                    request.object_type = Envariable.get_task_otype()
                request.object_path = lists
                response = client.refresh_object_caches_with_options(request, runtime)
                response_data = response.body.to_map()
            elif Envariable.get_task_type() == 'push':
                taskID = 'PushTaskId'
                request = cdn_20180510_models.PushObjectCacheRequest()
                if Envariable.get_task_area():
                    request.area = Envariable.get_task_area()
                request.object_path = lists
                response = client.push_object_cache_with_options(request, runtime)
                response_data = response.body.to_map()

            if response_data and taskID:
                print(response_data)

                timeout = 0
                while True:
                    count = 0
                    # Use the SDK to query the task status.
                    taskreq = cdn_20180510_models.DescribeRefreshTasksRequest()
                    taskreq.task_id = response_data[taskID]
                    taskresp = client.describe_refresh_tasks_with_options(taskreq, runtime)
                    taskresp_data = taskresp.body.to_map()
                    print(f"[{response_data[taskID]}] is doing... ...")
                    
                    for t in taskresp_data['Tasks']['CDNTask']:
                        if t['Status'] != 'Complete':
                            count += 1
                    if count == 0:
                        logging.info(f"[{response_data[taskID]}] is finish")
                        break
                    elif timeout > 5:  # Wait for a maximum of 50 seconds (5 × 10 seconds).
                        logging.info(f"[{response_data[taskID]}] timeout after 50 seconds")
                        break
                    else:
                        timeout += 1
                        time.sleep(10)  # Check the status every 10 seconds.
                        continue
        except Exception as e:
            logging.error(f"\n[error]: {e}")
            sys.exit(1)


class Refresh(object):
    def main(self, argv):
        if len(argv) < 1:
            sys.exit(f"\n[usage]: {sys.argv[0]} -h ")
        try:
            opts, args = getopt.getopt(argv, "hr:t:a:o:")
        except getopt.GetoptError as e:
            sys.exit(f"\n[usage]: {sys.argv[0]} -h ")

        for opt, arg in opts:
            if opt == '-h':
                self.help()
                sys.exit()
            elif opt == '-r':
                Envariable.set_fd(arg)
            elif opt == '-t':
                Envariable.set_task_type(arg)
            elif opt == '-a':
                Envariable.set_task_area(arg)
            elif opt == '-o':
                Envariable.set_task_otype(arg)
            else:
                sys.exit(f"\n[usage]: {sys.argv[0]} -h ")
        
        # Check the initialization status only when it is not a help command.
        if not _initialization_success:
            sys.exit("\n[error]: Failed to initialize credentials and client. Please check environment variables.\n")

        try:
            if not (Envariable.get_ak() and Envariable.get_sk() and Envariable.get_fd() and Envariable.get_task_type()):
                sys.exit("\n[error]: Must set environment variables ALIBABA_CLOUD_ACCESS_KEY_ID and ALIBABA_CLOUD_ACCESS_KEY_SECRET, and parameters '-r', '-t'\n")
            if Envariable.get_task_type() not in {"push", "clear"}:
                sys.exit("\n[error]: taskType Error, '-t' option in 'push' or 'clear'\n")
            if Envariable.get_task_area() and Envariable.get_task_otype():
                sys.exit("\n[error]: -a and -o cannot exist at same time\n")
            if Envariable.get_task_area():
                if Envariable.get_task_area() not in {"domestic", "overseas"}:
                    sys.exit("\n[error]: Area value Error, '-a' option in 'domestic' or 'overseas'\n")
            if Envariable.get_task_otype():
                if Envariable.get_task_otype() not in {"File", "Directory"}:
                    sys.exit("\n[error]: ObjectType value Error, '-a' options in 'File' or 'Directory'\n")
                if Envariable.get_task_type() == 'push':
                    sys.exit("\n[error]: -t must be clear and 'push' -a use together\n")
        except Exception as e:
            logging.error(f"\n[error]: Parameter {e} error\n")
            sys.exit(1)

        handler = BaseCheck()
        if handler.urlFormat() and handler.printQuota():
            for g in doTask.doProd():
                doTask.doRefresh(''.join(g))
                time.sleep(1)

    def help(self):
        print("\nscript options explain: \
                    \n\t -r <filename>                   The file path and file name. After the script runs, it reads the URLs from the file. Each line must contain one URL. URLs with special characters must be URL-encoded. Each URL must start with http or https. \
                    \n\t -t <taskType>                   The task type. `clear`: refresh. `push`: prefetch. \
                    \n\t -a [String,<domestic|overseas>] Optional. The prefetch scope. If you do not set this parameter, resources are prefetched globally.\
                    \n\t    domestic                     The Chinese mainland only. \
                    \n\t    overseas                     Global (excluding the Chinese mainland). \
                    \n\t -o [String,<File|Directory>]    Optional. The type of content to refresh. \
                    \n\t    File                         File (default). \
                    \n\t    Directory                    Directory.")


if __name__ == '__main__':
    fun = Refresh()
    fun.main(sys.argv[1:])

Code execution flow

  1. Splits the file into batches based on the value of the gop parameter (100).

  2. Processes the URLs of each batch sequentially.

  3. Proceeds to the next batch after the current batch is complete.

Note

You can change the size of each batch by changing the value of the gop variable.

View help information

After you create the script, you can run python $script -h in a command-line interface (CLI), such as Command Prompt, PowerShell, or Terminal, to view the help information for the script.

Note

$script is a placeholder for the file name of the Python script. For example, if your script is named Refresh.py, you can run python Refresh.py -h.

Run the following command in a CLI, such as Command Prompt, PowerShell, or Terminal. The command displays help information about the script and its parameters.

python Refresh.py -h

The following output is returned:

script options explain:
              -r <filename>                    //The file path and file name. After the script runs, it reads the URLs from the file. Each line must contain one URL. URLs with special characters must be URL-encoded. Each URL must start with http or https.
              -t <taskType>                    //The task type. `clear`: refresh. `push`: prefetch.
              -a [String,<domestic|overseas>   //Optional. The prefetch scope. If you do not set this parameter, resources are prefetched globally.            
                   domestic                    //The Chinese mainland only.             
                   overseas                    //Global (excluding the Chinese mainland).             
              -o [String,<File|Directory>]     //Optional. The type of content to refresh.             
                   File                        //File (default).             
                   Directory                   //Directory.

Step 4: Set the Alibaba Cloud AccessKey in an environment variable

When you use an AccessKey to call an API, we recommend that you do not include the AccessKey in your code. The script provided in Step 3 reads the AccessKey ID and AccessKey secret from environment variables. For more information about how to set your Alibaba Cloud AccessKey before you run the code, see Configure environment variables in Linux, macOS, and Windows.

Important

In Linux and macOS, an Alibaba Cloud AccessKey configured with the export command is valid only for the current session. The AccessKey becomes invalid after the session ends.

To make the AccessKey permanent, add the export command to the startup configuration file of your operating system.

Step 5: Run the script

Run the following command in a CLI, such as Command Prompt, PowerShell, or Terminal:

python Refresh.py -r <PathToUrlFile> -t <TaskType>
Note

<PathToUrlFile>: The path to the file that contains the list of URLs, such as urllist.txt.

<TaskType>: The type of the task. Valid values: clear (refresh) and push (prefetch).

Resource refresh command

  • If the URL file is named urllist.txt, the file and the Refresh.py script are in the same directory, and the task type is clear (refresh), run the following command in a CLI, such as Command Prompt, PowerShell, or Terminal.

    python Refresh.py -r urllist.txt -t clear
  • If the file is in a different directory, such as D:\example\filename\urllist.txt, run the following command in a CLI, such as Command Prompt, PowerShell, or Terminal.

    python Refresh.py -r D:\example\filename\urllist.txt -t clear

The following is a sample output:

python Refresh.py -r urllist.txt -t clear
{'RequestId': 'C1686DCA-F3B5-5575-ADD1-05F96617D770', 'RefreshTaskId': '18392588710'}
[18392588710] is doing... ...

If the error message Failed to initialize credentials and client. Please check environment variables. is returned, set the Alibaba Cloud AccessKey in an environment variable as described in Step 4, and then run the command in the same terminal window.

Resource prefetch command

  • If the URL file is named urllist.txt, the file and the Refresh.py script are in the same directory, and the task type is push (prefetch), run the following command in a CLI, such as Command Prompt, PowerShell, or Terminal.

    python Refresh.py -r urllist.txt -t push
  • If the file is in a different directory, such as D:\example\filename\urllist.txt, run the following command in a CLI, such as Command Prompt, PowerShell, or Terminal.

    python Refresh.py -r D:\example\filename\urllist.txt -t push

The following is a sample output:

python Refresh.py -r urllist.txt -t push
{'RequestId': 'C1686DCA-F3B5-5575-ADD1-05F96617D771', 'RefreshTaskId': '18392588711'}
[18392588710] is doing... ...

If the error message Failed to initialize credentials and client. Please check environment variables. is returned, set the Alibaba Cloud AccessKey in an environment variable as described in Step 4, and then run the command in the same terminal window.