All Products
Search
Document Center

Function Compute (2.0):Dynamically add watermarks to PDF files by using a Function Compute node in DataWorks

Last Updated:Nov 17, 2023

This topic describes how to use a Function Compute node in DataWorks to call the Function Compute service and periodically add watermarks to incremental PDF files in Object Storage Service (OSS).

Background information

You can perform custom configurations for various features in Function Compute and create a Function Compute node in DataWorks to call the Function Compute service.

Prerequisites

  • DataWorks is activated. For more information, see Activate DataWorks.

  • Function Compute is activated. For more information, see Quickly create a function.

  • You have activated Object Storage Service (OSS). For more information, see Activate OSS. Create a bucket and upload the PDF files to which you want to add watermarks. In this example, the 2023-08-15 directory is created in the bucket-testxxxx bucket and the example.pdf file is uploaded to the directory.

Limits

  • Limits on features

    DataWorks allows you to invoke only event functions. If you want to periodically schedule an event processing function in DataWorks, you must create an event function rather than an HTTP function to process event requests in Function Compute.

  • Limits on regions

    You can use the features provided by Function Compute only in the workspaces that are created in the following regions: China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Shenzhen), China (Hong Kong), Singapore, UK (London), US (Silicon Valley), US (Virginia), Germany (Frankfurt), Australia (Sydney), and India (Mumbai).

Step 1: Create a Function Compute application

  1. Log on to the Function Compute console. In the left-side navigation pane, click Applications.

  2. On the Applications page, click Create Application.

  3. On the Create Application page, select Use a Template to Create an Application. In the search box, enter start-pdf-watermark and click the search icon. Move the pointer over the start-pdf-watermark application template that is displayed after the search and click Create Now.

    image.png
    Note

    You can visit GitHub to view the source code of the start-pdf-watermark application template. The implementation logic of the application is to add a specified watermark to a PDF file in OSS and write the PDF file with the watermark back to the same OSS path.

  4. On the Create Application page, configure the parameters.

    image.pngimage.png

    Parameter

    Description

    Deployment Type

    Set the value to Directly Deploy.

    Application Name

    The system automatically generates a name that meets the related requirements. You can change the name based on your business requirements.

    Role Name

    AliyunFCServerlessDevsRole is selected by default. You can change the value based on your business requirements.

    • When you deploy applications in Serverless Application Center, make sure that Function Compute is granted with the required permissions. For example, some permissions are required when you deploy specific service and function resources and access other Alibaba Cloud services, such as Virtual Private Cloud (VPC), Apsara File Storage NAS (NAS), and Simple Log Service. First of all, you must associate a RAM role with the application or environment and set Function Compute as the trusted service. Then, Service Application Center can call the AssumeRole operation to obtain a Security Token Service (STS) token and assume the RAM role to access Alibaba Cloud services.

    • To simplify authorization, Serverless Application Center provides the default role AliyunFCServerlessDevsRole. This role has the permissions on some Alibaba Cloud resources that are accessed by Service Application Center. You can log on to the Resource Access Management (RAM) console to view the permissions of the AliyunFCServerlessDevsRole role.

    Region

    The region in which you want to create the application. After you select a region, only OSS buckets in the selected region are available for the OSS bucket name parameter.

    Service name

    The system automatically generates a name that meets the related requirements. You can change the name based on your business requirements.

    Function name

    The system automatically generates a name that meets the related requirements. You can change the name based on your business requirements.

    Time Zone

    The time zone to which the selected region belongs is selected by default. You can change the value based on your business requirements.

    OSS bucket name

    The name of the OSS bucket that you want to use. Only OSS buckets that reside in the region specified by the Region parameter are available.

    RAM's ARN

    AliyunFCDefaultRole is selected by default. You can change the value based on your business requirements.

    When a function is executed, Function Compute needs to access other Alibaba Cloud resources. For example, Function Compute needs to write function logs to the specified Logstore in Log Service, pull images from Container Registry, or connect to virtual private clouds (VPCs) for access. To simplify authorization, Function Compute provides the default RAM role AliyunFCDefaultRole. This role has the permissions that is required by Function Compute to access specific Alibaba Cloud resources. For more information about how to create the AliyunFCDefaultRole role and how to bind the role, see Activate Function Compute. You can log on to the RAM console to view the details of the AliyunFCDefaultRole role.

  5. Click Create and Deploy Default Environment. If Deployed is displayed next to Deployment Status on the right side of the details page that appears, the application is created and deployed.

    image.png
  6. In the upper part of the page, click Application Details to go to the details page of the application.

    image.png
  7. On the details page of the application, click Default Environment in the Environment Name column. The Environment Details tab appears.

    image.png
  8. In the Resource Information section of the Environment Details tab, click the value of Function to go to the details page of the function.

    image.png
  9. On the details page of the function, click the Test Function tab. On the Test Function tab, expand Configure test events and configure the following parameters.

    image.png
    • Event name: Enter a name for the test event in the Event Name field.

    • Event content: Enter JSON-formatted code in the code editor. In the example, the following code is entered.

      Important

      If you directly copy the following code, you must delete the forward slashes (/) and the comments after the forward slashes (/). Otherwise, the code may fail the JSON format verification.

      // The following code provides an example on how to add the watermark DataWorks to a PDF file named example.pdf in the /2023-08-15/ path. The font of the watermark is Helvetica, and the font size is 30. For information about the parameters in the code, see the comments of the parameters.
      {
          "pdf_file": "/2023-08-15/example.pdf",  // The path of the PDF file in the OSS bucket.
          "mark_text": "DataWorks",    // The watermark text. If you want to add a watermark to a PDF file, this parameter is required.
          "pagesize": [595.275590551181, 841.8897637795275], // Optional. The default value is the A4 paper size (21 cm, 29.7 cm). 1 cm is equivalent to 28.346456692913385 points.
          "font": "Helvetica",     // Optional. The font of the watermark. The default value is Helvetica. If you want to add a watermark in Chinese to the PDF file, you can set this parameter to zenhei or microhei.
          "font_size": 20,         // Optional. The font size of the watermark. The default value is 30.
          "font_color": [0, 0, 0], // The font color of the watermark, in the RGB format. The default color is black.
          "rotate": 30,            // Optional. The rotation angle of the watermark. The default value is 0.
          "opacity": 0.1,          // Optional. The transparency of the watermark. The default value is 0.1. The value 1 indicates that the watermark is not transparent.
          "density": [198.4251968503937, 283.46456692913387] // The density of the watermark. The default value is [141.73228346456693, 141.73228346456693], which indicates an interval of 7 cm on the X-axis and an interval of 10 cm on the Y-axis exist between watermark texts.
      }
  • Click Test Function. If the code is successfully run, you can view the PDF file to which the watermark is added in the specified OSS path. In this example, the example-out.pdf file is generated.

    image.png

    You can view the source PDF file and the generated PDF file in OSS.

    image.png

Step 2: Create and configure a Function Compute node in the DataWorks console

  1. Log on to the DataWorks console.

  2. In the left-side navigation pane, click Workspaces.

  3. In the top navigation bar, select the region that you specify in Step 1: Create a Function Compute application.

  4. On the Workspaces page, find the desired workspace and click its name to go to the Workspace Details page. If you do not have workspaces in the selected region, you must create a workspace in the region. For more information, see Create a workspace.

  5. In the left-side navigation pane, choose Data Modeling and Development > DataStudio.

  6. In the Scheduled Workflow pane of the DataStudio page, find the desired workflow, click its name, right-click General, and then choose Create Node > Function Compute. In the Create Node dialog box, enter a name in the Name field and click Confirm to create a Function Compute node.image.png

  7. On the configuration tab of the Function Compute node, configure the parameters.

    image.png

    Parameter

    Description

    Select Service

    Select the service name that you specify in Substep 4 in Step 1. In this example, fc-pdf-test is selected. For information about how to create a service, see Quickly create a function.

    Select Version Or Alias

    Select the version or alias of the service that you want to use for subsequent function invocation. If you select Default Version, the Version parameter is displayed, and the value of the Version parameter is fixed as LATEST. In this example, Default Version is selected.

    • Service version

      Function Compute provides the service-level versioning feature, which allows you to release one or more versions for a service. A version is similar to a service snapshot that contains the information such as the service settings, and the code and settings of functions that belong to the service. A version does not contain trigger information. When you release a version, the system generates a snapshot for the service and assigns a version number that is associated with the snapshot for future use. For more information about how to release a version, see Manage versions.

    • Version alias

      Function Compute allows you to create an alias for a service version. An alias points to a specific version of a service. You can use an alias to perform version release, rollback, or canary release with ease. An alias is dependent on a service or a version. When you use an alias to access a service or function, Function Compute parses the alias into the version to which the alias points. This way, the invoker does not need to know the specific version to which the alias points. For information about how to create an alias, see Manage aliases.

    Select Function

    Select the function name that you specify in Substep 4 in Step 1. In this example, pdf_add_watermark is selected. For information about how to create a function, see Manage functions.

    Note

    DataWorks allows you to invoke only event functions. If you want to periodically schedule an event processing function in DataWorks, you must create an event function rather than an HTTP function to process event requests in Function Compute.

    Invocation Method

    In this example, Synchronous Invocation is selected. For more information about invocation methods of functions, see Function invocation.

    • Synchronous Invocation: When you synchronously invoke a function, an event directly triggers the function, and Function Compute executes the function and waits for a response. After the function is invoked, Function Compute returns the execution results of the function.

    • Asynchronous Invocation: When you asynchronously invoke a function, Function Compute immediately returns a response after the request is persisted instead of returning a response only after the request execution is complete.

      If your function has the logic that is time-consuming, resource-consuming, or error-prone, you can use this method to allow your programs to respond to traffic spikes in an efficient and reliable manner.

    Variable

    The parameters that are assigned to variables used in the code for invoking the function as values. In this example, the JSON content that you configure in Substep 9 in Step 1 is modified and used to add watermarks to incremental PDF files in OSS on a daily basis.

    // The following code provides an example on how to add a watermark to a PDF file named example.pdf in a path that is in the /${current_date}/ format.
    {
        "pdf_file": "/${current_date}/example.pdf",  // The path of the PDF file in the OSS bucket.
        "mark_text": "DataWorks",    // The watermark text. If you want to add a watermark to a PDF file, this parameter is required.
        "pagesize": [595.275590551181, 841.8897637795275], // Optional. The default value is the A4 paper size (21 cm, 29.7 cm). 1 cm is equivalent to 28.346456692913385 points.
        "font": "Helvetica",     // Optional. The font of the watermark. The default value is Helvetica. If you want to add a watermark in Chinese to the PDF file, you can set this parameter to zenhei or microhei.
        "font_size": 20,         // Optional. The font size of the watermark. The default value is 30.
        "font_color": [0, 0, 0], // The font color of the watermark, in the RGB format. The default color is black.
        "rotate": 30,            // Optional. The rotation angle of the watermark. The default value is 0.
        "opacity": 0.1,          // Optional. The transparency of the watermark. The default value is 0.1. The value 1 indicates that the watermark is not transparent.
        "density": [198.4251968503937, 283.46456692913387] // The density of the watermark. The default value is [141.73228346456693, 141.73228346456693], which indicates an interval of 7 cm on the X-axis and an interval of 10 cm on the Y-axis exist between watermark texts.
    }
    Note
    • The value of pdf_file is in the /${current_date}/example.pdf format. ${current_date} indicates that a variable named current_date is used.

    • When DataWorks schedules the Function Compute node, DataWorks replaces ${current_date} with an actual value. You can configure the variable when you configure scheduling parameters for the Function Compute node. For example, if DataWorks runs the Function Compute node on August 15, 2023, the value of pdf_file is /2023-08-15/example.pdf. If DataWorks schedules the Function Compute node on August 16, 2023, the value of pdf_file is /2023-08-16/example.pdf.

    • The business system needs to only generate incremental PDF files in the specified OSS path every day based on specific time-related rules before DataWorks starts to schedule the Function Compute node. Then, DataWorks schedules the Function Compute node to add watermarks to the incremental PDF files every day.

    • For this example, you must upload a PDF file to a path that is in the /${current_date}/ format in OSS before DataWorks starts to schedule the Function Compute node. For example, you can upload a PDF file named example.pdf to the /2023-08-15/ path.

  8. Optional. Debug and run the Function Compute node. After the configuration is complete, click the 运行 icon in the top toolbar of the configuration tab of the Function Compute node. In the Runtime Parameters dialog box, select a resource group that you want to use to run the Function Compute node, assign constants to the variables that you use as values, and then click OK to test whether the code logic of the Function Compute node is correct. For example, if you assign 2023-08-15 to the ${current_date} variable as the value, DataWorks runs the Function Compute node to add a watermark to the example.pdf file stored in the /2023-08-15/ path.

  9. Configure scheduling properties for the Function Compute node to periodically schedule and run the node. DataWorks provides scheduling parameters, which are used to implement dynamic parameter passing in node code in scheduling scenarios. You can click Properties in the right-side navigation pane of the configuration tab of the Function Compute node. In the Parameters section of the Properties tab, you can configure scheduling parameters for the Function Compute node. In this example, the current_date scheduling parameter is added, and $[yyyy-mm-dd] is assigned to the scheduling parameter as the value. yyyy-mm-dd indicates the year, month, and day when the Function Compute node is run. For more information about scheduling parameter configurations, see Supported formats of scheduling parameters. For more information about scheduling properties, see Overview.

    image.png

Step 3: Commit and deploy the Function Compute node

Function Compute nodes can be automatically scheduled only after they are committed and deployed to the production environment.

  1. Save and commit the Function Compute node.

    Click the Save and Submit icons in the top toolbar on the configuration tab of the Function Compute node to save and commit the Function Compute node. When you commit a node, enter a change description as prompted and specify whether to perform code review and smoke testing.

    Note
    • You can commit the node only after you configure the Rerun and Parent Nodes parameters on the Properties tab.

    • If the code review feature is enabled, a node can be deployed only after the code of the node is approved by a specified reviewer. For more information, see Code review.

    • To ensure that the node you created can be run as expected, we recommend that you perform smoke testing before you deploy the node. For more information, see Perform smoke testing.

  2. Optional:Deploy the Function Compute node.

    If the workspace that you use is in standard mode, you must click Deploy in the upper-right corner to deploy the node after you commit it. For more information, see Differences between workspaces in basic mode and workspaces in standard mode and Deploy nodes.

What to do next

After you commit and deploy the Function Compute node to Operation Center in the production environment, you can perform O&M operations on the node in Operation Center. For more information, see Perform basic O&M operations on auto triggered nodes.

References