This topic provides answers to some frequently asked questions about fully managed Flink, including answers to questions about console operations, network connectivity, and JAR package issues.

How do I upload a JAR package in the Object Storage Service (OSS) console?

  1. In the console of fully managed Flink, view the OSS bucket of the current cluster. OSS Bucket
    The following figure shows information about the OSS bucket. Bucket details
  2. Log on to the OSS console and upload the JAR package to the /artifacts/namespaces directory of the OSS bucket. Directory of the JAR package
  3. In the left-side navigation pane of the console of fully managed Flink, click Artifacts to view the JAR package that you uploaded in the OSS console. View the JAR package

How do I configure the parameters that are related to checkpoints and state backends?

In the upper-right corner of the job details page, click Configure. On the right side of the Draft Editor page, click the Advanced tab. In the panel that appears, add the code shown in the following figure to the Additional Configuration section and save the code to make it take effect. The following figure shows the sample code. DetailsThe following table describes the parameters that you can configure for fully managed Flink in addition to the parameters provided by Apache Flink.
Parameter Description Remarks
state.backend The stream state storage system. Valid values:
  • GeminiStateBackend: specifies the stream state storage system provided by Alibaba Cloud. This is the default value.
  • RocksDB: specifies the stream state storage system provided by Apache Flink.
table.exec.state.ttl The time-to-live (TTL) of state data in SQL jobs.
Note We recommend that you set this parameter to 129,600,000 milliseconds, which is equal to 1.5 days.
  • For Ververica Runtime (VVR) that is earlier than 4.0, this parameter is left empty by default. This indicates that the state data does not expire.
  • For VVR 4.0 and later versions, this parameter is set to 129,600,000 milliseconds by default, which is equal to 1.5 days.
state.backend.gemini.ttl.ms The TTL of state data in DataStream jobs and Python jobs.
For more information about the parameters that are related to checkpoints and state backends in Apache Flink, see Checkpoints and State Backends.

How do I find the job that triggers an alert?

The alert event contains JobID and Deployment ID. However, JobID changes after a failover is performed. In this case, you must use Deployment ID to find the job for which the error is returned. Deployment ID is not displayed in the console of fully managed Flink. You must obtain this information from the URL of the job. Deployment ID

How do I view information about a workspace, such as the workspace ID?

Log on to the console of fully managed Flink, find the workspace whose information you want to view, and then choose More > Workspace Details in the Actions column. Workspace Details

How does a fully managed Flink cluster access the Internet?

  • Background information
    By default, fully managed Flink clusters cannot access the Internet. Therefore, Alibaba Cloud provides NAT gateways to enable communications between virtual private clouds (VPCs) and the Internet. This way, users of fully managed Flink clusters can access the Internet by using user-defined extensions (UDXs) or DataStream code. Background information
  • Solution
    Create a NAT gateway in the VPC. Then, create a source network address translation (SNAT) entry to bind the vSwitch that is associated with the fully managed Flink cluster to an elastic IP address (EIP). This way, the cluster can access the Internet by using the EIP. Procedure:
    1. Create a NAT gateway. For more information, see Create a NAT gateway.
    2. Create an SNAT entry. For more information, see Create an SNAT entry.
    3. Bind the vSwitch that is associated with the fully managed Flink cluster to an EIP. For more information, see Associate an EIP with a NAT gateway.

How does Realtime Compute for Apache Flink access the storage resources across VPCs?

You can use the following methods to allow Realtime Compute for Apache Flink to access the storage resources across VPCs:
  • Submit a ticket. Select VPC as the product name. Express Connect or other products are required to establish connections between VPCs. You are charged when you use this method.
  • Use VPN gateways to establish VPN connections between VPCs. For more information, see Establish IPsec-VPN connections between two VPCs.
  • Unsubscribe to the storage service that resides in a different VPC from fully managed Flink. Then, purchase a storage service that resides in the same VPC as fully managed Flink.
  • Release the fully managed Flink cluster and then purchase another fully managed Flink cluster that is in the same VPC as the storage service.
  • Enable Internet access for fully managed Flink. This way, fully managed Flink can access storage services over the Internet. By default, fully managed Flink clusters cannot access the Internet. For more information about how to access the Internet, see How does a fully managed Flink cluster access the Internet?.
    Note The Internet has a longer latency than internal networks. If you have high performance requirements, we recommend that you do not enable Internet access for fully managed Flink.

How do I troubleshoot dependency conflicts of Flink?

  • Problem description
    • An error caused by an issue in Flink or Hadoop is reported.
      java.lang.AbstractMethodError
      java.lang.ClassNotFoundException
      java.lang.IllegalAccessError
      java.lang.IllegalAccessException
      java.lang.InstantiationError
      java.lang.InstantiationException
      java.lang.InvocationTargetException
      java.lang.NoClassDefFoundError
      java.lang.NoSuchFieldError
      java.lang.NoSuchFieldException
      java.lang.NoSuchMethodError
      java.lang.NoSuchMethodException
    • No error is reported, but one of the following issues occur:
      • Logs are not generated or the Log4j configuration does not take effect.
        In most cases, this issue occurs because the dependency contains the Log4j configuration. To resolve this issue, you must check whether the dependency in the JAR file of your job contains the Log4j configuration. If the dependency contains the Log4j configuration, you can configure exclusions in the dependency to remove the Log4j configuration.
        Note If you use different versions of Log4j, you must use maven-shade-plugin to relocate Log4j-related classes.
      • The remote procedure call (RPC) fails.

        By default, errors caused by dependency conflicts during Akka RPCs of Flink are not recorded in logs. To check these errors, you must enable debug logging.

        For example, a debug log records Cannot allocate the requested resources. Trying to allocate ResourceProfile{xxx}. However, the JobManager log stops at the message Registering TaskManager with ResourceID xxx and does not display further information until a resource request timeout occurs and displays the NoResourceAvailableException message. In addition, TaskManagers continuously report the error message Cannot allocate the requested resources. Trying to allocate ResourceProfile{xxx}.

        Cause: After you enable debug logging, the RPC error message InvocationTargetException appears. In this case, slots fail to be allocated for TaskManagers and the status of the TaskManagers becomes inconsistent. As a result, Resource Manager continuously fails to allocate slots.

  • Causes
    • The JAR package of your job contains unnecessary dependencies, such as the dependencies for basic configurations, Flink, Hadoop, and Log4j. As a result, dependency conflicts occur and cause some issues.
    • The dependency that corresponds to the connector that is required for your job is not included in the JAR package.
  • Troubleshooting
    • Check whether the pom.xml file of your job contains unnecessary dependencies.
    • Run the jar tf foo.jar command to view the content of the JAR package and determine whether the package contains the content that causes dependency conflicts.
    • Run the mvn dependency:tree command to check the dependency relationship of your job and determine whether dependency conflicts exist.
  • Solution
    • We recommend that you set scope to provided for the dependencies for basic configurations. This way, the dependencies for basic configurations are not included in the JAR package of your job.
      • DataStream Java
        <dependency>
          <groupId>org.apache.flink</groupId>
          <artifactId>flink-streaming-java_2.11</artifactId>
          <version>${flink.version}</version>
          <scope>provided</scope>
        </dependency>
      • DataStream Scala
        <dependency>
          <groupId>org.apache.flink</groupId>
          <artifactId>flink-streaming-scala_2.11</artifactId>
          <version>${flink.version}</version>
          <scope>provided</scope>
        </dependency>
      • DataSet Java
        <dependency>
          <groupId>org.apache.flink</groupId>
          <artifactId>flink-java</artifactId>
          <version>${flink.version}</version>
          <scope>provided</scope>
        </dependency>
      • DataSet Scala
        <dependency>
          <groupId>org.apache.flink</groupId>
          <artifactId>flink-scala_2.11</artifactId>
          <version>${flink.version}</version>
          <scope>provided</scope>
        </dependency>
    • Add the dependencies that correspond to the connectors required for the job, and set scope to compile. This way, the dependencies that correspond to the required connectors are included in the JAR package. The default value of scope is compile. In the following code, the Kafka connector is used as an example.
      <dependency>
          <groupId>org.apache.flink</groupId>
          <artifactId>flink-connector-kafka_2.11</artifactId>
          <version>${flink.version}</version>
      </dependency>
    • We recommend that you do not add the dependencies for Flink, Hadoop, or Log4j. Take note of the following exceptions:
      • If the job has direct dependencies for basic configurations or connectors, we recommend that you set scope to provided. Sample code:
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <scope>provided</scope>
        </dependency>
      • If the job has indirect dependencies for basic configurations or connectors, we recommend that you configure exclusions to remove the dependencies. Sample code:
        <dependency>
            <groupId>foo</groupId>
              <artifactId>bar</artifactId>
              <exclusions>
                <exclusion>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-common</artifactId>
               </exclusion>
            </exclusions>
        </dependency>

How do I resolve the domain name of the service on which a Flink job depends?

If your self-managed Flink job depends on the domain name of the service, a domain name resolution failure is reported when you migrate the service data to fully managed Flink. To resolve this issue, you can use one of the following methods based on the specific scenario:
  • You have a self-managed DNS. Flink can connect to the self-managed DNS service over a VPC, and the self-managed DNS can normally resolve domain names.
    In this case, you can perform DNS resolution by using the job template of fully managed Flink. For example, the IP address of your self-managed DNS is 192.168.0.1. Perform the following steps:
    1. Log on to the Realtime Compute for Apache Flink console.
    2. On the Fully Managed Flink tab, find the workspace that you want to manage and click Console in the Actions column.
    3. In the left-side navigation pane, choose Administration > Deployment Defaults.
    4. In the Additional Configuration section, add the following code:
      env.java.opts: >-
        -Dsun.net.spi.nameservice.provider.1=default
        -Dsun.net.spi.nameservice.provider.2=dns,sun
        -Dsun.net.spi.nameservice.nameservers=192.168.0.1
      Note If your self-managed DNS has multiple IP addresses, we recommend that you separate the IP addresses with commas (,).
    5. Create and run a job in the console of fully managed Flink.

      If the error message "UnknownHostException" still appears, domain names cannot be resolved. In this case, submit a ticket.

  • You do not have a self-managed DNS or fully managed Flink cannot connect to the self-managed DNS over a VPC.
    In this case, you need to use Alibaba Cloud DNS PrivateZone to resolve domain names. For example, the VPC in which fully managed Flink resides is named vpc-flinkxxxxxxx, and the domain names that your Flink job needs to access are aaa.test.com 127.0.0.1, bbb.test.com 127.0.0.2, and ccc.test.com 127.0.0.3. To resolve the domain names, perform the following steps:
    1. Activate Alibaba Cloud DNS PrivateZone. For more information, see Activate Alibaba Cloud DNS PrivateZone.
    2. Add a zone and use the common suffix of the service that your Flink job needs to access as the zone name. For more information, see Add a zone.
    3. Associate the zone with the VPC in which fully managed Flink resides. For more information, see Associate a zone with a VPC or disassociate a zone from a VPC.
    4. Add DNS records to the zone. For more information, see Add DNS records.
    5. In the console of fully managed Flink, create and run a job or stop and rerun an existing job.

      If the error message "UnknownHost" still appears, domain names cannot be resolved. In this case, submit a ticket.