Development and debugging - Realtime Compute for Apache Flink

How should I declare a DDL statement when submitting it with DML statements in the same script?
How do I write multiple INSERT INTO statements?
How do I pass special characters in Entry point Main Arguments?
Why does uploading a UDF JAR package fail after multiple modifications?
Why are fields misaligned when a POJO class is used as the UDTF return type?
How do I resolve Flink dependency conflicts?
Error: Could not parse type at position 50: expected but was . Input type string: ROW
Error when writing to a table: "Invalid primary key. Column 'xxx' is nullable."
Why does a JSON file open in the browser instead of being downloaded?

Declaring DDL with DML statements

When you submit DDL and DML statements together in the same script, declare the DDL statement as CREATE TEMPORARY TABLE instead of CREATE TABLE. Otherwise, clicking Validate fails with an error similar to the following.

CREATE TABLE datagen_source (a bigint, b int, c varchar)
  WITH ('connector' = 'datagen');
CREATE  TABLE print_sink(C bigint, var1 int)
  WITH ('connector' = 'print','logger' = 'true');
INSERT INTO print_sink SELECT a,8  FROM datagen_source;
Error message:
org.apache.flink.table.gateway.api.vvr.utils.SqlValidationException: A sequence of multiple statements to execute is supported if the last statement is a 'SELECT' statement or 'INSERT INTO' statement or 'CREATE TABLE IF NOT EXISTS ... AS TABLE' statement ...BASE IF NOT EXISTS ... AS DATABASE' statement or 'AUTO OPTIMIZE TABLE|DATABASE' statements or multiple 'INSERT INTO' or 'CREATE TABLE IF NOT EXISTS ... AS TAB ...ATABASE IF NOT EXISTS ... AS DATABASE' statements wrapped in a 'BEGIN STATEMENT SET' block and all other statements are CREATE TEMPORARY TABLE|VIEW|[SYSTEM] FUNCTION, 'SHOW', DESCRIBE, 'USE' statements.
	at org.apache.flink.table.gateway.vvr.service.utils.SqlValidateUtils.validateDraft(SqlValidateUtils.java:107)
	at org.apache.flink.table.gateway.vvr.service.command.DraftCommand.getDraftType(DraftCommand.java:120)
	at org.apache.flink.table.gateway.vvr.service.command.DraftCommand.executeInternal(DraftCommand.java:71)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)

Multiple INSERT INTO statements

To form a single logical unit, wrap multiple INSERT INTO statements between BEGIN STATEMENT SET; and END;. For more information, see INSERT INTO statements. If you do not wrap the statements, clicking Validate fails with an error similar to the following.

CREATE TEMPORARY TABLE datagen_source (a bigint, b int, c varchar)
  WITH ('connector' = 'datagen');
CREATE TEMPORARY  TABLE print_sink(C bigint, var1 int)
  WITH ('connector' = 'print','logger' = 'true');
CREATE TEMPORARY TABLE print_sink2(C bigint, var2 int)
  WITH ('connector' = 'print','logger' = 'true');
INSERT INTO print_sink SELECT a,B  FROM datagen_source;
INSERT INTO print_sink2 SELECT a,B  FROM datagen_source;
org.apache.flink.table.gateway.api.vvr.utils.SqlValidationException: A sequence of multiple statements to execute is supported if the last statement is a 'SELECT' statement or 'INSERT INTO' statement or 'CREATE TABLE IF NOT EXISTS ... AS TABLE' statement or 'CREATE DATABASE IF NOT EXISTS ... AS DATABASE' statement or 'AUTO OPTIMIZE TABLE|DATABASE' statements or multiple 'INSERT INTO' or 'CREATE TABLE IF NOT EXISTS ... AS TABLE' or 'CREATE DATABASE IF NOT EXISTS ... AS DATABASE' statements wrapped in a 'BEGIN STATEMENT SET' block and all other statements are CREATE TEMPORARY TABLE|VIEW|[SYSTEM] FUNCTION, 'SHOW', DESCRIBE, 'USE' statements.
      at org.apache.flink.table.gateway.vvr.service.utils.SqlValidateUtils.validateDraft(SqlValidateUtils.java:79)
      at org.apache.flink.table.gateway.vvr.service.command.DraftCommand.getDraftType(DraftCommand.java:120)
      at org.apache.flink.table.gateway.vvr.service.command.DraftCommand.executeInternal(DraftCommand.java:71)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:422)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
      at org.apache.flink.table.gateway.service.context.SqlGatewaySecurityContext.runSecured(SqlGatewaySecurityContext.java:73)
      at org.apache.flink.table.gateway.vvr.service.command.AbstractCommand.wrapClassLoader(AbstractCommand.java:171)
      at org.apache.flink.table.gateway.vvr.service.command.AbstractCommand.execute(AbstractCommand.java:163)
      at org.apache.flink.table.gateway.vvr.service.command.CommandManager.lambda$execute$0(CommandManager.java:71)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

Passing special characters in main arguments

Cause

When you pass special characters such as # and $ in Entry point Main Arguments, the backslash (\) escape character does not work and the special characters are discarded.
Solution

On the Deployments page, click the name of the target deployment. In the Parameters section, add the parameter env.java.opts: -Dconfig.disable-inline-comment=true to the Other Configuration field. For more information, see How to configure custom deployment parameters.

UDF JAR upload failure after modification

Cause

The UDF runtime enforces unique class names across JAR packages.
Solution
- Delete the old JAR package and upload the new one.
- Upload the JAR package in the Additional Dependencies section and use a temporary function in your code. For information about how to use a temporary function, see Register a UDF. The following example shows the syntax.
```
CREATE TEMPORARY FUNCTION `cp_record_reduce` AS 'com.taobao.test.udf.blink.CPRecordReduceUDF';
```
  In the Additional Dependencies section, located in the Other Configuration panel on the right, provide the OSS URL of your UDF JAR package.

Field misalignment with POJO class as UDTF return type

Details

When you use a POJO class as the return type for a UDTF and explicitly declare an alias list for the returned columns in SQL, you might encounter a field misalignment issue. As a result, the actual fields might not be the ones you expect, even if the data types are consistent.

For example, if you use the following POJO class as the return type for a UDTF, package it, and register the function as a deployment-level UDF as described in Develop a custom function, the SQL validation fails.

package com.aliyun.example;
public class TestPojoWithoutConstructor {
	public int c;
	public String d;
	public boolean a;
	public String b;
}

package com.aliyun.example;
import org.apache.flink.table.functions.TableFunction;
public class MyTableFuncPojoWithoutConstructor extends TableFunction<TestPojoWithoutConstructor> {
	private static final long serialVersionUID = 1L;
	public void eval(String str1, Integer i2) {
		TestPojoWithoutConstructor p = new TestPojoWithoutConstructor();
		p.d = str1 + "_d";
		p.c = i2 + 2;
		p.b = str1 + "_b";
		collect(p);
	}
}

CREATE TEMPORARY FUNCTION MyTableFuncPojoWithoutConstructor as 'com.aliyun.example.MyTableFuncPojoWithoutConstructor';
CREATE TEMPORARY TABLE src ( 
  id STRING,
  cnt INT
) WITH (
  'connector' = 'datagen'
);
CREATE TEMPORARY TABLE sink ( 
  f1 INT,
  f2 STRING,
  f3 BOOLEAN,
  f4 STRING
) WITH (
 'connector' = 'print'
);
INSERT INTO sink
SELECT T.* FROM src, LATERAL TABLE(MyTableFuncPojoWithoutConstructor(id, cnt)) AS T(c, d, a, b);

The SQL validation returns the following error message:

org.apache.flink.table.api.ValidationException: SQL validation failed. Column types of query result and sink for 'vvp.default.sink' do not match.
Cause: Sink column 'f1' at position 0 is of type INT but expression in the query is of type BOOLEAN NOT NULL.
Hint: You will need to rewrite or cast the expression.
Query schema: [c: BOOLEAN NOT NULL, d: STRING, a: INT NOT NULL, b: STRING]
Sink schema:  [f1: INT, f2: STRING, f3: BOOLEAN, f4: STRING]
	at org.apache.flink.table.sqlserver.utils.FormatValidatorExceptionUtils.newValidationException(FormatValidatorExceptionUtils.java:41)

The fields returned from the UDTF are misaligned with the fields in the POJO class. In the query result, field c is of type BOOLEAN and field a is of type INT, which is the reverse of their definitions in the POJO class.

Cause

According to the type inference rules for POJO classes:
- If the POJO class has a parameterized constructor, Flink infers the return type based on the constructor's parameter order.
- If the POJO class does not have a parameterized constructor, Flink reorders the fields in alphabetical order by name.
In the example, because the POJO class used for the UDTF's return type lacks a parameterized constructor, the fields are returned in alphabetical order, resulting in the type BOOLEAN a, VARCHAR(2147483647) b, INTEGER c, VARCHAR(2147483647) d). Although this inference is valid, the SQL query explicitly renames the output columns with LATERAL TABLE(MyTableFuncPojoWithoutConstructor(id, cnt)) AS T(c, d, a, b). This alias list renames columns by position, creating a mismatch with the alphabetically sorted fields from the POJO. This conflict between positional aliasing and alphabetical field ordering causes the validation exception or unexpected data misalignment.

Solution

If the POJO class lacks a parameterized constructor, remove the explicit renaming of the UDTF return fields. For example, change the INSERT statement in the SQL to:

-- If the POJO class lacks a parameterized constructor, select the required fields by name. 
-- When using T.*, you must be aware of the actual order of the returned fields.
SELECT T.c, T.d, T.a, T.b FROM src, LATERAL TABLE(MyTableFuncPojoWithoutConstructor(id, cnt)) AS T;

Implement a parameterized constructor in the POJO class to control the order of the fields in the return type. In this case, the field order of the UDTF's output will match the parameter order of the constructor.

package com.aliyun.example;
public class TestPojoWithConstructor {
	public int c;
	public String d;
	public boolean a;
	public String b;
	// Using specific fields order instead of alphabetical order
	public TestPojoWithConstructor(int c, String d, boolean a, String b) {
		this.c = c;
		this.d = d;
		this.a = a;
		this.b = b;
	}
}

Resolving Flink dependency conflicts

Symptoms
- The conflict manifests as clear errors thrown by Flink- or Hadoop-related classes.
```
java.lang.AbstractMethodError
java.lang.ClassNotFoundException
java.lang.IllegalAccessError
java.lang.IllegalAccessException
java.lang.InstantiationError
java.lang.InstantiationException
java.lang.InvocationTargetException
java.lang.NoClassDefFoundError
java.lang.NoSuchFieldError
java.lang.NoSuchFieldException
java.lang.NoSuchMethodError
java.lang.NoSuchMethodException
```
- Alternatively, the issue might present as unexpected behavior without a clear error message, such as:
  - Logs are not generated or the log4j configuration does not take effect.
    
    This issue is usually caused by log4j-related configurations included in the dependencies. Check if the deployment JAR package contains dependencies that carry log4j configurations. You can remove these configurations by using exclusions in your dependency definitions.
    
    Note
    If you must use a different version of log4j, use the maven-shade-plugin to relocate the log4j-related classes.
  - RPC call exceptions.
    
    Dependency conflicts affecting Flink's Akka RPC calls may cause exceptions that are not displayed in logs by default. You must enable debug logging to identify them.
    
    For example, the debug log shows Cannot allocate the requested resources. Trying to allocate ResourceProfile{xxx}, but the JobManager (JM) log shows no activity after Registering TaskManager with ResourceID xxx until a NoResourceAvailableException timeout error occurs. Meanwhile, the TaskManager (TM) continuously reports the error Cannot allocate the requested resources. Trying to allocate ResourceProfile{xxx}.
    
    Cause: With debug logging enabled, you can see that an InvocationTargetException is thrown during an RPC call. This error causes the TM slot allocation to fail mid-process, resulting in an inconsistent state. The ResourceManager (RM) then continuously and unsuccessfully attempts to allocate a slot, and cannot recover.
Causes
- The deployment JAR package contains unnecessary dependencies, such as base Flink, Hadoop, or log4j libraries, which cause dependency conflicts.
- Dependencies for a required connector are not included in the JAR package.
Troubleshooting
- Review the deployment's pom.xml file for unnecessary dependencies.
- Inspect the contents of the deployment JAR package by running the jar tf foo.jar command to check for conflicting files.
- Analyze the deployment's dependency tree for conflicts by running the mvn dependency:tree command.

Solution

As a best practice, set the scope of basic framework dependencies to provided. This prevents them from being bundled into the deployment JAR package.

DataStream Java

<dependency>
  <groupId>org.apache.flink</groupId>
  <artifactId>flink-streaming-java_2.11</artifactId>
  <version>${flink.version}</version>
  <scope>provided</scope>
</dependency>

DataStream Scala

<dependency>
  <groupId>org.apache.flink</groupId>
  <artifactId>flink-streaming-scala_2.11</artifactId>
  <version>${flink.version}</version>
  <scope>provided</scope>
</dependency>

DataSet Java

<dependency>
  <groupId>org.apache.flink</groupId>
  <artifactId>flink-java</artifactId>
  <version>${flink.version}</version>
  <scope>provided</scope>
</dependency>

DataSet Scala

<dependency>
  <groupId>org.apache.flink</groupId>
  <artifactId>flink-scala_2.11</artifactId>
  <version>${flink.version}</version>
  <scope>provided</scope>
</dependency>

Add the required connector dependencies to your project. The default scope is compile, which correctly bundles them into the deployment JAR package. For example, to add the Kafka connector:
```
<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-connector-kafka_2.11</artifactId>
    <version>${flink.version}</version>
</dependency>
```
Do not add other Flink, Hadoop, or log4j dependencies. However:
- If the deployment has a direct dependency on base configuration or connector-related components, set the scope to provided. The following example shows the syntax.
```
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-common</artifactId>
    <scope>provided</scope>
</dependency>
```
- If the deployment has a transitive dependency on base configuration or connector-related components, remove it by using an exclusion. The following example shows the syntax.
```
<dependency>
    <groupId>foo</groupId>
      <artifactId>bar</artifactId>
      <exclusions>
        <exclusion>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-common</artifactId>
       </exclusion>
    </exclusions>
</dependency>
```

Error: Could not parse type at position 50: expected but was . Input type string: ROW

Error details

When writing SQL in the SQL editor, a syntax check error (red wavy line) occurs when you use a UDTF.

Caused by: org.apache.flink.table.api.ValidationException: Could not parse type at position 50: <IDENTIFIER> expected but was <KEYWORD>. Input type string: ROW<resultId String,pointRange String,from String,to String,type String,pointScope String,userId String,point String,triggerSource String,time String,uuid String>

The following code is an example:

@FunctionHint(
    //input = @DataTypeHint("BYTES"),
    output = @DataTypeHint("ROW<resultId String,pointRange String,from String,to String,type String,pointScope String,userId String,point String,triggerSource String,time String,uuid String>"))
public class PointChangeMetaQPaser1 extends TableFunction<Row> {
    Logger logger = LoggerFactory.getLogger(this.getClass().getName());
    public void eval(byte[] bytes) {
        try {
            String messageBody = new String(bytes, "UTF-8");
            Map<String, String> resultDO = JSON.parseObject(messageBody, Map.class);
            logger.info("PointChangeMetaQPaser1 logger:" + JSON.toJSONString(resultDO));
            collect(Row.of(
                    getString(resultDO.get("resultId")),
                    getString(resultDO.get("pointRange")),
                    getString(resultDO.get("from")),
                    getString(resultDO.get("to")),
                    getString(resultDO.get("type")),
                    getString(resultDO.get("pointScope")),
                    getString(resultDO.get("userId")),
                    getString(resultDO.get("point")),
                    getString(resultDO.getOrDefault("triggerSource", "NULL")),
                    getString(resultDO.getOrDefault("time", String.valueOf(System.currentTimeMillis()))),
                    getString(resultDO.getOrDefault("uuid", String.valueOf(UUID.randomUUID())))
            ));
        } catch (Exception e) {
            logger.error("PointChangeMetaQPaser1 error", e);
        }
    }
    private String getString(Object o) {
        if (o == null) {
            return null;
        }
        return String.valueOf(o);
    }
}

Cause

When you use DataTypeHint to define function data types, a reserved keyword is used directly as a field name.
Solution
- Rename the fields to non-keywords. For example, rename to to fto and from to ffrom.
- Wrap fields that use reserved keywords in backticks ().

Error: "Invalid primary key. Column 'xxx' is nullable."

Cause

Flink mandates that all primary key columns be explicitly declared as NOT NULL. Even if the data contains no NULL values, Flink rejects the operation before writing if a primary key column in the table creation statement allows NULL values (for example, INT NULL). This is not a runtime error but a semantic check during the DDL parsing phase.
Solution

Declare the primary key columns mentioned in the error as NOT NULL and recreate the table.

JSON file opens in browser instead of downloading

Symptom

When you click to download a JSON file from the Artifacts page, the browser does not trigger a download. Instead, it opens a new tab and displays the JSON content directly.
Cause

The JSON file in OSS is missing the Content-Disposition: attachment HTTP response header. This causes the browser to display the file's content directly instead of downloading it.
Solution
- Option 1: Re-upload the file
  
  This issue has been fixed in platform version 4.5.0, but the fix only applies to files uploaded after this version was released. Files uploaded before this date must be handled manually.
- Option 2: Modify the OSS object metadata
  
  Manually modify the object metadata by adding the following standard HTTP attribute:
  - Header name: Content-Disposition
  - Header value: attachment
  For more information, see Manage object metadata.