This topic provides answers to some frequently asked questions about MaxCompute user-defined functions (UDFs) that are written in Java.

Class- or dependency-related issues

When you call a MaxCompute UDF, the following issues related to classes or dependencies may occur:
  • Issue 1: The error message ClassNotFoundException or Some dependencies are missing appears.
    • Causes:
      • Cause 1: The JAR file that is specified when you create the UDF is invalid.
      • Cause 2: One or more JAR files on which the UDF depends are not uploaded to MaxCompute. For example, the required third-party package is not uploaded.
      • Cause 3: The UDF is not called in the MaxCompute project in which the UDF is created. For example, a MaxCompute UDF is created in a development project but is called in a production project.
      • Cause 4: The required file does not exist or the resource type is invalid. For example, the type of the uploaded file is PY, but the file type specified in get_cache_file of the UDF code is FILE.
    • Solutions:
      • Solution to Cause 1: Check the content of the JAR file and confirm that the JAR file contains all the required classes. Then, repackage resources into a JAR file, and upload the file to your MaxCompute project. For more information about how to package resources into files and upload the files to your MaxCompute project, see Package, upload, and register a Java program.
      • Solution to Cause 2: Upload the required third-party package as a resource to the MaxCompute project. Then, add this package to the resource list when you create the UDF. For more information about how to upload resources and create UDFs, see Add resources and Create a function.
      • Solution to Cause 3: On the MaxCompute client, run the list functions; command for the project in which the MaxCompute UDF is called. Then, confirm that the MaxCompute UDF is displayed in the command output and the classes and required resources of the MaxCompute UDF are valid.
      • Solution to Cause 4: On the MaxCompute client, run the desc function <function_name>; command. Then, confirm that all the required files are displayed in the resource list in the command output. If the type of the uploaded file is inconsistent with the file type specified in get_cache_file, you can run the add <file_type> <file_name>; command to add the required file.
  • Issue 2: The error message NoClassDefFoundError or NoSuchMethodError appears.
    • Causes:
      • Cause 1: The version of the third-party library in the uploaded JAR file is inconsistent with the version of the built-in third-party library in MaxCompute.
      • Cause 2: This issue occurs due to sandbox limits. The detailed error information java.security.AccessControlException: access denied ("java.lang.RuntimePermission" "createClassLoader") appears in the standard error of the job instance. The information indicates that this issue occurs due to sandbox limits. When the UDF runs in a distributed environment, the UDF is subject to sandbox limits. For more information about the sandbox limits, see Java sandbox.
    • Solutions:
      • Solution to Cause 1: Use maven-shade-plugin to resolve the version inconsistency issue, change the path for storing JAR files of the UDF, repackage resources into a JAR file, and then upload the file to your MaxCompute project. For more information about how to package resources into files and upload the files to your MaxCompute project, see Package, upload, and register a Java program.
      • Solution to Cause 2: Follow the instructions provided in Issues related to Java sandbox limits.

Issues related to Java sandbox limits

  • Issue 1: An error occurs when a MaxCompute UDF is called to access local files, access the Internet, access a distributed file system, or create a Java thread.
  • Causes:
    • Cause 1: The error occurs due to sandbox limits when the MaxCompute UDF runs in a distributed environment. For more information about the sandbox limits, see Java sandbox.
    • Cause 2: The MaxCompute UDF does not support access to the Internet.
  • Solutions:
    • Solution to Cause 1: Run the set odps.isolation.session.enable=true; command to enable a sandbox at the session level. If the issue persists after the sandbox is enabled, fill in an application form to join the related DingTalk group and feed back the issue to the MaxCompute technical support team.
    • Solution to Cause 2: Fill in and submit a network connection application form based on your business requirements. After the MaxCompute technical support team receives the application, the team contacts you for establishing network connections. For more information about how to fill in application forms, see Network connection process.

Performance-related issues

When you call a MaxCompute UDF, the following performance issues may occur:
  • Issue 1: The error message kInstanceMonitorTimeout appears.
    • Cause: Data processing of the MaxCompute UDF times out. By default, the duration in which the UDF processes data is limited. In most cases, a UDF must process 1024 rows of data at a time within 1800s. This duration is not the total duration in which a worker runs but the duration in which the UDF processes a small batch of data records at a time. In most cases, MaxCompute SQL can process more than 10,000 rows of data per second. This limit aims only to prevent infinite loops in a MaxCompute UDF. If an infinite loop occurs, CPU resources are occupied for a long period of time.
    • Solution:
      • If MaxCompute needs to process a large amount of data, you can call ExecutionContext.claimAlive in the Java class method of the MaxCompute UDF to reset the timer.
      • Optimize the logic of the MaxCompute UDF code. After the optimization, you can configure the following parameters at the session level to adjust the operation of the MaxCompute UDF before you call the MaxCompute UDF. This way, the data processing is accelerated.
        Parameter Description
        set odps.function.timeout=xxx; The timeout period for running a UDF. Default value: 1800. Unit: second. You can increase the value of this parameter based on your business requirements. Valid values: 1 to 3600.
        set odps.stage.mapper.split.size=xxx; The input data amount of a Map worker. Default value: 256. Unit: MB. You can decrease the value of this parameter based on your business requirements.
        set odps.sql.executionengine.batch.rowcount=xxx; The number of rows that MaxCompute can process at a time. Default value: 1024. You can decrease the value of this parameter based on your business requirements.
  • Issue 2: The error message errMsg:SigKill(OOM) or OutOfMemoryError appears.
    • Cause: MaxCompute runs jobs in three stages: Map, Reduce, and Join. If MaxCompute processes a large amount of data, the data processing of each instance at each stage is time-consuming.
    • Solution:
      • If the error is reported for the fuxi or runtime code, you can configure the following resource parameters to accelerate the data processing.
        Parameter Description
        set odps.stage.mapper.mem=xxx; The memory size of a Map worker. Default value: 1024. Unit: MB. You can increase the value of this parameter based on your business requirements.
        set odps.stage.reducer.mem=xxx; The memory size of a Reduce worker. Default value: 1024. Unit: MB. You can increase the value of this parameter based on your business requirements.
        set odps.stage.joiner.mem=xxx; The memory size of a Join worker. Default value: 1024. Unit: MB. You can increase the value of this parameter based on your business requirements.
        set odps.stage.mapper.split.size=xxx; The input data amount of a Map worker. Default value: 256. Unit: MB. You can increase the value of this parameter based on your business requirements.
        set odps.stage.reducer.num=xxx; The number of Reduce workers. You can increase the value of this parameter based on your business requirements.
        set odps.stage.joiner.num=xxx; The number of Join workers. You can increase the value of this parameter based on your business requirements.
      • If this error is reported for Java code, you can adjust the preceding parameters and run the set odps.sql.udf.jvm.memory=xxx; command to increase the Java virtual machine (JVM) memory size.

For more information about the parameters, see SET operations.

UDTF-related issues

When you call a Java UDTF, the following issues may occur:
  • Issue 1: The error message Semantic analysis exception - only a single expression in the SELECT clause is supported with UDTF's appears when you call a UDTF.
    • Cause: Columns or expressions are specified in the SELECT statement in which the UDTF is called. The following sample code shows an incorrect SQL statement:
      select b.*, 'x', udtffunction_name(v) from table lateral view udtffunction_name(v) b as f1, f2;
    • Solution: You can use the Java UDTF with LATERAL VIEW in the SELECT statement. Sample statement:
      select b.*, 'x' from table lateral view udtffunction_name(v) b as f1, f2;
  • Issue 2: The error message Semantic analysis exception - expect 2 aliases but have 0 appears.
    • Cause: The aliases of the output columns are not specified in the SELECT statement in which the UDTF is called.
    • Solution: You can use the AS clause to specify the aliases of output columns in the SELECT statement in which the Java UDTF is called. Sample statement:
      select udtffunction_name(paramname) as (col1, col2);