All Products
Search
Document Center

MaxCompute:MaxFrame FAQ

Last Updated:Dec 26, 2025

This topic describes common errors in MaxFrame.

Problem 1: Error "invalid type INT for function UDF definition, you need to set odps.sql.type.system.odps2=true; to use it"

  • Cause: This error occurs because you are using MaxCompute V2.0 data types, but the MaxCompute V2.0 data type version is not enabled. This causes the job to fail during execution.

  • Solution: To resolve this issue, enable the MaxCompute V2.0 data type using a flag. The following example shows how to do this:

    from maxframe import config
    # Add this before new_session
    config.options.sql.settings = {
      "odps.sql.type.system.odps2": "true"
    }
    

Problem 2: Error "UDF : No module named 'cloudpickle'"

  • Cause: The required cloudpickle package is missing.

  • Solution: To resolve this issue, reference the MaxCompute base image. The following example shows how to do this:

    from maxframe import config
    # Add this before new_session
    config.options.sql.settings = {
      "odps.session.image": "common",
    }
    

Problem 3: How to reuse resources in a user-defined function (UDF) submitted by a DataFrame (apply)

In some user-defined function (UDF) scenarios, you may need to create or destroy multiple resources, such as initializing database connections or loading models. You may want these operations to occur only once when each UDF is loaded.

To reuse resources, you can use a Python feature where the default values for function parameters are initialized only once.

For example, in the following UDF, the model is loaded only once.

def predict(s, _ctx={}):
  from ultralytics import YOLO
  # The initial value of _ctx is an empty dict, which is initialized only once during Python execution.
  # When using the model, check if it exists in _ctx. If not, load it and store it in the dict.
  if not _ctx.get("model", None):
    model = YOLO(os.path.join("./", "yolo11n.pt"))
    _ctx["model"] = model
  model = _ctx["model"]

  # Then, call the relevant model APIs.
  

The following example shows a UDF that needs to destroy resources. This example uses a custom class named MyConnector to create and close database connections.

class MyConnector:

  def __init__(self):
    # Create the database connection in __init__
    self.conn = create_connection()

  def __del__(self):
    # Close the database connection in __del__
    try:
      self.conn.close()
    except:
      pass


def process(s, connector=MyConnector()):
  # Directly call the database connection within the connector. You do not need to create and close the connection again inside the UDF.
  connector.conn.execute("xxxxx")
  
Note

The number of times initialization runs depends on the number of UDF workers. Each worker has a separate Python environment. For example, if a UDF call processes 100,000 rows of data and the task is assigned to 10 UDF workers, each worker processes 10,000 rows. In this case, initialization runs a total of 10 times. For each worker, the initialization process runs only once.

Problem 4: How to update the MaxFrame version in DataWorks resource groups (exclusive and general-purpose)

Problem 5: Best practices for using MaxFrame custom images

Problem 6: Query error "ODPS-0130071:[0,0] Semantic analysis exception - physical plan generation failed: java.lang.RuntimeException: sequence_row_id cannot be applied because of : no CMF"

Solution: Add `index_col` to the query. The following example shows how to do this:

df2=md.read_odps_table("tablename",index_col="cloumn").to_pandas()
df2=reset_index(inplace=True)

Problem 7: Error "Cannot determine dtypes by calculating with enumerate data, please specify it as arguments" when using methods with UDFs, such as apply

  • Cause: MaxFrame attempts to infer the DataFrame or Series type that the UDF returns. These types are then used to check and build the DataFrame or Series for subsequent calculations. However, dtypes may not be retrieved correctly in the following situations:

    • The UDF cannot run in the current environment. This may be because of dependencies on custom images, third-party libraries, or incorrect input parameters.

    • If output_type is specified, the function's actual return type may not match the specified output_type.

  • Solution: Modify the code or specify `dtypes` to inform MaxFrame of the UDF's return type. For example:

    • To return a DataFrame that contains one int column: df.apply(..., dtypes=pd.Series([np.int_]), output_type="dataframe")

    • To return a DataFrame that contains two columns, A and B: df.apply(..., dtypes={"A": np.int_, "B": np.str_}, output_type="dataframe")

    • To return a Series named flag with a bool type: df.apply(..., dtype="bool", name="flag", output_type="series")

Problem 8: How to add a flag in the same way as in SQL

from maxframe import config
config.options.sql.settings = {
    "odps.stage.mapper.split.size": "8",
    "odps.stage.joiner.num": "20"
}

Problem 9: How to reference third-party packages in MaxFrame development

For more information, see Reference third-party packages and images.

from maxframe.udf import with_resources


@with_resources("resource_name")
def process(row):
    ...
    

Problem 10: Task error "TypeError: Cannot accept arguments append_partitions"

Check your PyODPS version. You can resolve this issue by upgrading to version 0.12.0.

Problem 11: How to parse many JSON string fields

The software development kit (SDK) for MaxFrame V1.0.0 and later supports parsing multiple JSON string fields in the following way:

https://maxframe.readthedocs.io/en/latest/reference/dataframe/generated/maxframe.dataframe.Series.mf.flatjson.html

Problem 12: ODPS-0010000: Fuxi job failed - Job failed for unknown reason, cannot get jobstatus

  • Cause: The installation of dependencies fails when you use methods such as @with_python_requirements. This failure prevents the job from running.

  • Error message explanation: ODPS-0010000: Fuxi job failed - Job failed for unknown reason, cannot get jobstatus

    You can find more details in the stderr of the PythonPack Logview, such as the network connectivity failure shown below.

    Detailed error message

    ...
      File "/root/.venv/lib/python3.11/site-packages/odps/models/instance.py", line 469, in _call_with_retry
        return utils.call_with_retry(func, **retry_kw)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/root/.venv/lib/python3.11/site-packages/odps/utils.py", line 996, in call_with_retry
        return func(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^
      File "/root/.venv/lib/python3.11/site-packages/odps/models/instance.py", line 556, in _get_resp
        return self._client.get(self.resource(), action="taskstatus")
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/root/.venv/lib/python3.11/site-packages/odps/rest.py", line 332, in get
        return self.request(url, "get", stream=stream, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/root/.venv/lib/python3.11/site-packages/odps/rest.py", line 213, in request
        return self._request(url, method, stream=stream, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/root/.venv/lib/python3.11/site-packages/odps/rest.py", line 310, in _request
        res = self.session.send(
              ^^^^^^^^^^^^^^^^^^
      File "/root/.venv/lib/python3.11/site-packages/requests/sessions.py", line 703, in send
        r = adapter.send(request, **kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/root/.venv/lib/python3.11/site-packages/requests/adapters.py", line 700, in send
        raise ConnectionError(e, request=request)
    requests.exceptions.ConnectionError: HTTPConnectionPool(host='service.cn-beijing-intranet.maxcompute.aliyun-inc.com', port=80): Max retries exceeded with url: /api/projects/odps_monitor_odps_cn_beijing_i/instances/20250620142836623g059jg5q8pk?taskstatus&curr_project=odps_monitor_odps_cn_beijing_i (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x7f326dca0510>: Failed to resolve 'service.cn-beijing-intranet.maxcompute.aliyun-inc.com' ([Errno -2] Name or service not known)"))
    [2025-06-20 14:28:40,903 pythonpack.taskcaller WARNING] Set pack result: Unexpected error occurred, is_succeeded=False
    
  • Solution:

    1. This is an internal PythonPack error. The node that packages and installs Pip dependencies may be temporarily unable to access the dependency repository. First, retry the job. If the issue persists, contact the MaxFrame team.

    2. For periodic jobs, you can cache the successful packaging result from PythonPack to ensure stability. You can then use the cached result in subsequent daily jobs. The following example shows how to cache the result:

      from maxframe import options
      # Set the pythonpack result to prod. This way, subsequent jobs directly use the cached pythonpack result.
      options.pythonpack.task.settings = {"odps.pythonpack.production": "true"}
      

      To ignore the cache, add force_rebuild=True in @with_python_requirements.

    3. Alternatively, you can avoid using PythonPack to install dependencies. You can package the required dependencies offline, upload them as a MaxFrame resource, and then reference them in the job. MaxFrame automatically adds the dependencies to the callable context.

      PyODPS-Pack is a tool that simplifies this process. PyODPS-Pack automatically loads a manylinux Docker container with the same environment for packaging to avoid compatibility issues. It currently runs on X86 Linux machines. Apple devices with M-series ARM chips are not supported at this time.

      To use a MaxCompute resource in MaxFrame, use @with_resources.

Problem 13: ODPS-0123055:User script exception

  • Cause: This is the most common type of error in MaxFrame. It occurs during the execution of a UDF in operators such as apply, apply_chunk, flatmap, map, and transform. The error message indicates that the submitted UDF threw a Python exception. The main causes are as follows:

    • The code has a logical error. Review the code logic.

    • The error handling logic is flawed and throws an unhandled exception. Check whether the try-except block correctly handles all possible exceptions.

    • The UDF accesses the network. By default, network access is disabled in MaxCompute UDF containers.

    • The output type declared with `dtype` or `dtypes` in the operator does not match the actual type returned by the UDF.

    • The UDF references dependencies that are missing from the runtime environment. This prevents the user's code from being deserialized correctly.

  • Error message explanation:

    Most ODPS-0123055:User script exception errors are Python exceptions. You can check the stderr of the failed instance.

    For example, running a JSON load operation on a non-JSON string causes an error. This is a common issue in data processing.

    def simple_failure(row):
        import json
    
        text = row["json_text"]
        data = json.loads(text)
        return data
    
    
    df = md.read_pandas(pd.DataFrame({"json_text": ["123", "456", "789"]}))
    df.apply(
        simple_failure, axis=1, dtypes={"text": np.str_}, output_type="dataframe"
    ).execute()
    

    The corresponding error message is shown below. The message clearly indicates that the error occurred in the simple_failure function on line 5, which is the line data = json.loads(text):

    ScriptError: ODPS-0123055: InstanceId: 20250622063246442gquihia95z2
    ODPS-0123055:User script exception - Traceback (most recent call last):
      File "/home/admin/mf_udf_ref_20250622062907997gvwps9irzzc_user_udf_139907101614080.py", line 130, in wrapped
        return func(self, *args, **kw)
               ^^^^^^^^^^^^^^^^^^^^^^^
      File "/home/admin/mf_udf_ref_20250622062907997gvwps9irzzc_user_udf_139907101614080.py", line 262, in process
        for result in self.user_func(*args):
      File "/home/admin/mf_udf_ref_20250622062907997gvwps9irzzc_user_udf_139907101614080.py", line 230, in user_func_caller
        _user_function_results = _user_function(data, *_args, **_kw_args)
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/var/folders/_8/v9wr7xm54bz0rj5pl4p9dkww0000gn/T/ipykernel_18735/2599074506.py", line 5, in simple_failure
      File "/usr/ali/python3.11.7/lib/python3.11/json/__init__.py", line 346, in loads
        return _default_decoder.decode(s)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/ali/python3.11.7/lib/python3.11/json/decoder.py", line 337, in decode
        obj, end = self.raw_decode(s, idx=_w(s, 0).end())
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/ali/python3.11.7/lib/python3.11/json/decoder.py", line 355, in raw_decode
        raise JSONDecodeError("Expecting value", s, err.value) from None
    json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
     | fatalInstance: Odps/meta_dev_20250622063246442gquihia95z2_SQL_0_0_0_job_0/M1#0_0
    

    If the function can be serialized correctly, you can usually find the source of the error by analyzing the stack trace.

    If the UDF references dependencies that are missing from the runtime environment, it cannot be serialized correctly. The error message indicates that a dependency cannot be found. The message clearly identifies the object or dependency that caused the serialization to fail, as shown in the following example.

    # Assume xxhash is installed locally and imported
    import xxhash
    
    
    def type_failure(row):
        # Reference xxhash in the UDF
        return str(xxhash.xxh64("hello maxfrmae"))
    
    
    df = md.read_pandas(pd.DataFrame(np.random.randn(3, 5), columns=list("ABCDE")))
    df.apply(
        type_failure, axis=1, dtypes={"hash_value": np.str_}, output_type="dataframe"
    ).execute()
    

    The error produces the following exception stack. MaxFrame failed to unpickle the local function during runtime, and the message No module named 'xxhash' is displayed. This is a precise error message.

    File "/home/admin/mf_udf_ref_20250622070426909g26q6zdfar2_user_udf_140362144866304.py", line 209, in __init__
    
        _user_function = cloudpickle.loads(base64.b64decode(b'gAWVnwIAAAAAAACMF2Nsb3VkcGlja2xlLmNsb3VkcGlja2xllIwOX21ha2VfZnVuY3Rpb26Uk5QoaACMDV9idWlsdGluX3R5cGWUk5SMCENvZGVUeXBllIWUUpQoSwFLAEsASwFLBUsDQ1CXAHQBAAAAAAAAAAAAAHQCAAAAAAAAAAAAAKACAAAAAAAAAAAAAAAAAAAAAAAAAABkAaYBAACrAQAAAAAAAAAApgEAAKsBAAAAAAAAAABTAJROjA5oZWxsbyBtYXhmcm1hZZSGlIwDc3RylIwGeHhoYXNolIwFeHhoNjSUh5SMA3Jvd5SFlIxNL3Zhci9mb2xkZXJzL184L3Y5d3I3eG01NGJ6MHJqNXBsNHA5ZGt3dzAwMDBnbi9UL2lweWtlcm5lbF8xODczNS81NTM2OTIzNjYucHmUjAx0eXBlX2ZhaWx1cmWUaBJLBEMdgADdCw6Ndo98inzQHCzRDy3UDy3RCy7UCy7QBC6UQwCUKSl0lFKUfZQojAtfX3BhY2thZ2VfX5ROjAhfX25hbWVfX5SMCF9fbWFpbl9flHVOTk50lFKUjBxjbG91ZHBpY2tsZS5jbG91ZHBpY2tsZV9mYXN0lIwSX2Z1bmN0aW9uX3NldHN0YXRllJOUaBx9lH2UKGgZaBKMDF9fcXVhbG5hbWVfX5RoEowPX19hbm5vdGF0aW9uc19flH2UjA5fX2t3ZGVmYXVsdHNfX5ROjAxfX2RlZmF1bHRzX1+UTowKX19tb2R1bGVfX5RoGowHX19kb2NfX5ROjAtfX2Nsb3N1cmVfX5ROjBdfY2xvdWRwaWNrbGVfc3VibW9kdWxlc5RdlIwLX19nbG9iYWxzX1+UfZRoDGgAjAlzdWJpbXBvcnSUk5RoDIWUUpRzdYaUhlIwLg=='), buffers=[ ])
    
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^-
      File "/usr/ali/python3.11.7/lib/python3.11/site-packages/cloudpickle/cloudpickle.py", line 649, in subimport
        __import__(name)
    ModuleNotFoundError: No module named 'xxhash'
    
  • The following sections describe the error codes and messages for other types of errors.

    • Incorrect return type:

      def type_failure(row):
          text = row["A"]
          # Return a float
          return text
      
      df = md.read_pandas(pd.DataFrame(np.random.randn(3, 5), columns=list("ABCDE")))
      
      # Declare that it returns a DataFrame containing a str column named A
      df.apply(type_failure, axis=1, dtypes={"A": np.str_}, output_type="dataframe").execute()
      

      The message indicates that a unicode (str) was expected, but a float was received. This information is usually specified by `dtypes` or `dtype`. Make sure the declared type matches the actual type returned by the function.

      ScriptError: ODPS-0123055: InstanceId: 202506220642291g87d6xot20d
      ODPS-0123055:User script exception - Traceback (most recent call last):
        File "/home/admin/mf_udf_ref_20250622062907997gvwps9irzzc_user_udf_139905326100480.py", line 130, in wrapped
          return func(self, *args, **kw)
                 ^^^^^^^^^^^^^^^^^^^^^^^
        File "/home/admin/mf_udf_ref_20250622062907997gvwps9irzzc_user_udf_139905326100480.py", line 263, in process
          self.forward(*result)
      TypeError: return value expected <class 'unicode'> but <class 'float'> found, value: 1.8263596267666997
       | fatalInstance: Odps/meta_dev_202506220642291g87d6xot20d_SQL_0_0_0_job_0/M1#0_0
      
      • Access when the network is disabled

      def request_aliyun_com(row):
          import requests
      
          url = "https://github.com/aliyun/alibabacloud-odps-maxframe-client"
          response = requests.get(url)
          return response.text
      
      
      df.apply(
          request_aliyun_com, axis=1, dtypes={"content": np.str_}, output_type="dataframe"
      ).execute()
      

      Corresponding error message:

      ScriptError: ODPS-0123055: InstanceId: 20250622070516226gzo61d9idlr
      ODPS-0123055:User script exception - Traceback (most recent call last):
        File "/usr/ali/python3.11.7/lib/python3.11/site-packages/urllib3/connection.py", line 196, in _new_conn
          sock = connection.create_connection(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/usr/ali/python3.11.7/lib/python3.11/site-packages/urllib3/util/connection.py", line 85, in create_connection
          raise err
        File "/usr/ali/python3.11.7/lib/python3.11/site-packages/urllib3/util/connection.py", line 73, in create_connection
          sock.connect(sa)
      ConnectionRefusedError: [Errno 111] Connection refused
      
  • Solution:

    1. For exception errors, analyze the stack information to identify the function that caused the error, and then fix the function.

    2. After you fix the error, you can test it locally. To do this, construct the corresponding data and call the function as a normal Python function. For example:

    def udf_func(row):
        import json
    
        text = row["json_text"]
        data = json.loads(text)
        return data
    
    # Construct the input and test the function logic locally
    udf_func(pd.Series(['{"hello": "maxfrmae"}'], index=["json_text"]))
    
    1. For network access issues, you need to enable network access. For more information, see Network enablement process.

    2. For deserialization failures, check whether any unexpected dependencies were introduced. Also, check whether the dependency that is indicated in the error message was correctly installed using PythonPack or included in the runtime environment as a resource.

Problem 14: ODPS-0123144: Fuxi job failed - kInstanceMonitorTimeout CRASH_EXIT, usually caused by bad udf performance

  • Cause: The UDF timed out.

  • Error message explanation

    During UDF execution, you may encounter error messages such as kInstanceMonitorTimeout or CRASH_EXIT, usually caused by bad udf performance.

    This error usually means that the UDF timed out. In MaxCompute offline computing scenarios, UDF execution time is typically monitored by row batches. If a UDF does not finish processing a specified number of rows within a specified time, it times out and fails. The relevant configuration is as follows:

    from maxframe import options
    options.sql.settings = {
        # Batch size. Default: 1024. Minimum: 1.
        "odps.sql.executionengine.batch.rowcount": "1",
        # Batch timeout. Default: 1800. Maximum: 3600.
        "odps.function.timeout": "3600",
    }
    
  • Solution

    Modify the batch size and batch timeout as needed.

Problem 15: ODPS-0123144: Fuxi job failed - fuxi job failed, Job exceed live limit

  • Cause: MaxCompute jobs have a maximum timeout period. The default is 24 hours. If a job runs for more than 24 hours, it fails.

  • Error message explanation

    The CDN mapping system determines that the job has timed out and then terminates it. The job status changes to Failed. If the job runs on DataWorks, a different timeout period may apply. For more information, contact the DataWorks team. In this case, the job status is Canceled.

  • Solutions

    You can adjust the maximum job runtime as needed.

    from maxframe import options
    
    # Set the maximum survival time for the MaxFrame session
    options.session.max_alive_seconds = 72 * 60 * 60
    # Set the maximum idle timeout for the MaxFrame session
    options.session.max_idle_seconds = 72 * 60 * 60
    options.sql.settings = {
        # Set the maximum runtime for SQL jobs. Default: 24h. Maximum: 72h.
        "odps.sql.job.max.time.hours": 72,
    }
    

Problem 16: 0130071:[0,0] Semantic analysis exception - physical plan generation failed: unable to retrive row count of file pangu://xxx

  • Cause: This error may occur when you use flags such as odps.sql.split.dop to specify the number of split tasks.

  • Error message explanation

    This error usually indicates that a meta file was not generated when the data was written to the source table. As a result, the metadata of the source table cannot be directly retrieved, and the source table cannot be accurately chunked.

  • Solution

    1. Use the odps.stage.mapper.split.size flag instead. The unit is megabytes (MB). The default value is 256, and the minimum value is 1.

    2. If precise chunking is required, consider regenerating the CMF. To do this, contact the MaxCompute team.

      In addition, to ensure that a meta file is generated when writing to a table, you can add the following flags:

      from maxframe import options
      
      options.sql.settings = {
          "odps.task.merge.enabled": "false",
          "odps.sql.reshuffle.dynamicpt": "false",
          "odps.sql.enable.dynaparts.stats.collection": "true",
          "odps.optimizer.dynamic.partition.is.first.nth.value.split.enable": "false",
          "odps.sql.stats.collection.aggressive":"true",
      }
      

      We will consider better ways to ensure the stability of the precise split feature in the future.

Problem 17: ODPS-0130071:[x,y] Semantic analysis exception

  • Cause: In a `ReadOdpsQuery` scenario, this error usually indicates a semantic problem with the SQL query itself.

  • Error message

    This error usually indicates a semantic problem with the SQL statement.

  • Solution

    1. Check the SQL syntax.

    2. Upgrade the MaxFrame client by running the following command: pip install --upgrade maxframe.

    3. If the issue persists, contact the MaxFrame team.

Problem 18: ODPS-0020041:StringOutOfMaxLength:String length X is larger than maximum Y

  • Cause: An oversized string is encountered when writing data to a table or during a shuffle process. The string length exceeds the maximum allowed length.

  • Error message explanation

    To ensure computing stability, MaxCompute limits the maximum length of a single readable and writable string at the storage layer to 268,435,456 characters.

  • Solution

    1. Consider truncating or discarding the data that may be causing the error. In ReadOdpsQuery, you can use LENGTH to filter the data.

    2. Consider compressing the data before storing it, for example, using gzip. This can significantly reduce the string length and size.

    def compress_string(input_string):
      """
        Compresses a string using gzip.
        """
      encoded_string = input_string.encode('utf-8')
      compressed_bytes = gzip.compress(encoded_string)
      return compressed_bytes
    
    1. Contact the MaxCompute team for support with specific data.

Problem 19: ODPS-0010000:System internal error - fuxi job failed, caused by: process exited with code 0

  • Cause: A job that contains a UDF or AI function fails.

  • Error message explanation

    This usually indicates that an out-of-memory (OOM) error occurred during the execution of the UDF or AI function.

  • Solution

    1. Contact the MaxCompute team to confirm the actual memory usage.

    2. Run the UDF or AI function with more memory.

      For a UDF, you can use @with_running_options to set the memory.

      @with_running_options(memory="8GB")
      def udf_func(row):
          return row
      

      For an AI function, you can set the memory in the function using running_options={"memory": "8GB"}.

Problem 20: ODPS-0123131:User defined function exception - internal error - Fatal Error Happended

  • Cause: Reading from or writing to an external table.

  • Error message explanation: This usually indicates that an internal error occurred while reading from or writing to an external table.

  • Solution: Contact the MaxCompute team.

Problem 21: ODPS-0010000:System internal error - com.aliyun.odps.metadata.common.MetastoreServerException: 0420111:Database not found

  • Cause: The specified schema, project, or table information cannot be found when reading from or writing to a table.

  • Error message explanation: The metadata that the computation depends on cannot be found. As a result, the job cannot run.

  • Solution:

    1. Check whether the project, schema, and table information used in the SQL is correct. If not, modify the information and retry the operation.

    2. Contact the MaxCompute team.

Problem 22: ODPS-0010000:System internal error - fuxi job failed, caused by: process killed by signal 7

  • Cause: A job that contains a UDF fails.

  • Error message explanation: The UDF sends an abnormal signal during runtime.

  • Solution:

    1. Check whether the UDF uses a signal to send a cancel, timeout, or other signal to the process.

    2. Contact the MaxCompute team for troubleshooting.

Problem 23: ODPS-0010000:System internal error - fuxi job failed, caused by: StdException:vector::_M_range_insert

  • Cause: This error is related to a job that contains a UDF.

  • Error message explanation: The UDF cannot request enough memory at runtime, which causes the vector insertion to fail. Check the business code, dependency libraries, and memory settings.

  • Solution:

    1. Check for memory issues in the UDF. Check whether native dependency libraries have memory issues and whether they are the latest versions. Increase the memory requested by the UDF.

    2. Contact the MaxCompute team for troubleshooting.

Problem 24: ODPS-0130071:[0,0] Semantic analysis exception - physical plan generation failed: task:M1 instance count exceeds limit 99999

  • Cause: This can happen with any job if the source table is large. Incorrectly setting a split flag can also cause this issue.

  • Error message explanation

    In a MaxCompute SQL job, if no settings are configured, the source table is chunked and processed in a distributed manner by default. The default chunk size is 256 MB. If the total number of chunks created exceeds 99,999, this error occurs.

  • Solution:

    1. Use the odps.stage.mapper.split.size flag. The unit is megabytes (MB). The default value is 256, and the minimum value is 1. You can set a larger value to ensure that the total number of chunks is less than 99,999.

    2. Use the odps.sql.split.dop flag. The minimum value is 1. This flag specifies the expected target number of chunks.

    3. Due to various constraints, the final number of chunks may not equal the expected target number. Setting the number of chunks close to the upper limit may still cause an error. If both methods fail after you make multiple adjustments, contact the MaxCompute team.

Problem 25: ODPS-0110061:Failed to run ddltask - ODPS-0130131:Table not found

  • Cause: This error may occur in long-running MaxFrame jobs that run for more than one day.

  • Error message explanation

    An internal MaxFrame Data Definition Language (DDL) task fails. This failure usually occurs when a single computation stage runs for more than 24 hours, which causes the table of an ancestor node in the same session to expire.

    These tables are typically temporary tables created during the computation process, specifically, tables generated after a df.execute() call. Sink tables specified by to_odps_table usually do not have this problem.

  • Solution:

    Set a longer time-to-live (TTL) for temporary tables. The unit is days. By default, the TTL of a temporary table is one day. If a computing job has multiple operators that may run across different days, you must set this parameter.

    options.sql.settings = {
        "session.temp_table_lifecycle": 3,
    }
    

Problem 26: NoTaskServerResponseError

  • Cause: In a Jupyter Notebook, you create a MaxFrame session and run some jobs. Then, you pause for more than 1 hour before running the next script. This error may occur.

  • Error message explanation

    The MaxFrame session has expired and cannot be found.

  • Solution:

    1. Recreate the session. However, the computation state from the previous cells will not be preserved.

    2. If you expect a pause and want to continue running the job later, you must set the following parameter:

    from maxframe import options
    
    # Set the expiration to 24 hours. The default is 1 hour.
    options.session.max_idle_seconds = 60 * 60 * 24
    

Problem 27: IntCastingNaNError: Cannot convert non-finite values (NA or inf) to integer: Error while type casting for column 'xx'

  • Cause: A column of type BIGINT or INT contains NULL or INF values, and the result is printed. This includes automatic printing in a Jupyter Notebook.

  • Error message explanation

    MaxFrame data is built on DataFrames. When data is loaded locally, it is converted to a pandas DataFrame. In pandas, data of the BIGINT and INT types cannot be NULL. NULL values are treated as FLOAT.

  • Solution:

    The MaxFrame team is working to resolve this issue. However, the type system is complex, and a clear timeline cannot be provided at this time. For now, you can consider the following methods:

    1. Use `fillna` to fill NULL values before printing.

    2. Use `astype` to convert to FLOAT before printing.

    3. Do not print the column unless necessary.

Problem 28: ODPS-0010000:System internal error - fuxi job failed, SQL job failed after failover for too many times

  • Cause

    1. The job includes a Reduce or Join operation.

    2. A large `split.dop` value or a small `split.size` value is set, which generates many mapper instances.

    3. A large `reducer.num` value or `joiner.num` value is set, which generates many reducer or joiner instances.

  • Error message explanation

    The shuffle data is too large, which causes a Job Master out-of-memory (OOM) error.

  • Solution:

    1. Reduce the number of mappers and reducers/joiners. The maximum number should not exceed 10,000.

    2. Contact the MaxCompute team.

Problem 29: ODPS-0010000:System internal error - task/common/task_resource_helper.cpp(747): OdpsException: ODPS-0020011:Invalid parameter - Total resource size must be <= 2048MB

  • Cause: The job contains a UDF, and the UDF depends on a large resource.

  • Error message explanation

    The total size of resources that a UDF can depend on is 2048 MB. Jobs that exceed this limit cannot run.

  • Solution:

    Try to use external volume acceleration to download the corresponding resources from Object Storage Service (OSS). This method provides faster download speeds and higher limits.

Problem 30: ODPS-0130071:[22,132] Semantic analysis exception - column values_list in source has incompatible type ARRAY/MAP/STRUCT

  • Cause: The data being processed contains arrays, maps, or structs.

  • Error message explanation

    1. This may be a type declaration issue. The expected target column is not of the array, map, or struct type.

    2. This may be a bug in the MaxFrame type system.

  • Solution:

    1. Upgrade the MaxFrame client by running pip install -U maxframe and then retry the operation.

    2. Contact the MaxFrame team for troubleshooting.

Problem 31: Shuffle output too large

Solution: Use the `odps.sql.sys.flag.fuxi_JobMaxInternalFolderSize` flag to specify the shuffle space size in megabytes (MB).