This topic describes common errors in MaxFrame.
Problem 1: Error "invalid type INT for function UDF definition, you need to set odps.sql.type.system.odps2=true; to use it"
Cause: This error occurs because you are using MaxCompute V2.0 data types, but the MaxCompute V2.0 data type version is not enabled. This causes the job to fail during execution.
Solution: To resolve this issue, enable the MaxCompute V2.0 data type using a flag. The following example shows how to do this:
from maxframe import config # Add this before new_session config.options.sql.settings = { "odps.sql.type.system.odps2": "true" }
Problem 2: Error "UDF : No module named 'cloudpickle'"
Cause: The required cloudpickle package is missing.
Solution: To resolve this issue, reference the MaxCompute base image. The following example shows how to do this:
from maxframe import config # Add this before new_session config.options.sql.settings = { "odps.session.image": "common", }
Problem 3: How to reuse resources in a user-defined function (UDF) submitted by a DataFrame (apply)
In some user-defined function (UDF) scenarios, you may need to create or destroy multiple resources, such as initializing database connections or loading models. You may want these operations to occur only once when each UDF is loaded.
To reuse resources, you can use a Python feature where the default values for function parameters are initialized only once.
For example, in the following UDF, the model is loaded only once.
def predict(s, _ctx={}):
from ultralytics import YOLO
# The initial value of _ctx is an empty dict, which is initialized only once during Python execution.
# When using the model, check if it exists in _ctx. If not, load it and store it in the dict.
if not _ctx.get("model", None):
model = YOLO(os.path.join("./", "yolo11n.pt"))
_ctx["model"] = model
model = _ctx["model"]
# Then, call the relevant model APIs.
The following example shows a UDF that needs to destroy resources. This example uses a custom class named MyConnector to create and close database connections.
class MyConnector:
def __init__(self):
# Create the database connection in __init__
self.conn = create_connection()
def __del__(self):
# Close the database connection in __del__
try:
self.conn.close()
except:
pass
def process(s, connector=MyConnector()):
# Directly call the database connection within the connector. You do not need to create and close the connection again inside the UDF.
connector.conn.execute("xxxxx")
The number of times initialization runs depends on the number of UDF workers. Each worker has a separate Python environment. For example, if a UDF call processes 100,000 rows of data and the task is assigned to 10 UDF workers, each worker processes 10,000 rows. In this case, initialization runs a total of 10 times. For each worker, the initialization process runs only once.
Problem 4: How to update the MaxFrame version in DataWorks resource groups (exclusive and general-purpose)
Problem 5: Best practices for using MaxFrame custom images
Problem 6: Query error "ODPS-0130071:[0,0] Semantic analysis exception - physical plan generation failed: java.lang.RuntimeException: sequence_row_id cannot be applied because of : no CMF"
Solution: Add `index_col` to the query. The following example shows how to do this:
df2=md.read_odps_table("tablename",index_col="cloumn").to_pandas()
df2=reset_index(inplace=True)Problem 7: Error "Cannot determine dtypes by calculating with enumerate data, please specify it as arguments" when using methods with UDFs, such as apply
Cause: MaxFrame attempts to infer the DataFrame or Series type that the UDF returns. These types are then used to check and build the DataFrame or Series for subsequent calculations. However, dtypes may not be retrieved correctly in the following situations:
The UDF cannot run in the current environment. This may be because of dependencies on custom images, third-party libraries, or incorrect input parameters.
If
output_typeis specified, the function's actual return type may not match the specifiedoutput_type.
Solution: Modify the code or specify `dtypes` to inform MaxFrame of the UDF's return type. For example:
To return a DataFrame that contains one int column:
df.apply(..., dtypes=pd.Series([np.int_]), output_type="dataframe")To return a DataFrame that contains two columns, A and B:
df.apply(..., dtypes={"A": np.int_, "B": np.str_}, output_type="dataframe")To return a Series named flag with a bool type:
df.apply(..., dtype="bool", name="flag", output_type="series")
Problem 8: How to add a flag in the same way as in SQL
from maxframe import config
config.options.sql.settings = {
"odps.stage.mapper.split.size": "8",
"odps.stage.joiner.num": "20"
}Problem 9: How to reference third-party packages in MaxFrame development
For more information, see Reference third-party packages and images.
from maxframe.udf import with_resources
@with_resources("resource_name")
def process(row):
...
Problem 10: Task error "TypeError: Cannot accept arguments append_partitions"
Check your PyODPS version. You can resolve this issue by upgrading to version 0.12.0.
Problem 11: How to parse many JSON string fields
The software development kit (SDK) for MaxFrame V1.0.0 and later supports parsing multiple JSON string fields in the following way:
Problem 12: ODPS-0010000: Fuxi job failed - Job failed for unknown reason, cannot get jobstatus
Cause: The installation of dependencies fails when you use methods such as
@with_python_requirements. This failure prevents the job from running.Error message explanation:
ODPS-0010000: Fuxi job failed - Job failed for unknown reason, cannot get jobstatusYou can find more details in the stderr of the PythonPack Logview, such as the network connectivity failure shown below.
Solution:
This is an internal PythonPack error. The node that packages and installs Pip dependencies may be temporarily unable to access the dependency repository. First, retry the job. If the issue persists, contact the MaxFrame team.
For periodic jobs, you can cache the successful packaging result from PythonPack to ensure stability. You can then use the cached result in subsequent daily jobs. The following example shows how to cache the result:
from maxframe import options # Set the pythonpack result to prod. This way, subsequent jobs directly use the cached pythonpack result. options.pythonpack.task.settings = {"odps.pythonpack.production": "true"}To ignore the cache, add
force_rebuild=Truein@with_python_requirements.Alternatively, you can avoid using PythonPack to install dependencies. You can package the required dependencies offline, upload them as a MaxFrame resource, and then reference them in the job. MaxFrame automatically adds the dependencies to the callable context.
PyODPS-Pack is a tool that simplifies this process. PyODPS-Pack automatically loads a manylinux Docker container with the same environment for packaging to avoid compatibility issues. It currently runs on X86 Linux machines. Apple devices with M-series ARM chips are not supported at this time.
To use a MaxCompute resource in MaxFrame, use
@with_resources.
Problem 13: ODPS-0123055:User script exception
Cause: This is the most common type of error in MaxFrame. It occurs during the execution of a UDF in operators such as
apply,apply_chunk,flatmap,map, andtransform. The error message indicates that the submitted UDF threw a Python exception. The main causes are as follows:The code has a logical error. Review the code logic.
The error handling logic is flawed and throws an unhandled exception. Check whether the
try-exceptblock correctly handles all possible exceptions.The UDF accesses the network. By default, network access is disabled in MaxCompute UDF containers.
The output type declared with `dtype` or `dtypes` in the operator does not match the actual type returned by the UDF.
The UDF references dependencies that are missing from the runtime environment. This prevents the user's code from being deserialized correctly.
Error message explanation:
Most
ODPS-0123055:User script exceptionerrors are Python exceptions. You can check thestderrof the failed instance.For example, running a JSON load operation on a non-JSON string causes an error. This is a common issue in data processing.
def simple_failure(row): import json text = row["json_text"] data = json.loads(text) return data df = md.read_pandas(pd.DataFrame({"json_text": ["123", "456", "789"]})) df.apply( simple_failure, axis=1, dtypes={"text": np.str_}, output_type="dataframe" ).execute()The corresponding error message is shown below. The message clearly indicates that the error occurred in the
simple_failurefunction on line 5, which is the linedata = json.loads(text):ScriptError: ODPS-0123055: InstanceId: 20250622063246442gquihia95z2 ODPS-0123055:User script exception - Traceback (most recent call last): File "/home/admin/mf_udf_ref_20250622062907997gvwps9irzzc_user_udf_139907101614080.py", line 130, in wrapped return func(self, *args, **kw) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/admin/mf_udf_ref_20250622062907997gvwps9irzzc_user_udf_139907101614080.py", line 262, in process for result in self.user_func(*args): File "/home/admin/mf_udf_ref_20250622062907997gvwps9irzzc_user_udf_139907101614080.py", line 230, in user_func_caller _user_function_results = _user_function(data, *_args, **_kw_args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/var/folders/_8/v9wr7xm54bz0rj5pl4p9dkww0000gn/T/ipykernel_18735/2599074506.py", line 5, in simple_failure File "/usr/ali/python3.11.7/lib/python3.11/json/__init__.py", line 346, in loads return _default_decoder.decode(s) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/ali/python3.11.7/lib/python3.11/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/ali/python3.11.7/lib/python3.11/json/decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) | fatalInstance: Odps/meta_dev_20250622063246442gquihia95z2_SQL_0_0_0_job_0/M1#0_0If the function can be serialized correctly, you can usually find the source of the error by analyzing the stack trace.
If the UDF references dependencies that are missing from the runtime environment, it cannot be serialized correctly. The error message indicates that a dependency cannot be found. The message clearly identifies the object or dependency that caused the serialization to fail, as shown in the following example.
# Assume xxhash is installed locally and imported import xxhash def type_failure(row): # Reference xxhash in the UDF return str(xxhash.xxh64("hello maxfrmae")) df = md.read_pandas(pd.DataFrame(np.random.randn(3, 5), columns=list("ABCDE"))) df.apply( type_failure, axis=1, dtypes={"hash_value": np.str_}, output_type="dataframe" ).execute()The error produces the following exception stack. MaxFrame failed to
unpicklethe local function during runtime, and the messageNo module named 'xxhash'is displayed. This is a precise error message.File "/home/admin/mf_udf_ref_20250622070426909g26q6zdfar2_user_udf_140362144866304.py", line 209, in __init__ _user_function = cloudpickle.loads(base64.b64decode(b'gAWVnwIAAAAAAACMF2Nsb3VkcGlja2xlLmNsb3VkcGlja2xllIwOX21ha2VfZnVuY3Rpb26Uk5QoaACMDV9idWlsdGluX3R5cGWUk5SMCENvZGVUeXBllIWUUpQoSwFLAEsASwFLBUsDQ1CXAHQBAAAAAAAAAAAAAHQCAAAAAAAAAAAAAKACAAAAAAAAAAAAAAAAAAAAAAAAAABkAaYBAACrAQAAAAAAAAAApgEAAKsBAAAAAAAAAABTAJROjA5oZWxsbyBtYXhmcm1hZZSGlIwDc3RylIwGeHhoYXNolIwFeHhoNjSUh5SMA3Jvd5SFlIxNL3Zhci9mb2xkZXJzL184L3Y5d3I3eG01NGJ6MHJqNXBsNHA5ZGt3dzAwMDBnbi9UL2lweWtlcm5lbF8xODczNS81NTM2OTIzNjYucHmUjAx0eXBlX2ZhaWx1cmWUaBJLBEMdgADdCw6Ndo98inzQHCzRDy3UDy3RCy7UCy7QBC6UQwCUKSl0lFKUfZQojAtfX3BhY2thZ2VfX5ROjAhfX25hbWVfX5SMCF9fbWFpbl9flHVOTk50lFKUjBxjbG91ZHBpY2tsZS5jbG91ZHBpY2tsZV9mYXN0lIwSX2Z1bmN0aW9uX3NldHN0YXRllJOUaBx9lH2UKGgZaBKMDF9fcXVhbG5hbWVfX5RoEowPX19hbm5vdGF0aW9uc19flH2UjA5fX2t3ZGVmYXVsdHNfX5ROjAxfX2RlZmF1bHRzX1+UTowKX19tb2R1bGVfX5RoGowHX19kb2NfX5ROjAtfX2Nsb3N1cmVfX5ROjBdfY2xvdWRwaWNrbGVfc3VibW9kdWxlc5RdlIwLX19nbG9iYWxzX1+UfZRoDGgAjAlzdWJpbXBvcnSUk5RoDIWUUpRzdYaUhlIwLg=='), buffers=[ ]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^- File "/usr/ali/python3.11.7/lib/python3.11/site-packages/cloudpickle/cloudpickle.py", line 649, in subimport __import__(name) ModuleNotFoundError: No module named 'xxhash'The following sections describe the error codes and messages for other types of errors.
Incorrect return type:
def type_failure(row): text = row["A"] # Return a float return text df = md.read_pandas(pd.DataFrame(np.random.randn(3, 5), columns=list("ABCDE"))) # Declare that it returns a DataFrame containing a str column named A df.apply(type_failure, axis=1, dtypes={"A": np.str_}, output_type="dataframe").execute()The message indicates that a unicode (str) was expected, but a float was received. This information is usually specified by `dtypes` or `dtype`. Make sure the declared type matches the actual type returned by the function.
ScriptError: ODPS-0123055: InstanceId: 202506220642291g87d6xot20d ODPS-0123055:User script exception - Traceback (most recent call last): File "/home/admin/mf_udf_ref_20250622062907997gvwps9irzzc_user_udf_139905326100480.py", line 130, in wrapped return func(self, *args, **kw) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/admin/mf_udf_ref_20250622062907997gvwps9irzzc_user_udf_139905326100480.py", line 263, in process self.forward(*result) TypeError: return value expected <class 'unicode'> but <class 'float'> found, value: 1.8263596267666997 | fatalInstance: Odps/meta_dev_202506220642291g87d6xot20d_SQL_0_0_0_job_0/M1#0_0Access when the network is disabled
def request_aliyun_com(row): import requests url = "https://github.com/aliyun/alibabacloud-odps-maxframe-client" response = requests.get(url) return response.text df.apply( request_aliyun_com, axis=1, dtypes={"content": np.str_}, output_type="dataframe" ).execute()Corresponding error message:
ScriptError: ODPS-0123055: InstanceId: 20250622070516226gzo61d9idlr ODPS-0123055:User script exception - Traceback (most recent call last): File "/usr/ali/python3.11.7/lib/python3.11/site-packages/urllib3/connection.py", line 196, in _new_conn sock = connection.create_connection( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/ali/python3.11.7/lib/python3.11/site-packages/urllib3/util/connection.py", line 85, in create_connection raise err File "/usr/ali/python3.11.7/lib/python3.11/site-packages/urllib3/util/connection.py", line 73, in create_connection sock.connect(sa) ConnectionRefusedError: [Errno 111] Connection refused
Solution:
For exception errors, analyze the stack information to identify the function that caused the error, and then fix the function.
After you fix the error, you can test it locally. To do this, construct the corresponding data and call the function as a normal Python function. For example:
def udf_func(row): import json text = row["json_text"] data = json.loads(text) return data # Construct the input and test the function logic locally udf_func(pd.Series(['{"hello": "maxfrmae"}'], index=["json_text"]))For network access issues, you need to enable network access. For more information, see Network enablement process.
For deserialization failures, check whether any unexpected dependencies were introduced. Also, check whether the dependency that is indicated in the error message was correctly installed using PythonPack or included in the runtime environment as a resource.
Problem 14: ODPS-0123144: Fuxi job failed - kInstanceMonitorTimeout CRASH_EXIT, usually caused by bad udf performance
Cause: The UDF timed out.
Error message explanation
During UDF execution, you may encounter error messages such as
kInstanceMonitorTimeoutorCRASH_EXIT, usually caused by bad udf performance.This error usually means that the UDF timed out. In MaxCompute offline computing scenarios, UDF execution time is typically monitored by row batches. If a UDF does not finish processing a specified number of rows within a specified time, it times out and fails. The relevant configuration is as follows:
from maxframe import options options.sql.settings = { # Batch size. Default: 1024. Minimum: 1. "odps.sql.executionengine.batch.rowcount": "1", # Batch timeout. Default: 1800. Maximum: 3600. "odps.function.timeout": "3600", }Solution
Modify the batch size and batch timeout as needed.
Problem 15: ODPS-0123144: Fuxi job failed - fuxi job failed, Job exceed live limit
Cause: MaxCompute jobs have a maximum timeout period. The default is 24 hours. If a job runs for more than 24 hours, it fails.
Error message explanation
The CDN mapping system determines that the job has timed out and then terminates it. The job status changes to Failed. If the job runs on DataWorks, a different timeout period may apply. For more information, contact the DataWorks team. In this case, the job status is Canceled.
Solutions
You can adjust the maximum job runtime as needed.
from maxframe import options # Set the maximum survival time for the MaxFrame session options.session.max_alive_seconds = 72 * 60 * 60 # Set the maximum idle timeout for the MaxFrame session options.session.max_idle_seconds = 72 * 60 * 60 options.sql.settings = { # Set the maximum runtime for SQL jobs. Default: 24h. Maximum: 72h. "odps.sql.job.max.time.hours": 72, }
Problem 16: 0130071:[0,0] Semantic analysis exception - physical plan generation failed: unable to retrive row count of file pangu://xxx
Cause: This error may occur when you use flags such as
odps.sql.split.dopto specify the number of split tasks.Error message explanation
This error usually indicates that a meta file was not generated when the data was written to the source table. As a result, the metadata of the source table cannot be directly retrieved, and the source table cannot be accurately chunked.
Solution
Use the
odps.stage.mapper.split.sizeflag instead. The unit is megabytes (MB). The default value is 256, and the minimum value is 1.If precise chunking is required, consider regenerating the CMF. To do this, contact the MaxCompute team.
In addition, to ensure that a meta file is generated when writing to a table, you can add the following flags:
from maxframe import options options.sql.settings = { "odps.task.merge.enabled": "false", "odps.sql.reshuffle.dynamicpt": "false", "odps.sql.enable.dynaparts.stats.collection": "true", "odps.optimizer.dynamic.partition.is.first.nth.value.split.enable": "false", "odps.sql.stats.collection.aggressive":"true", }We will consider better ways to ensure the stability of the precise split feature in the future.
Problem 17: ODPS-0130071:[x,y] Semantic analysis exception
Cause: In a `ReadOdpsQuery` scenario, this error usually indicates a semantic problem with the SQL query itself.
Error message
This error usually indicates a semantic problem with the SQL statement.
Solution
Check the SQL syntax.
Upgrade the MaxFrame client by running the following command:
pip install --upgrade maxframe.If the issue persists, contact the MaxFrame team.
Problem 18: ODPS-0020041:StringOutOfMaxLength:String length X is larger than maximum Y
Cause: An oversized string is encountered when writing data to a table or during a shuffle process. The string length exceeds the maximum allowed length.
Error message explanation
To ensure computing stability, MaxCompute limits the maximum length of a single readable and writable string at the storage layer to 268,435,456 characters.
Solution
Consider truncating or discarding the data that may be causing the error. In ReadOdpsQuery, you can use LENGTH to filter the data.
Consider compressing the data before storing it, for example, using gzip. This can significantly reduce the string length and size.
def compress_string(input_string): """ Compresses a string using gzip. """ encoded_string = input_string.encode('utf-8') compressed_bytes = gzip.compress(encoded_string) return compressed_bytesContact the MaxCompute team for support with specific data.
Problem 19: ODPS-0010000:System internal error - fuxi job failed, caused by: process exited with code 0
Cause: A job that contains a UDF or AI function fails.
Error message explanation
This usually indicates that an out-of-memory (OOM) error occurred during the execution of the UDF or AI function.
Solution
Contact the MaxCompute team to confirm the actual memory usage.
Run the UDF or AI function with more memory.
For a UDF, you can use
@with_running_optionsto set the memory.@with_running_options(memory="8GB") def udf_func(row): return rowFor an AI function, you can set the memory in the function using
running_options={"memory": "8GB"}.
Problem 20: ODPS-0123131:User defined function exception - internal error - Fatal Error Happended
Cause: Reading from or writing to an external table.
Error message explanation: This usually indicates that an internal error occurred while reading from or writing to an external table.
Solution: Contact the MaxCompute team.
Problem 21: ODPS-0010000:System internal error - com.aliyun.odps.metadata.common.MetastoreServerException: 0420111:Database not found
Cause: The specified schema, project, or table information cannot be found when reading from or writing to a table.
Error message explanation: The metadata that the computation depends on cannot be found. As a result, the job cannot run.
Solution:
Check whether the project, schema, and table information used in the SQL is correct. If not, modify the information and retry the operation.
Contact the MaxCompute team.
Problem 22: ODPS-0010000:System internal error - fuxi job failed, caused by: process killed by signal 7
Cause: A job that contains a UDF fails.
Error message explanation: The UDF sends an abnormal signal during runtime.
Solution:
Check whether the UDF uses a signal to send a cancel, timeout, or other signal to the process.
Contact the MaxCompute team for troubleshooting.
Problem 23: ODPS-0010000:System internal error - fuxi job failed, caused by: StdException:vector::_M_range_insert
Cause: This error is related to a job that contains a UDF.
Error message explanation: The UDF cannot request enough memory at runtime, which causes the vector insertion to fail. Check the business code, dependency libraries, and memory settings.
Solution:
Check for memory issues in the UDF. Check whether native dependency libraries have memory issues and whether they are the latest versions. Increase the memory requested by the UDF.
Contact the MaxCompute team for troubleshooting.
Problem 24: ODPS-0130071:[0,0] Semantic analysis exception - physical plan generation failed: task:M1 instance count exceeds limit 99999
Cause: This can happen with any job if the source table is large. Incorrectly setting a split flag can also cause this issue.
Error message explanation
In a MaxCompute SQL job, if no settings are configured, the source table is chunked and processed in a distributed manner by default. The default chunk size is 256 MB. If the total number of chunks created exceeds 99,999, this error occurs.
Solution:
Use the
odps.stage.mapper.split.sizeflag. The unit is megabytes (MB). The default value is 256, and the minimum value is 1. You can set a larger value to ensure that the total number of chunks is less than 99,999.Use the
odps.sql.split.dopflag. The minimum value is 1. This flag specifies the expected target number of chunks.Due to various constraints, the final number of chunks may not equal the expected target number. Setting the number of chunks close to the upper limit may still cause an error. If both methods fail after you make multiple adjustments, contact the MaxCompute team.
Problem 25: ODPS-0110061:Failed to run ddltask - ODPS-0130131:Table not found
Cause: This error may occur in long-running MaxFrame jobs that run for more than one day.
Error message explanation
An internal MaxFrame Data Definition Language (DDL) task fails. This failure usually occurs when a single computation stage runs for more than 24 hours, which causes the table of an ancestor node in the same session to expire.
These tables are typically temporary tables created during the computation process, specifically, tables generated after a
df.execute()call. Sink tables specified byto_odps_tableusually do not have this problem.Solution:
Set a longer time-to-live (TTL) for temporary tables. The unit is days. By default, the TTL of a temporary table is one day. If a computing job has multiple operators that may run across different days, you must set this parameter.
options.sql.settings = { "session.temp_table_lifecycle": 3, }
Problem 26: NoTaskServerResponseError
Cause: In a Jupyter Notebook, you create a MaxFrame session and run some jobs. Then, you pause for more than 1 hour before running the next script. This error may occur.
Error message explanation
The MaxFrame session has expired and cannot be found.
Solution:
Recreate the session. However, the computation state from the previous cells will not be preserved.
If you expect a pause and want to continue running the job later, you must set the following parameter:
from maxframe import options # Set the expiration to 24 hours. The default is 1 hour. options.session.max_idle_seconds = 60 * 60 * 24
Problem 27: IntCastingNaNError: Cannot convert non-finite values (NA or inf) to integer: Error while type casting for column 'xx'
Cause: A column of type BIGINT or INT contains NULL or INF values, and the result is printed. This includes automatic printing in a Jupyter Notebook.
Error message explanation
MaxFrame data is built on DataFrames. When data is loaded locally, it is converted to a pandas DataFrame. In pandas, data of the BIGINT and INT types cannot be NULL. NULL values are treated as FLOAT.
Solution:
The MaxFrame team is working to resolve this issue. However, the type system is complex, and a clear timeline cannot be provided at this time. For now, you can consider the following methods:
Use `fillna` to fill NULL values before printing.
Use `astype` to convert to FLOAT before printing.
Do not print the column unless necessary.
Problem 28: ODPS-0010000:System internal error - fuxi job failed, SQL job failed after failover for too many times
Cause
The job includes a Reduce or Join operation.
A large `split.dop` value or a small `split.size` value is set, which generates many mapper instances.
A large `reducer.num` value or `joiner.num` value is set, which generates many reducer or joiner instances.
Error message explanation
The shuffle data is too large, which causes a Job Master out-of-memory (OOM) error.
Solution:
Reduce the number of mappers and reducers/joiners. The maximum number should not exceed 10,000.
Contact the MaxCompute team.
Problem 29: ODPS-0010000:System internal error - task/common/task_resource_helper.cpp(747): OdpsException: ODPS-0020011:Invalid parameter - Total resource size must be <= 2048MB
Cause: The job contains a UDF, and the UDF depends on a large resource.
Error message explanation
The total size of resources that a UDF can depend on is 2048 MB. Jobs that exceed this limit cannot run.
Solution:
Try to use external volume acceleration to download the corresponding resources from Object Storage Service (OSS). This method provides faster download speeds and higher limits.
Problem 30: ODPS-0130071:[22,132] Semantic analysis exception - column values_list in source has incompatible type ARRAY/MAP/STRUCT
Cause: The data being processed contains arrays, maps, or structs.
Error message explanation
This may be a type declaration issue. The expected target column is not of the array, map, or struct type.
This may be a bug in the MaxFrame type system.
Solution:
Upgrade the MaxFrame client by running
pip install -U maxframeand then retry the operation.Contact the MaxFrame team for troubleshooting.
Problem 31: Shuffle output too large
Solution: Use the `odps.sql.sys.flag.fuxi_JobMaxInternalFolderSize` flag to specify the shuffle space size in megabytes (MB).