MaxFrame offers an automatic packaging service to simplify the management of third-party packages in Python development. This service allows you to declare required external dependency packages during job development. When the job is running, the dependency files are automatically packaged and integrated into the job environment. You do not need to manually upload the packages. This simplifies package management.
Precautions
If you access MaxFrame by using the standard MaxFrame SDK (on-premises MaxFrame client), make sure that the version of the MaxFrame SDK is V0.1.0b5 or later. For more information about how to access MaxFrame, see Preparations.
Description of the automatic packaging service
MaxFrame allows you to use the with_python_requirements function to call the automatic packaging service. The following code provides an example of the syntax.
def with_python_requirements(
*requirements: str,
force_rebuild: bool = False,
prefer_binary: bool = False,
pre_release: bool = False,
): ...
Parameter description:
requirements: required. This parameter can be used to declare one or more external dependency packages. The declaration must comply with the PEP-508 specifications used by the package installer for Python, pip. Sample code:
@with_python_requirements("scikit_learn>1.0", "xgboost>1.0")force_rebuild: optional. This parameter specifies whether to perform repeated packaging if a third-party package already exists during automatic packaging. Valid values:
False: Packaging is not repeatedly performed, but the generated packages are saved as temporary resources that are deleted on a daily basis. This is the default value.
True: Packaging is repeatedly performed and resources of the latest version of the PyPI image are forcibly used and stored as long-term resources.
prefer_binary: optional. This parameter specifies whether to preferentially package binary wheel files. Valid values:
False: The system does not preferentially package binary wheel files. This is the default value.
True: The system preferentially packages binary wheel files.
Setting this parameter to True is equivalent to the effect of specifying --prefer-binary for pip, which preferentially packages binary wheel files. The packaging efficiency may be high but the latest version of packages may not be used.
pre_release: optional. This parameter specifies whether files that are pre-released in the alpha or beta phase can be packaged. Valid values:
false: Files that are pre-released in the alpha or beta phase cannot be packaged. This is the default value.
True: Files that are pre-released in the alpha or beta phase can be packaged.
Sample code
The following code provides an example on how to automatically package third-party packages.
from odps import ODPS
from odps.udf import annotate
import os
from odps import ODPS, options
import maxframe.dataframe as md
from maxframe import new_session
o = ODPS(
# Set the environment variable ALIBABA_CLOUD_ACCESS_KEY_ID to the AccessKey ID of your Alibaba Cloud account.
# Set the environment variable ALIBABA_CLOUD_ACCESS_KEY_SECRET to the AccessKey secret of your Alibaba Cloud account.
# We recommend that you do not directly use the strings of your AccessKey ID and AccessKey secret.
os.getenv('ALIBABA_CLOUD_ACCESS_KEY_ID'),
os.getenv('ALIBABA_CLOUD_ACCESS_KEY_SECRET'),
project='your-default-project',
endpoint='your-end-point'
)
session = new_session(o)
data = [["abcd"], ["efgh"], ["ijkl"], ["mno"]]
md_df = md.DataFrame(data, columns=["col1"])
# Call the automatic packaging service.
from maxframe.udf import with_python_requirements
@with_python_requirements("jieba==0.40 cloudpickle pandas")
def process(row):
import jieba
row["col1"] = row["col1"] + "_" + jieba.__version__
return row
md_result = (
md_df.apply(
process,
axis=1,
result_type="expand",
output_type="dataframe",
dtypes=md_df.dtypes.copy(),
)
.execute()
.fetch()
)