The custom algorithm upload feature of Machine Learning Studio allows you to develop algorithms by using SQL, Spark 2.0, or PySpark 2.0. Then, you can use this feature to encapsulate algorithms as components and upload them to Machine Learning Studio. You can also publish the algorithms to the AI marketplace. This topic describes how to develop and publish an algorithm.

Prerequisites

Algorithm packages are developed. For more information, see Develop an algorithm package. For more information about algorithm package examples, see Algorithm package example.

Background information

After you upload the algorithms developed by using SQL, Spark 2.0, or PySpark 2.0 to Machine Learning Studio, costs are generated for running the algorithms. For more information about the pricing, visit https://icms.alibaba-inc.com/content/learn/afbd15?l=1&m=16768&n=2325848.

Procedure

  1. Log on to the Machine Learning Platform for AI (PAI) console.
  2. In the left-side navigation pane, click Algorithm release. On the Publish Algorithms page, click Create a custom algorithm.
  3. In the Create a custom algorithm panel, configure the parameters and click OK.
    Parameter Description
    Algorithm name The name of the custom algorithm. The name cannot exceed 10 characters in length and can contain only letters. Example, pyspark.
    Algorithm unique identification The unique ID of the algorithm. You can use the ID to query information, such as logs. The value can contain only letters and digits. You must specify this parameter.
    Algorithm framework The framework that the custom algorithm component uses. Valid values: SQL, SPARK, and PySPARK.
    Algorithm package The algorithm package.
    • If the Algorithm framework parameter is set to SQL, you must upload an SQL script.
    • If the Algorithm framework parameter is set to SPARK, you must upload a JAR package.
    • If the Algorithm framework parameter is set to PySPARK, you must upload a ZIP package.
      Note If you use macOS to repackage the file, the system adds an extra layer of directory to the ZIP package. This may cause the failure in PySpark. The files that you want to run must be stored in the root directory of the ZIP package. We recommend that you use Linux packaging commands.
    For more information about algorithm package examples, see Algorithm package example
    Types of algorithms The folder to which the custom algorithm is published to Machine Learning Studio. Valid values: Data processing, Data analysis, Text Analysis, and Other.
    Entrance parameters The entry file or function. This parameter is required only if the Algorithm framework parameter is set to PySPARK. Example: read_example.mainFunc.
    Entrance class name The name of the entry class. This parameter is required only if the Algorithm framework parameter is set to SPARK. Example: com.aliyun.odps.spark.examples.simhash.SimHashSpark.
    Remarks The remarks. The value cannot exceed 30 characters in length and can contain only letters.
  4. On the Publish Algorithms page, find the required algorithm and click Add version in the Version Management column.
    Add version
    Note A version indicates the display mode of an algorithm component. Only the algorithms that have versions configured can be published.
  5. In the Add version panel, set the Version number parameter and click OK.
  6. In the Add version panel, click Go to configuration. The PAI-STUDIO Algorithm Configuration page appears.
  7. In the Basic Control section, drag and drop a control to the Parameter Settings section.
    For example, after you upload the pyspark.zip algorithm package, drag and drop Single Field Control from the Basic Control section to the Parameter Settings section twice.
  8. In the Parameter Settings section, click the controls in sequence to configure basic information. Click the Save icon to complete the configuration.
    Parameter Description
    Name The mapping item of the parameters in the algorithm code. For example, if you set this parameter to idCol, idCol in the code is the input of this component.
    Tag The display name of the control.
    converter This parameter is empty by default.
    Input/Output Port The input or output port of the component, such as Input#1.
    Data Type The data type of the control. Valid values: BIGINT, DOUBLE, BOOLEAN, STRING, and DATETIME. You must select at least one data type.
  9. Go back to the Publish Algorithms page and click Edit version. In the Edit version panel, click Use this version. After this operation, the algorithm can be published.
  10. Go back to the Publish Algorithms page and click the Drop-down icon icon next to Release to select a method.
    • Post to PAI Studio: If you select this method, specify a region and project. After the algorithm component is published, the component can be used only in the specified project. Your Alibaba Cloud account and all its RAM users share this component. In this example, this method is used.
    • Publish to AI Marketplace: If you select this method, the algorithm component is published to the AI marketplace. All PAI users can download and use the component.
  11. Optional:If you select Post to PAI Studio, go back to the PAI console. In the left-side navigation pane, click Studio-Modeling Visualization.
  12. On the PAI Visualization Modeling page, find the required project and click Machine Learning in the Operation column to check whether the algorithm component is published.
    Publish result