Singular value decomposition (SVD) is an important matrix decomposition in linear algebra. It is a generalization of the diagonalization of normal matrices in matrix analysis. SVD is widely used in fields such as signal processing and statistics.

Background information

Formula for singular value decomposition: X = U S V'

Configure the component

You can configure the component by using one of the following methods:
  • Use the Machine Learning Platform for AI console
    Tab Parameter Description
    Fields Setting Feature Columns The columns that are used to store key-value pairs. The keys and values are separated by colons (:), and multiple key-value pairs are separated by commas (,).
    Parameters Setting Number of Reserved Singular Values The top N singular groups that you want to decompose. All singular groups are decomposed by default.
    Accuracy Error The error precision that is allowed.
    Tuning Memory Size per Node The memory size of each node. Unit: MB. This parameter must be used with the Number of Nodes parameter. The value of this parameter must be a positive integer. Valid values: [1,9999].
    Number of Nodes The value of this parameter must be a positive integer. Valid values: [1024,64 × 1024].
    Lifetime The lifecycle of the output table.
  • Use commands
    PAI -name svd
        -project algo_public
        -DinputTableName=bank_data
        -DselectedColNames=col0
        -DenableSparse=true
        -Dk=5
        -DoutputUTableName=u_table
        -DoutputVTableName=v_table
        -DoutputSTableName=s_table;
    Parameter Required Description Default value
    inputTableName Yes The input table that is used for training. N/A
    selectedColNames No The columns that are selected from the input table for training. Separate the columns with commas (,).

    If a sparse matrix is used, the columns of the STRING data type are supported. If a data table is used, the columns of the INT and DOUBLE types are supported.

    All columns
    inputTablePartitions No The partitions that are selected from the input table for training. Set this parameter in the Partition_name=value format.

    To specify multi-level partitions, set this parameter in the name1=value1/name2=value2; format.

    If you specify multiple partitions, separate them with commas (,).

    All partitions
    outputUTableName Yes The output table of the unitary matrix. The output table is generated from the m * sgNum dimension. m represents the number of rows of the data table, and sgNum represents the number of calculated singular values. N/A
    outputSTableName Yes The output table of the scattering matrix (S-matrix). The output table is generated from the sgNum * sgNum dimension. sgNum represents the number of calculated singular values. N/A
    outputVTableName Yes The output table of the V matrix. The output table is generated from the n * sgNum dimension. n represents the number of columns of the matrix, and sgNum represents the number of calculated singular values. N/A
    k Yes The number of expected singular values.

    The number of generated singular values may be a positive integer less than the value specified by the k parameter.

    N/A
    tol No The convergence error. 1.0e~06
    enableSparse No Specifies whether data in the input table is in the sparse format. Valid values:
    • true
    • false
    false
    itemDelimiter No The delimiter that is used to separate key-value pairs when data in the input table is in the sparse format. Space
    kvDelimiter No The delimiter that is used to separate keys and values when data in the input table is in the sparse format. :
    coreNum No The number of cores. This parameter must be used with the memSizePerCore parameter. The value of this parameter must be a positive integer. Valid values: [1,9999]. Determined by the system
    memSizePerCore No The memory size of each core. Unit: MB. The value of this parameter must be a positive integer. Valid values: [1024,64 × 1024]. Determined by the system
    lifecycle No The lifecycle of the output table. The value must be a positive integer. N/A

Example

  • Generate input data
    drop table if exists svd_test_input;
    create table svd_test_input
    as
    select
        *
    from
    (
      select
            '0:3.9079 2:0.0009 3:0.0416 4:0.17664 6:0.36460 8:0.091330' as col0
        from dual
        union all
      select
            '0:0.09229 2:0.4872172 5:0.5267 8:0.4544 9:0.23317' as col0
        from dual
        union all
        select
        '1:0.8312 3:0.9317 5:0.5680 7:0.5560 9:0.0508' as col0
        from dual
        union all
        select
        '2:0.767 5:0.01891 8:0.25235 ' as col0
        from dual
        union all
        select
        '0:0.29819 2:0.87598086 6:0.5315568 ' as col0
        from dual
        union all
        select
        '0:0.920260 2:0.5154311513 4:0.8104 5:0.188420 8:0.88' as col0
        from dual
    ) a;
  • Run commands
    PAI -name svd
        -project algo_public
        -DinputTableName=svd_test_input
        -DselectedColNames=col0
        -DenableSparse=true
        -Dk=5
        -DoutputUTableName=u_table
        -DoutputVTableName=v_table
        -DoutputSTableName=s_table;
  • Analysis scale: 100,000 columns