All Products
Search
Document Center

Platform For AI:Singular-value Decomposition (SVD)

Last Updated:Oct 27, 2023

Singular value decomposition (SVD) is an important matrix decomposition in linear algebra. SVD is a generalization of the diagonalization of normal matrices in matrix analysis. SVD is widely used in fields such as signal processing and statistics.

Background information

SVD formula: X = U S V'

Configure the component

You can use one of the following methods to configure the Singular-value Decomposition (SVD) component.

Method 1: Configure the component on the pipeline page

You can configure the parameters of the Singular-value Decomposition (SVD) component on the pipeline page of Machine Learning Designer of Machine Learning Platform for AI (PAI). Machine Learning Designer is formerly known as Machine Learning Studio. The following table describes the parameters.

Tab

Parameter

Description

Fields Setting

Feature Columns

The columns that are used to store key-value pairs. The keys and values are separated by colons (:), and multiple key-value pairs are separated by commas (,).

Parameters Setting

Reserved Singular Values

The top N singular groups that you want to decompose. All singular groups are decomposed by default.

Precision Error

The error precision that is allowed.

Tuning

Memory Size per Node (Unit: MB)

The memory size of each core. Unit: MB. This parameter is used with the Cores parameter. The value must be a positive integer. Valid values: [1024, 64 × 1024].

Cores

The value must be a positive integer. Valid values: [1, 9999].

Lifecycle

The lifecycle of the output table.

Method 2: Use PAI commands

Configure the component parameters by using PAI commands. You can use the SQL Script component to call PAI commands. For more information, see SQL Script.

PAI -name svd
    -project algo_public
    -DinputTableName=bank_data
    -DselectedColNames=col0
    -DenableSparse=true
    -Dk=5
    -DoutputUTableName=u_table
    -DoutputVTableName=v_table
    -DoutputSTableName=s_table;

Parameter

Required

Description

Default value

inputTableName

Yes

The input table that is used for training.

No default value

selectedColNames

No

The columns that are selected from the input table for training. Separate the columns with commas (,).

If a sparse matrix is used, the columns of the STRING data type are supported. If a data table is used, the columns of the INT and DOUBLE types are supported.

All columns

inputTablePartitions

No

The partitions that are selected from the input table for training. Set this parameter in the Partition_name=value format.

To specify multi-level partitions, set this parameter in the name1=value1/name2=value2; format.

If you specify multiple partitions, separate them with commas (,).

All partitions

outputUTableName

Yes

The output table of the unitary matrix. The output table is generated from the m * sgNum dimension. m represents the number of rows of the data table, and sgNum represents the number of calculated singular values.

No default value

outputSTableName

Yes

The output table of the scattering matrix (S-matrix). The output table is generated from the sgNum * sgNum dimension. sgNum represents the number of calculated singular values.

No default value

outputVTableName

Yes

The output table of the V matrix. The output table is generated from the n * sgNum dimension. n represents the number of columns of the matrix, and sgNum represents the number of calculated singular values.

No default value

k

Yes

The number of expected singular values.

The number of generated singular values may be a positive integer less than the value specified by the k parameter.

No default value

tol

No

The convergence error.

1.0e-06

enableSparse

No

Specifies whether data in the input table is in the sparse format. Valid values:

  • true

  • false

false

itemDelimiter

No

The delimiter that is used to separate key-value pairs when data in the input table is in the sparse format.

Backspace

kvDelimiter

No

The delimiter that is used to separate keys and values when data in the input table is in the sparse format.

:

coreNum

No

The number of cores. This parameter is used with the memSizePerCore parameter. The value must be a positive integer. Valid values: [1,9999].

Determined by the system

memSizePerCore

No

The memory size of each core. Unit: MB. The value must be a positive integer. Valid values: [1024,64 × 1024].

Determined by the system

lifecycle

No

The lifecycle of the output table. The value must be a positive integer.

No default value

Example

  • Input data

    drop table if exists svd_test_input;
    create table svd_test_input
    as
    select
        *
    from
    (
      select
            '0:3.9079 2:0.0009 3:0.0416 4:0.17664 6:0.36460 8:0.091330' as col0
        from dual
        union all
      select
            '0:0.09229 2:0.4872172 5:0.5267 8:0.4544 9:0.23317' as col0
        from dual
        union all
        select
        '1:0.8312 3:0.9317 5:0.5680 7:0.5560 9:0.0508' as col0
        from dual
        union all
        select
        '2:0.767 5:0.01891 8:0.25235 ' as col0
        from dual
        union all
        select
        '0:0.29819 2:0.87598086 6:0.5315568 ' as col0
        from dual
        union all
        select
        '0:0.920260 2:0.5154311513 4:0.8104 5:0.188420 8:0.88' as col0
        from dual
    ) a;
  • PAI command

    PAI -name svd
        -project algo_public
        -DinputTableName=svd_test_input
        -DselectedColNames=col0
        -DenableSparse=true
        -Dk=5
        -DoutputUTableName=u_table
        -DoutputVTableName=v_table
        -DoutputSTableName=s_table;
  • Analysis scale: 100,000 columns