Singular value decomposition (SVD) is an important matrix decomposition in linear algebra. SVD is a generalization of the diagonalization of normal matrices in matrix analysis. SVD is widely used in fields such as signal processing and statistics.
Background information
SVD formula: X = U S V'
Configure the component
You can use one of the following methods to configure the Singular-value Decomposition (SVD) component.
Method 1: Configure the component on the pipeline page
You can configure the parameters of the Singular-value Decomposition (SVD) component on the pipeline page of Machine Learning Designer of Machine Learning Platform for AI (PAI). Machine Learning Designer is formerly known as Machine Learning Studio. The following table describes the parameters.
Tab | Parameter | Description |
Fields Setting | Feature Columns | The columns that are used to store key-value pairs. The keys and values are separated by colons (:), and multiple key-value pairs are separated by commas (,). |
Parameters Setting | Reserved Singular Values | The top N singular groups that you want to decompose. All singular groups are decomposed by default. |
Precision Error | The error precision that is allowed. | |
Tuning | Memory Size per Node (Unit: MB) | The memory size of each core. Unit: MB. This parameter is used with the Cores parameter. The value must be a positive integer. Valid values: [1024, 64 × 1024]. |
Cores | The value must be a positive integer. Valid values: [1, 9999]. | |
Lifecycle | The lifecycle of the output table. |
Method 2: Use PAI commands
Configure the component parameters by using PAI commands. You can use the SQL Script component to call PAI commands. For more information, see SQL Script.
PAI -name svd
-project algo_public
-DinputTableName=bank_data
-DselectedColNames=col0
-DenableSparse=true
-Dk=5
-DoutputUTableName=u_table
-DoutputVTableName=v_table
-DoutputSTableName=s_table;
Parameter | Required | Description | Default value |
inputTableName | Yes | The input table that is used for training. | No default value |
selectedColNames | No | The columns that are selected from the input table for training. Separate the columns with commas (,). If a sparse matrix is used, the columns of the STRING data type are supported. If a data table is used, the columns of the INT and DOUBLE types are supported. | All columns |
inputTablePartitions | No | The partitions that are selected from the input table for training. Set this parameter in the To specify multi-level partitions, set this parameter in the If you specify multiple partitions, separate them with commas (,). | All partitions |
outputUTableName | Yes | The output table of the unitary matrix. The output table is generated from the | No default value |
outputSTableName | Yes | The output table of the scattering matrix (S-matrix). The output table is generated from the | No default value |
outputVTableName | Yes | The output table of the V matrix. The output table is generated from the | No default value |
k | Yes | The number of expected singular values. The number of generated singular values may be a positive integer less than the value specified by the k parameter. | No default value |
tol | No | The convergence error. | 1.0e-06 |
enableSparse | No | Specifies whether data in the input table is in the sparse format. Valid values:
| false |
itemDelimiter | No | The delimiter that is used to separate key-value pairs when data in the input table is in the sparse format. | Backspace |
kvDelimiter | No | The delimiter that is used to separate keys and values when data in the input table is in the sparse format. | : |
coreNum | No | The number of cores. This parameter is used with the memSizePerCore parameter. The value must be a positive integer. Valid values: [1,9999]. | Determined by the system |
memSizePerCore | No | The memory size of each core. Unit: MB. The value must be a positive integer. Valid values: [1024,64 × 1024]. | Determined by the system |
lifecycle | No | The lifecycle of the output table. The value must be a positive integer. | No default value |
Example
Input data
drop table if exists svd_test_input; create table svd_test_input as select * from ( select '0:3.9079 2:0.0009 3:0.0416 4:0.17664 6:0.36460 8:0.091330' as col0 from dual union all select '0:0.09229 2:0.4872172 5:0.5267 8:0.4544 9:0.23317' as col0 from dual union all select '1:0.8312 3:0.9317 5:0.5680 7:0.5560 9:0.0508' as col0 from dual union all select '2:0.767 5:0.01891 8:0.25235 ' as col0 from dual union all select '0:0.29819 2:0.87598086 6:0.5315568 ' as col0 from dual union all select '0:0.920260 2:0.5154311513 4:0.8104 5:0.188420 8:0.88' as col0 from dual ) a;
PAI command
PAI -name svd -project algo_public -DinputTableName=svd_test_input -DselectedColNames=col0 -DenableSparse=true -Dk=5 -DoutputUTableName=u_table -DoutputVTableName=v_table -DoutputSTableName=s_table;
Analysis scale: 100,000 columns