Singular value decomposition (SVD) is a matrix factorization technique in linear algebra — a generalization of the diagonalization of normal matrices — that decomposes a matrix X into three components: X = U S V'. SVD is widely used in signal processing and statistics.
How it works
SVD factorizes an input matrix X (m rows × n columns) into three output matrices:
| Output | Dimensions | Description |
|---|---|---|
| U | m × sgNum | Left singular vectors (unitary matrix) |
| S | sgNum × sgNum | Diagonal matrix of singular values (scattering matrix) |
| V | n × sgNum | Right singular vectors (V matrix) |
Here, sgNum is the number of singular values actually computed (which may be less than the requested k), m is the number of rows in the input table, and n is the number of columns.
Configure the component
Use the Machine Learning Platform for AI console
| Tab | Parameter | Description |
|---|---|---|
| Fields setting | Feature columns | Columns storing key-value pairs. Separate keys from values with a colon (:), and separate multiple key-value pairs with a comma (,). |
| Parameters setting | Number of reserved singular values | The top N singular values to compute. Computes all singular values by default. |
| Parameters setting | Accuracy error | The allowed error precision for convergence. |
| Tuning | Memory size per node | Memory allocated to each node, in MB. Must be used with Number of nodes. Valid values: 1–9999 (positive integer). |
| Tuning | Number of nodes | Number of compute nodes. Valid values: 1024–64 × 1024 (positive integer). |
| Tuning | Lifetime | Lifecycle of the output table, in days. |
Use commands
Submit the SVD job from the command line:
PAI -name svd
-project algo_public
-DinputTableName=bank_data
-DselectedColNames=col0
-DenableSparse=true
-Dk=5
-DoutputUTableName=u_table
-DoutputVTableName=v_table
-DoutputSTableName=s_table;Input parameters
| Parameter | Required | Default | Description |
|---|---|---|---|
inputTableName | Yes | — | Input table used for training. |
selectedColNames | No | All columns | Comma-separated list of columns to include. Use STRING columns for sparse input; use INT or DOUBLE columns for dense input. |
inputTablePartitions | No | All partitions | Partitions to read from the input table. Format: partition_name=value. For multi-level partitions: name1=value1/name2=value2;. Separate multiple partitions with commas. |
enableSparse | No | false | Set to true if the input data is in sparse key-value format. |
itemDelimiter | No | Space | Delimiter between key-value pairs in sparse format. |
kvDelimiter | No | : | Delimiter between keys and values in sparse format. |
k | Yes | — | Number of singular values to compute. The actual number returned may be less than k. |
tol | No | 1.0e-06 | Convergence error threshold. |
Output parameters
| Parameter | Required | Description |
|---|---|---|
outputUTableName | Yes | Output table for the U matrix (m × sgNum). |
outputSTableName | Yes | Output table for the S matrix (sgNum × sgNum). |
outputVTableName | Yes | Output table for the V matrix (n × sgNum). |
Resource parameters
| Parameter | Required | Default | Description |
|---|---|---|---|
coreNum | No | System default | Number of cores. Must be used with memSizePerCore. Valid values: 1–9999 (positive integer). |
memSizePerCore | No | System default | Memory per core, in MB. Must be used with coreNum. Valid values: 1024–64 × 1024 (positive integer). |
lifecycle | No | — | Lifecycle of the output table, in days (positive integer). |
Example
This example runs SVD on a sparse input table with six rows and up to 100,000 columns, computing the top 5 singular values.
Step 1: Create the input table.
DROP TABLE IF EXISTS svd_test_input;
CREATE TABLE svd_test_input
AS
SELECT *
FROM
(
SELECT '0:3.9079 2:0.0009 3:0.0416 4:0.17664 6:0.36460 8:0.091330' AS col0
UNION ALL
SELECT '0:0.09229 2:0.4872172 5:0.5267 8:0.4544 9:0.23317' AS col0
UNION ALL
SELECT '1:0.8312 3:0.9317 5:0.5680 7:0.5560 9:0.0508' AS col0
UNION ALL
SELECT '2:0.767 5:0.01891 8:0.25235' AS col0
UNION ALL
SELECT '0:0.29819 2:0.87598086 6:0.5315568' AS col0
UNION ALL
SELECT '0:0.920260 2:0.5154311513 4:0.8104 5:0.188420 8:0.88' AS col0
) a;Step 2: Run the SVD job.
PAI -name svd
-project algo_public
-DinputTableName=svd_test_input
-DselectedColNames=col0
-DenableSparse=true
-Dk=5
-DoutputUTableName=u_table
-DoutputVTableName=v_table
-DoutputSTableName=s_table;