Run SVD to Decompose Matrices with PAI - Platform for AI

Singular value decomposition (SVD) is a matrix factorization technique in linear algebra — a generalization of the diagonalization of normal matrices — that decomposes a matrix X into three components: X = U S V'. SVD is widely used in signal processing and statistics.

How it works

SVD factorizes an input matrix X (m rows × n columns) into three output matrices:

Output	Dimensions	Description
U	m × sgNum	Left singular vectors (unitary matrix)
S	sgNum × sgNum	Diagonal matrix of singular values (scattering matrix)
V	n × sgNum	Right singular vectors (V matrix)

Here, sgNum is the number of singular values actually computed (which may be less than the requested k), m is the number of rows in the input table, and n is the number of columns.

Configure the component

Use the Machine Learning Platform for AI console

Tab	Parameter	Description
Fields setting	Feature columns	Columns storing key-value pairs. Separate keys from values with a colon (`:`), and separate multiple key-value pairs with a comma (`,`).
Parameters setting	Number of reserved singular values	The top N singular values to compute. Computes all singular values by default.
Parameters setting	Accuracy error	The allowed error precision for convergence.
Tuning	Memory size per node	Memory allocated to each node, in MB. Must be used with Number of nodes. Valid values: 1–9999 (positive integer).
Tuning	Number of nodes	Number of compute nodes. Valid values: 1024–64 × 1024 (positive integer).
Tuning	Lifetime	Lifecycle of the output table, in days.

Use commands

Submit the SVD job from the command line:

PAI -name svd
    -project algo_public
    -DinputTableName=bank_data
    -DselectedColNames=col0
    -DenableSparse=true
    -Dk=5
    -DoutputUTableName=u_table
    -DoutputVTableName=v_table
    -DoutputSTableName=s_table;

Input parameters

Parameter	Required	Default	Description
`inputTableName`	Yes	—	Input table used for training.
`selectedColNames`	No	All columns	Comma-separated list of columns to include. Use STRING columns for sparse input; use INT or DOUBLE columns for dense input.
`inputTablePartitions`	No	All partitions	Partitions to read from the input table. Format: `partition_name=value`. For multi-level partitions: `name1=value1/name2=value2;`. Separate multiple partitions with commas.
`enableSparse`	No	`false`	Set to `true` if the input data is in sparse key-value format.
`itemDelimiter`	No	Space	Delimiter between key-value pairs in sparse format.
`kvDelimiter`	No	`:`	Delimiter between keys and values in sparse format.
`k`	Yes	—	Number of singular values to compute. The actual number returned may be less than `k`.
`tol`	No	`1.0e-06`	Convergence error threshold.

Output parameters

Parameter	Required	Description
`outputUTableName`	Yes	Output table for the U matrix (m × sgNum).
`outputSTableName`	Yes	Output table for the S matrix (sgNum × sgNum).
`outputVTableName`	Yes	Output table for the V matrix (n × sgNum).

Resource parameters

Parameter	Required	Default	Description
`coreNum`	No	System default	Number of cores. Must be used with `memSizePerCore`. Valid values: 1–9999 (positive integer).
`memSizePerCore`	No	System default	Memory per core, in MB. Must be used with `coreNum`. Valid values: 1024–64 × 1024 (positive integer).
`lifecycle`	No	—	Lifecycle of the output table, in days (positive integer).

Example

This example runs SVD on a sparse input table with six rows and up to 100,000 columns, computing the top 5 singular values.

Step 1: Create the input table.

DROP TABLE IF EXISTS svd_test_input;
CREATE TABLE svd_test_input
AS
SELECT *
FROM
(
  SELECT '0:3.9079 2:0.0009 3:0.0416 4:0.17664 6:0.36460 8:0.091330' AS col0
  UNION ALL
  SELECT '0:0.09229 2:0.4872172 5:0.5267 8:0.4544 9:0.23317' AS col0
  UNION ALL
  SELECT '1:0.8312 3:0.9317 5:0.5680 7:0.5560 9:0.0508' AS col0
  UNION ALL
  SELECT '2:0.767 5:0.01891 8:0.25235' AS col0
  UNION ALL
  SELECT '0:0.29819 2:0.87598086 6:0.5315568' AS col0
  UNION ALL
  SELECT '0:0.920260 2:0.5154311513 4:0.8104 5:0.188420 8:0.88' AS col0
) a;

Step 2: Run the SVD job.

PAI -name svd
    -project algo_public
    -DinputTableName=svd_test_input
    -DselectedColNames=col0
    -DenableSparse=true
    -Dk=5
    -DoutputUTableName=u_table
    -DoutputVTableName=v_table
    -DoutputSTableName=s_table;