All Products
Search
Document Center

Platform For AI:Vector Assembler

Last Updated:Feb 14, 2025

Vector Assembler is a machine learning algorithm used for dimensionality reduction and feature extraction, aiming to simplify data processing by representing high-dimensional data as low-dimensional vectors. The algorithm typically involves applying some mathematical transformation to the input vectors, converting them into a fixed-length vector representation to facilitate subsequent classification or clustering tasks. Vector Assembler is widely used in natural language processing and recommendation systems, helping to enhance the computational efficiency and accuracy of models.

Limits

The supported compute engines are MaxCompute and Realtime Compute for Apache Flink.

Configure the component in Machine Learning Designer

Input ports

Input port (from left to right)

Data type

Recommended upstream component

Required

data

Structured data stored in MaxCompute or Object Storage Service (OSS)

None

Yes

Component parameters

Tab

Parameter

Description

Field Setting

selectedCols

The names of the columns that you want to aggregate. You can select numeric columns or vector columns.

reservedCols

The names of the generated columns that you want to reserve.

Parameter Setting

outputCol

The name of the vector column that is generated.

handleInvalidMethod

The policy that is used to handle exceptions. Default value: ERROR. Valid values:

ERROR: throws an exception.

SKIP: skips an exception and returns NULL.

numThreads

The number of threads used by the component. Default value: 1.

Execution Tuning

Number of Workers

The number of workers. This parameter must be used together with the Memory per worker, unit MB parameter. The value of this parameter must be a positive integer. Valid values: [1,9999].

Memory per worker, unit MB

The memory size of each worker. Valid values: 1024 to 65536. Unit: MB.

Output ports

Output port (from left to right)

Storage location

Recommended downstream component

Model type

data

N/A

None

None

Example

You can copy the following code to the code editor of the PyAlink Script component. This allows the PyAlink Script component to function like the Vector Assembler component.

from pyalink.alink import *

def main(sources, sinks, parameter):
    data = sources[0]
    selectedColNames = ["col2", "col3"]
    trainOp = VectorAssemblerBatchOp()\
               .setSelectedCols(selectedColNames)\
               .setOutputCol("vec")
    result = trainOp.linkFrom(data)
    result.link(sinks[0])
		BatchOperator.execute()