This topic describes the Semantic Vector Distance component provided by Machine Learning Studio.
You can calculate the extension words or sentences of the specified words or sentences based on the calculated semantic vectors, such as the word vectors calculated by the Word2Vec component. The extension words or sentences are a set of vectors that are closest to a certain vector. For example, you can generate a list of words that are most similar to a given word. This is based on the semantic vectors that are returned by the Word2Vec component.
Configure the component
- Machine Learning Platform for AI console
Tab Parameter Description Fields Setting ID Column The ID of the column. Vector Columns The column that contain vector names. Example: f1 or f2. Parameters Setting Number of Closest Vectors to Output Default value: 5. Distance Calculation Mode The following calculation modes are supported:
Default value: euclidean.
Distance Threshold When the distance between two vectors is less than this value, the distance is provided. Default value: +∞. Tuning Computing Cores The number of cores used for calculation. The value is automatically allocated. Memory Size per Core (Unit: MB) The size of memory required by each core. The value is automatically allocated.
- PAI command
PAI -name SemanticVectorDistance -project algo_public -DinputTableName="test_input" -DoutputTableName="test_output" -DidColName="word" -DvectorColNames="f0,f1,f2,f3,f4,f5" -Dlifecycle=30
Parameter Required Description Default value inputTableName Yes The name of the input table. No default value inputTablePartitions No The partitions that are selected from the input table for calculation. All partitions of the input table outputTableName Yes The name of the output table. No default value idTableName No The name of the vector ID table for vector calculation. The table contains only one column, and each row stores a vector ID. This parameter is empty by default, which indicates that all vectors in the input table are used for calculation. No default value idTablePartitions No The partitions that are selected from the ID table for calculation. By default, all partitions are selected for calculation. No default value idColName Yes The name of the ID column. 3 vectorColNames No A list of vector column names in the f1,f2 format. No default value topN No The number of the closest vectors in the output. Valid values: [1,+∞]. 5 distanceType No The method that is used to calculate the distance between vectors. euclidean distanceThreshold No The threshold for the distance between vectors. The threshold is provided when the distance between the two vectors is less than this value. Valid values: (0,+∞). +∞ lifecycle No The lifecycle of the input table. Valid values: any non-zero positive integer. No default value coreNum No The number of cores used for calculation. Valid values: any non-zero positive integer. Automatically calculated memSizePerCore No The size of memory required by each core. Valid values: any non-zero positive integer. Automatically calculated