All Products
Search
Document Center

MaxCompute:Smoke testing on data type correctness

Last Updated:Jun 19, 2023

This topic describes how to perform smoke testing on the doc table and query table to check whether the system runs as expected.

Test conclusion

Support for data types:
  • Proxima CE supports only the FLOAT, INT8, and BINARY data types. These data types correspond to the FT_FP32, FT_INT8, and FT_BINARY32 data types of aitheta2, respectively.
  • Proxima CE does not support the DOUBLE and INT16 data types. These data types correspond to the FT_FP64 and FT_INT16 data types of aitheta, respectively.

Proxima CE can use the preceding supported data types as expected. Proxima CE is planned to support the FLOAT16 and BINARY data types in the future. These data types correspond to the FT_FP16 and FT_BINARY64 data types of aitheta, respectively.

Test procedure

  1. Learn the data types that are supported by different distance measure types.
    The implementation of data types of Proxima2 is determined by specific factors, such as the distance measure type and converter. If a data type is not supported by the related distance measure type, the data type cannot be used in Proxima2. The following table lists the data types that are supported or not supported by different distance measure types.
    Distance measure typeFT_FP16FT_FP32FT_FP64FT_INT8FT_INT16FT_INT4FT_BINARY32FT_BINARY64
    SquaredEuclidean✔️✔️✔️✔️✔️✔️
    Euclidean✔️✔️✔️✔️✔️✔️
    MipsEuclidean✔️✔️✔️✔️
    Geographical✔️✔️
    Canberra✔️✔️✔️
    Manhattan✔️✔️✔️✔️✔️✔️
    Chebyshev✔️✔️✔️
    InnerProduct✔️✔️✔️✔️
    Matching✔️✔️
    RussellRao✔️✔️
    RogersTanimoto✔️✔️
    Hamming✔️✔️
    Note If the distance measure type is Geographical, the number of dimensions must be 2.
  2. Prepare data.
    Data tableDescription
    doc table
    • Data types other than BINARY: random values in the range of 8 columns and 10 rows
    • BINARY data type: random values that contain only 0 and 1 in the range of 32 columns and 9 rows
    query table
    • Data types other than BINARY: random values in the range of 8 columns and 3 rows
    • BINARY data type: random values that contain only 0 and 1 in the range of 32 columns and 1 row
    Data generation script:
    # Values
    CREATE TABLE doc_table_float_0702(pk STRING, vector STRING) PARTITIONED BY (pt STRING);
    CREATE TABLE query_table_float_0702(pk STRING, vector STRING) PARTITIONED BY (pt STRING);
    ALTER TABLE doc_table_float_0702 add PARTITION(pt='20210702');
    ALTER TABLE query_table_float_0702 add PARTITION(pt='20210702');
    
    INSERT OVERWRITE TABLE doc_table_float_0702 PARTITION (pt='20210702') VALUES
    ('1.nid','1~1~1~1~1~1~1~1'),
    ('2.nid','2~2~2~2~2~2~2~2'),
    ('3.nid','3~3~3~3~3~3~3~3'),
    ('4.nid','4~4~4~4~4~4~4~4'),
    ('5.nid','5~5~5~5~5~5~5~5'),
    ('6.nid','6~6~6~6~6~6~6~6'),
    ('7.nid','7~7~7~7~7~7~7~7'),
    ('8.nid','8~8~8~8~8~8~8~8'),
    ('9.nid','9~9~9~9~9~9~9~9'),
    ('10.nid','10~10~10~10~10~10~10~10');
    SELECT * FROM doc_table_float_0702;
    
    INSERT OVERWRITE TABLE query_table_float_0702 PARTITION (pt='20210702') VALUES
    ('q1.nid','1~1~1~1~2~2~2~2'),
    ('q2.nid','4~4~4~4~3~3~3~3'),
    ('q3.nid','9~9~9~9~5~5~5~5');
    SELECT * FROM query_table_float_0702;
    
    # binary
    CREATE TABLE doc_table_binary_0706(pk STRING, vector STRING) PARTITIONED BY (pt STRING);
     ALTER TABLE doc_table_binary_0706 add PARTITION(pt='20210706');
     INSERT OVERWRITE TABLE doc_table_binary_0706 PARTITION (pt='20210706') VALUES
     ('1.nid','0~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1'),
    ('2.nid','0~0~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1'),
    ('3.nid','0~0~0~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1'),
    ('4.nid','0~0~0~0~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1'),
    ('5.nid','0~0~0~0~0~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1'),
    ('6.nid','0~0~0~0~0~0~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1'),
    ('7.nid','0~0~0~0~0~0~0~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1'),
    ('8.nid','0~0~0~0~0~0~0~0~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1'),
    ('9.nid','0~0~0~0~0~0~0~0~0~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1');
     SELECT * FROM doc_table_binary_0706;
    
     CREATE TABLE query_table_binary_0706(pk STRING, vector STRING) PARTITIONED BY (pt STRING);
     ALTER TABLE query_table_binary_0706 add PARTITION(pt='20210706');
     INSERT OVERWRITE TABLE query_table_binary_0706 PARTITION (pt='20210706') VALUES
     ('q1.nid','1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1');
     SELECT * FROM query_table_binary_0706;
  3. Run the script.
    The following script uses the FLOAT data type as an example. The script for other data types is similar.
    jar -Dorg.bytedeco.javacpp.logger.debug=true -resources  proxima2_ce_linux.jar
     -classpath http://schedule@{env}inside.cheetah.alibaba-inc.com/scheduler/res?id=179763045
    com.alibaba.proxima2.ce.CentauriRunner
    -doc_table doc_table_float_0702
    -doc_table_partition 20210702
    -query_table query_table_float_0702
    -query_table_partition 20210702
    -output_table output_table_float_0702
    -output_table_partition 20210702
    -data_type float
    -dimension 8
    -app_id 201220
    -topk 3;