This topic describes how to perform smoke testing on the doc table and query table to check whether the system runs as expected.
Test conclusion
Support for data types:
- Proxima CE supports only the
FLOAT
,INT8
, andBINARY
data types. These data types correspond to theFT_FP32
,FT_INT8
, andFT_BINARY32
data types of aitheta2, respectively. - Proxima CE does not support the
DOUBLE
andINT16
data types. These data types correspond to theFT_FP64
andFT_INT16
data types of aitheta, respectively.
Proxima CE can use the preceding supported data types as expected. Proxima CE is planned to support the FLOAT16
and BINARY
data types in the future. These data types correspond to the FT_FP16
and FT_BINARY64
data types of aitheta, respectively.
Test procedure
- Learn the data types that are supported by different distance measure types. The implementation of data types of Proxima2 is determined by specific factors, such as the distance measure type and converter. If a data type is not supported by the related distance measure type, the data type cannot be used in Proxima2. The following table lists the data types that are supported or not supported by different distance measure types.
Distance measure type FT_FP16 FT_FP32 FT_FP64 FT_INT8 FT_INT16 FT_INT4 FT_BINARY32 FT_BINARY64 SquaredEuclidean ✔️ ✔️ ❌ ✔️ ❌ ✔️ ✔️ ✔️ Euclidean ✔️ ✔️ ❌ ✔️ ❌ ✔️ ✔️ ✔️ MipsEuclidean ✔️ ✔️ ❌ ✔️ ❌ ✔️ ❌ ❌ Geographical ✔️ ✔️ ❌ ❌ ❌ ❌ ❌ ❌ Canberra ✔️ ✔️ ❌ ✔️ ❌ ❌ ❌ ❌ Manhattan ✔️ ✔️ ❌ ✔️ ❌ ✔️ ✔️ ✔️ Chebyshev ✔️ ✔️ ❌ ✔️ ❌ ❌ ❌ ❌ InnerProduct ✔️ ✔️ ❌ ✔️ ❌ ✔️ ❌ ❌ Matching ❌ ❌ ❌ ❌ ❌ ❌ ✔️ ✔️ RussellRao ❌ ❌ ❌ ❌ ❌ ❌ ✔️ ✔️ RogersTanimoto ❌ ❌ ❌ ❌ ❌ ❌ ✔️ ✔️ Hamming ❌ ❌ ❌ ❌ ❌ ❌ ✔️ ✔️ Note If the distance measure type is Geographical, the number of dimensions must be 2. - Prepare data.
Data generation script:Data table Description doc table - Data types other than BINARY: random values in the range of 8 columns and 10 rows
- BINARY data type: random values that contain only 0 and 1 in the range of 32 columns and 9 rows
query table - Data types other than BINARY: random values in the range of 8 columns and 3 rows
- BINARY data type: random values that contain only 0 and 1 in the range of 32 columns and 1 row
# Values CREATE TABLE doc_table_float_0702(pk STRING, vector STRING) PARTITIONED BY (pt STRING); CREATE TABLE query_table_float_0702(pk STRING, vector STRING) PARTITIONED BY (pt STRING); ALTER TABLE doc_table_float_0702 add PARTITION(pt='20210702'); ALTER TABLE query_table_float_0702 add PARTITION(pt='20210702'); INSERT OVERWRITE TABLE doc_table_float_0702 PARTITION (pt='20210702') VALUES ('1.nid','1~1~1~1~1~1~1~1'), ('2.nid','2~2~2~2~2~2~2~2'), ('3.nid','3~3~3~3~3~3~3~3'), ('4.nid','4~4~4~4~4~4~4~4'), ('5.nid','5~5~5~5~5~5~5~5'), ('6.nid','6~6~6~6~6~6~6~6'), ('7.nid','7~7~7~7~7~7~7~7'), ('8.nid','8~8~8~8~8~8~8~8'), ('9.nid','9~9~9~9~9~9~9~9'), ('10.nid','10~10~10~10~10~10~10~10'); SELECT * FROM doc_table_float_0702; INSERT OVERWRITE TABLE query_table_float_0702 PARTITION (pt='20210702') VALUES ('q1.nid','1~1~1~1~2~2~2~2'), ('q2.nid','4~4~4~4~3~3~3~3'), ('q3.nid','9~9~9~9~5~5~5~5'); SELECT * FROM query_table_float_0702; # binary CREATE TABLE doc_table_binary_0706(pk STRING, vector STRING) PARTITIONED BY (pt STRING); ALTER TABLE doc_table_binary_0706 add PARTITION(pt='20210706'); INSERT OVERWRITE TABLE doc_table_binary_0706 PARTITION (pt='20210706') VALUES ('1.nid','0~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1'), ('2.nid','0~0~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1'), ('3.nid','0~0~0~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1'), ('4.nid','0~0~0~0~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1'), ('5.nid','0~0~0~0~0~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1'), ('6.nid','0~0~0~0~0~0~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1'), ('7.nid','0~0~0~0~0~0~0~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1'), ('8.nid','0~0~0~0~0~0~0~0~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1'), ('9.nid','0~0~0~0~0~0~0~0~0~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1'); SELECT * FROM doc_table_binary_0706; CREATE TABLE query_table_binary_0706(pk STRING, vector STRING) PARTITIONED BY (pt STRING); ALTER TABLE query_table_binary_0706 add PARTITION(pt='20210706'); INSERT OVERWRITE TABLE query_table_binary_0706 PARTITION (pt='20210706') VALUES ('q1.nid','1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1~1'); SELECT * FROM query_table_binary_0706;
- Run the script. The following script uses the FLOAT data type as an example. The script for other data types is similar.
jar -Dorg.bytedeco.javacpp.logger.debug=true -resources proxima2_ce_linux.jar -classpath http://schedule@{env}inside.cheetah.alibaba-inc.com/scheduler/res?id=179763045 com.alibaba.proxima2.ce.CentauriRunner -doc_table doc_table_float_0702 -doc_table_partition 20210702 -query_table query_table_float_0702 -query_table_partition 20210702 -output_table output_table_float_0702 -output_table_partition 20210702 -data_type float -dimension 8 -app_id 201220 -topk 3;