Preparations

Read the Quick start topic, download the latest version of HBase Shell, and then configure HBase Shell.

Create mappings between the HBase table and the Search index

You can use a JavaScript Object Notation (JSON) file to create mappings between the HBase table and the Search index. The following example shows a sample configuration.

{
  "sourceNamespace": "default",
  "sourceTable": "testTable",
  "targetIndexName": "democollection",
  "indexType": "SOLR",
  "rowkeyFormatterType": "STRING",
  "fields": [
    {
      "source": "f:name",
      "targetField": "name_s",
      "type": "STRING"
    },
     {
      "source": "f:age",
      "targetField": "age_i",
      "type": "INT"
    }
  ]
}

In the preceding example, the data in the testTable table is synchronized to the democollection index. The f:name column, in which the column family and column name are separated with a colon (:), is mapped to the name_s column in the index. The f:age column is mapped to the age_i column in the index. The following table describes the parameters and the valid values.

Parameter Description
sourceNamespace The namespace of the HBase table. If no namespace is available for the table, leave the parameter empty or set the value to default.
sourceTable The name of the HBase table without the namespace.
targetIndexName The name of the Search index.
indexType The value is set to SOLR.
rowkeyFormatterType The rowkey format in the HBase table. You can set this parameter to STRING or HEX. For more information, see the following section.
fields The column to be mapped and the type of the column in the Search index. The value is a JSON array. Separate multiple fields with commas (,). For more information, see the following example. For more information about this parameter, see the following section.

rowkeyFormatterType

rowkeyFormatterType indicates how the rowkey in the HBase table is mapped to the ID of the index document. The ID is a string. Valid values:

  • STRING: If the data type of the rowkey in the HBase table is STRING, such as row1order0001,12345, use this value. Note: In the preceding example, 12345 is a string, not a number. In this case, you can use the Bytes.toString(byte[]) function to convert the rowkey to the ID of the index document. After you find the destination document in the Search index, you can use the Bytes.toBytes(String) function to convert the ID of the index document to byte[], which is used as the rowkey to query the original data in the HBase table.
  • HEX: If the data type of the rowkey in the HBase table is not STRING, use this value. For example, if the rowkey is a numerical value such as 12345 or a combination of multiple fields that include the non-string data type, set the parameter to HEX. In this case, you can use the encodeAsString(byte[]) function of the org.apache.commons.codec.binary.Hex package to convert the rowkey to the ID of the index document. After you find the destination document in the Search index, you can use the Hex.decodeHex(String.toCharArray()) function to convert the ID of the index document to byte[], which is used as the rowkey to query the original data in the HBase table.

Note: If the data is not written to the HBase table by using the Bytes.toBytes(String) function, the data type is not considered as STRING. To import the rowkey to HBase, set this parameter to HEX. Otherwise, after you convert the ID of the document in the index back to bytes, the result may be different from the original rowkey.

fields

The following table describes the parameters for a field mapping.

Parameter Description
source The name of the column to be mapped in the HBase table. The name of the column family and qualifier are separated with a colon (:). For example, f:name.
targetField The column name in the index. In the preceding example, the given columns are dynamic columns, such as name_s and age_i. If you use dynamic columns, you do not need to predefine the column name. The Search service automatically identifies dynamic columns. For more information about dynamic columns, see Update the configuration set.
type The data type of the column when data type is written to the HBase table. This data type must be the same as the data type of the source column in the HBase table. The valid values are INT, LONG, STRING, BOOLEAN, FLOAT, DOUBLE, SHORT, and BIGDECIMAL. The values are case-sensitive.

Learn about data type

HBase does not use the concept of data type. Instead, data types are converted to bytes. You can call the Bytes.toBytes(String/long/int/...) method to convert all types of data to bytes, such as STRING, LONG, and INT. This way, you can store data in the column in the HBase table. The type parameter specifies how data is stored in a column in the HBase table. Example:

int age = 25;
byte[] ageValue = Bytes.toBytes(age);
put.addColumn(Bytes.toBytes("f"), Bytes.toBytes("age"), ageValue);
String name = "25";
byte[] nameValue = Bytes.toBytes(name);
put.addColumn(Bytes.toBytes("f"), Bytes.toBytes("name"), nameValue);

In the preceding code, the f:age column is of the INT type, whereas the f:name column is of the STRING type instead of the INT type. You must set the type parameter to a valid value to synchronize data to the Search index. The system converts the bytes back to the original data based on the specified type to synchronize the data to the Search index. In the preceding example, if you set the type of the f:name column to INT, the system calls the Bytes.toInt() method to convert the bytes back to the original data. This causes the data to be converted into the wrong type.

Learn about targetField

The targetField parameter specifies the destination column in the Search index to be mapped from the source column in the HBase table. The Search service has a strong schema. Therefore, you must preset the managed_schema configuration set for each column. For more information about how to configure the schema, see Update the configuration set. We recommend that you use the dynamicField method of the Search service. This method automatically identifies the type of the column based on the suffix. For example, name_s indicates that the type of the column in the index is STRING.

The type of the source column in the HBase table does not need to match the data type of the column in the index. For example, you can set the type of the source column f:age to STRING and set the targetField parameter to age_i in the index to specify that the type of the column is INT. When the source column is synchronized to the index, the Search service automatically converts the data type from STRING to INT. If data of the STRING type that cannot be converted to a number is written to the f:age column, an error occurs when the column is synchronized to the index.

Manage the schema

Modify the schema of the mapping

You can store the schema of the JSON format that is described in the preceding section in a file named schema.json. Then, you can run the alter_external_index command in HBase Shell to modify the schema of the HBase mapping. You must place the schema.json file in the startup directory of HBase Shell. You can also specify a relative path or an absolute path that points to the file.

hbase(main):006:0> alter_external_index 'HBase table name', 'schema.json'

You can use the JSON file to add, delete, or modify multiple columns. You can also delete all mapping columns in the fields parameter to delete all mappings of the HBase table. Example:

{
  "sourceNamespace": "default",
  "sourceTable": "testTable",
  "targetIndexName": "democollection",
  "indexType": "SOLR",
  "rowkeyFormatterType": "STRING",
  "fields": []
}

If you want to add one or more columns to the schema of the existing mapping, you can run the add_external_index_field command.

 hbase shell> add_external_index_field 'testTable', {FAMILY => 'f', QUALIFIER => 'money', TARGETFIELD => 'money_f', TYPE => 'FLOAT' }

Note: You can run the add_external_index_field command to add columns only for tables whose schema of the mapping is modified by running the alter_external_index command. Each time you modify the schema of the mapping, the HBase table is completely altered. If you want to modify a large number of columns, we recommend that you run the alter_external_index method to complete the modification.

If you want to delete one or more columns from the schema of the existing mapping, run the remove_external_index command.

  hbase shell> remove_external_index 'testTable', 'f:name', f:age'

Note: Each time you modify the schema of the mapping, the HBase table is completely altered. If you want to modify a large number of columns, we recommend that you run the alter_external_index method to complete the modification.

View the current schema of the mapping

Run the describe_external_index command in HBase Shell to obtain the complete schema of the mapping in the JSON format for the current table.

hbase(main):005:0> describe_external_index 'testTable'