edit-icon download-icon

PlainBuffer

Last Updated: Mar 23, 2018

The PlainBuffer format is defined in Table Store to indicate row data, because it delivers better performance for serialization, and small object resolution, compared to Protocol Buffer.

Format definition

  1. plainbuffer = tag_header row1 [row2] [row3]
  2. row = ( pk [attr] | [pk] attr | pk attr ) [tag_delete_marker] row_checksum;
  3. pk = tag_pk cell_1 [cell_2] [cell_3]
  4. attr = tag_attr cell1 [cell_2] [cell_3]
  5. cell = tag_cell cell_name [cell_value] [cell_op] [cell_ts] cell_checksum
  6. cell_name = tag_cell_name formated_value
  7. cell_value = tag_cell_value formated_value
  8. cell_op = tag_cell_op cell_op_value
  9. cell_ts = tag_cell_ts cell_ts_value
  10. row_checksum = tag_row_checksum row_crc8
  11. cell_checksum = tag_cell_checksum row_crc8
  12. formated_value = value_type value_len value_data
  13. value_type = int8
  14. value_len = int32
  15. cell_op_value = delete_all_version | delete_one_version
  16. cell_ts_value = int64
  17. delete_all_version = 0x01 (1byte)
  18. delete_one_version = 0x03 (1byte)

Tag value

  1. tag_header = 0x75 (4byte)
  2. tag_pk = 0x01 (1byte)
  3. tag_attr = 0x02 (1byte)
  4. tag_cell = 0x03 (1byte)
  5. tag_cell_name = 0x04 (1byte)
  6. tag_cell_value = 0x05 (1byte)
  7. tag_cell_op = 0x06 (1byte)
  8. tag_cell_ts = 0x07 (1byte)
  9. tag_delete_marker = 0x08 (1byte)
  10. tag_row_checksum = 0x09 (1byte)
  11. tag_cell_checksum = 0x0A (1byte)

ValueType value

The values of value_type in formated_value are as follows:

  1. VT_INTEGER = 0x0
  2. VT_DOUBLE = 0x1
  3. VT_BOOLEAN = 0x2
  4. VT_STRING = 0x3
  5. VT_NULL = 0x6
  6. VT_BLOB = 0x7
  7. VT_INF_MIN = 0x9
  8. VT_INF_MAX = 0xa
  9. VT_AUTO_INCREMENT = 0xb

Calculate the checksum

The basic logic for calculating the checksum is as follows:

  • Calculate the name/value/type/time stamp of each cell.

  • Calculate the delete in the row. If delete mark exists in the row, supplement a single-byte 1; otherwise, supplement a single-byte 0.

  • Because the checksum is calculated for each cell, the cell checksum is used to calculate the row checksum, that is, the CRC operation is only performed on the checksums of cells in the row, not data in the row.

C++ implementation:

  1. void GetChecksum(uint8_t* crc, const InplaceCell& cell)
  2. {
  3. Crc8(crc, cell.GetName());
  4. Crc8(crc, cell.GetValue().GetInternalSlice());
  5. Crc8(crc, cell.GetTimestamp());
  6. Crc8(crc, cell.GetOpType());
  7. }
  8. void GetChecksum(uint8_t* crc, const InplaceRow& row)
  9. {
  10. const std::deque<InplaceCell>& pk = row.GetPrimaryKey();
  11. for (size_t i = 0; i < pk.size(); i++) {
  12. uint8_t* cellcrc;
  13. *cellcrc = 0;
  14. GetChecksum(cellcrc, pk[i]);
  15. Crc8(crc, *cellcrc);
  16. }
  17. for (int i = 0; i < row.GetCellCount(); i++) {
  18. uint8_t* cellcrc;
  19. *cellcrc = 0;
  20. GetChecksum(cellcrc, row.GetCell(i));
  21. Crc8(crc, *cellcrc);
  22. }
  23. uint8_t del = 0;
  24. if (row.HasDeleteMarker()) {
  25. del = 1;
  26. }
  27. Crc8(crc, del);
  28. }

Example

A data row contains two primary key columns and four data columns. The primary key types are string and integer, and the data types are string, int, and double. The versions are 1001, 1002, and 1003. A column is also contained to delete all versions.

  • Primary key column:
    • [pk1:string:iampk]
    • [pk2:integer:100]
  • Attribute column:
    • [column1:string:bad:1001]
    • [column2:integer:128:1002]
    • [column3:double:34.2:1003]
    • [column4:del_all_versions]

Encoding:

  1. <Header starting>[0x75]
  2. <Primary key column starting>[0x1]
  3. <Cell1>[0x3][0x4][3][pk1][0x5][3][5][iampk]
  4. <Cell2>[0x3][0x4][3][pk2][0x5][0][100]
  5. <Attribute column starting>[0x2]
  6. <Cell1>[0x3][0x4][7][column1][0x5][0x3][3][bad][0x7][1001]
  7. <Cell2>[0x3][0x4][7][column2][0x5][0x0][128][0x7][1002]
  8. <Cell3>[0x3][0x4][7][column3][0x5][0x1][34.2][0x7][1003]
  9. <Cell4>[0x3][0x4][7][column4][0x5][0x6][1]
Thank you! We've received your feedback.