definition of and how to use EmbeddingVariable - Platform For AI

You can use EmbeddingVariable in ultra-large-scale training to reduce memory resource usage and ensure that model features are not damaged.

Warning

GPU-accelerated servers will be phased out. You can submit TensorFlow tasks that run on CPU servers. If you want to use GPU-accelerated instances for model training, go to Deep Learning Containers (DLC) to submit jobs. For more information, see Submit training jobs.

Background information

Embedding is an effective way to manage features that involve words and IDs in deep learning. Embedding is a kind of function mapping that maps high-dimensional sparse features to low-dimensional dense vectors and performs end-to-end model training. In TensorFlow, variables are used to define a model or a node status. The definition of a model or the node status by using variables depends on the tensor of the data structure. Tensor is an abstract data type in TensorFlow that contains scalars, vectors, matrices, and high-dimensional data structures. Tensor is used as a data carrier for communication among operators. Any operator that supports tensor as the input and output can be used for graph computing. Tensor uses consecutive storage. When you define a variable, you need to specify the variable type and shape. You cannot modify the shape.

TensorFlow uses variables to implement the embedding mechanism. [vocabulary_size, embedding_dimension] is used to specify the shape of a variable in embedding. Scenarios in which large-scale sparse features are used have the following disadvantages:

The value of the vocabulary_size parameter is determined by the ID size and may be difficult to estimate because the number of IDs in online learning scenarios increases.
In most cases, an ID is a large-sized string. Before embedding, you need to hash the ID based on the value range of the vocabulary_size parameter
- If vocabulary_size is excessively small, the probability of a hash collision increases. Different features may share the same embedding, which results in fewer features.
- If vocabulary_size is excessively large, the variable stores unnecessary embedding, which results in memory redundancy.
Large values of embedding variables increase the model size. You cannot remove the embedding from the model even if regular expressions are used to reduce the impact of the embedding of some features on the model.

To resolve the preceding issues, PAI-TensorFlow provides EmbeddingVariable. EmbeddingVariable uses memory resources in a cost-effective manner to perform offline trainings of ultra-large-scale features and publish models online without compromising features. PAI-TensorFlow provides the EmbeddingVariable V3.1 and Feature_Column V3.3 APIs. We recommend that you use the Feature_Column APIs, which can automatically accelerate the feature identification process for a string.

EmbeddingVariable features

Dynamic embedding
You need to only specify the embedding_dim parameter to allow PAI-TensorFlow to dynamically increase or decrease the dictionary size based on training. This method is suitable for online learning and eliminates the need to preprocess data of the PAI-TensorFlow model.
Group lasso
In most cases, the size of an embedding variable that underwent deep learning is large. If you deploy the embedding variable as an online service, the server may be overloaded. Group lasso-based embedding variables reduce the workload of model deployment.
EmbeddingVariable allows you to transfer original feature values to the embedding lookup. This way, you do not need to perform identification operations, such as hashing and you can achieve feature lossless training.
EmbeddingVariable supports the import and export of graph inference, backpropagation, and variables. During model training, an optimizer is used to automatically update embedding variables.

tf.get_embedding_variable

tf.get_embedding_variable returns an existing or a new embedding variable. Definition:

get_embedding_variable(
    name,
    embedding_dim,
    key_dtype=dtypes.int64,
    value_dtype=None,
    initializer=None,
    trainable=True,
    collections=None,
    partitioner=None,
    custom_getter=None,
    steps_to_live=None,
    filter_options=variables.CounterFilterOptions()
)

name: the name of the embedding variable.
embedding_dim: the embedding dimension. Example: 8 or 64.
key_dtype: the type of the key in the embedding lookup. Default value: int64.
value_dtype: the type of embedding vector. Only the FLOAT type is supported.
initializer: the initial value of the embedding variable.
trainable: specifies whether the embedding variable is added to the collection of GraphKeys.TRAINABLE_VARIABLES.
collections: a list of collection keys. The embedding variable is added to the collections in the list. Default value: [GraphKeys.GLOBAL_VARIABLES].
partitioner: the partition function.
custom_getter: a callable object that uses true getter as the first argument and allows the internal get_variable method to be overwritten. custom_getter must be in the def custom_getter(getter,*args,**kwargs) format. def custom_getter(getter,name,*args,**kwargs) allows direct access to all get_variable parameters.
steps_to_live: the number of global steps. This parameter is used to remove expired features. The system deletes the features whose number of global steps exceeds the value of this parameter.
filter_options: the admission policy. You can configure the policy based on the feature frequency.

EmbeddingVariable

Structure of EmbeddingVariable:

class EmbeddingVariable(ResourceVariable)

  def total_count():
    # Return the total_count and [rowCount,EmbeddingDim] of the embedding variable. 
  def read_value():
    raise NotImplementedError("...")
  def assign():
    raise NotImplementedError("...")
  def assign_add():
    raise NotImplementedError("...")
  def assign_sub():
    raise NotImplementedError("...")

The sparse_read() method that is used to read sparse data is supported. If the queried key does not exist and you want the initial value of the embedding variable to be returned, the initializer that corresponds to key is returned.
The total_count() method that is used to count the total number of words in the embedding variable is supported. This method returns the dynamic shape value of the variable.
The read_value() method that is used to read all embedding variables is not supported.
The methods that are used to assign values to embedding variables are not supported. The methods include assign(), assign_add(), and assign_sub().

Structure of CounterFilterOptions:

	@tf_export("CounterFilterOptions")
	class CounterFilterOptions(object):
		def __init__(self, filter_freq=0):
			pass

Specifies the admission frequency. Default value: 0.

Use feature_column to build an embedding variable

def tf.contrib.layers.sparse_column_with_embedding(column_name=column_name,
                                                   dtype=tf.string,
                                                   partition_num=None,
                                                   steps_to_live=None,
                                                   # Only the 140 Lite version of TensorFlow supports these parameters. 
                                                   steps_to_live_l2reg=None,
                                                   l2reg_theta=None)

  # column_name: column name
  # dtype: type, default is tf.string

Examples

Use tf.get_embedding_variable to build a TensorFlow graph that contains an embedding variable

#!/usr/bin/python
import tensorflow as tf

var = tf.get_embedding_variable("var_0",
                                embedding_dim=3,
                                initializer=tf.ones_initializer(tf.float32),
                                partitioner=tf.fixed_size_partitioner(num_shards=4))

shape = [var1.total_count() for var1 in var]

emb = tf.nn.embedding_lookup(var, tf.cast([0,1,2,5,6,7], tf.int64))
fun = tf.multiply(emb, 2.0, name='multiply')
loss = tf.reduce_sum(fun, name='reduce_sum')
opt = tf.train.FtrlOptimizer(0.1,
                             l1_regularization_strength=2.0,
                             l2_regularization_strength=0.00001)

g_v = opt.compute_gradients(loss)
train_op = opt.apply_gradients(g_v)

init = tf.global_variables_initializer()

sess_config = tf.ConfigProto(allow_soft_placement=True, log_device_placement=False)
with tf.Session(config=sess_config) as sess:
  sess.run([init])
  print(sess.run([emb, train_op, loss]))
  print(sess.run([emb, train_op, loss]))
  print(sess.run([emb, train_op, loss]))
  print(sess.run([shape]))

Save an embedding variable as a checkpoint

#!/usr/bin/python
import tensorflow as tf

var = tf.get_embedding_variable("var_0",
                                embedding_dim=3,
                                initializer=tf.ones_initializer(tf.float32),
                                partitioner=tf.fixed_size_partitioner(num_shards=4))

emb = tf.nn.embedding_lookup(var, tf.cast([0,1,2,5,6,7], tf.int64))

init = tf.global_variables_initializer()
saver = tf.train.Saver(sharded=True)
print("GLOBAL_VARIABLES: ", tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES))
print("SAVEABLE_OBJECTS: ", tf.get_collection(tf.GraphKeys.SAVEABLE_OBJECTS))

checkpointDir = "/tmp/model_dir"
sess_config = tf.ConfigProto(allow_soft_placement=True, log_device_placement=False)
with tf.Session(config=sess_config) as sess:
  sess.run([init])
  print(sess.run([emb]))

  save_path = saver.save(sess, checkpointDir + "/model.ckpt", global_step=666)
  tf.train.write_graph(sess.graph_def, checkpointDir, 'train.pbtxt')
  print("save_path", save_path)
  print("list_variables", tf.contrib.framework.list_variables(checkpointDir))

Restore an embedding variable from a checkpoint

#!/usr/bin/python
import tensorflow as tf

var = tf.get_embedding_variable("var_0",
                                embedding_dim=3,
                                initializer=tf.ones_initializer(tf.float32),
                                partitioner=tf.fixed_size_partitioner(num_shards=4))

emb = tf.nn.embedding_lookup(var, tf.cast([0,1,2,5,6,7], tf.int64))

init = tf.global_variables_initializer()
saver = tf.train.Saver(sharded=True)
print("GLOBAL_VARIABLES: ", tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES))
print("SAVEABLE_OBJECTS: ", tf.get_collection(tf.GraphKeys.SAVEABLE_OBJECTS))

checkpointDir = "/tmp/model_dir"
sess_config = tf.ConfigProto(allow_soft_placement=True, log_device_placement=False)
with tf.Session(config=sess_config) as sess:
  print("list_variables", tf.contrib.framework.list_variables(checkpointDir))
  saver.restore(sess, checkpointDir + "/model.ckpt-666")
  print(sess.run([emb]))

Use feature_column to build a TensorFlow graph that contains an embedding variable

import tensorflow as tf
import os

columns_list=[]
columns_list.append(tf.contrib.layers.sparse_column_with_embedding(column_name="col_emb", dtype=tf.string))
W = tf.contrib.layers.shared_embedding_columns(sparse_id_columns=columns_list,
        dimension=3,
        initializer=tf.ones_initializer(tf.float32),
        shared_embedding_name="xxxxx_shared")

ids={}
ids["col_emb"] = tf.SparseTensor(indices=[[0,0],[1,0],[2,0],[3,0],[4,0]], values=["aaaa","bbbbb","ccc","4nn","5b"], dense_shape=[5, 5])

emb = tf.contrib.layers.input_from_feature_columns(columns_to_tensors=ids, feature_columns=W)

fun = tf.multiply(emb, 2.0, name='multiply')
loss = tf.reduce_sum(fun, name='reduce_sum')
opt = tf.train.FtrlOptimizer(0.1, l1_regularization_strength=2.0, l2_regularization_strength=0.00001)
g_v = opt.compute_gradients(loss)
train_op = opt.apply_gradients(g_v)
init = tf.global_variables_initializer()
init_local = tf.local_variables_initializer()
sess_config = tf.ConfigProto(allow_soft_placement=True, log_device_placement=False)
with tf.Session(config=sess_config) as sess:
  sess.run(init)
  print("init global done")
  sess.run(init_local)
  print("init local done")
  print(sess.run([emb, train_op,loss]))
  print(sess.run([emb, train_op,loss]))
  print(sess.run([emb, train_op,loss]))
  print(sess.run([emb]))

Note

EmbeddingVariable supports only the FTRL optimizer, Adagrad optimizer, Adam optimizer, and AdagradDecay optimizer.