Custom feature operators are plugins that the framework can dynamically load and execute. The Feature Generation (FG) framework is designed to be lightweight and includes only a few common feature operators to reduce compile time, minimize service resource usage, and accelerate service startup.
Configuration
{
"feature_name": "my_custom_fg_op",
"feature_type": "custom_feature",
"operator_name": "EditDistance",
"operator_lib_file": "libedit_distance.so",
"expression": [
"user:query",
"item:title"
],
"value_type": "string",
"separator": ",",
"default_value": "-1",
"value_dimension": 1,
"normalizer": "method=expression,expr=x>16?16:x",
"num_buckets": 10000,
"stub_type": false,
"is_sequence": false,
"is_op_thread_safe": true,
...
}In addition to the listed configuration items, you can add other items as needed. The entire JSON configuration string is passed to the custom operator.
Configuration item | Description |
feature_type | Set this to |
operator_name | The name under which the feature operator is registered. We recommend that you keep this name consistent with the implemented class name. The same |
operator_lib_file | The name of the feature operator's dynamic-link library file. The name must end with
|
expression | The input expression. Multiple inputs are supported. |
value_type | The output type of the feature transformation. It can only be a basic type, such as |
default_value | The default value of the feature. Configure this as a string. The code converts it to the required type. |
separator | The separator for multiple values. It is used to split the configured |
stub_type | Indicates whether the current feature operator can only be used as an intermediate result of a feature transformation. If you set this to |
is_sequence | Marks whether the feature is a sequence feature. |
sequence_length | The maximum length of the sequence. If the length exceeds this value, the sequence is truncated. |
sequence_delim | The separator between sequence elements. Set this only when the input is of the string type. |
split_sequence | If the input sequence feature is of the string type, this parameter specifies whether the framework needs to perform a split operation on the sequence. The default value is
|
value_dimension | The dimension of the output feature. This can be used to truncate the output of an offline task and affects the schema of the output table. If the feature has multiple values and the output dimension is uncertain, you can omit this configuration.
|
Discretization operation | Six types of discretization operations are supported. You do not need to implement these operations yourself. For more information, see Feature discretization (binning).
|
normalizer | For numerical features, you can add this configuration to further process the transformation result, such as calculating the value of an expression. For supported operators and functions, see Built-in feature operators. Four frameworks are supported: minmax, zscore, log10, and expression. The configurations and calculation methods are as follows:
|
placeholder | In a sequence feature, if each element of the sequence has multiple values (
|
disable_string_view | Specifies whether to disable feature values of the
|
is_op_thread_safe | Indicates whether the current feature operator is thread-safe. Set this to
|
Additional notes:
User-defined configuration items must not have the same names as the configuration items that are used by the framework.
Custom operators can read and use configuration items defined by the framework. However, attempting to change their semantics will cause undefined behavior.
Configuration items that depend on external resource files must end with
_file.This marker is used to sync resource files when you use FG in offline tasks.
Configuration examples
{
"feature_name": "time_diff_seq",
"feature_type": "custom_feature",
"operator_name": "SeqExpr",
"expression": ["user:cur_time", "user:clk_time_seq"],
"formula": "cur_time - clk_time_seq",
"default_value": "0",
"value_type": "int32",
"is_sequence": true,
"num_buckets": 1000,
"is_op_thread_safe": false
},
{
"feature_name": "spherical_distance",
"feature_type": "custom_feature",
"operator_name": "SeqExpr",
"expression": ["item:click_id_lng", "item:click_id_lat", "user:j_lng", "user:j_lat"],
"formula": "spherical_distance",
"default_value": "0",
"value_type": "double",
"is_sequence": true,
"is_op_thread_safe": true,
"value_dimension": 1,
"normalizer": "method=expression,expr=sqrt(x)"
}formula: An expression. For more information about supported expressions, see expr_feature.spherical_distance: Calculates the distance between two latitude and longitude coordinates. The parameters are[lng1_seq, lat1_seq, lng2, lat2]. The first two parameters are sequences, and the last two are scalar values.
This is an example of a custom
Sequence featurein a tiled format. For an example of a customSequence featurein a nested format, see sequence_feature.
C++ interface
#pragma once
#ifndef FEATURE_GENERATOR_PLUGIN_BASE_H
#define FEATURE_GENERATOR_PLUGIN_BASE_H
#include <absl/container/flat_hash_map.h>
#include <absl/strings/string_view.h>
#include <absl/types/optional.h>
#include <stdexcept>
#include <utility>
#include <vector>
#include "fsmap.h"
#include "integral_types.h"
namespace fg {
using absl::optional;
using std::string;
using std::vector;
template <typename T>
using List = std::vector<T>;
template <typename K, typename V>
using Map = absl::flat_hash_map<K, V>;
template <typename K, typename V>
using MapArray = std::vector<std::pair<K, V>>;
using Matrix = std::vector<std::vector<float>>;
using MatrixL = std::vector<std::vector<int64>>;
using MatrixS = std::vector<std::vector<string>>;
template <typename K, typename V>
using FSMap = featurestore::type::fs_map<K, V>;
using FieldPtr = absl::variant<
const optional<string>*, const optional<int32>*, const optional<int64>*,
const optional<float>*, const optional<double>*,
const optional<absl::string_view>*,
const List<string>*, const List<int32>*, const List<int64>*,
const List<float>*, const List<double>*, const List<absl::string_view>*,
const Map<string, string>*, const Map<string, int32>*,
const Map<string, int64>*, const Map<string, float>*,
const Map<string, double>*, const Map<string, absl::string_view>*,
const Map<absl::string_view, absl::string_view>*,
const Map<absl::string_view, int32>*, const Map<absl::string_view, int64>*,
const Map<absl::string_view, float>*, const Map<absl::string_view, double>*,
const Map<absl::string_view, string>*,
const Map<int32, string>*, const Map<int32, int32>*,
const Map<int32, int64>*, const Map<int32, float>*,
const Map<int32, double>*, const Map<int32, absl::string_view>*,
const Map<int64, string>*, const Map<int64, float>*,
const Map<int64, double>*, const Map<int64, int32>*,
const Map<int64, int64>*, const Map<int64, absl::string_view>*,
const FSMap<absl::string_view, absl::string_view>*,
const FSMap<absl::string_view, int32>*,
const FSMap<absl::string_view, int64>*,
const FSMap<absl::string_view, float>*,
const FSMap<absl::string_view, double>*,
const FSMap<int32, int32>*, const FSMap<int32, int64>*,
const FSMap<int32, float>*, const FSMap<int32, double>*,
const FSMap<int32, absl::string_view>*,
const FSMap<int64, float>*, const FSMap<int64, double>*,
const FSMap<int64, int32>*, const FSMap<int64, int64>*,
const FSMap<int64, absl::string_view>*,
const MapArray<string, string>*, const MapArray<string, int32>*,
const MapArray<string, int64>*, const MapArray<string, float>*,
const MapArray<string, double>*,
const MapArray<int32, string>*, const MapArray<int32, float>*,
const MapArray<int32, double>*, const MapArray<int32, int32>*,
const MapArray<int32, int64>*,
const MapArray<int64, string>*, const MapArray<int64, float>*,
const MapArray<int64, double>*, const MapArray<int64, int32>*,
const MapArray<int64, int64>*, const Matrix*, const MatrixL*,
const MatrixS*>;
// represents a COLUMN of the feature table
using VariantVector = absl::variant<
vector<optional<string>>, vector<optional<int32>>, vector<optional<int64>>,
vector<optional<float>>, vector<optional<double>>,
vector<optional<absl::string_view>>,
vector<List<string>>, vector<List<int32>>, vector<List<int64>>,
vector<List<float>>, vector<List<double>>, vector<List<absl::string_view>>,
vector<Map<string, string>>, vector<Map<string, int32>>,
vector<Map<string, int64>>, vector<Map<string, float>>,
vector<Map<string, double>>, vector<Map<string, absl::string_view>>,
vector<Map<absl::string_view, absl::string_view>>,
vector<Map<absl::string_view, int32>>,
vector<Map<absl::string_view, int64>>,
vector<Map<absl::string_view, float>>,
vector<Map<absl::string_view, double>>,
vector<Map<int32, string>>, vector<Map<int32, int32>>,
vector<Map<int32, int64>>, vector<Map<int32, float>>,
vector<Map<int32, double>>, vector<Map<int32, absl::string_view>>,
vector<Map<int64, string>>, vector<Map<int64, float>>,
vector<Map<int64, double>>, vector<Map<int64, int32>>,
vector<Map<int64, int64>>, vector<Map<int64, absl::string_view>>,
vector<FSMap<absl::string_view, absl::string_view>>,
vector<FSMap<absl::string_view, int32>>,
vector<FSMap<absl::string_view, int64>>,
vector<FSMap<absl::string_view, float>>,
vector<FSMap<absl::string_view, double>>,
vector<FSMap<int32, int32>>, vector<FSMap<int32, int64>>,
vector<FSMap<int32, float>>, vector<FSMap<int32, double>>,
vector<FSMap<int32, absl::string_view>>,
vector<FSMap<int64, float>>, vector<FSMap<int64, double>>,
vector<FSMap<int64, int32>>, vector<FSMap<int64, int64>>,
vector<FSMap<int64, absl::string_view>>,
vector<MapArray<string, string>>, vector<MapArray<string, int32>>,
vector<MapArray<string, int64>>, vector<MapArray<string, float>>,
vector<MapArray<string, double>>,
vector<MapArray<int32, string>>, vector<MapArray<int32, float>>,
vector<MapArray<int32, double>>, vector<MapArray<int32, int32>>,
vector<MapArray<int32, int64>>,
vector<MapArray<int64, string>>, vector<MapArray<int64, float>>,
vector<MapArray<int64, double>>, vector<MapArray<int64, int32>>,
vector<MapArray<int64, int64>>, vector<Matrix>, vector<MatrixL>,
vector<MatrixS>>;
/**
* @brief The public base class for custom feature operators.
*
* The framework checks if a subclass overrides the `BatchProcess` method. If it is overridden,
* the framework calls this method to perform the feature transformation.
* Otherwise, the framework selects one of the `ProcessWith*` methods to execute based on the `value_type` configuration.
* You must implement the method that corresponds to the required output type.
*/
class IFeatureOP {
public:
class NotOverriddenException : public std::exception {
public:
explicit NotOverriddenException(std::string msg) : msg_(std::move(msg)) {}
const char* what() const noexcept override {
if (msg_.empty()) {
return "unimplemented method called";
}
// Cache the message to a member variable to ensure that the returned pointer remains valid.
cached_ = "unimplemented method called: " + msg_;
return cached_.c_str();
}
private:
std::string msg_;
mutable std::string cached_;
};
virtual ~IFeatureOP() = default;
/**
* @brief The initialization method.
* @param feature_config is a json string,
* @return Returns 0 if the model is loaded successfully. Otherwise, it indicates that the model failed to load.
*/
virtual int Initialize(const string& feature_config) = 0;
/**
* @brief Performs feature transformation and outputs the results as the string type.
* @param inputs A record that can contain multiple fields.
* @param outputs The outputs of the feature transformation.
* @return A status code. A value of 0 indicates successful execution.
*/
virtual int ProcessWithStrOutputs(const vector<FieldPtr>& inputs,
vector<string>& outputs) {
throw NotOverriddenException("ProcessWithStrOutputs(FieldPtr)");
}
/**
* @brief Performs feature transformation and outputs the results as the int32 type.
* @param inputs A record that can contain multiple fields.
* @param outputs The outputs of the feature transformation.
* @return A status code. A value of 0 indicates successful execution.
*/
virtual int ProcessWithInt32Outputs(const vector<FieldPtr>& inputs,
vector<int32>& outputs) {
throw NotOverriddenException("ProcessWithInt32Outputs(FieldPtr)");
}
/**
* @brief Performs feature transformation and outputs the results as the int64 type.
* @param inputs A record that can contain multiple fields.
* @param outputs The outputs of the feature transformation.
* @return A status code. A value of 0 indicates successful execution.
*/
virtual int ProcessWithInt64Outputs(const vector<FieldPtr>& inputs,
vector<int64>& outputs) {
throw NotOverriddenException("ProcessWithInt64Outputs(FieldPtr)");
}
/**
* @brief Performs feature transformation and outputs the results as the float type.
* @param inputs A record that can contain multiple fields.
* @param outputs The outputs of the feature transformation.
* @return A status code. A value of 0 indicates successful execution.
*/
virtual int ProcessWithFloatOutputs(const vector<FieldPtr>& inputs,
vector<float>& outputs) {
throw NotOverriddenException("ProcessWithFloatOutputs(FieldPtr)");
}
/**
* @brief Performs feature transformation and outputs the results as the double type.
* @param inputs A record that can contain multiple fields.
* @param outputs The outputs of the feature transformation.
* @return A status code. A value of 0 indicates successful execution.
*/
virtual int ProcessWithDoubleOutputs(const vector<FieldPtr>& inputs,
vector<double>& outputs) {
throw NotOverriddenException("ProcessWithDoubleOutputs(FieldPtr)");
}
/**
* @brief An optional batch interface for processing multiple records.
*
* @param inputs A vector of input columns. `VariantVector` represents a feature column.
* @param outputs
* The transformed features. This method supports complex output types that can be used as inputs for other feature transformations.
* @return A status code. A value of 0 indicates successful execution.
*/
virtual int BatchProcess(const vector<VariantVector>& inputs,
VariantVector& outputs) {
throw NotOverriddenException("BatchProcess");
}
/**
* @brief Explicitly declares whether the subclass implements the BatchProcess method.
*
* The framework preferentially calls this method to check if `BatchProcess` has been overridden.
* If your subclass implements `BatchProcess`, you must override this method to return true.
* By default, it returns false to indicate that `BatchProcess` is not implemented.
*
* Note: This method is used to avoid exception propagation issues across dynamic library boundaries.
* When a custom operator (a .so file) and the main program use different C++ ABIs or compilation options,
* attempting to detect the implementation by calling `BatchProcess` and catching an exception may fail.
*
* @return true if the subclass implements BatchProcess.
* @return false if the subclass does not implement BatchProcess (default).
*/
virtual bool HasBatchProcessImpl() const { return false; }
};
using CreateOperatorFunc = IFeatureOP* (*)();
inline FieldPtr GetFieldPtr(const VariantVector& input, size_t i) {
return absl::visit(
[&](const auto& vec) -> FieldPtr {
if (i >= vec.size()) {
throw std::out_of_range("GetFieldPtr: index " + std::to_string(i) +
" out of range [0, " +
std::to_string(vec.size()) + ")");
}
return &vec.at(i);
},
input);
}
} // namespace fg
#if defined(__GNUC__)
#define PLUGIN_API_HIDDEN \
__attribute__((visibility("hidden"))) __attribute__((used))
#define PLUGIN_API_EXPORT \
__attribute__((visibility("default"))) __attribute__((used))
#else
#define PLUGIN_API_HIDDEN
#define PLUGIN_API_EXPORT
#endif
std::vector<std::string>& getLocalNames();
std::vector<std::pair<std::string, void*>>& getLocalRegs();
#define REGISTER_PLUGIN(OpName, OpClass) \
extern "C" PLUGIN_API_EXPORT fg::IFeatureOP* create##OpClass() { \
return new fg::OpClass(); \
} \
namespace { \
struct _Reg_##OpClass { \
_Reg_##OpClass() { \
getLocalNames().push_back(OpName); \
getLocalRegs().emplace_back(OpName, (void*)&create##OpClass); \
} \
}; \
static _Reg_##OpClass _dummy_##OpClass __attribute__((used)); \
}
#endif // FEATURE_GENERATOR_PLUGIN_BASE_H
Developer guide
Download the API dependency file fg-api.tar.gz. This file contains the necessary header files.
Inherit the
IFeatureOPbase class, implement theInitializemethod, and implement at least oneProcessWith*method.Your implementation class must include a parameterless constructor.
The framework passes the JSON configuration string to the
Initializemethod. You can then parse the required configuration items.The framework calls the corresponding
ProcessWith*method based on thevalue_typeconfiguration item. If you do not implement the method for the corresponding type, a runtime exception is thrown.The
ProcessWith*method processes a single record. It can have multiple input fields and multi-dimensional outputs, such as a multi-value feature.VariantRecorddefines all feature field types that the framework can process.Your code should support as many types as possible by implementing the corresponding feature transformation operation for each possible input type. If you are certain that specific types are not required, you can throw an exception directly.
FSMAP is a type that needs to be supported when you use
featurestore. It can significantly improve processor performance.
You only need to implement the feature transformation operation before the discretization operation. If a discretization operation is configured, the framework automatically performs it.
Use the
REGISTER_PLUGINmacro to register the new feature operator. Otherwise, the framework cannot use it.REGISTER_PLUGIN("OperatorName", OperatorClass): Replace the two macro parameters as required. We recommend that you use the same name for both parameters.
The value for
operator_namein the configuration item must be 'OperatorName'.Register the operator in the implementation file, not the header file.
The framework scans all dynamic-link libraries in a specified directory and attempts to load the required feature operators when necessary.
Use the
FEATURE_OPERATOR_DIRenvironment variable to specify the directory where the dynamic-link library files are located.Each dynamic-link library can contain implementations of multiple feature operators.
The
BatchProcessinterface is used for batch processing and processes one batch of data at a time.This interface is optional. If you implement it, the FG framework no longer calls sample-granularity interfaces such as
ProcessWith*.After implementing this interface, override the
bool HasBatchProcessImpl() constfunction and returntrueto instruct the main program to use this interface.Implementing this interface can improve performance. For example, if a user-side feature contains only one sample per request, you can use the
broadcast mechanismto avoid repeatedly parsing the user-side feature for cross features.When
stub_type=trueis configured and no binning operation is set, this interface can return any valid type, such asMap.The type of the
VariantVectorreturned by theBatchProcessfunction depends on the values ofis_sequence,value_dimension, andvalue_type. For more information, see the description of thevalue_dimensionconfiguration item.For an example of a batch processing interface, you can download and review RegexReplace.
Third-party dependencies
abseil-cpp (We recommend that you use the same version as the FG framework.)
Third-party libraries that the custom operator depends on must be compiled by embedding the source code or using static linking. Do not depend on any dynamic-link libraries, because this can cause the operator to fail to load.
Sequence features
If the is_sequence configuration item is set to true, note the following items:
Sparse feature sequences
If the operator generates a sparse feature sequence, such as a sequence of previously visited
item_ids, and each element of the sequence is a single value, you can output any type.If the operator generates a sparse feature sequence and each element of the sequence can have multiple values, you can only output the string type. You must set value_type to string and use the separator
chr(29)to separate multiple values.
Dense feature sequences
When an operator generates a sparse feature sequence, such as the embedding vectors of historically accessed items, you must set
value_dimensionto the dimension of each element in the sequence.If the elements of the sequence are scalars, set
value_dimensionto 1.If the elements of the sequence are vectors, set
value_dimensionto the length of the vector.The number of feature values output by the operator must be an integer multiple of
value_dimension.
Custom operator list
Operator name | Operator function | Source code download link | Binary package download link |
EditDistance | Edit distance | ||
SeqExpr | Sequence expression | ||
BPETokenize | BPE tokenization | Included in the built-in tokenize_feature. |
Configuration items
EditDistance
encoding: The encoding of the input text. Options:
utf-8,latin. The default value islatin.
Developer examples
The following example shows how to calculate the edit distance between two input texts. The header file is edit_distance.h.
#pragma once
#include "api/base_op.h"
namespace fg {
namespace functor {
class EditDistanceFunctor;
}
using std::string;
using std::vector;
/**
* @brief Edit distance: Inputs two strings and outputs their text edit distance.
*/
class EditDistance : public IFeatureOP {
public:
int Initialize(const string& feature_config) override;
/// @return A status code. 0 indicates successful execution.
int ProcessWithStrOutputs(const vector<FieldPtr>& inputs,
vector<string>& outputs) override;
/// @return A status code. 0 indicates successful execution.
int ProcessWithInt32Outputs(const vector<FieldPtr>& inputs,
vector<int32>& outputs) override;
/// @return A status code. 0 indicates successful execution.
int ProcessWithInt64Outputs(const vector<FieldPtr>& inputs,
vector<int64>& outputs) override;
/// @return A status code. 0 indicates successful execution.
int ProcessWithFloatOutputs(const vector<FieldPtr>& inputs,
vector<float>& outputs) override;
/// @return A status code. 0 indicates successful execution.
int ProcessWithDoubleOutputs(const vector<FieldPtr>& inputs,
vector<double>& outputs) override;
private:
string feature_name_;
std::unique_ptr<functor::EditDistanceFunctor> functor_p_;
};
} // end of namespace fg
The implementation file is edit_distance.cc.
#include "edit_distance.h"
#include <absl/strings/ascii.h>
#include <absl/strings/str_join.h>
#include <nlohmann/json.hpp>
#include <numeric> // Includes std::iota
#include <stdexcept>
#include "api/log.h"
namespace fg {
using absl::optional;
namespace functor {
template <class T>
int edit_distance(const T& s1, const T& s2) {
int l1 = s1.size();
int l2 = s2.size();
if (l1 * l2 == 0) {
return l1 + l2;
}
vector<int> prev(l2 + 1);
vector<int> curr(l2 + 1);
std::iota(prev.begin(), prev.end(), 0);
for (int i = 0; i <= l1; ++i) {
curr[0] = i;
for (int j = 1; j <= l2; ++j) {
int d = prev[j - 1];
if (s1[i - 1] == s2[j - 1]) {
curr[j] = d;
} else {
int d2 = std::min(prev[j], curr[j - 1]);
curr[j] = 1 + std::min(d, d2);
}
}
prev.swap(curr);
}
return prev[l2];
}
enum class Encoding : unsigned int { Latin = 0, UTF8 = 1 };
class EditDistanceFunctor {
public:
EditDistanceFunctor(const string& encoding) {
string enc = absl::AsciiStrToLower(encoding);
if (enc == "utf-8" || enc == "utf8") {
encoding_ = Encoding::UTF8;
} else {
encoding_ = Encoding::Latin;
}
}
int operator()(absl::string_view s1, absl::string_view s2) {
if (encoding_ == Encoding::Latin) {
return edit_distance(s1, s2);
}
if (encoding_ == Encoding::UTF8) {
return edit_distance(from_bytes(s1), from_bytes(s2));
}
LOG(ERROR) << "EditDistanceFunctor found unsupported text encoding";
assert(false);
return 0;
}
const Encoding TextEncoding() const { return encoding_; }
private:
Encoding encoding_;
std::wstring from_bytes(absl::string_view str) {
std::wstring result;
int i = 0;
int len = (int)str.length();
while (i < len) {
int char_size = 0;
int unicode = 0;
if ((str[i] & 0x80) == 0) {
unicode = str[i];
char_size = 1;
} else if ((str[i] & 0xE0) == 0xC0) {
unicode = str[i] & 0x1F;
char_size = 2;
} else if ((str[i] & 0xF0) == 0xE0) {
unicode = str[i] & 0x0F;
char_size = 3;
} else if ((str[i] & 0xF8) == 0xF0) {
unicode = str[i] & 0x07;
char_size = 4;
} else {
// Invalid UTF-8 sequence
++i;
continue;
}
for (int j = 1; j < char_size; ++j) {
unicode = (unicode << 6) | (str[i + j] & 0x3F);
}
if (unicode <= 0xFFFF) {
result += static_cast<wchar_t>(unicode);
} else {
// Handle surrogate pairs for characters outside the BMP
unicode -= 0x10000;
result += static_cast<wchar_t>((unicode >> 10) + 0xD800);
result += static_cast<wchar_t>((unicode & 0x3FF) + 0xDC00);
}
i += char_size;
}
return result;
}
};
} // namespace functor
// Defines the overloaded class.
template <class... Ts>
struct overloaded : Ts... {
using Ts::operator()...;
};
// Class template argument deduction guide (C++17).
template <class... Ts>
overloaded(Ts...) -> overloaded<Ts...>;
int EditDistance::Initialize(const string& feature_config) {
nlohmann::json cfg;
try {
cfg = nlohmann::json::parse(feature_config);
} catch (nlohmann::json::parse_error& ex) {
LOG(ERROR) << "parse error at byte " << ex.byte;
LOG(ERROR) << "config: " << feature_config;
throw std::runtime_error("parse EditDistance config failed");
}
feature_name_ = cfg.at("feature_name");
string encoding = cfg.value("encoding", "latin");
functor_p_ = std::make_unique<functor::EditDistanceFunctor>(encoding);
functor::Encoding enc = functor_p_->TextEncoding();
encoding = (enc == functor::Encoding::UTF8) ? "UTF-8" : "Latin";
LOG(INFO) << "feature <" << feature_name_ << "> with text encoding: " << encoding;
return 0;
}
int EditDistance::ProcessWithInt32Outputs(const vector<FieldPtr>& inputs,
vector<int32>& outputs) {
outputs.clear();
if (inputs.size() < 2) {
outputs.push_back(0);
return -1; // invalid inputs
}
int d = absl::visit(
overloaded{
[this](const optional<string>* s1, const optional<string>* s2) {
absl::string_view empty_view;
return functor_p_->operator()(*s1 ? **s1 : empty_view, *s2 ? **s2 : empty_view);
},
[this](const optional<absl::string_view>* s1,
const optional<absl::string_view>* s2) {
absl::string_view empty_view;
return functor_p_->operator()(*s1 ? **s1 : empty_view, *s2 ? **s2 : empty_view);
},
[this](const optional<absl::string_view>* s1,
const optional<string>* s2) {
absl::string_view empty_view;
return functor_p_->operator()(*s1 ? **s1 : empty_view, *s2 ? **s2 : empty_view);
},
[this](const optional<string>* s1,
const optional<absl::string_view>* s2) {
absl::string_view empty_view;
return functor_p_->operator()(*s1 ? **s1 : empty_view, *s2 ? **s2 : empty_view);
},
[this](const List<string>* s1, const List<string>* s2) {
string str1 = absl::StrJoin(*s1, "");
string str2 = absl::StrJoin(*s2, "");
return functor_p_->operator()(str1, str2);
},
[this](const List<absl::string_view>* s1,
const List<absl::string_view>* s2) {
string str1 = absl::StrJoin(*s1, "");
string str2 = absl::StrJoin(*s2, "");
return functor_p_->operator()(str1, str2);
},
[this](const auto* x, const auto* y) {
ERROR_EXIT(feature_name_,
"unsupported input type: ", typeid(*x).name(), " vs ",
typeid(*y).name());
return 0;
}},
inputs.at(0), inputs.at(1));
outputs.push_back(d);
return 0;
}
int EditDistance::ProcessWithInt64Outputs(const vector<FieldPtr>& inputs,
vector<int64>& outputs) {
vector<int32> distances;
int status = ProcessWithInt32Outputs(inputs, distances);
if (0 != status) {
return status;
}
outputs.clear();
outputs.insert(outputs.end(), distances.begin(), distances.end());
return 0;
}
int EditDistance::ProcessWithFloatOutputs(const vector<FieldPtr>& inputs,
vector<float>& outputs) {
vector<int32> distances;
int status = ProcessWithInt32Outputs(inputs, distances);
if (0 != status) {
return status;
}
outputs.clear();
outputs.insert(outputs.end(), distances.begin(), distances.end());
return 0;
}
int EditDistance::ProcessWithDoubleOutputs(const vector<FieldPtr>& inputs,
vector<double>& outputs) {
vector<int32> distances;
int status = ProcessWithInt32Outputs(inputs, distances);
if (0 != status) {
return status;
}
outputs.clear();
outputs.insert(outputs.end(), distances.begin(), distances.end());
return 0;
}
int EditDistance::ProcessWithStrOutputs(const vector<FieldPtr>& inputs,
vector<string>& outputs) {
vector<int32> distances;
int status = ProcessWithInt32Outputs(inputs, distances);
if (0 != status) {
return status;
}
outputs.clear();
outputs.reserve(distances.size());
std::transform(distances.begin(), distances.end(),
std::back_inserter(outputs),
[](int32& x) { return std::to_string(x); });
return 0;
}
} // end of namespace fg
REGISTER_PLUGIN("EditDistance", EditDistance);
Download the source code from the table above and run the build.sh script to compile and generate the FG operator.
Compile a custom operator
You must use the same compilation environment as the FG framework, including the language standard (C++17) and compilation options. We recommend that you use the official compiler image. The image details are available in the build.sh script.
Compiler environment image (CentOS 7):
mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easyrec/feature_generator:centos7-0.1.1Compiler environment image (rockylinux:8, compatible with CentOS 8):
mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easyrec/feature_generator:0.1.1By default, the C++11 ABI is not used. To use the new ABI, set
_GLIBCXX_USE_CXX11_ABI=1. In this case, you can only use the second image (tag: 0.1.1), which is based onrockylinux:8.Make sure that the custom operator does not use dynamic linking for third-party libraries. You can use static linking or copy the source code into the project to compile.
For more information, see the CMakeLists.txt file in the developer example.