Feature Generator
RecIS’s Feature Generator module provides configurable feature and embedding settings.
FG
- class recis.fg.feature_generator.FG(fg_parser: FGParser, shape_manager: ShapeManager, use_coalesce=True, grad_reduce_by='worker', initializer='uniform', init_kwargs=None, emb_default_class='hash_table', emb_default_device='cuda', emb_default_type=torch.float32)[source]
Feature Generator for managing feature configurations and embeddings.
The FG class serves as the main interface for feature generation in the RecIS system. It manages feature parsing, shape inference, embedding configurations, and provides utilities for building feature pipelines with proper initialization and device management.
- Key Features:
Feature configuration parsing and validation
Automatic shape inference for features and blocks
Embedding configuration management with multiple initializers
Support for both hash table and bucket embeddings
Multi-hash feature support for advanced embedding strategies
Integration with dataset I/O operations
- shape_manager
Manager for feature and block shapes.
- Type:
ShapeManager
- embedding_initializer
Initializer class for embedding parameters.
- emb_default_type
Default data type for embeddings.
- Type:
- __init__(fg_parser: FGParser, shape_manager: ShapeManager, use_coalesce=True, grad_reduce_by='worker', initializer='uniform', init_kwargs=None, emb_default_class='hash_table', emb_default_device='cuda', emb_default_type=torch.float32)[source]
Initialize the Feature Generator.
- Parameters:
fg_parser (FGParser) – Parser for feature configuration files.
shape_manager (ShapeManager) – Manager for feature and block shapes.
use_coalesce (bool, optional) – Whether to use coalesced operations. Defaults to True.
grad_reduce_by (str, optional) – Gradient reduction strategy. Defaults to “worker”.
initializer (str, optional) – Embedding initializer type. Must be one of “constant”, “uniform”, “normal”, “xavier_normal”, “xavier_uniform”. Defaults to “uniform”.
init_kwargs (dict, optional) – Custom initialization parameters. If None, uses default parameters for the specified initializer.
emb_default_class (str, optional) – Default embedding class. Must be “hash_table” or “bucket_emb”. Defaults to “hash_table”.
emb_default_device (str, optional) – Default device for embeddings. Must be “cpu” or “cuda”. Defaults to “cuda”.
emb_default_type (torch.dtype, optional) – Default data type for embeddings. Defaults to torch.float32.
- Raises:
ValueError – If emb_default_class is not “hash_table” or “bucket_emb”.
ValueError – If emb_default_device is not “cpu” or “cuda”.
NotImplementedError – If bucket embedding is selected (not yet implemented).
- add_id(id_name)[source]
Add an ID feature name.
- Parameters:
id_name (str) – Name of the ID feature.
- add_io_features(dataset: DatasetBase)[source]
Add I/O features to a dataset based on parser configurations.
This method configures the dataset with features from the parser’s I/O configurations, adds label features with their dimensions and default values, and adds variable-length ID features.
- Parameters:
dataset (DatasetBase) – Dataset to configure with features.
- property block_shapes
Get block shapes from the shape manager.
- Returns:
Dictionary mapping block names to their shapes.
- Return type:
- property feature_blocks
Get feature blocks from the parser.
- Returns:
Dictionary mapping block names to feature lists.
- Return type:
- property feature_shapes
Get feature shapes from the shape manager.
- Returns:
Dictionary mapping feature names to their shapes.
- Return type:
- get_emb_confs()[source]
Generate embedding configurations for all features.
This method processes all embedding configurations from the parser and creates EmbeddingOption objects with appropriate settings for device, data type, initializer, and hooks.
- Returns:
Dictionary mapping embedding names to EmbeddingOption objects.
- Return type:
OrderedDict
- Raises:
RuntimeError – If an unsupported transform configuration is encountered.
- get_feature_confs()[source]
Generate feature configurations for all features.
This method processes all embedding configurations from the parser and creates Feature objects with appropriate operations based on the transformation types (bucketize, hash, mod, etc.).
- Returns:
List of Feature objects with configured operations.
- Return type:
- Raises:
RuntimeError – If an unsupported ID transform type is encountered.
- is_seq_block(block_name)[source]
Check if a block is a sequence block.
- Parameters:
block_name (str) – Name of the block to check.
- Returns:
True if the block is a sequence block, False otherwise.
- Return type:
- Raises:
RuntimeError – If the block name is not found in feature blocks.
- property sample_ids
Get list of sample ID feature names.
- Returns:
List of ID feature names.
- Return type:
- recis.fg.feature_generator.build_fg(fg_conf_path, mc_conf_path=None, mc_config=None, fg_parser_class=<class 'recis.fg.fg_parser.FGParser'>, mc_parser_class=<class 'recis.fg.mc_parser.MCParser'>, fg_class=<class 'recis.fg.feature_generator.FG'>, shape_manager_class=<class 'recis.fg.shape_manager.ShapeManager'>, uses_columns=None, lower_case=False, with_seq_prefix=False, already_hashed=False, hash_in_io=False, devel_mode=False, **kwargs)[source]
Build a complete Feature Generator with all necessary components.
This factory function creates and initializes all components needed for feature generation: MC parser, FG parser, shape manager, and the main FG instance. It provides a convenient way to set up the entire feature generation pipeline with proper configuration.
- Parameters:
fg_conf_path (str) – Path to the feature generation configuration file.
mc_conf_path (str, optional) – Path to the MC configuration file. Either this or mc_config must be provided.
mc_config (dict, optional) – MC configuration dictionary. Either this or mc_conf_path must be provided.
fg_parser_class (type, optional) – FGParser class to use. Defaults to FGParser.
mc_parser_class (type, optional) – MCParser class to use. Defaults to MCParser.
fg_class (type, optional) – FG class to use. Defaults to FG.
shape_manager_class (type, optional) – ShapeManager class to use. Defaults to ShapeManager.
uses_columns (list, optional) – List of column names to use. If None, uses all columns.
lower_case (bool, optional) – Whether to convert configuration keys to lowercase. Defaults to False.
with_seq_prefix (bool, optional) – Whether the feature name already has sequence block name as prefix. Defaults to False.
already_hashed (bool, optional) – Whether features are already hashed. Defaults to False.
hash_in_io (bool, optional) – Whether to perform hashing in I/O layer. Defaults to False.
devel_mode (bool, optional) – Whether to enable development mode. Defaults to False.
**kwargs – Additional keyword arguments passed to the FG constructor.
- Returns:
Configured Feature Generator instance ready for use.
- Return type:
Example
# Build FG with file paths fg = build_fg( fg_conf_path="features.json", mc_conf_path="model_config.json", initializer="xavier_uniform", emb_default_device="cuda", ) # Build FG with configuration dictionary fg = build_fg( fg_conf_path="features.json", mc_config={"block1": ["feature1", "feature2"]}, uses_columns=["block1"], )
FGParser
- class recis.fg.fg_parser.FGParser(conf_file_path, mc_parser, already_hashed=False, hash_in_io=False, lower_case=False, devel_mode=False)[source]
Feature Generation configuration parser and processor.
The FGParser class is responsible for parsing feature generation configuration files, processing feature definitions, and creating structured configurations for the feature generation pipeline. It handles both regular and sequence features, applies various transformations, and manages feature filtering based on model configuration.
- Key Features:
Parse JSON configuration files for feature definitions
Filter features based on model configuration requirements
Handle sequence features with proper length and structure
Support feature copying and inheritance
Generate I/O and embedding configurations
Validate and transform feature parameters
- mc_parser
Model configuration parser instance.
- __init__(conf_file_path, mc_parser, already_hashed=False, hash_in_io=False, lower_case=False, devel_mode=False)[source]
Initialize the FG Parser.
- Parameters:
conf_file_path (str) – Path to the feature generation configuration file.
mc_parser – Model configuration parser instance.
already_hashed (bool, optional) – Whether features are already hashed. Defaults to False.
hash_in_io (bool, optional) – Whether to hash in I/O layer. Defaults to False.
lower_case (bool, optional) – Whether to convert keys to lowercase. Defaults to False.
devel_mode (bool, optional) – Whether to enable development mode. Defaults to False.
- property emb_configs
Get embedding configurations for all features.
- Returns:
Dictionary mapping feature names to embedding configurations.
- Return type:
- property feature_blocks
Get feature blocks from the model configuration parser.
- Returns:
Dictionary mapping block names to feature lists.
- Return type:
- get_seq_len(fea_name)[source]
Get sequence length for a sequence feature.
- Parameters:
fea_name (str) – Name of the sequence feature.
- Returns:
Sequence length of the feature.
- Return type:
- Raises:
RuntimeError – If the feature is not a sequence feature.
- property io_configs
Get I/O configurations for all features.
- Returns:
Dictionary mapping feature names to I/O configurations.
- Return type:
MCParser
- class recis.fg.mc_parser.MCParser(mc_config_path=None, mc_config=None, uses_columns=None, lower_case=False, with_seq_prefix=False)[source]
Model Configuration parser for managing feature blocks and sequences.
The MCParser class is responsible for parsing model configuration files and managing the organization of features into blocks. It handles both regular feature blocks and sequence blocks, providing utilities to check feature availability and manage feature groupings for model training.
- Key Features:
Parse JSON model configuration files
Manage feature blocks and sequence blocks
Filter features based on column usage requirements
Provide feature availability checking utilities
Support both file-based and dictionary-based configuration
- seq_blocks
Dictionary mapping sequence block names to feature names.
- Type:
OrderedDict
- blocks
Dictionary of all usable feature names.
- Type:
OrderedDict
- fea_blocks
Dictionary mapping block names to feature lists for concatenation.
- Type:
OrderedDict
- __init__(mc_config_path=None, mc_config=None, uses_columns=None, lower_case=False, with_seq_prefix=False)[source]
Initialize the MC Parser.
- Parameters:
mc_config_path (str, optional) – Path to the model configuration file. Either this or mc_config must be provided.
mc_config (dict, optional) – Model configuration dictionary. Either this or mc_config_path must be provided.
uses_columns (list, optional) – List of column names to use. If None, uses all columns from the configuration.
lower_case (bool, optional) – Whether to convert configuration keys to lowercase. Defaults to False.
with_seq_prefix (bool, optional) – Whether the feature name already has sequence block name as prefix. Defaults to False.
- Raises:
AssertionError – If neither mc_config_path nor mc_config is provided.
- property feature_blocks
Get feature blocks dictionary.
- Returns:
Dictionary mapping block names to feature lists.
- Return type:
OrderedDict
- has_seq_fea(seq_block, fea_name)[source]
Check if a sequence feature is available in a sequence block.
- property seq_block_names
Get sequence block names.
- Returns:
Keys of sequence blocks dictionary.
- Return type:
dict_keys