Neural Network Module
RecIS’s neural network module provides components optimized for sparse computation, including dynamic embedding, sparse merging, feature filtering and other core functionalities.
Components
Component Overview
Component |
Description |
---|---|
Core implementation of dynamic embedding tables |
|
Feature admission and feature filtering strategies |
|
High-level wrapper for dynamically expandable embedding tables, supporting distributed and sparse merging performance optimizations |
|
Initializers |
|
Other functional interfaces |
Best Practices
Performance Optimization
Reasonable block_size Setting:
# Adjust block_size based on memory size emb_opt = EmbeddingOption( embedding_dim=64, block_size=8192, # Smaller block_size suitable for memory-constrained environments shared_name="embedding" )
Use Appropriate Initializers:
# For deep networks, use smaller initialization standard deviation from recis.nn.initializers import TruncNormalInitializer emb_opt = EmbeddingOption( embedding_dim=128, initializer=TruncNormalInitializer(std=0.001) )
Enable coalesced Optimization:
# For multi-table queries, enabling coalesced can improve performance emb_opt = EmbeddingOption( embedding_dim=64, coalesced=True, shared_name="coalesced_emb" )
Distributed Training
import torch.distributed as dist
# Initialize distributed environment
dist.init_process_group()
# Create distributed embedding
emb_opt = EmbeddingOption(
embedding_dim=64,
pg=dist.group.WORLD,
grad_reduce_by="worker"
)
embedding = DynamicEmbedding(emb_opt)
Memory Management
# For large-scale embedding, use half precision
emb_opt = EmbeddingOption(
embedding_dim=256,
dtype=torch.float16,
device=torch.device("cuda")
)
Common Questions
Q: How to handle variable-length sequence embedding?
A: Use RaggedTensor as input:
from recis.ragged.tensor import RaggedTensor
# Create RaggedTensor
values = torch.LongTensor([1, 2, 3, 4, 5])
offsets = torch.LongTensor([0, 2, 5]) # Two sequences: [1,2] and [3,4,5]
ragged_input = RaggedTensor(values, offsets)
# Use embedding
output = embedding(ragged_input)
Q: How to share embedding parameters?
A: By setting the same shared_name:
# Two embeddings share parameters
user_emb = DynamicEmbedding(EmbeddingOption(
embedding_dim=64,
shared_name="shared_emb"
))
item_emb = DynamicEmbedding(EmbeddingOption(
embedding_dim=64,
shared_name="shared_emb" # Same name
))
Q: How to optimize embedding lookup performance?
A: You can optimize through the following ways:
Enable coalesced=True for batch query optimization
Use appropriate grad_reduce_by strategy
Set correct device when computing on GPU