Neural Network Module

RecIS’s neural network module provides components optimized for sparse computation, including dynamic embedding, sparse merging, feature filtering and other core functionalities.

Component Overview

Component Description
Component	Description
HashTable Module	Core implementation of dynamic embedding tables
Feature Admission and Feature Filtering	Feature admission and feature filtering strategies
Dynamic Embedding Tables	High-level wrapper for dynamically expandable embedding tables, supporting distributed and sparse merging performance optimizations
Initializers	Initializers
Other Functional Interfaces	Other functional interfaces

Best Practices

Performance Optimization

Reasonable block_size Setting:

# Adjust block_size based on memory size
emb_opt = EmbeddingOption(
    embedding_dim=64,
    block_size=8192,  # Smaller block_size suitable for memory-constrained environments
    shared_name="embedding"
)

Use Appropriate Initializers:

# For deep networks, use smaller initialization standard deviation
from recis.nn.initializers import TruncNormalInitializer

emb_opt = EmbeddingOption(
    embedding_dim=128,
    initializer=TruncNormalInitializer(std=0.001)
)

Enable coalesced Optimization:

# For multi-table queries, enabling coalesced can improve performance
emb_opt = EmbeddingOption(
    embedding_dim=64,
    coalesced=True,
    shared_name="coalesced_emb"
)

Distributed Training

import torch.distributed as dist

# Initialize distributed environment
dist.init_process_group()

# Create distributed embedding
emb_opt = EmbeddingOption(
    embedding_dim=64,
    pg=dist.group.WORLD,
    grad_reduce_by="worker"
)
embedding = DynamicEmbedding(emb_opt)

Memory Management

# For large-scale embedding, use half precision
emb_opt = EmbeddingOption(
    embedding_dim=256,
    dtype=torch.float16,
    device=torch.device("cuda")
)

Common Questions

Q: How to handle variable-length sequence embedding?

A: Use RaggedTensor as input:

from recis.ragged.tensor import RaggedTensor

# Create RaggedTensor
values = torch.LongTensor([1, 2, 3, 4, 5])
offsets = torch.LongTensor([0, 2, 5])  # Two sequences: [1,2] and [3,4,5]
ragged_input = RaggedTensor(values, offsets)

# Use embedding
output = embedding(ragged_input)

Q: How to share embedding parameters?

A: By setting the same shared_name:

# Two embeddings share parameters
user_emb = DynamicEmbedding(EmbeddingOption(
    embedding_dim=64,
    shared_name="shared_emb"
))

item_emb = DynamicEmbedding(EmbeddingOption(
    embedding_dim=64,
    shared_name="shared_emb"  # Same name
))

Q: How to optimize embedding lookup performance?

A: You can optimize through the following ways:

Enable coalesced=True for batch query optimization
Use appropriate grad_reduce_by strategy
Set correct device when computing on GPU

Neural Network Module

Components

Component Overview

Best Practices

Common Questions