Neural Network Module

RecIS’s neural network module provides components optimized for sparse computation, including dynamic embedding, sparse merging, feature filtering and other core functionalities.

Components

Component Overview

Component Description

Component

Description

HashTable Module

Core implementation of dynamic embedding tables

Feature Admission and Feature Filtering

Feature admission and feature filtering strategies

Dynamic Embedding Tables

High-level wrapper for dynamically expandable embedding tables, supporting distributed and sparse merging performance optimizations

Initializers

Initializers

Other Functional Interfaces

Other functional interfaces

Best Practices

Performance Optimization

  1. Reasonable block_size Setting:

    # Adjust block_size based on memory size
    emb_opt = EmbeddingOption(
        embedding_dim=64,
        block_size=8192,  # Smaller block_size suitable for memory-constrained environments
        shared_name="embedding"
    )
    
  2. Use Appropriate Initializers:

    # For deep networks, use smaller initialization standard deviation
    from recis.nn.initializers import TruncNormalInitializer
    
    emb_opt = EmbeddingOption(
        embedding_dim=128,
        initializer=TruncNormalInitializer(std=0.001)
    )
    
  3. Enable coalesced Optimization:

    # For multi-table queries, enabling coalesced can improve performance
    emb_opt = EmbeddingOption(
        embedding_dim=64,
        coalesced=True,
        shared_name="coalesced_emb"
    )
    

Distributed Training

import torch.distributed as dist

# Initialize distributed environment
dist.init_process_group()

# Create distributed embedding
emb_opt = EmbeddingOption(
    embedding_dim=64,
    pg=dist.group.WORLD,
    grad_reduce_by="worker"
)
embedding = DynamicEmbedding(emb_opt)

Memory Management

# For large-scale embedding, use half precision
emb_opt = EmbeddingOption(
    embedding_dim=256,
    dtype=torch.float16,
    device=torch.device("cuda")
)

Common Questions

Q: How to handle variable-length sequence embedding?

A: Use RaggedTensor as input:

from recis.ragged.tensor import RaggedTensor

# Create RaggedTensor
values = torch.LongTensor([1, 2, 3, 4, 5])
offsets = torch.LongTensor([0, 2, 5])  # Two sequences: [1,2] and [3,4,5]
ragged_input = RaggedTensor(values, offsets)

# Use embedding
output = embedding(ragged_input)

Q: How to share embedding parameters?

A: By setting the same shared_name:

# Two embeddings share parameters
user_emb = DynamicEmbedding(EmbeddingOption(
    embedding_dim=64,
    shared_name="shared_emb"
))

item_emb = DynamicEmbedding(EmbeddingOption(
    embedding_dim=64,
    shared_name="shared_emb"  # Same name
))

Q: How to optimize embedding lookup performance?

A: You can optimize through the following ways:

  1. Enable coalesced=True for batch query optimization

  2. Use appropriate grad_reduce_by strategy

  3. Set correct device when computing on GPU