Hook System

Basic Hooks

RecIS provides a rich Hook system to extend the training process:

Hook

class recis.hooks.hook.Hook[source]

Base class for all hooks in the RecIS training system.

Hooks provide a way to extend the training process by defining callback methods that are called at specific points during training, evaluation, and execution. All custom hooks should inherit from this base class and override the relevant callback methods.

The hook system supports the following callback points: - Training lifecycle: before_train, after_train - Evaluation lifecycle: before_evaluate, after_evaluate - Epoch lifecycle: before_epoch, after_epoch - Step lifecycle: before_step, after_step - Cleanup: end

Example

Creating a custom hook:

>>> class CustomHook(Hook):
...     def __init__(self, custom_param):
...         self.custom_param = custom_param
...
...     def before_train(self):
...         print(f"Training started with {self.custom_param}")
...
...     def after_step(self):
...         # Custom logic after each training step
...         pass
>>> # Use the custom hook
>>> custom_hook = CustomHook("my_parameter")
>>> trainer.add_hook(custom_hook)

after_step()[source]

Called after each training step completes.

This method is invoked after each individual training step has been executed. Use this for per-step processing logic, such as logging metrics or updating statistics.

after_train()[source]

Called after training completes.

This method is invoked once at the end of the training process, after all training steps have been completed. Use this for cleanup or final processing logic.

before_step()[source]

Called before each training step.

This method is invoked before each individual training step is executed. Use this for per-step setup logic.

before_train()[source]

Called before training starts.

This method is invoked once at the beginning of the training process, before any training steps are executed. Use this for initialization logic that needs to happen before training begins.

LoggerHook

recis.framework.metrics.add_metric(name, metric)[source]

Add or update a metric in the global metrics registry.

Parameters:

name (str) – The name of the metric to add or update.
metric – The metric value to store. Can be any type (float, int, tensor, etc.).

Example

>>> add_metric("accuracy", 0.95)
>>> add_metric("loss", 0.05)
>>> add_metric("learning_rate", 0.001)

class recis.hooks.logger_hook.LoggerHook(log_step=10)[source]

Hook for logging training metrics and progress.

The LoggerHook logs training metrics at regular intervals and provides performance statistics including queries per second (QPS). This hook is automatically added by the Trainer, so manual addition is typically not required.

Parameters:: log_step (int) – Logging interval in steps. Defaults to 10.

Example

>>> from recis.hooks import LoggerHook
>>> # Create logger hook with custom interval
>>> from recis.framework.metrics import add_metric
>>> add_metric("loss", 0.123)
>>> add_metric("accuracy", 0.95)
>>> logger_hook = LoggerHook(log_step=50)
>>> trainer.add_hook(logger_hook)
>>> # The hook will automatically log metrics every 50 steps
>>> # Output format: <gstep=100> <lstep=50> <qps=12.34> <loss=0.123> <accuracy=0.95>

Note

The Trainer automatically adds a LoggerHook, so manual addition is usually not necessary unless you need custom logging intervals or multiple loggers.

ProfilerHook

class recis.hooks.profiler_hook.ProfilerHook(wait=1, warmup=48, active=1, repeat=4, output_dir='./')[source]

Hook for performance profiling during training.

The ProfilerHook uses PyTorch’s profiler to collect detailed performance metrics during training. It captures CPU and GPU activities, memory usage, operation shapes, and FLOP counts. The profiling results are saved as Chrome trace files for visualization in Chrome’s tracing tool.

Parameters:

wait (int) – Number of steps to wait before starting profiling. Defaults to 1.
warmup (int) – Number of warmup steps before active profiling. Defaults to 48.
active (int) – Number of active profiling steps. Defaults to 1.
repeat (int) – Number of profiling cycles to repeat. Defaults to 4.
output_dir (str) – Directory to save profiling results. Defaults to “./”.

prof

PyTorch profiler instance.

Type:: torch.profiler.profile

logger

Logger instance for outputting messages.

Type:: Logger

output_dir

Output directory for profiling results.

Type:: str

Example

>>> from recis.hooks import ProfilerHook
>>> # Create profiler hook with custom settings
>>> profiler_hook = ProfilerHook(
...     wait=1, warmup=28, active=2, repeat=1, output_dir="./timeline/"
... )
>>> trainer.add_hook(profiler_hook)
>>> # The hook will automatically profile training and save results
>>> # Results will be saved as Chrome trace files (.json)

Note

The profiling results can be visualized by opening the generated .json files in Chrome’s tracing tool (chrome://tracing/).

__init__(wait=1, warmup=48, active=1, repeat=4, output_dir='./')[source]

HashTableFilterHook

class recis.hooks.filter_hook.HashTableFilterHook(filter_interval: int = 100)[source]

Hook for automatic hash table feature filtering during training.

This hook manages the lifecycle of features in hash tables by coordinating filtering operations across multiple hash table instances. It automatically updates step counters and triggers filtering operations at configurable intervals to remove stale or inactive features.

The hook integrates with the hash table filter system to:

Track global training steps for each hash table filter
Execute filtering operations at specified intervals
Provide comprehensive logging of filter activities
Support dynamic adjustment of filtering frequency

Parameters:: filter_interval (int, optional) – Number of training steps between filter operations. If None, filtering is disabled. Defaults to 100.

Examples:

Please refer to the documentation Feature Admission and Feature Filtering

# Create and configure filter hook
filter_hook = HashTableFilterHook(filter_interval=200)

# Training loop integration
for epoch in range(num_epochs):
    for step, batch in enumerate(dataloader):
        # ... training logic ...

        # Hook automatically manages filtering
        filter_hook.after_step(None, global_step)

        global_step += 1

__init__(filter_interval: int = 100)[source]

Initialize the hash table filter hook.

Parameters:: filter_interval (int, optional) – Number of training steps between filter operations. Must be positive. If None, filtering is disabled. Defaults to 100.

Example:

# Standard filtering every 100 steps
hook = HashTableFilterHook(filter_interval=100)

Custom Hooks

from recis.hooks import Hook

class CustomHook(Hook):
    def __init__(self, custom_param):
        self.custom_param = custom_param

    def before_train(self, trainer):
        print(f"Training started with {self.custom_param}")

    def after_step(self, trainer):
        if trainer.state.global_step % 1000 == 0:
            # Execute custom logic every 1000 steps
            self.custom_logic(trainer)

    def custom_logic(self, trainer):
        # Custom logic implementation
        pass

# Use custom hook
custom_hook = CustomHook("my_parameter")
trainer.add_hook(custom_hook)