Each benchmark sample is an operator directory with five files following a strict contract.
data/<source>/<operator_name>/
├── reference.py # class Model(nn.Module)
├── input.py # _make_inputs(**kwargs) -> dict[str, Tensor]
├── shapes.json # shape spec (dict keyed by id)
├── metadata.json # id / dtype / origin (not visible to agent)
└── roofline.json # W / Q / SOL_time (not visible to agent) | File | Role | Agent Visible | In Release |
|---|---|---|---|
| reference.py | PyTorch reference (class Model) | ✓ | ✓ |
| input.py | Input constructor (_make_inputs) | ✓ | ✓ |
| shapes.json | Shape specifications (init + input kwargs) | ✓ | ✓ |
| metadata.json | Operator identity + upstream provenance | ✗ | ✓ |
| roofline.json | Roofline estimates (W / Q / SOL_time) | ✗ | ✓ |