Manifest Cache#
Overview#
paimon-cpp caches raw manifest file bytes at the ObjectsFile<T>::Read()
layer. The cache uses the public Cache abstraction and is injected through
ScanContextBuilder or ReadContextBuilder. The cache covers data
manifests, manifest lists, and index manifests because they all read through
ObjectsFile<T>.
For repeated get, scan, or batch get/scan -f requests in the same
process, the same snapshot often reads the same manifest files repeatedly. On a
cache hit, the read path skips remote filesystem open/read, builds an
in-memory input stream from cached bytes, and still runs the format reader,
Arrow decoding, and object deserialization. This design primarily reduces
remote IO latency and bandwidth while keeping cache weight aligned with the
actual cached bytes.
Configuration#
Manifest caching is disabled by default. Embedding applications that need it can
provide a custom Cache implementation and inject it through WithCache.
Manifest reads create cache keys with CacheKind::MANIFEST internally, so
callers do not need to pass the cache kind through scan or read contexts. The
same cache instance can be reused across multiple scan or read contexts when
process-local sharing is desired.
Example:
class RoutingCache : public paimon::Cache {
public:
RoutingCache(std::shared_ptr<paimon::Cache> default_cache,
std::shared_ptr<paimon::Cache> manifest_cache)
: default_cache_(std::move(default_cache)),
manifest_cache_(std::move(manifest_cache)) {}
paimon::Result<std::shared_ptr<paimon::CacheValue>> Get(
const std::shared_ptr<paimon::CacheKey>& key,
std::function<paimon::Result<std::shared_ptr<paimon::CacheValue>>(
const std::shared_ptr<paimon::CacheKey>&)> supplier) override {
return Select(key)->Get(key, std::move(supplier));
}
// Put(), Invalidate(), InvalidateAll(), and Size() route in the same way.
private:
std::shared_ptr<paimon::Cache> Select(
const std::shared_ptr<paimon::CacheKey>& key) const {
return key && key->GetKind() == paimon::CacheKind::MANIFEST
? manifest_cache_
: default_cache_;
}
std::shared_ptr<paimon::Cache> default_cache_;
std::shared_ptr<paimon::Cache> manifest_cache_;
};
auto cache = std::make_shared<RoutingCache>(
std::make_shared<MyDefaultCache>(),
std::make_shared<MyManifestCache>());
paimon::ScanContextBuilder scan_builder(table_path);
scan_builder.WithCache(cache);
paimon::ReadContextBuilder read_builder(table_path);
read_builder.WithCache(cache);
Passing nullptr or omitting WithCache() leaves manifest caching disabled.
Future Optimizations#
Add hit, miss, bypass, and eviction metrics to read trace or metrics.
Add single-flight loading for high-concurrency misses on the same manifest path.
Evaluate a decoded-records second-level cache, configurable as a CPU-vs-memory tradeoff.