GLobal Index#

Interface#

class GlobalIndexFileReader#

Abstract interface for reading global index files from storage.

Public Functions

virtual ~GlobalIndexFileReader() = default#
virtual Result<std::unique_ptr<InputStream>> GetInputStream(const std::string &file_name) const = 0#

Opens an input stream for reading the specified global index file.

class GlobalIndexFileWriter#

Abstract interface for writing global index files to storage.

Public Functions

virtual ~GlobalIndexFileWriter() = default#
virtual Result<std::string> NewFileName(const std::string &prefix) const = 0#

Generates a unique file name for a new index file using the given prefix.

Note

This function may be called multiple times if the index consists of multiple files.

virtual Result<std::unique_ptr<OutputStream>> NewOutputStream(const std::string &file_name) const = 0#

Opens a new output stream for writing index data to the specified file.

virtual Result<int64_t> GetFileSize(const std::string &file_name) const = 0#

Get the file size of input file name.

Warning

doxygenclass: Cannot find class “paimon::GlobalIndexEvaluator” in doxygen xml output for project “paimon_cpp” from directory: ../../apidoc/xml

class GlobalIndexReader : public paimon::FunctionVisitor<std::shared_ptr<GlobalIndexResult>>#

Reads and evaluates filter predicates against a global file index.

Derived classes are expected to implement the visitor methods (e.g., VisitEqual, VisitIsNull, etc.) to return index-based results that indicate which row satisfy the given predicate.

Note

All GlobalIndexResult objects returned by implementations of this class use local row ids that start from 0 — not global row ids in the entire table. The GlobalIndexResult can be converted to global row ids by calling AddOffset().

Public Functions

virtual Result<std::shared_ptr<VectorSearchGlobalIndexResult>> VisitVectorSearch(const std::shared_ptr<VectorSearch> &vector_search) = 0#

VisitVectorSearch performs approximate vector similarity search.

Note

VisitVectorSearch is thread-safe (not coroutine-safe) while other VisitXXX is not thread-safe.

Warning

VisitVectorSearch may return error status when it is incorrectly invoked (e.g., BitmapGlobalIndexReader call VisitVectorSearch).

struct GlobalIndexIOMeta#

Metadata describing a single file entry in a global index.

Public Functions

inline GlobalIndexIOMeta(const std::string &_file_name, int64_t _file_size, int64_t _range_end, const std::shared_ptr<Bytes> &_metadata)#

Public Members

std::string file_name#
int64_t file_size#
int64_t range_end#

The inclusive range end covered by this file (i.e., the last local row id).

std::shared_ptr<Bytes> metadata#

Optional binary metadata associated with the file, such as serialized secondary index structures or inline index bytes.

May be null if no additional metadata is available.

class GlobalIndexResult : public std::enable_shared_from_this<GlobalIndexResult>#

Global index result to get selected global row ids.

Subclassed by paimon::BitmapGlobalIndexResult, paimon::VectorSearchGlobalIndexResult

Public Functions

virtual ~GlobalIndexResult() = default#
virtual Result<bool> IsEmpty() const = 0#

Checks whether the global index result contains no matching row ids.

Returns:

A Result<bool> where:

  • true indicates the result is empty (no matching rows),

  • false indicates at least one matching row exists,

  • An error is returned only if internal state is corrupted or I/O fails (e.g., during lazy loading of index data).

virtual Result<std::unique_ptr<Iterator>> CreateIterator() const = 0#

Creates a new iterator over the selected global row ids.

Result<std::vector<Range>> ToRanges() const#

Returns non-overlapping, sorted ranges covering all row ids in GlobalIndexResult.

virtual Result<std::shared_ptr<GlobalIndexResult>> And(const std::shared_ptr<GlobalIndexResult> &other)#

Computes the logical AND (intersection) between current result and another.

virtual Result<std::shared_ptr<GlobalIndexResult>> Or(const std::shared_ptr<GlobalIndexResult> &other)#

Computes the logical OR (union) between this result and another.

virtual Result<std::shared_ptr<GlobalIndexResult>> AddOffset(int64_t offset) = 0#

Adds the given offset to each row id in current result and returns the new global index result.

virtual std::string ToString() const = 0#

Public Static Functions

static Result<PAIMON_UNIQUE_PTR<Bytes>> Serialize(const std::shared_ptr<GlobalIndexResult> &global_index_result, const std::shared_ptr<MemoryPool> &pool)#

Serializes a GlobalIndexResult object into a byte array.

Note

This method only supports the following concrete implementations:

  • BitmapVectorSearchGlobalIndexResult

  • BitmapGlobalIndexResult

Parameters:
  • global_index_result – The GlobalIndexResult instance to serialize (must not be null).

  • pool – Memory pool used to allocate the output byte buffer.

Returns:

A Result containing a unique pointer to the serialized Bytes on success, or an error status on failure.

static Result<std::shared_ptr<GlobalIndexResult>> Deserialize(const char *buffer, size_t length, const std::shared_ptr<MemoryPool> &pool)#

Deserializes a GlobalIndexResult object from a raw byte buffer.

Note

The concrete type of the deserialized object is determined by metadata embedded in the buffer. Currently, only the following types are supported:

  • BitmapVectorSearchGlobalIndexResult

  • BitmapGlobalIndexResult

Parameters:
  • buffer – Pointer to the serialized byte data (must not be null).

  • length – Size of the buffer in bytes.

  • pool – Memory pool used to allocate internal objects during deserialization.

Returns:

A Result containing a shared pointer to the reconstructed GlobalIndexResult on success, or an error status on failure.

class Iterator#

Iterator interface for traversing selected global row ids.

Subclassed by paimon::BitmapGlobalIndexResult::Iterator

Public Functions

virtual ~Iterator() = default#
virtual bool HasNext() const = 0#

Checks whether more row ids are available.

virtual int64_t Next() = 0#
Returns:

The next global row id and advances the iterator.

class GlobalIndexScan#

Represents a logical scan over a global index for a table.

Public Functions

virtual ~GlobalIndexScan() = default#
virtual Result<std::shared_ptr<RowRangeGlobalIndexScanner>> CreateRangeScan(const Range &range) = 0#

Creates a scanner for the global index over the specified row id range.

This method instantiates a low-level scanner that can evaluate predicates and retrieve matching row ids from the global index data corresponding to the given row id range.

Parameters:

range – The inclusive row id range [start, end] for which to create the scanner. The range must be fully covered by existing global index data (from GetRowRangeList()).

Returns:

A Result containing a range-level scanner, or an error if parse index meta fails.

virtual Result<std::vector<Range>> GetRowRangeList() = 0#

Returns row id ranges covered by this global index (sorted and non-overlapping ranges).

Each Range represents a contiguous segment of row ids for which global index data exists. This allows the query engine to parallelize scanning and be aware of ranges that are not covered by any global index.

Returns:

A Result containing sorted and non-overlapping Range objects.

Public Static Functions

static Result<std::unique_ptr<GlobalIndexScan>> Create(const std::string &table_path, const std::optional<int64_t> &snapshot_id, const std::optional<std::vector<std::map<std::string, std::string>>> &partitions, const std::map<std::string, std::string> &options, const std::shared_ptr<FileSystem> &file_system, const std::shared_ptr<MemoryPool> &pool)#

Creates a GlobalIndexScan instance for the specified table and context.

Parameters:
  • table_path – Root directory of the table.

  • snapshot_id – Optional snapshot id to read from; if not provided, uses the latest.

  • partitions – Optional list of specific partitions to restrict the scan scope. Each map represents one partition (e.g., {“dt”: “2024-06-01”}). If omitted, scans all partitions.

  • options – Index-specific configuration.

  • file_system – File system for accessing index files. If not provided (nullptr), it is inferred from the FILE_SYSTEM key in the options parameter.

  • pool – Memory pool for temporary allocations; if nullptr, uses default.

Returns:

A Result containing a unique pointer to the created scanner, or an error if initialization fails (e.g., I/O error).

static Result<std::unique_ptr<GlobalIndexScan>> Create(const std::string &root_path, const std::optional<int64_t> &snapshot_id, const std::shared_ptr<Predicate> &partition_filters, const std::map<std::string, std::string> &options, const std::shared_ptr<FileSystem> &file_system, const std::shared_ptr<MemoryPool> &memory_pool)#

Creates a GlobalIndexScan instance for the specified table and context.

Parameters:

partition_filters – Optional specific partition predicates.

class GlobalIndexWriter#

Abstract interface for building a global index from Arrow data batches.

Public Functions

virtual ~GlobalIndexWriter() = default#
virtual Status AddBatch(::ArrowArray *arrow_array) = 0#

Builds index structures from a batch of columnar data.

Parameters:

arrow_array – A valid C ArrowArray pointer representing a struct array. Must not be nullptr, and must conform to the expected schema.

Returns:

Status::OK() on success; otherwise, an error indicating malformed input, I/O failure, or unsupported type, etc.

virtual Result<std::vector<GlobalIndexIOMeta>> Finish() = 0#

Finalizes the index build process and returns metadata for persisted index.

class GlobalIndexerFactory : public paimon::Factory#

Factory for creating GlobalIndexer instances based on index type identifiers.

Public Functions

~GlobalIndexerFactory() override = default#
virtual Result<std::unique_ptr<GlobalIndexer>> Create(const std::map<std::string, std::string> &options) const = 0#

Creates a GlobalIndexer using the current factory’s implementation and the given options.

Public Static Functions

static Result<std::unique_ptr<GlobalIndexer>> Get(const std::string &identifier, const std::map<std::string, std::string> &options)#

Creates a GlobalIndexer instance by looking up a registered factory using an identifier.

The provided identifier is automatically appended with GLOBAL_INDEX_IDENTIFIER_SUFFIX (e.g., “-global”) to form the full key used for factory lookup. This ensures namespace separation between file and global index types.

Parameters:
  • identifier – The base name of the index type (e.g., “bitmap”).

  • options – Configuration parameters for the indexer.

Returns:

A Result containing a unique pointer to the created GlobalIndexer, or an error if creation fails.

Returns:

nullptr if no matching factory.

Public Static Attributes

static const char GLOBAL_INDEX_IDENTIFIER_SUFFIX[]#

Suffix used to distinguish global index identifiers (e.g., “bitmap-global”).

class GlobalIndexer#

Interface for creating global index readers and writers.

Public Functions

virtual ~GlobalIndexer() = default#
virtual Result<std::shared_ptr<GlobalIndexWriter>> CreateWriter(const std::string &field_name, ::ArrowSchema *arrow_schema, const std::shared_ptr<GlobalIndexFileWriter> &file_writer, const std::shared_ptr<MemoryPool> &pool) const = 0#

Creates a writer for building a global index on a specific field.

Parameters:
  • field_name – Name of the field to be indexed.

  • arrow_schema – Schema of the input Arrow struct array. It must contain the field specified by field_name and may include additional associated fields used during index construction.

  • file_writer – I/O handler for persisting index data to storage.

  • pool – Memory pool for temporary allocations; if nullptr, uses default.

Returns:

A Result containing a shared pointer to the created GlobalIndexWriter, or an error if the field is not found, unsupported, or initialization fails, etc.

virtual Result<std::shared_ptr<GlobalIndexReader>> CreateReader(::ArrowSchema *arrow_schema, const std::shared_ptr<GlobalIndexFileReader> &file_reader, const std::vector<GlobalIndexIOMeta> &files, const std::shared_ptr<MemoryPool> &pool) const = 0#

Creates a reader for querying a pre-built global index.

Parameters:
  • arrow_schema – Schema of the indexed data; used to interpret predicate literals.

  • file_reader – I/O handler for reading index artifacts from storage.

  • files – List of index file metadata entries produced during writing.

  • pool – Memory pool for temporary allocations; if nullptr, uses default.

Returns:

A Result containing a shared pointer to the created GlobalIndexReader, or an error if the index cannot be loaded or is incompatible, etc.

class RowRangeGlobalIndexScanner#

Interface for scanning global index data at the range level.

Public Functions

virtual ~RowRangeGlobalIndexScanner() = default#
virtual Result<std::shared_ptr<GlobalIndexReader>> CreateReader(const std::string &field_name, const std::string &index_type) const = 0#

Creates a GlobalIndexReader for a specific field and index type within this range.

This reader provides low-level access to the serialized index data for the given column (field_name) and index kind (index_type, such as “bitmap”).

Note

All GlobalIndexResult objects returned by GlobalIndexReader use local row ids that start from 0 — not global row ids in the entire table.

Parameters:
  • field_name – Name of the indexed column.

  • index_type – Type of the global index (e.g., “bitmap”, “lumina”).

Returns:

A Result that is:

  • Successful with a non-null reader if the index exists and loads correctly;

  • Successful with a null pointer if no index was built for the given field and type;

  • An error only if loading fails (e.g., file corruption, I/O error, unsupported format).

virtual Result<std::vector<std::shared_ptr<GlobalIndexReader>>> CreateReaders(const std::string &field_name) const = 0#

Creates several GlobalIndexReaders for a specific field within this range.

Parameters:

field_name – Name of the indexed column.

Returns:

A Result that is:

  • Successful with several readers if the indexes exist and load correctly;

  • Successful with an empty vector if no index was built for the given field;

  • Error returns when loading fails (e.g., file corruption, I/O error, unsupported format).

Warning

doxygenclass: Cannot find class “paimon::RowRangeGlobalIndexWriter” in doxygen xml output for project “paimon_cpp” from directory: ../../apidoc/xml