GLobal Index#
Interface#
-
class GlobalIndexFileReader#
Abstract interface for reading global index files from storage.
Public Functions
-
virtual ~GlobalIndexFileReader() = default#
-
virtual Result<std::unique_ptr<InputStream>> GetInputStream(const std::string &file_name) const = 0#
Opens an input stream for reading the specified global index file.
-
virtual ~GlobalIndexFileReader() = default#
-
class GlobalIndexFileWriter#
Abstract interface for writing global index files to storage.
Public Functions
-
virtual ~GlobalIndexFileWriter() = default#
-
virtual Result<std::string> NewFileName(const std::string &prefix) const = 0#
Generates a unique file name for a new index file using the given prefix.
Note
This function may be called multiple times if the index consists of multiple files.
-
virtual Result<std::unique_ptr<OutputStream>> NewOutputStream(const std::string &file_name) const = 0#
Opens a new output stream for writing index data to the specified file.
-
virtual Result<int64_t> GetFileSize(const std::string &file_name) const = 0#
Get the file size of input file name.
-
virtual ~GlobalIndexFileWriter() = default#
Warning
doxygenclass: Cannot find class “paimon::GlobalIndexEvaluator” in doxygen xml output for project “paimon_cpp” from directory: ../../apidoc/xml
-
class GlobalIndexReader : public paimon::FunctionVisitor<std::shared_ptr<GlobalIndexResult>>#
Reads and evaluates filter predicates against a global file index.
Derived classes are expected to implement the visitor methods (e.g.,
VisitEqual,VisitIsNull, etc.) to return index-based results that indicate which row satisfy the given predicate.Note
All
GlobalIndexResultobjects returned by implementations of this class use local row ids that start from 0 — not global row ids in the entire table. TheGlobalIndexResultcan be converted to global row ids by callingAddOffset().Public Functions
VisitVectorSearch performs approximate vector similarity search.
Note
VisitVectorSearchis thread-safe (not coroutine-safe) while otherVisitXXXis not thread-safe.Warning
VisitVectorSearchmay return error status when it is incorrectly invoked (e.g., BitmapGlobalIndexReader callVisitVectorSearch).
-
struct GlobalIndexIOMeta#
Metadata describing a single file entry in a global index.
Public Functions
-
class GlobalIndexResult : public std::enable_shared_from_this<GlobalIndexResult>#
Global index result to get selected global row ids.
Subclassed by paimon::BitmapGlobalIndexResult, paimon::VectorSearchGlobalIndexResult
Public Functions
-
virtual ~GlobalIndexResult() = default#
-
virtual Result<bool> IsEmpty() const = 0#
Checks whether the global index result contains no matching row ids.
- Returns:
A
Result<bool>where:trueindicates the result is empty (no matching rows),falseindicates at least one matching row exists,An error is returned only if internal state is corrupted or I/O fails (e.g., during lazy loading of index data).
-
virtual Result<std::unique_ptr<Iterator>> CreateIterator() const = 0#
Creates a new iterator over the selected global row ids.
-
Result<std::vector<Range>> ToRanges() const#
Returns non-overlapping, sorted ranges covering all row ids in
GlobalIndexResult.
Computes the logical AND (intersection) between current result and another.
Computes the logical OR (union) between this result and another.
-
virtual Result<std::shared_ptr<GlobalIndexResult>> AddOffset(int64_t offset) = 0#
Adds the given offset to each row id in current result and returns the new global index result.
-
virtual std::string ToString() const = 0#
Public Static Functions
Serializes a GlobalIndexResult object into a byte array.
Note
This method only supports the following concrete implementations:
BitmapVectorSearchGlobalIndexResult
BitmapGlobalIndexResult
- Parameters:
global_index_result – The GlobalIndexResult instance to serialize (must not be null).
pool – Memory pool used to allocate the output byte buffer.
- Returns:
A Result containing a unique pointer to the serialized Bytes on success, or an error status on failure.
Deserializes a GlobalIndexResult object from a raw byte buffer.
Note
The concrete type of the deserialized object is determined by metadata embedded in the buffer. Currently, only the following types are supported:
BitmapVectorSearchGlobalIndexResult
BitmapGlobalIndexResult
- Parameters:
buffer – Pointer to the serialized byte data (must not be null).
length – Size of the buffer in bytes.
pool – Memory pool used to allocate internal objects during deserialization.
- Returns:
A Result containing a shared pointer to the reconstructed GlobalIndexResult on success, or an error status on failure.
-
virtual ~GlobalIndexResult() = default#
-
class GlobalIndexScan#
Represents a logical scan over a global index for a table.
Public Functions
-
virtual ~GlobalIndexScan() = default#
-
virtual Result<std::shared_ptr<RowRangeGlobalIndexScanner>> CreateRangeScan(const Range &range) = 0#
Creates a scanner for the global index over the specified row id range.
This method instantiates a low-level scanner that can evaluate predicates and retrieve matching row ids from the global index data corresponding to the given row id range.
- Parameters:
range – The inclusive row id range [start, end] for which to create the scanner. The range must be fully covered by existing global index data (from
GetRowRangeList()).- Returns:
A
Resultcontaining a range-level scanner, or an error if parse index meta fails.
-
virtual Result<std::vector<Range>> GetRowRangeList() = 0#
Returns row id ranges covered by this global index (sorted and non-overlapping ranges).
Each
Rangerepresents a contiguous segment of row ids for which global index data exists. This allows the query engine to parallelize scanning and be aware of ranges that are not covered by any global index.- Returns:
A
Resultcontaining sorted and non-overlappingRangeobjects.
Public Static Functions
Creates a
GlobalIndexScaninstance for the specified table and context.- Parameters:
table_path – Root directory of the table.
snapshot_id – Optional snapshot id to read from; if not provided, uses the latest.
partitions – Optional list of specific partitions to restrict the scan scope. Each map represents one partition (e.g., {“dt”: “2024-06-01”}). If omitted, scans all partitions.
options – Index-specific configuration.
file_system – File system for accessing index files. If not provided (nullptr), it is inferred from the
FILE_SYSTEMkey in theoptionsparameter.pool – Memory pool for temporary allocations; if nullptr, uses default.
- Returns:
A
Resultcontaining a unique pointer to the created scanner, or an error if initialization fails (e.g., I/O error).
Creates a
GlobalIndexScaninstance for the specified table and context.- Parameters:
partition_filters – Optional specific partition predicates.
-
virtual ~GlobalIndexScan() = default#
-
class GlobalIndexWriter#
Abstract interface for building a global index from Arrow data batches.
Public Functions
-
virtual ~GlobalIndexWriter() = default#
-
virtual Status AddBatch(::ArrowArray *arrow_array) = 0#
Builds index structures from a batch of columnar data.
- Parameters:
arrow_array – A valid C ArrowArray pointer representing a struct array. Must not be nullptr, and must conform to the expected schema.
- Returns:
Status::OK()on success; otherwise, an error indicating malformed input, I/O failure, or unsupported type, etc.
-
virtual Result<std::vector<GlobalIndexIOMeta>> Finish() = 0#
Finalizes the index build process and returns metadata for persisted index.
-
virtual ~GlobalIndexWriter() = default#
-
class GlobalIndexerFactory : public paimon::Factory#
Factory for creating
GlobalIndexerinstances based on index type identifiers.Public Functions
-
~GlobalIndexerFactory() override = default#
-
virtual Result<std::unique_ptr<GlobalIndexer>> Create(const std::map<std::string, std::string> &options) const = 0#
Creates a
GlobalIndexerusing the current factory’s implementation and the given options.
Public Static Functions
-
static Result<std::unique_ptr<GlobalIndexer>> Get(const std::string &identifier, const std::map<std::string, std::string> &options)#
Creates a
GlobalIndexerinstance by looking up a registered factory using an identifier.The provided
identifieris automatically appended withGLOBAL_INDEX_IDENTIFIER_SUFFIX(e.g., “-global”) to form the full key used for factory lookup. This ensures namespace separation between file and global index types.- Parameters:
identifier – The base name of the index type (e.g., “bitmap”).
options – Configuration parameters for the indexer.
- Returns:
A
Resultcontaining a unique pointer to the createdGlobalIndexer, or an error if creation fails.- Returns:
nullptr if no matching factory.
Public Static Attributes
-
static const char GLOBAL_INDEX_IDENTIFIER_SUFFIX[]#
Suffix used to distinguish global index identifiers (e.g., “bitmap-global”).
-
~GlobalIndexerFactory() override = default#
-
class GlobalIndexer#
Interface for creating global index readers and writers.
Public Functions
-
virtual ~GlobalIndexer() = default#
Creates a writer for building a global index on a specific field.
- Parameters:
field_name – Name of the field to be indexed.
arrow_schema – Schema of the input Arrow struct array. It must contain the field specified by field_name and may include additional associated fields used during index construction.
file_writer – I/O handler for persisting index data to storage.
pool – Memory pool for temporary allocations; if nullptr, uses default.
- Returns:
A
Resultcontaining a shared pointer to the createdGlobalIndexWriter, or an error if the field is not found, unsupported, or initialization fails, etc.
Creates a reader for querying a pre-built global index.
- Parameters:
arrow_schema – Schema of the indexed data; used to interpret predicate literals.
file_reader – I/O handler for reading index artifacts from storage.
files – List of index file metadata entries produced during writing.
pool – Memory pool for temporary allocations; if nullptr, uses default.
- Returns:
A
Resultcontaining a shared pointer to the createdGlobalIndexReader, or an error if the index cannot be loaded or is incompatible, etc.
-
virtual ~GlobalIndexer() = default#
-
class RowRangeGlobalIndexScanner#
Interface for scanning global index data at the range level.
Public Functions
-
virtual ~RowRangeGlobalIndexScanner() = default#
-
virtual Result<std::shared_ptr<GlobalIndexReader>> CreateReader(const std::string &field_name, const std::string &index_type) const = 0#
Creates a
GlobalIndexReaderfor a specific field and index type within this range.This reader provides low-level access to the serialized index data for the given column (
field_name) and index kind (index_type, such as “bitmap”).Note
All
GlobalIndexResultobjects returned byGlobalIndexReaderuse local row ids that start from 0 — not global row ids in the entire table.- Parameters:
field_name – Name of the indexed column.
index_type – Type of the global index (e.g., “bitmap”, “lumina”).
- Returns:
A
Resultthat is:Successful with a non-null reader if the index exists and loads correctly;
Successful with a null pointer if no index was built for the given field and type;
An error only if loading fails (e.g., file corruption, I/O error, unsupported format).
-
virtual Result<std::vector<std::shared_ptr<GlobalIndexReader>>> CreateReaders(const std::string &field_name) const = 0#
Creates several
GlobalIndexReaders for a specific field within this range.- Parameters:
field_name – Name of the indexed column.
- Returns:
A
Resultthat is:Successful with several readers if the indexes exist and load correctly;
Successful with an empty vector if no index was built for the given field;
Error returns when loading fails (e.g., file corruption, I/O error, unsupported format).
-
virtual ~RowRangeGlobalIndexScanner() = default#
Warning
doxygenclass: Cannot find class “paimon::RowRangeGlobalIndexWriter” in doxygen xml output for project “paimon_cpp” from directory: ../../apidoc/xml