Scan#

Interface#

class TableScan#

A scanner interface for reading table’s meta and create a plan.

Public Functions

virtual ~TableScan() = default#
virtual Result<std::shared_ptr<Plan>> CreatePlan() = 0#

Create a scan plan.

Returns:

A Result containing a shared pointer to the created Plan or an error status.

Public Static Functions

static Result<std::unique_ptr<TableScan>> Create(std::unique_ptr<ScanContext> context)#

Create an instance of TableScan.

Parameters:

context – A unique pointer to the ScanContext used for scan operations.

Returns:

A Result containing a unique pointer to the TableScan instance.

class ScanContextBuilder#

ScanContextBuilder used to build a ScanContext, has input validation.

Public Functions

explicit ScanContextBuilder(const std::string &path)#

Constructs a ScanContextBuilder with required parameters.

Parameters:

path – The root path of the table.

~ScanContextBuilder()#
ScanContextBuilder &SetLimit(int32_t limit)#

If limit is not set, it defaults to unlimited.

ScanContextBuilder &SetBucketFilter(int32_t bucket_filter)#

Set a bucket filter to scan only specific bucket.

ScanContextBuilder &SetPartitionFilter(const std::vector<std::map<std::string, std::string>> &partition_filters)#

partition_filters in vector is supposed to be OR, filter in map is supposed to be AND, e.g., partition_filters is {{k1=1,k2=10}, {k1=2,k2=20}} => OR(AND(k1=1,k2=10), AND(k1=2,k2=20))

ScanContextBuilder &SetPredicate(const std::shared_ptr<Predicate> &predicate)#

Set a predicate for filtering data.

ScanContextBuilder &SetRowRanges(const std::vector<Range> &row_ranges)#

Specify the row id ranges for scan.

This is usually used to read specific rows in data-evolution mode. File ranges that do not have any intersection with range_ids will be filtered. If not set, all rows are returned

ScanContextBuilder &AddOption(const std::string &key, const std::string &value)#

The options added or set in ScanContextBuilder have high priority and will be merged with the options in table schema.

ScanContextBuilder &SetOptions(const std::map<std::string, std::string> &options)#

Set a configuration options map to set some option entries which are not defined in the table schema or whose values you want to overwrite.

Note

The options map will clear the options added by AddOption() before.

Parameters:

options – The configuration options map.

Returns:

Reference to this builder for method chaining.

ScanContextBuilder &WithStreamingMode(bool is_streaming_mode)#

Set whether the scan is in streaming mode.

Note

if not set, is_streaming_mode = false

ScanContextBuilder &WithMemoryPool(const std::shared_ptr<MemoryPool> &memory_pool)#

Set custom memory pool for memory management.

Note

if not set, memory_pool is default pool

Parameters:

memory_pool – The memory pool to use.

Returns:

Reference to this builder for method chaining.

ScanContextBuilder &WithExecutor(const std::shared_ptr<Executor> &executor)#

Set custom executor for task execution.

Note

if not set, executor is default executor

Parameters:

executor – The executor to use.

Returns:

Reference to this builder for method chaining.

Result<std::unique_ptr<ScanContext>> Finish()#

Build and return a ScanContext instance with input validation.

Returns:

Result containing the constructed ScanContext or an error status.

class ScanContext#

ScanContext is some configuration for table scan operations.

Please do not use this class directly, use ScanContextBuilder to build a ScanContext which has input validation.

Public Functions

ScanContext(const std::string &path, bool is_streaming_mode, std::optional<int32_t> limit, const std::shared_ptr<ScanFilter> &scan_filter, const std::shared_ptr<MemoryPool> &memory_pool, const std::shared_ptr<Executor> &executor, const std::map<std::string, std::string> &options)#
~ScanContext()#
inline const std::string &GetPath() const#
inline bool IsStreamingMode() const#
inline std::optional<int32_t> GetLimit() const#
inline std::shared_ptr<ScanFilter> GetScanFilters() const#
inline const std::map<std::string, std::string> &GetOptions() const#
inline std::shared_ptr<MemoryPool> GetMemoryPool() const#
inline std::shared_ptr<Executor> GetExecutor() const#
class Plan#

Result plan of this TableScan.

Public Functions

virtual ~Plan() = default#
virtual const std::vector<std::shared_ptr<Split>> &Splits() const = 0#

Result splits.

virtual std::optional<int64_t> SnapshotId() const = 0#

Snapshot id of this plan, return std::nullopt if the table is empty.

class DataSplit : public paimon::Split#

Input data split for reading operation. Needed by most batch computation engines.

Public Functions

virtual std::vector<SimpleDataFileMeta> GetFileList() const = 0#

Get the list of metadata for all data files in this split.

Note

This method will be removed in future versions and is only used for append tables.

struct SimpleDataFileMeta#

Metadata structure for simple data files.

Contains essential information about a data file including its location, size, row count, sequence numbers, schema information, and timestamps. This structure is used to track file metadata without loading the actual file content.

Public Functions

inline SimpleDataFileMeta(const std::string &_file_path, int64_t _file_size, int64_t _row_count, int64_t _min_sequence_number, int64_t _max_sequence_number, int64_t _schema_id, int32_t _level, const Timestamp &_creation_time, const std::optional<int64_t> &_delete_row_count)#
bool operator==(const SimpleDataFileMeta &other) const#
std::string ToString() const#

Public Members

std::string file_path#

Absolute path of the data file.

If external path is enabled, file_path indicates the actual location in the external storage system.

int64_t file_size#
int64_t row_count#
int64_t min_sequence_number#
int64_t max_sequence_number#
int64_t schema_id#
int32_t level#
Timestamp creation_time#
std::optional<int64_t> delete_row_count#