Scan#
Interface#
-
class TableScan#
A scanner interface for reading table’s meta and create a plan.
Public Functions
-
virtual ~TableScan() = default#
Public Static Functions
-
static Result<std::unique_ptr<TableScan>> Create(std::unique_ptr<ScanContext> context)#
Create an instance of
TableScan.- Parameters:
context – A unique pointer to the
ScanContextused for scan operations.- Returns:
A Result containing a unique pointer to the
TableScaninstance.
-
virtual ~TableScan() = default#
-
class ScanContextBuilder#
ScanContextBuilderused to build aScanContext, has input validation.Public Functions
-
explicit ScanContextBuilder(const std::string &path)#
Constructs a
ScanContextBuilderwith required parameters.- Parameters:
path – The root path of the table.
-
~ScanContextBuilder()#
-
ScanContextBuilder &SetLimit(int32_t limit)#
If limit is not set, it defaults to unlimited.
-
ScanContextBuilder &SetBucketFilter(int32_t bucket_filter)#
Set a bucket filter to scan only specific bucket.
-
ScanContextBuilder &SetPartitionFilter(const std::vector<std::map<std::string, std::string>> &partition_filters)#
partition_filters in vector is supposed to be OR, filter in map is supposed to be AND, e.g., partition_filters is {{k1=1,k2=10}, {k1=2,k2=20}} => OR(AND(k1=1,k2=10), AND(k1=2,k2=20))
Set a predicate for filtering data.
-
ScanContextBuilder &SetRowRanges(const std::vector<Range> &row_ranges)#
Specify the row id ranges for scan.
This is usually used to read specific rows in data-evolution mode. File ranges that do not have any intersection with range_ids will be filtered. If not set, all rows are returned
-
ScanContextBuilder &AddOption(const std::string &key, const std::string &value)#
The options added or set in
ScanContextBuilderhave high priority and will be merged with the options in table schema.
-
ScanContextBuilder &SetOptions(const std::map<std::string, std::string> &options)#
Set a configuration options map to set some option entries which are not defined in the table schema or whose values you want to overwrite.
Note
The options map will clear the options added by
AddOption()before.- Parameters:
options – The configuration options map.
- Returns:
Reference to this builder for method chaining.
-
ScanContextBuilder &WithStreamingMode(bool is_streaming_mode)#
Set whether the scan is in streaming mode.
Note
if not set, is_streaming_mode = false
Set custom memory pool for memory management.
Note
if not set, memory_pool is default pool
- Parameters:
memory_pool – The memory pool to use.
- Returns:
Reference to this builder for method chaining.
Set custom executor for task execution.
Note
if not set, executor is default executor
- Parameters:
executor – The executor to use.
- Returns:
Reference to this builder for method chaining.
-
Result<std::unique_ptr<ScanContext>> Finish()#
Build and return a
ScanContextinstance with input validation.- Returns:
Result containing the constructed
ScanContextor an error status.
-
explicit ScanContextBuilder(const std::string &path)#
-
class ScanContext#
ScanContextis some configuration for table scan operations.Please do not use this class directly, use
ScanContextBuilderto build aScanContextwhich has input validation.See also
Public Functions
-
~ScanContext()#
-
inline const std::string &GetPath() const#
-
inline bool IsStreamingMode() const#
-
inline std::optional<int32_t> GetLimit() const#
-
inline std::shared_ptr<ScanFilter> GetScanFilters() const#
-
inline const std::map<std::string, std::string> &GetOptions() const#
-
inline std::shared_ptr<MemoryPool> GetMemoryPool() const#
-
~ScanContext()#
-
class DataSplit : public paimon::Split#
Input data split for reading operation. Needed by most batch computation engines.
Public Functions
-
virtual std::vector<SimpleDataFileMeta> GetFileList() const = 0#
Get the list of metadata for all data files in this split.
Note
This method will be removed in future versions and is only used for append tables.
-
struct SimpleDataFileMeta#
Metadata structure for simple data files.
Contains essential information about a data file including its location, size, row count, sequence numbers, schema information, and timestamps. This structure is used to track file metadata without loading the actual file content.
Public Functions
-
inline SimpleDataFileMeta(const std::string &_file_path, int64_t _file_size, int64_t _row_count, int64_t _min_sequence_number, int64_t _max_sequence_number, int64_t _schema_id, int32_t _level, const Timestamp &_creation_time, const std::optional<int64_t> &_delete_row_count)#
-
bool operator==(const SimpleDataFileMeta &other) const#
-
std::string ToString() const#
Public Members
-
std::string file_path#
Absolute path of the data file.
If external path is enabled,
file_pathindicates the actual location in the external storage system.
-
int64_t file_size#
-
int64_t row_count#
-
int64_t min_sequence_number#
-
int64_t max_sequence_number#
-
int64_t schema_id#
-
int32_t level#
-
std::optional<int64_t> delete_row_count#
-
inline SimpleDataFileMeta(const std::string &_file_path, int64_t _file_size, int64_t _row_count, int64_t _min_sequence_number, int64_t _max_sequence_number, int64_t _schema_id, int32_t _level, const Timestamp &_creation_time, const std::optional<int64_t> &_delete_row_count)#
-
virtual std::vector<SimpleDataFileMeta> GetFileList() const = 0#