Write#
Interface#
-
class FileStoreWrite#
Interface for write operations in a file store.
Public Functions
-
virtual ~FileStoreWrite() = default#
-
virtual Status Write(std::unique_ptr<RecordBatch> &&batch) = 0#
Support write an input
RecordBatchto internal buffer or file.
-
virtual Result<std::vector<std::shared_ptr<CommitMessage>>> PrepareCommit(bool wait_compaction = true, int64_t commit_identifier = BATCH_WRITE_COMMIT_IDENTIFIER) = 0#
Generate a list of commit messages with the latest generated data file meta information of the current snapshot.
When we need commit, call PrepareCommit to get the current
CommitMessages with the latest generated data file meta information of the current snapshot.This function is designed to be called when a commit is required. Depending on the writing scenario, the behavior will differ:
For batch write, simply call
PrepareCommit()without any parameters.For streaming write, you need to provide both parameters:
PrepareCommit(bool wait_compaction, int64_t commit_identifier).
- Parameters:
wait_compaction – Indicates whether to wait for any ongoing compaction process to complete.
commit_identifier – A unique identifier for the commit operation. This parameter is only relevant in streaming write scenarios.
- Returns:
A Result containing
std::vector<std::shared_ptr<CommitMessage>>objects, representing the generated commit messages.
-
virtual std::shared_ptr<Metrics> GetMetrics() const = 0#
-
virtual Status Close() = 0#
Public Static Functions
-
static Result<std::unique_ptr<FileStoreWrite>> Create(std::unique_ptr<WriteContext> context)#
Create an instance of
FileStoreWrite.- Parameters:
context – A unique pointer to the
WriteContextused for write operations.- Returns:
A Result containing a unique pointer to the
FileStoreWriteinstance.
-
virtual ~FileStoreWrite() = default#
-
class WriteContextBuilder#
WriteContextBuilderused to build aWriteContext, has input validation.Public Functions
-
WriteContextBuilder(const std::string &root_path, const std::string &commit_user)#
Constructs a
WriteContextBuilderwith required parameters.- Parameters:
root_path – The root path of the table.
commit_user – The user identifier for commit operations.
-
~WriteContextBuilder()#
-
WriteContextBuilder &SetOptions(const std::map<std::string, std::string> &options)#
Set a configuration options map to set some option entries which are not defined in the table schema or whose values you want to overwrite.
Note
The options map will clear the options added by
AddOption()before.- Parameters:
options – The configuration options map.
- Returns:
Reference to this builder for method chaining.
-
WriteContextBuilder &AddOption(const std::string &key, const std::string &value)#
Add a single configuration option which is not defined in the table schema or whose value you want to overwrite.
If you want to add multiple options, call
AddOption()multiple times or useSetOptions()instead.- Parameters:
key – The option key.
value – The option value.
- Returns:
Reference to this builder for method chaining.
-
WriteContextBuilder &WithStreamingMode(bool is_streaming_mode)#
Set whether to enable streaming mode (default is false)
- Returns:
Reference to this builder for method chaining.
-
WriteContextBuilder &WithIgnorePreviousFiles(bool ignore_previous_files)#
Set whether the write operation should ignore previously stored files.
(default is false)
- Returns:
Reference to this builder for method chaining.
Set custom memory pool for memory management.
- Parameters:
memory_pool – The memory pool to use.
- Returns:
Reference to this builder for method chaining.
Set custom executor for task execution.
- Parameters:
executor – The executor to use.
- Returns:
Reference to this builder for method chaining.
-
WriteContextBuilder &WithWriteId(int32_t write_id)#
For postpone bucket mode in pk table,
WithWriteId()supposed to be used.Each worker must have its own unique
write_idwithin a task, which is used as the prefix for its data files. This ensures that files from the same worker share the same prefix and can be consumed by the same compaction reader to preserve input order.- Returns:
Reference to this builder for method chaining.
-
WriteContextBuilder &WithBranch(const std::string &branch)#
Write to specific branch, default is main.
- Returns:
Reference to this builder for method chaining.
-
WriteContextBuilder &WithWriteSchema(const std::vector<std::string> &write_schema)#
For data evolution, user can write partial specific fields from table schema.
If not set, write all fields in table.
- Returns:
Reference to this builder for method chaining.
-
WriteContextBuilder &WithFileSystemSchemeToIdentifierMap(const std::map<std::string, std::string> &fs_scheme_to_identifier_map)#
Set the file system scheme to identifier mapping for custom file system configurations.
This allows using different file system implementations for different URI schemes.
Note
If not set, use default file system (configured in
Options::FILE_SYSTEM).- Parameters:
fs_scheme_to_identifier_map – Map from URI scheme to file system identifier.
- Returns:
Reference to this builder for method chaining.
-
Result<std::unique_ptr<WriteContext>> Finish()#
Build and return a
WriteContextinstance with input validation.- Returns:
Result containing the constructed
WriteContextor an error status.
-
WriteContextBuilder(const std::string &root_path, const std::string &commit_user)#
-
class WriteContext#
WriteContextis some configuration for write operations.Please do not use this class directly, use
WriteContextBuilderto build aWriteContextwhich has input validation.See also
Public Functions
-
~WriteContext()#
-
inline const std::string &GetRootPath() const#
-
inline const std::string &GetCommitUser() const#
-
inline const std::map<std::string, std::string> &GetFileSystemSchemeToIdentifierMap() const#
-
inline const std::map<std::string, std::string> &GetOptions() const#
-
inline bool IsStreamingMode() const#
-
inline bool IgnoreNumBucketCheck() const#
-
inline bool IgnorePreviousFiles() const#
-
inline const std::optional<int32_t> GetWriteId() const#
-
inline const std::string &GetBranch() const#
-
inline const std::vector<std::string> &GetWriteSchema() const#
-
inline std::shared_ptr<MemoryPool> GetMemoryPool() const#
-
~WriteContext()#
-
class RecordBatch#
RecordBatchencapsulates a batch of data with the same schema, supporting different types such asINSERT,UPDATE_BEFORE,UPDATE_AFTER, andDELETE.It is typically used in streaming write or batch processing scenarios, with underlying data stored in the Apache Arrow format.
Note
Do not use this class directly, use
RecordBatchBuilderto build aRecordBatchwhich has input validation.Public Types
Public Functions
-
RecordBatch(const std::map<std::string, std::string> &partition, int32_t bucket, const std::vector<RowKind> &row_kinds, ArrowArray *data)#
Note
1. Data cannot be reused, as it will be released after Write. 2. If a partition field’s value is null, it should be represented as “__DEFAULT_PARTITION__”(or a user-defined default value) in the partition map. However, in the Arrow array, partition column values MUST NOT be set to “__DEFAULT_PARTITION__” (or a user-defined default value). Instead, they should be properly set as actual nulls. If used, it may lead to behavioral inconsistencies between C++ Paimon and Java Paimon.
-
~RecordBatch()#
-
RecordBatch(RecordBatch&&)#
-
RecordBatch &operator=(RecordBatch&&)#
-
RecordBatch(const RecordBatch&) = delete#
-
RecordBatch &operator=(const RecordBatch&) = delete#
-
inline const std::map<std::string, std::string> &GetPartition() const#
-
inline int32_t GetBucket() const#
-
inline ArrowArray *GetData() const#
-
inline const std::vector<RecordBatch::RowKind> &GetRowKind() const#
-
inline void SetBucket(int32_t bucket)#
-
bool HasSpecifiedBucket() const#
-
RecordBatch(const std::map<std::string, std::string> &partition, int32_t bucket, const std::vector<RowKind> &row_kinds, ArrowArray *data)#
-
class RecordBatchBuilder#
Builder for constructing
RecordBatchinstances.This class provides a convenient way to build
RecordBatchobjects by setting various properties such as data, row kinds, partition information, and bucket id.Public Functions
-
explicit RecordBatchBuilder(::ArrowArray *data)#
Constructs a
RecordBatchBuilderwith Arrow data.- Parameters:
data – Arrow array containing the record data.
-
~RecordBatchBuilder()#
-
RecordBatchBuilder &MoveData(::ArrowArray *data)#
Move new Arrow data into the builder, replacing existing data.
- Parameters:
data – New Arrow array data.
-
RecordBatchBuilder &SetRowKinds(const std::vector<RecordBatch::RowKind> &row_kinds)#
Set the row kinds for each record in the batch.
Note
row_kindsmust have the same length as the number of records in the data.- Parameters:
row_kinds – A vector of row kinds, including INSERT, UPDATE_BEFORE, UPDATE_AFTER and DELETE. If not set, default value is
INSERT.
-
RecordBatchBuilder &SetPartition(const std::map<std::string, std::string> &data)#
Set the partition information for this record batch.
- Parameters:
data – Map of partition column names to their string values.
-
RecordBatchBuilder &SetBucket(int32_t bucket)#
Set the bucket id for this record batch.
If not set, default value is
-1.- Parameters:
bucket – The bucket id for data distribution.
-
Result<std::unique_ptr<RecordBatch>> Finish()#
Build and return the final
RecordBatchinstance.This method validates the configuration and creates
RecordBatchwith all the specified properties.
-
explicit RecordBatchBuilder(::ArrowArray *data)#