Commit#

Interface#

class FileStoreCommit#

Interface for commit operations in a file store.

The FileStoreCommit class provides interfaces for committing changes, expiring old snapshots, dropping partitions, and retrieving commit metrics.

Public Functions

virtual ~FileStoreCommit() = default#
virtual Status Commit(const std::vector<std::shared_ptr<CommitMessage>> &commit_messages, int64_t commit_identifier = BATCH_WRITE_COMMIT_IDENTIFIER, std::optional<int64_t> watermark = std::nullopt) = 0#

Commit changes to the file store.

Parameters:
  • commit_messages – A vector of commit messages to be committed.

  • commit_identifier – An optional identifier for the commit operation. Default is BATCH_WRITE_COMMIT_IDENTIFIER.

  • watermark – An optional event-time watermark used to indicate the progress of data processing. Default is std::nullopt.

Returns:

Status indicating the success or failure of the commit operation.

virtual Result<int32_t> FilterAndCommit(const std::map<int64_t, std::vector<std::shared_ptr<CommitMessage>>> &commit_identifier_and_messages, std::optional<int64_t> watermark = std::nullopt) = 0#

Filter out all std::vector<CommitMessage> which have been committed and commit the remaining ones.

Compared to commit, this method will first check if a commit_identifier has been committed, so this method might be slower. A common usage of this method is to retry the commit process after a failure.

Parameters:
  • commit_identifier_and_messages – A map containing all CommitMessages in question. The key is the commit_identifier.

  • watermark – An optional event-time watermark used to indicate the progress of data processing. Default is std::nullopt.

Returns:

Number of std::vector<CommitMessage> committed.

virtual Status Overwrite(const std::vector<std::map<std::string, std::string>> &partitions, const std::vector<std::shared_ptr<CommitMessage>> &commit_messages, int64_t commit_identifier, std::optional<int64_t> watermark = std::nullopt) = 0#

Overwrite from manifest committable and partition.

Parameters:
  • partitions – A single partition maps each partition key to a partition value. Depending on the user-defined statement, the partition might not include all partition keys. Also note that this partition does not necessarily equal to the partitions of the newly added key-values. This is just the partition to be cleaned up.

  • commit_messages – Description of the commit messages.

  • commit_identifier – Unique identifier.

  • watermark – An optional event-time watermark used to indicate the progress of data processing. Default is std::nullopt.

Returns:

Result of the operation.

virtual Result<int32_t> FilterAndOverwrite(const std::vector<std::map<std::string, std::string>> &partitions, const std::vector<std::shared_ptr<CommitMessage>> &commit_messages, int64_t commit_identifier, std::optional<int64_t> watermark = std::nullopt) = 0#

This is a temporary interface for internal use.

It will be removed in a future version. Please do not rely on it for long-term use.

Parameters:
  • partitions – Description of the partitions.

  • commit_messages – Description of the commit messages.

  • commit_identifier – Unique identifier.

  • watermark – An optional event-time watermark used to indicate the progress of data processing. Default is std::nullopt.

Returns:

Result of the operation.

virtual Result<std::string> GetLastCommitTableRequest() = 0#

If user want to use REST catalog commit, please set CommitContextBuilder::UseRESTCatalogCommit(), then call Commit() (or FilterAndCommit()) normally, then call this method to get the last commit table request, which is a JSON string that can be used to send to REST catalog server.

Note

Temporary interface for internal use, will be removed in the future.

Returns:

A Result containing a JSON string which including snapshot and statistics, but excluding tableId.

virtual Result<int32_t> Expire() = 0#

Expire old snapshot in the file store.

Returns:

Result<int32_t> indicating the number of expired items or an error status.

virtual Status DropPartition(const std::vector<std::map<std::string, std::string>> &partitions, int64_t commit_identifier) = 0#

Drop specified partitions from the file store.

Parameters:
  • partitions – A vector of partitions to be dropped.

  • commit_identifier – An identifier for the commit operation.

Returns:

Status indicating the success or failure of the drop partition operation.

virtual std::shared_ptr<Metrics> GetCommitMetrics() const = 0#

Retrieve metrics related to commit operations.

Returns:

A shared pointer to a Metrics object containing commit metrics.

Public Static Functions

static Result<std::unique_ptr<FileStoreCommit>> Create(std::unique_ptr<CommitContext> context)#

Create an instance of FileStoreCommit.

Parameters:

context – A unique pointer to the CommitContext used for commit operations.

Returns:

A Result containing a unique pointer to the FileStoreCommit instance.

class CommitContextBuilder#

CommitContextBuilder used to build a CommitContext, has input validation.

Public Functions

CommitContextBuilder(const std::string &root_path, const std::string &commit_user)#

Constructs a CommitContextBuilder with required parameters.

Parameters:
  • root_path – The root path of the Paimon table.

  • commit_user – The user identifier for the commit operation.

~CommitContextBuilder()#
CommitContextBuilder &SetOptions(const std::map<std::string, std::string> &options)#

Set a configuration options map to set some option entries which are not defined in the table schema or whose values you want to overwrite.

Note

The options map will clear the options added by AddOption() before.

Parameters:

options – The configuration options map.

Returns:

Reference to this builder for method chaining.

CommitContextBuilder &AddOption(const std::string &key, const std::string &value)#

Add a single configuration option which is not defined in the table schema or whose value you want to overwrite.

If you want to add multiple options, call AddOption() multiple times or use SetOptions() instead.

Parameters:
  • key – The option key.

  • value – The option value.

Returns:

Reference to this builder for method chaining.

CommitContextBuilder &IgnoreEmptyCommit(bool ignore_empty_commit)#

Sets whether to ignore empty commits (default is true).

When set to true, commits that don’t contain any actual data changes will be ignored.

Parameters:

ignore_empty_commit – True to ignore empty commits, false otherwise.

Returns:

Reference to this builder for method chaining.

CommitContextBuilder &UseRESTCatalogCommit(bool use_rest_catalog_commit)#

Sets whether to use REST catalog commit (default is false).

Note

Temporary interface, will be removed in the future.

Parameters:

use_rest_catalog_commit – True to use REST catalog commit, false otherwise.

Returns:

Reference to this builder for method chaining.

CommitContextBuilder &WithMemoryPool(const std::shared_ptr<MemoryPool> &memory_pool)#

Sets the memory pool to be used for memory allocation during commit operations.

Parameters:

memory_pool – Shared pointer to the memory pool instance.

Returns:

Reference to this builder for method chaining.

CommitContextBuilder &WithExecutor(const std::shared_ptr<Executor> &executor)#

Sets the executor to be used for asynchronous operations during commit.

Parameters:

executor – Shared pointer to the executor instance.

Returns:

Reference to this builder for method chaining.

Result<std::unique_ptr<CommitContext>> Finish()#

Build and return a CommitContext instance with input validation.

Returns:

Result containing the constructed CommitContext or an error status.

class CommitContext#

CommitContext is some configuration for commit operations.

Please do not use this class directly, use CommitContextBuilder to build a CommitContext which has input validation.

Public Functions

CommitContext(const std::string &root_path, const std::string &commit_user, bool ignore_empty_commit, bool use_rest_catalog_commit, const std::shared_ptr<MemoryPool> &memory_pool, const std::shared_ptr<Executor> &executor, const std::map<std::string, std::string> &options)#
~CommitContext()#
inline const std::string &GetRootPath() const#
inline const std::string &GetCommitUser() const#
inline bool IgnoreEmptyCommit() const#
inline bool UseRESTCatalogCommit() const#
inline std::shared_ptr<MemoryPool> GetMemoryPool() const#
inline std::shared_ptr<Executor> GetExecutor() const#
inline const std::map<std::string, std::string> &GetOptions() const#