Orphan Files Clean#

Interface#

class OrphanFilesCleaner#

To remove the data files and metadata files that are not used by table (so-called “orphan

files”).

It will ignore exception when listing all files because it’s OK to not delete unread files.

To avoid deleting newly written files, it only deletes orphan files older than olderThanMillis (1 day by default).

To avoid deleting files that are used but not read by mistaken, it will stop removing process when failed to read used files.

To avoid deleting files that were newly added to the Paimon Java protocol but are unrecognized by Paimon C++, we implemented a strong pattern-matching validation, deleting only files in patterns we recognize.

Note

OrphanFilesCleaner in Paimon C++ only support cleaning append table, do not support cleaning table with tag, table with external paths, table with branch, table with index, table with changelog, and primary key table.

Public Functions

virtual ~OrphanFilesCleaner() = default#
virtual Result<std::set<std::string>> Clean() = 0#

Cleans orphan files.

Returns:

A Result object containing a set of strings representing the paths of the cleaned files.

Public Static Functions

static Result<std::unique_ptr<OrphanFilesCleaner>> Create(std::unique_ptr<CleanContext> &&context)#

Create an instance of OrphanFilesCleaner.

Parameters:

context – A unique pointer to the CleanContext used for cleanup tasks.

Returns:

A Result containing a unique pointer to the OrphanFilesCleaner instance.

class CleanContextBuilder#

CleanContextBuilder used to build a CleanContext, has input validation.

Public Functions

explicit CleanContextBuilder(const std::string &root_path)#

Constructs a CleanContextBuilder with required parameters.

Parameters:

root_path – The root path of the table.

~CleanContextBuilder()#
CleanContextBuilder &SetOptions(const std::map<std::string, std::string> &options)#

Set a configuration options map to set some option entries which are not defined in the table schema or whose values you want to overwrite.

Note

The options map will clear the options added by AddOption() before.

Parameters:

options – The configuration options map.

Returns:

Reference to this builder for method chaining.

CleanContextBuilder &AddOption(const std::string &key, const std::string &value)#

Add a single configuration option which is not defined in the table schema or whose value you want to overwrite.

If you want to add multiple options, call AddOption() multiple times or use SetOptions() instead.

Parameters:
  • key – The option key.

  • value – The option value.

Returns:

Reference to this builder for method chaining.

CleanContextBuilder &WithOlderThanMs(int64_t older_than_ms)#

An optional time threshold in milliseconds for filtering.

If not provided, defaults to the current time minus one day.

CleanContextBuilder &WithFileRetainCondition(std::function<bool(const std::string&)> should_be_retained)#

Specifies a custom condition to determine which files should be retained.

Parameters:

should_be_retained – A callable object that takes a filename and returns true if the file should be kept, or false if it can be deleted.

Returns:

Reference to this builder for method chaining.

CleanContextBuilder &WithMemoryPool(const std::shared_ptr<MemoryPool> &pool)#

Set custom memory pool for memory management.

Parameters:

pool – The memory pool to use.

Returns:

Reference to this builder for method chaining.

CleanContextBuilder &WithExecutor(const std::shared_ptr<Executor> &executor)#

Set custom executor for task execution.

Parameters:

executor – The executor to use.

Returns:

Reference to this builder for method chaining.

Result<std::unique_ptr<CleanContext>> Finish()#

Build and return a CleanContext instance with input validation.

Returns:

Result containing the constructed CleanContext or an error status.

class CleanContext#

CleanContext is some configuration for orphan files clean operations.

Please do not use this class directly, use CleanContextBuilder to build a CleanContext which has input validation.

Public Functions

CleanContext(const std::string &root_path, const std::map<std::string, std::string> &options, int64_t older_than_ms, const std::shared_ptr<MemoryPool> &pool, const std::shared_ptr<Executor> &executor, std::function<bool(const std::string&)> should_be_retained)#
~CleanContext()#
inline const std::string &GetRootPath() const#
inline const std::map<std::string, std::string> &GetOptions() const#
inline int64_t GetOlderThanMs() const#
inline std::shared_ptr<MemoryPool> GetMemoryPool() const#
inline std::shared_ptr<Executor> GetExecutor() const#
inline std::function<bool(const std::string&)> GetFileRetainCondition() const#