Orphan Files Clean#
Interface#
-
class OrphanFilesCleaner#
To remove the data files and metadata files that are not used by table (so-called “orphan
files”).
It will ignore exception when listing all files because it’s OK to not delete unread files.
To avoid deleting newly written files, it only deletes orphan files older than
olderThanMillis(1 day by default).To avoid deleting files that are used but not read by mistaken, it will stop removing process when failed to read used files.
To avoid deleting files that were newly added to the Paimon Java protocol but are unrecognized by Paimon C++, we implemented a strong pattern-matching validation, deleting only files in patterns we recognize.
Note
OrphanFilesCleanerin Paimon C++ only support cleaning append table, do not support cleaning table with tag, table with external paths, table with branch, table with index, table with changelog, and primary key table.Public Functions
-
virtual ~OrphanFilesCleaner() = default#
-
virtual Result<std::set<std::string>> Clean() = 0#
Cleans orphan files.
- Returns:
A Result object containing a set of strings representing the paths of the cleaned files.
Public Static Functions
-
static Result<std::unique_ptr<OrphanFilesCleaner>> Create(std::unique_ptr<CleanContext> &&context)#
Create an instance of
OrphanFilesCleaner.- Parameters:
context – A unique pointer to the
CleanContextused for cleanup tasks.- Returns:
A Result containing a unique pointer to the
OrphanFilesCleanerinstance.
-
virtual ~OrphanFilesCleaner() = default#
-
class CleanContextBuilder#
CleanContextBuilderused to build aCleanContext, has input validation.Public Functions
-
explicit CleanContextBuilder(const std::string &root_path)#
Constructs a
CleanContextBuilderwith required parameters.- Parameters:
root_path – The root path of the table.
-
~CleanContextBuilder()#
-
CleanContextBuilder &SetOptions(const std::map<std::string, std::string> &options)#
Set a configuration options map to set some option entries which are not defined in the table schema or whose values you want to overwrite.
Note
The options map will clear the options added by
AddOption()before.- Parameters:
options – The configuration options map.
- Returns:
Reference to this builder for method chaining.
-
CleanContextBuilder &AddOption(const std::string &key, const std::string &value)#
Add a single configuration option which is not defined in the table schema or whose value you want to overwrite.
If you want to add multiple options, call
AddOption()multiple times or useSetOptions()instead.- Parameters:
key – The option key.
value – The option value.
- Returns:
Reference to this builder for method chaining.
-
CleanContextBuilder &WithOlderThanMs(int64_t older_than_ms)#
An optional time threshold in milliseconds for filtering.
If not provided, defaults to the current time minus one day.
-
CleanContextBuilder &WithFileRetainCondition(std::function<bool(const std::string&)> should_be_retained)#
Specifies a custom condition to determine which files should be retained.
- Parameters:
should_be_retained – A callable object that takes a filename and returns
trueif the file should be kept, orfalseif it can be deleted.- Returns:
Reference to this builder for method chaining.
Set custom memory pool for memory management.
- Parameters:
pool – The memory pool to use.
- Returns:
Reference to this builder for method chaining.
Set custom executor for task execution.
- Parameters:
executor – The executor to use.
- Returns:
Reference to this builder for method chaining.
-
Result<std::unique_ptr<CleanContext>> Finish()#
Build and return a
CleanContextinstance with input validation.- Returns:
Result containing the constructed
CleanContextor an error status.
-
explicit CleanContextBuilder(const std::string &root_path)#
-
class CleanContext#
CleanContextis some configuration for orphan files clean operations.Please do not use this class directly, use
CleanContextBuilderto build aCleanContextwhich has input validation.See also
Public Functions
-
~CleanContext()#
-
inline const std::string &GetRootPath() const#
-
inline const std::map<std::string, std::string> &GetOptions() const#
-
inline int64_t GetOlderThanMs() const#
-
inline std::shared_ptr<MemoryPool> GetMemoryPool() const#
-
inline std::function<bool(const std::string&)> GetFileRetainCondition() const#
-
~CleanContext()#