Write And Prepare Commit#

Batch writing requires the compute engine to pre-bucket data (bucket), using the same bucketing strategy as Paimon to ensure correct Scan behavior, and to specify the target partition. Data should be accumulated into RecordBatch and written to Paimon.

Paimon C++ uses Apache Arrow as the in-memory columnar format to more efficiently support writing to disk columnar formats such as ORC and Parquet, thereby improving write throughput.

Note

Currently supported table types:

Append table
Primary Key table

Not supported in the current scope:

Changelog
Indexes

Bucketing Modes#

Append tables:
- Support bucket = -1 (dynamic bucket mode)
- Support bucket > 0 (fixed bucket mode)
PK tables:
- Support bucket > 0 (fixed bucket mode)

Note

PK tables do not support dynamic bucketing (bucket = -1).

RecordBatch Construction#

The compute engine must:
- Apply the Paimon-consistent bucketing function to each row prior to batching.
- Assign the correct partition for each row.
- Group rows into Arrow RecordBatch per partition-bucket combination to minimize writer state changes and I/O overhead.
Recommended practices:
- Use schema-aligned Arrow arrays with explicit validity bitmaps and offsets.
- Prefer batch sizes tuned for I/O throughput (e.g., tens to hundreds of MB per flush, depending on filesystem and cluster configuration).
- Maintain stable sort orders within a batch only if required by downstream merge or compaction logic; otherwise avoid unnecessary ordering costs.

Prepare Commit#

The compute engine is responsible for triggering the writer nodes’ PrepareCommit. Triggering conditions depend on the engine’s business needs and can follow either:

Streaming mode: time-based or periodic triggers (e.g., every N seconds).
Batch mode: trigger after all data in the batch has been written.

Once the compute engine collects CommitMessages from all writer nodes, it can issue a Commit request to the control plane (management path) to create a new Snapshot.

Compatibility Goals#

To ensure interoperability, the PrepareCommit result produced by Paimon C++ must be consumable by Paimon Java. Therefore:

The structure and semantics of CommitMessage must remain consistent with Java Paimon.
Any evolution of the Java-side CommitMessage schema must be tracked and validated on the C++ side to maintain cross-language compatibility.

Interface Design in Paimon C++#

Unlike Java Paimon, Paimon C++ does not expose BinaryRow-like types in its public interfaces. To preserve compatibility without leaking internal row representations, Paimon C++ provides CommitMessage only through:

Serialization: convert the internal commit state into a well-defined binary representation that matches Java Paimon’s expectations.
Deserialization: parse the Java-compatible binary representation back into C++ commit structures for validation, replay, or tooling needs.

This design ensures that:

Public APIs are independent of Java-specific row abstractions.
Cross-language commit payloads remain stable and versionable.
Internal data layouts can evolve without breaking external consumers.

CommitMessage Contract#

The CommitMessage must encode all information required by the coordinator to produce a correct Snapshot, which commonly includes (but is not limited to):

Partition and bucket identifiers associated with written data.
New data files, delete files, or changelog artifacts (as applicable to the table type).
File-level metadata required for manifest and index updates (e.g., row counts, min/max statistics where applicable).
Transactional markers and sequence numbers as required by table semantics.
Any per-writer state necessary for deduplication or idempotent commits.

Note

Current C++ scope supports Append and PK tables. Changelog and index artifacts are out of scope and should not be emitted in CommitMessage until explicitly supported.

Serialization and Deserialization#

Binary Format: - The binary payload must strictly conform to Java Paimon’s CommitMessage encoding. - Version tags or schema identifiers should be included to enable forwards/backwards compatibility and safe upgrades.
Serialization API: - Provide a function to serialize the writer’s commit state into a byte buffer (or stream) consumable by Java Paimon.
Deserialization API: - Provide a function to parse a Java-produced CommitMessage binary payload back into C++ commit structures for verification, replay, and testing.
Validation: - Include conformance tests to assert that C++ serialized payloads are accepted by Java Paimon. - Include round-trip tests to ensure C++ can parse Java-produced payloads and vice versa for supported message versions.

Operational Flow#

Writer nodes perform data ingestion and produce Arrow RecordBatch organized by partition and bucket.
Writers flush batches into ORC/Parquet files via registered file.format and file-system backends, producing file-level metadata and per-batch commit state.
Each writer invokes PrepareCommit, which: - Aggregates per-writer state into a CommitMessage. - Serializes the message into a Java-compatible binary payload.
The compute engine gathers CommitMessages from all writers.
The compute engine issues a Commit request to the control plane with the collected messages, resulting in a new Snapshot.
The coordinator validates the messages, updates manifests/metadata, and finalizes the snapshot atomically.