Read and Data Evolution#
Paimon by functionality can be divided into two layers:
Control Plane: Responsible for accessing and managing Meta (snapshot, manifest, etc.), including: - Catalog / Database access - Table retrieval - Collection and resolution of data files
Data Plane: Responsible for accessing actual data files, including: - Readers for various file formats - Coordinated reading of file collections
The control plane and data plane interact primarily via DataSplit (the query plan). Java currently supports a standard DataSplit protocol which includes the necessary meta information to access data files. With DataSplit, a high-performance data access path can be integrated.
At compute time, the execution engine (reader) does not need to be aware of the concrete table type or its metadata details. It only needs to follow the instructions within the DataSplit (query plan) to perform data reading operations.
With the layered abstraction of the control plane and data plane, and the use of DataSplit as a stable protocol interface, the two layers can evolve their functionality and optimize code relatively independently. This design also enables cross-language task scheduling and interaction (e.g., Java and C++), substantially reducing engineering maintenance costs across the two language ecosystems.
Schema Evolution#
Scope and Compatibility#
C++ Paimon supports all evolution kinds available in Java Paimon for non-nested types:
Add column
Drop column
Reorder columns
Rename column
Change column type
Note
Only non-nested type evolution is supported. Nested columns (struct, array, map) are not supported.
Partition keys: Only column reordering is supported; other operations are not supported (consistent with Java Paimon).
Primary key:
Adding or dropping columns is not supported.
Other operations are supported (consistent with Java Paimon).
Per-File Schema via Field IDs#
In DataSplit, each file may have a completely different data schema. Paimon uses field IDs to uniquely identify fields.
Overflow Behavior Disclaimer#
Overflow behavior is undefined for C++ and Java Paimon. Results in overflow scenarios may:
Be incorrect values,
Return an error status,
Or be null.
C++ Paimon does not guarantee identical results to Java Paimon in overflow scenarios. Users should not rely on identical return values between implementations.
Type Change Support Matrix#
The table below indicates support for changing a column type from source to target. Refer to the numbered notes below the table
for caveats.
src \ target |
tinyint |
smallint |
int |
bigint |
float |
double |
bool |
string |
binary |
date |
timestamp (without tz) |
decimal |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
tinyint |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
❌ |
❌ |
❌ |
✅ |
smallint |
✅ 1️⃣ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
❌ |
❌ |
❌ |
✅ |
int |
✅ 1️⃣ |
✅ 1️⃣ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
❌ |
❌ |
✅ 1️⃣ |
✅ |
bigint |
✅ 1️⃣ |
✅ 1️⃣ |
✅ 1️⃣ |
✅ |
✅ |
✅ |
✅ |
✅ |
❌ |
❌ |
✅ 6️⃣ |
✅ |
float |
✅ 2️⃣ |
✅ 2️⃣ |
✅ 2️⃣ |
✅ 2️⃣ |
✅ |
✅ |
✅ |
✅ 3️⃣ 4️⃣ |
❌ |
❌ |
❌ |
✅ |
double |
✅ 2️⃣ |
✅ 2️⃣ |
✅ 2️⃣ |
✅ 2️⃣ |
✅ 2️⃣ |
✅ |
✅ |
✅ 3️⃣ 4️⃣ |
❌ |
❌ |
❌ |
✅ |
bool |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
❌ |
❌ |
❌ |
✅ |
string |
✅ |
✅ |
✅ |
✅ |
✅ 3️⃣ |
✅ 3️⃣ |
✅ |
✅ |
✅ |
✅ |
✅ 5️⃣ |
✅ 7️⃣ |
binary |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
✅ |
✅ |
❌ |
❌ |
❌ |
date |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
✅ |
❌ |
✅ |
✅ 5️⃣ |
❌ |
timestamp (without tz) |
❌ |
❌ |
✅ 1️⃣ |
✅ |
❌ |
❌ |
❌ |
✅ |
❌ |
✅ |
✅ |
❌ |
decimal |
✅ 1️⃣ |
✅ 1️⃣ |
✅ 1️⃣ |
✅ 1️⃣ |
✅ |
✅ |
❌ |
✅ |
❌ |
❌ |
❌ |
✅ |
Overflow Behavior Notes
- 1️⃣ Integer downcast overflow behavior matches Java in specific cases.
Example: smallint -> tinyint, 32767 becomes -1; int -> smallint, -2147483648 becomes 0.
- 2️⃣ Floating-point overflow behavior is partially consistent with Java and partially different.
- Example: float -> tinyint
Java: MAX_FLOAT -> -1, INFINITY -> -1
C++: MAX_FLOAT -> 0, INFINITY -> 0
- 3️⃣ Keyword differences for special float/double values:
Java: Infinity, -Infinity, NaN
C++: inf, -inf, nan
- 4️⃣ Printing difference:
C++ Paimon prints 1.0 as
1Java Paimon prints 1.0 as
1.0
- 5️⃣ Timestamp precision and range differences:
Java Paimon: 0000-01-01 00:00:00.000000000 to 9999-12-31 23:59:59.999999999
C++ Paimon: 1677-09-21 00:12:43.145224192 to 2262-04-11 23:47:16.854775807
C++ only supports nanosecond precision; range is smaller.
- 6️⃣ bigint -> timestamp range differences:
Java Paimon (ms):
[MIN_INT64/1000, MAX_INT64/1000]secondsC++ Paimon (ns):
[MIN_INT64/1e9, MAX_INT64/1e9]seconds
- 7️⃣ string -> decimal with precision > 38:
C++ returns
nullif parsing would overflow 128-bit arithmetic.Java may rescale and return a value based on the rescaled precision.
Example input:
1111111111111111111111111111111111111.15, Java returns:1111111111111111111111111111111111111.2, C++ returns:null
Implementation Guidance#
Use DataSplit as the sole interface between control and data planes. Treat it as the canonical query plan contract.
Resolve field types and IDs per file; prefer inline data file metadata, fallback to table schema files when necessary.
Expect per-file schema variability; design readers to align by field IDs rather than positional indices.
Do not assume identical overflow semantics across C++ and Java; tests should validate acceptable ranges and nullability.
For timestamp handling, consider precision/range constraints in C++ when interoperating with Java-produced data splits.