struct_pack Introduction
struct_pack is a serialization library featuring zero-cost abstraction as well as usability. In struct_pack, the serialization or deserialization of one aggregated structure could easily be done in a single line of code, without any DSL, macro, or template to be defined. struct_pack also support use marco for custom reflection of non-aggregated struct. struct_pack supports the serialization of C++ structures through compile-time reflection and its performance is significantly better than protobuf and msgpack (see the benchmark section for details).
Below, we show the basic usage of struct_pack with a simple object as an example.
struct person {
int64_t id;
std::string name;
int age;
double salary;
};
person person1{.id = 1, .name = "hello struct pack", .age = 20, .salary = 1024.42};
Serialization
In below we demonstrate serval ways of serialize one object with struct_pack APIs.
Basic Usage
// serialization in one line
std::vector<char> result = struct_pack::serialize(person1);
Explicit data container
auto result = struct_pack::serialize<std::string>(person1);
// explicitly use std::string instead of std::vector<char> to hold the result
Append the result at the end of existing data container
std::string result="The next line is struct_pack serialize result.\n";
auto result = struct_pack::serialize_to(result,person1);
//
Save the results to memory location indicated by pointer
auto sz=struct_pack::get_needed_siarray(person1);
std::unique_ptr array=std::make_unique<char[]>(sz);
auto result = struct_pack::serialize_to(array.get(),sz,person1);
// save the result at given memory location by pointer
Multi-parameter serialization
auto result=struct_pack::serialize(person1.id, person1.name, person1.age, person1.salary);
//serialize as std::tuple<int64_t, std::string, int, double>
Save the results to output stream
std::ofstream writer("struct_pack_demo.data",
std::ofstream::out | std::ofstream::binary);
struct_pack::serialize_to(writer, person1);
Deserialization
In below we demonstrate serval ways of deserialize one object with struct_pack APIs.
Basic Usage
// deserialize in one line
auto person2 = deserialize<person>(buffer);
assert(person2); // person2.has_value() == true
assert(person2.value()==person1);
deserialize from pointers
// deserialize from memory location indicated by pointers
auto person2 = deserialize<person>(buffer.data(),buffer.size());
assert(person2); //person2.has_value() == true
assert(person2.value()==person1);
deserialize to an existing object
// deserialize to an existing object
person person2;
auto ec = deserialize_to(person2, buffer);
assert(!ec); // no error
assert(person2==person1);
Multi-parameter deserialization
auto person2 = deserialize<int64_t,std::string,int,double>(buffer);
assert(person2); // person2.has_value() == true
auto &&[id,name,age,salary]=person2.value();
assert(person1.id==id);
assert(person1.name==name);
assert(person1.age==age);
assert(person1.salary==salary);
deserialize to input stream
std::ifstream ifs("struct_pack_demo.data",
std::ofstream::in | std::ofstream::binary);
auto person2 = struct_pack::deserialize<person>(ifs);
assert(person2 == person1);
Partial deserialization
Sometimes users only need to deserialize specific fields of an object instead of all of them, and that's when the partial deserialization feature can be used. This can avoid full deserialization and improve efficiency significantly, eg.:
// Just deserialize the 2nd field of person
auto name = get_field<person, 1>(buffer.data(), buffer.size());
assert(name); // name.has_value() == true
assert(name.value() == "hello struct pack");
support std containers, std::optional and custom containers
For example, the library supports the following complicated objects with std containers and std::optional fields:
enum class Color { red, black, white };
struct complicated_object {
Color color;
int a;
std::string b;
std::vector<person> c;
std::list<std::string> d;
std::deque<int> e;
std::map<int, person> f;
std::multimap<int, person> g;
std::set<std::string> h;
std::multiset<int> i;
std::unordered_map<int, person> j;
std::unordered_multimap<int, int> k;
std::array<person, 2> m;
person n[2];
std::pair<std::string, person> o;
std::optional<int> p;
std::unique_ptr<int> q;
};
struct nested_object {
int id;
std::string name;
person p;
complicated_object o;
};
nested_object nested{.id = 2, .name = "tom", .p = {20, "tom"}, .o = {}};
auto buffer = serialize(nested);
auto nested2 = deserialize(buffer.data(), buffer.size());
assert(nested2)
assert(nested2==nested1);
custom support
custom reflection
Sometimes user need support non-aggregated type, or adjust the order of each field, which can be supported by macro function STRUCT_PACK_REFL(typename, fieldname1, fieldname2 ...)
.
namespace test {
class person : std::vector<int> {
private:
std::string mess;
public:
int age;
std::string name;
auto operator==(const person& rhs) const {
return age == rhs.age && name == rhs.name;
}
person() = default;
person(int age, const std::string& name) : age(age), name(name) {}
};
STRUCT_PACK_REFL(person, name, age);
}
The first argument of `STRUCT_PACK_REFL(typename, fieldname1, fieldname2 ...)` is the name of the type reflected, the others are the field names.
The macro must be defined in the same namespace of the reflected type.
The macro allow struct_pack support non-aggregated type, allow user define constructor, derived from other type or add some field which don't serialize.
Sometimes, user want to serialize/deserialize some private fields, which can be supported by macro function STRUCT_PACK_FRIEND_DECL(typename)
.
namespace example2 {
class person {
private:
int age;
std::string name;
public:
auto operator==(const person& rhs) const {
return age == rhs.age && name == rhs.name;
}
person() = default;
person(int age, const std::string& name) : age(age), name(name) {}
STRUCT_PACK_FRIEND_DECL(person);
};
STRUCT_PACK_REFL(person, age, name);
} // namespace example2
This macro must declared inside the struct, it regist struct_pack reflection function as friend function.
User can even register member function in macro function STRUCT_PACK_REFL
, which greatly expands the flexibility of struct_pack.
namespace example3 {
class person {
private:
int age_;
std::string name_;
public:
auto operator==(const person& rhs) const {
return age_ == rhs.age_ && name_ == rhs.name_;
}
person() = default;
person(int age, const std::string& name) : age_(age), name_(name) {}
int& age() { return age_; };
const int& age() const { return age_; };
std::string& name() { return name_; };
const std::string& name() const { return name_; };
};
STRUCT_PACK_REFL(person, age(), name());
} // namespace example3
The member function registed must return a reference, and this function must have a const version overload & non-const overload.
custom type
The type cannot be abstracted to a type already in the type system
For example, let's say we need to serialise a map type from a third-party library (absl, boost...): we just need to make sure that it conforms to the constraints of the struct_pack type system for maps:
// We should not inherit from stl container, this case just for testing.
template <typename Key, typename Value>
struct my_map : public std::map<Key, Value> {};
my_map<int, std::string> map1;
map1.emplace(1, "tom");
map1.emplace(2, "jerry");
absl::flat_hash_map<int, std::string> map2 =
{{1, "huey"}, {2, "dewey"}, {3, "louie"},};
auto buffer1 = serialize(map1);
auto buffer2 = serialize(map2);
For more detail, See struct_pack type system
The type cannot be abstracted to a type already in the type system
At this time, we also support custom serialisation, the user only needs to customise the following three functions:
- sp_get_needed_size
- sp_serialize_to
- sp_deserialize_to
For example, the following is an example of support for serialisation/deserialisation of a custom 2D array type.
struct array2D {
unsigned int x;
unsigned int y;
float* p;
array2D(unsigned int x, unsigned int y) : x(x), y(y) {
p = (float*)calloc(1ull * x * y, sizeof(float));
}
array2D(const array2D&) = delete;
array2D(array2D&& o) : x(o.x), y(o.y), p(o.p) { o.p = nullptr; };
array2D& operator=(const array2D&) = delete;
array2D& operator=(array2D&& o) {
x = o.x;
y = o.y;
p = o.p;
o.p = nullptr;
return *this;
}
float& operator()(std::size_t i, std::size_t j) { return p[i * y + j]; }
bool operator==(const array2D& o) const {
return x == o.x && y == o.y &&
memcmp(p, o.p, 1ull * x * y * sizeof(float)) == 0;
}
array2D() : x(0), y(0), p(nullptr) {}
~array2D() { free(p); }
};
// You need add those functions:
// 1. sp_get_needed_size: calculate length of serialization
std::size_t sp_get_needed_size(const array2D& ar) {
return 2 * struct_pack::get_write_size(ar.x) +
struct_pack::get_write_size(ar.p, 1ull * ar.x * ar.y);
}
// 2. sp_serialize_to: serilize object to writer
template </*struct_pack::writer_t*/ typename Writer>
void sp_serialize_to(Writer& writer, const array2D& ar) {
struct_pack::write(writer, ar.x);
struct_pack::write(writer, ar.y);
struct_pack::write(writer, ar.p, 1ull * ar.x * ar.y);
}
// 3. sp_deserialize_to: deserilize object from reader
template </*struct_pack::reader_t*/ typename Reader>
struct_pack::err_code sp_deserialize_to(Reader& reader, array2D& ar) {
if (auto ec = struct_pack::read(reader, ar.x); ec) {
return ec;
}
if (auto ec = struct_pack::read(reader, ar.y); ec) {
return ec;
}
auto length = 1ull * ar.x * ar.y * sizeof(float);
if constexpr (struct_pack::checkable_reader_t<Reader>) {
if (!reader.check(length)) { // some input(such as memory) allow us check
// length before read data.
return struct_pack::errc::no_buffer_space;
}
}
ar.p = (float*)malloc(length);
auto ec = struct_pack::read(reader, ar.p, 1ull * ar.x * ar.y);
if (ec) {
free(ar.p);
}
return ec;
// 4. The default name for type checking is it's literal type name. You can also
// config it by:
// constexpr std::string_view sp_set_type_name(test*) { return "myarray2D"; }
// 5. If you want to use struct_pack::get_field/struct_pack::get_field_to, you
// need also define this function to skip deserialization of your type:
// template <typename Reader>
// struct_pack::err_code sp_deserialize_to_with_skip(Reader& reader, array2D& ar);
}
void user_defined_serialization() {
std::vector<my_name_space::array2D> ar;
ar.emplace_back(11, 22);
ar.emplace_back(114, 514);
ar[0](1, 6) = 3.14;
ar[1](87, 111) = 2.71;
auto buffer = struct_pack::serialize(ar);
auto result = struct_pack::deserialize<decltype(ar)>(buffer);
assert(result.has_value());
assert(ar == result);
}
### custom output stream
Except std::ostream/std::sstream struct_pack also support serialize to custom output stream.
The custom stream should satisfy those conditions:
```cpp
template <typename T>
concept writer_t = requires(T t) {
t.write((const char *)nullptr, std::size_t{}); // Output a piece of data. The return value can be implicit conversion to bool. Return false in any error.
};
For example:
// A simple output stream for fwrite.
struct fwrite_stream {
FILE* file;
bool write(const char* data, std::size_t sz) {
return fwrite(data, sz, 1, file) == 1;
}
fwrite_stream(const char* file_name) : file(fopen(file_name, "wb")) {}
~fwrite_stream() { fclose(file); }
};
// ...
fwrite_stream writer("struct_pack_demo.data");
struct_pack::serialize_to(writer, person);
custom input stream
Except std::istream/std::sstream struct_pack also support serialize to custom input stream.
The custom stream should satisfy those conditions:
template <typename T>
concept reader_t = requires(T t) {
t.read((char *)nullptr, std::size_t{}); // Input a piece of data. The return value can be implicit conversion to bool. Return false in any error.
t.ignore(std::size_t{}); // Skip a piece of data. The return value can be implicit conversion to bool. Return false in any error.
t.tellg(); // Return an unsigned integer as the absolute position of stream.
};
In addition, if the stream support read_view
, then we enable the support of zero-copy for string_view.
template <typename T>
concept view_reader_t = reader_t<T> && requires(T t) {
{ t.read_view(std::size_t{}) } -> std::convertible_to<const char *>;
// Read a view from stream. The return value is the begin pointer to view. Return nullptr in any error.
};
varint support
struct_pack also supports varint code for integer.
{
std::vector<struct_pack::var_int32_t> vec={-1,0,1,2,3,4,5,6,7};
auto buffer = std::serialize(vec); //zigzag+varint code
}
{
std::vector<struct_pack::var_uint64_t> vec={1,2,3,4,5,6,7,UINT64_MAX};
auto buffer = std::serialize(vec); //varint code
}
struct rect {
int a,b,c,d;
constexpr static auto struct_pack_config = struct_pack::ENCODING_WITH_VARINT| struct_pack::USE_FAST_VARINT;
// enable fast varint encode.
};
derived class support
struct_pack supports serialize/deserialize derived class to the pointer of base class. But We need additional macro to mark the relationship to generate factory function automatically.
// base
// / |
// obj1 obj2
// |
// obj3
struct base {
uint64_t ID;
virtual uint32_t get_struct_pack_id()
const = 0; // user must declare this virtual function in base
// class
virtual ~base(){};
};
struct obj1 : public base {
std::string name;
virtual uint32_t get_struct_pack_id()
const override; // user must declare this virtual function in derived
// class
};
STRUCT_PACK_REFL(obj1, ID, name);
struct obj2 : public base {
std::array<float, 5> data;
virtual uint32_t get_struct_pack_id() const override;
};
STRUCT_PACK_REFL(obj2, ID, data);
struct obj3 : public obj1 {
int age;
virtual uint32_t get_struct_pack_id() const override;
};
STRUCT_PACK_REFL(obj3, ID, name, age);
STRUCT_PACK_DERIVED_DECL(base, obj1, obj2, obj3);
// declare the relationship bewteen base class and derived class.
STRUCT_PACK_DERIVED_IMPL(base, obj1, obj2, obj3);
// implement of get_struct_pack_id();
Then user can serialize & deserialize the type std::unique<base>
:
std::vector<std::unique_ptr<base>> data;
data.emplace_back(std::make_unique<obj2>());
data.emplace_back(std::make_unique<obj1>());
data.emplace_back(std::make_unique<obj3>());
auto ret = struct_pack::serialize(data);
auto result =
struct_pack::deserialize<std::vector<std::unique_ptr<base>>>(ret);
assert(result.has_value()); // check deserialize ok
assert(result->size() == 3); // check vector size
user can also serialize type base
then deserialize it to the std::unique<base>
:
auto ret = struct_pack::serialize(obj3{});
auto result =
struct_pack::deserialize_derived_class<base, obj1, obj2, obj3>(ret);
assert(result.has_value()); // check deserialize ok
std::unique_ptr<base> ptr = std::move(result.value());
assert(ptr != nullptr);
ID collision
If two derived class has same field, it will cause ID collision because struct_pack generate the id by type hash. Two identical IDs will cause struct_pack fail to properly deserialise the derived class. struct_pack will check for such an error at compile time.
In addition, there is also a $2^-32$ probability of a hash conflict for any two different types, causing struct_pack to fail to use the hash information to check for type errors. In Debug mode, struct_pack comes with the full type string by default to mitigate hash conflicts.
The user can fix this by manually tagging the type. That is, add the member constexpr static std::size_t struct_pack_id
to the type and assign a unique initial value. If user does not want to intrusively modify the class declaration, we can also choose to add the function constexpr std::size_t struct_pack_id(Type*)
under the same namespace.
struct obj1 {
std::string name;
virtual uint32_t get_struct_pack_id() const;
};
STRUCT_PACK_REFL(obj1, name);
struct obj2 : public obj1 {
virtual uint32_t get_struct_pack_id() const override;
constexpr static std::size_t struct_pack_id = 114514;
};
STRUCT_PACK_REFL(obj2, name);
When this field/function is added, struct_pack adds this ID to the type string, thus ensuring that the two types have different hash values, thus resolving ID conflicts/hash conflicts.
benchmark
Test case
The object to be serialized is pre-initialized and the memory to store the serialization result is pre-allocated. For each test case, we run one million serializations/deserialization and take the average.
Test objects
- A simple object
person
with 4 scaler types
struct person {
int64_t id;
std::string name;
int age;
double salary;
};
- A complicated object
monster
with nested types
enum Color : uint8_t { Red, Green, Blue };
struct Vec3 {
float x;
float y;
float z;
};
struct Weapon {
std::string name;
int16_t damage;
};
struct Monster {
Vec3 pos;
int16_t mana;
int16_t hp;
std::string name;
std::vector<uint8_t> inventory;
Color color;
std::vector<Weapon> weapons;
Weapon equipped;
std::vector<Vec3> path;
};
- A
rect
object with forint32_t
struct rect {
int32_t x;
int32_t y;
int32_t width;
int32_t height;
};
Test Environment
Compiler: Alibaba Clang 13
Processor: (Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz)
Test results
Forward/backward compatibility
If current message type no longer meets all you needs - say, you'd like the object to have an extra field, the compatibility should not be broken so that the old object could be correctly parsed with the new type definition. In struct_pack, any new fields you added must be of type struct_pack::compatible<T, version_number>
.
And the version number should be incremental each time you update the struct. Let's take struct person
as an example:
struct person {
int age;
std::string name;
};
struct person_v0 {
int age;
std::string name;
struct_pack::compatible<bool> maybe; //default version number is 0
};
struct person_v1 {
int age;
std::string name;
struct_pack::compatible<int32_t,20230101> id; // version number is 20230101
struct_pack::compatible<bool> maybe;
struct_pack::compatible<std::string,20230101> password; // version number is 20230101
};
struct person_v2 {
int age;
std::string name;
struct_pack::compatible<int32_t,20230101> id;
struct_pack::compatible<bool> maybe;
struct_pack::compatible<std::string,20230101> password;
struct_pack::compatible<double,20230402> salary; //the version number should be incremental. So 20230402 > 20230101
};
struct_pack ensures that those version of struct person can be safely converted to each other by serialization and deserialization, thus achieving forward/backward compatibility.
Remember,if the version number is not incremental, struct_pack can't achieve the compatibility. Thus struct_pack skip the type check for compatible field, the convert is undefined behavior!
Why is struct_pack faster?
- Streamlined type information and efficient type-checking. MD5 computation is done at compile time so at runtime we only needs to compare 32bit hash values to check sameness.
- struct_pack is a template library that encourages the compiler to do a better job at inlining.
- Zero-cost abstraction, no runtime cost for features that are not used.
- struct_pack's memory layout is much closer to the native C++ object, reducing the workload of (de)serializations.
- Compile-time type calculation allows struct_pack to generate different codes according to different types. So we can do optimization upon types. For example, we could use
memcpy
on contiguous containers and zero-copy optimization could be used onstring_view
Appendix
struct_pack type system
struct_pack layout
Test code
see benchmark.cpp