Skip to content

Serialization

November 22, 2023
March 27, 2020

TODO: merge Dropbox/caravan/caravan/interchange-format

Comparison of data-serialization formats - Wikiwand
RFC 8949 - Comparison of Other Binary Formats to CBOR's Design Objectives
deserialization - Performant Entity Serialization: BSON vs MessagePack (vs JSON) - Stack Overflow

Comparing speed and size of to_csv(), np.save(), to_hdf(), to_pickle() | Towards Data Science
The Best Format to Save Pandas Data | by Ilia Zaitsev | Towards Data Science
devforfu/pandas-formats-benchmark: A little benchmark comparing Pandas data frames serialization formats

Graphtage Documentation
trailofbits/graphtage: A semantic diff utility and library for tree-like files such as JSON, JSON5, XML, HTML, YAML, and CSV.

IDL

Interface description language - Wikiwand

JSON schema

JSON Schema | The home of JSON Schema
Understanding JSON Schema — Understanding JSON Schema 7.0 documentation
Structuring a complex schema — Understanding JSON Schema 7.0 documentation defining types
Combining schemas — Understanding JSON Schema 7.0 documentation
Applying subschemas conditionally — Understanding JSON Schema 7.0 documentation
jsonschema - Json Schema file extension - Stack Overflow .json with mime type application/schema+json

fastify/fluent-json-schema: A fluent API to generate JSON schemas ❗!important
sinclairzx81/typebox: JSON Schema Type Builder with Static Type Resolution for TypeScript ❗!important, sharing definition between JSON Schema and TypeScript

JSON Schema Tool playground, infer schema from JSON
JSON Schema Lint :: JSON Schema Validator
Fake your JSON-Schemas!

How to do inheritance? · Issue #348 · json-schema-org/json-schema-spec
Explain why inheritance isn't the right model · Issue #148 · json-schema-org/json-schema-org.github.io

jsonSchema attribute conditionally required - Stack Overflow
jsonschema - JSON schema: conditional dependency - Stack Overflow

Validators

JSON Schema Validation: A Vocabulary for Structural Validation of JSON
draft-bhutton-json-schema-validation-00 - JSON Schema Validation: A Vocabulary for Structural Validation of JSON

Ajv JSON schema validator
ajv-validator/ajv: The fastest JSON schema Validator. Supports JSON Schema draft-04/06/07/2019-09/2020-12 and JSON Type Definition (RFC8927)

jsonschema — jsonschema 3.2.0 documentation

keleshev/schema: Schema validation just got Pythonic

cypress-io/schema-tools: Validate, sanitize and document JSON schemas

JavaScript Validators

samchon/typescript-json: Super-fast Runtime type checkers (validators) and JSON.stringify() function TSON, zod is slow

Comparing schema validation libraries: Zod vs. Yup - LogRocket Blog Yup is pre-TypeScript

jquense/yup: Dead simple Object schema validation

colinhacks/zod: TypeScript-first schema validation with static type inference
Zod Tutorial | Total TypeScript
Learn "Zod" In 5 Minutes - DEV Community

mattkingshott/iodine: A micro JavaScript validation library.

Introduction - Superstruct
ianstormtaylor/superstruct: A simple and composable way to validate data in JavaScript (and TypeScript).

oussamahamdaoui/forgJs: ForgJs is a javascript lightweight object validator. Go check the Quick start section and start coding with love

Vest - Declarative Validations validate like writing test

flowstudio/datalize: Parameter, query, form data validation and filtering for NodeJS.
Node.js Form Validation Using Datalize | Toptal

philipnilsson/bueno: Composable validators for forms, API:s in TypeScript

Joi

joi.dev
sideway/joi: The most powerful data validation library for JS
v16.0.0 Release Notes · Issue #2037 · sideway/joi joi 16 is a rewrite

joi.dev - API Reference
RunKit + npm: joi
joi.dev - Schema Tester
tlivings/enjoi: Converts a JSON schema to a Joi schema.

joi/test at master · sideway/joi
What I’ve Learned Validating with Joi – ITNEXT
What I've Learned Validating with Joi — Futurice
Node API Schema Validation with Joi ― Scotch
Joi for Node: Exploring Javascript Object Schema Validation
Joi — awesome code validation for Node.js and Express - DEV Community 👩‍💻👨‍💻

Handling Joi validation errors in Hapi 17 – Piotr Karpala – Medium ❗!important, return validation error to client

Expressing complex logic in when() · Issue #1663 · sideway/joi use when() on schema

Customize error message

joi/API.md#list-of-errors at master · sideway/joi
Node.js + Joi how to display a custom error messages? - Stack Overflow

Joi validation error does not provide detailed information in response · Issue #3706 · hapijs/hapi in Hapi, failAction() is the best

JSON

shell-tools#JSON manipulation

RFC 8259 - The JavaScript Object Notation (JSON) Data Interchange Format
A beginner's guide to JSON, the data format for the internet - Stack Overflow Blog

8259 JSON
6901 JSON Pointer
6902 JSON Patch

JSON ABC - Sort JSON Alphabetically
JSON Sorter - Sort JSON keys online allows comments

fastify/fast-json-stringify: 2x faster than JSON.stringify()

JSON-LD - JSON for Linking Data
digitalbazaar/jsonld.js: A JSON-LD Processor and API implementation in JavaScript

Creating semantic sites with Web Components and JSON-LD - Chrome for Developers

ijl/orjson: Fast, correct Python JSON library supporting dataclasses, datetimes, and numpy

msgspec
Faster, more memory-efficient Python JSON parsing with msgspec

ICRAR/ijson: Iterative JSON parser with Pythonic interfaces
Processing large JSON files in Python without running out of memory

JSON streaming

JSON streaming - Wikiwand
JSON Lines

ndjson/ndjson.js: Streaming line delimited json parser + serializer

Binary Serialization

Binary Formats - JSON for Modern C++

BSON (Binary JSON) Serialization MongoDB, in-place update, designed for storage and lookup
JSON and BSON | MongoDB
BSON Types — MongoDB Manual
mongodb/js-bson: BSON Parser for node and browser
bson package - go.mongodb.org/mongo-driver/bson - Go Packages

CBOR — Concise Binary Object Representation | Overview Web Assembly, based-on MsgPack, supports partial decode, designed for network communication
RFC 8949 - Concise Binary Object Representation (CBOR)
RFC 8610 - Concise Data Definition Language (CDDL): A Notational Convention to Express Concise Binary Object Representation (CBOR) and JSON Data Structures
Base58 Encoder / Decoder Online - AppDevTools

TOML

TOML: English v1.0.0

tomllib — Parse TOML files — Python 3 documentation 3.11+, essentially tomli
hukkin/tomli: A lil' TOML parser
sdispater/tomlkit: Style-preserving TOML library for Python

Python and TOML: New Best Friends – Real Python

Taplo | A versatile TOML toolkit.

YAML

YAML™ Specification Index

PyYAML Documentation
YAML: The Missing Battery in Python – Real Python

CUE

CUE
Introduction | CUE
Documentation | CUE
CUE Playground

Cuetorials
CUE is an exciting configuration language — Bitfield Consulting
Configuring Kubernetes with CUE · garethr.dev

CUE: a data constraint language and shoo-in for Go. Marcel van Lohuizen, Google. - YouTube
GopherCon Europe 2020: Marcel van Lohuizen - Better APIs with Shareable Validation Logic - YouTube

Protocol Buffers

Protocol Buffers Documentation
Protocol Buffers Version 3 Language Specification | Protocol Buffers Documentation
Protocol Buffers - Wikiwand

Protocol Buffers Crash Course - YouTube

Protobuf - How Google Changed Data Serialization FOREVER - YouTube
Don't Use REST APIs in your Backend, Use gRPC - YouTube

Protocol Buffers, Part 1 — Serialization Library for Microservices
Protocol Buffers, Part 2 — The Untold Parts Of Using “Any”

Buf | Home The only Protobuf developer platform

MessagePack

MessagePack: It's like JSON. but fast and small.
MessagePack

supports partial decode, designed for network communication

neuecc/MessagePack-CSharp: Extremely Fast MessagePack Serializer for C#(.NET, .NET Core, Unity, Xamarin). / msgpack.org[C#]

msgpack/msgpack-python: MessagePack serializer implementation for Python msgpack.org[Python]

Node

mcollina/msgpack5: A msgpack v5 implementation for node.js, with extension points / msgpack.org[Node]

keywords:messagepack - npm search
mattheworiordan/nodejs-encoding-benchmarks: Simple repo to benchmark performance of Node.js encoding libraries
msgpack/msgpack-javascript: @msgpack/msgpack - MessagePack for JavaScript/TypeScript/ECMA-262 / msgpack.org[JavaScript]
kawanet/msgpack-lite: Fast Pure JavaScript MessagePack Encoder and Decoder / msgpack.org[JavaScript]

Apache Arrow/Feather

apache/arrow: Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing

PyArrow - Apache Arrow Python bindings — Apache Arrow v6.0.0
arrow package - github.com/apache/arrow/go/arrow - pkg.go.dev
Apache Arrow - v6.0.0
arrow/js at master · apache/arrow

Feather is now part of Apache Arrow

wesm/feather: Feather: fast, interoperable binary data frame storage for Python, R, and more powered by Apache Arrow
Feather: A Fast On-Disk Format for Data Frames for R and Python, powered by Apache Arrow - RStudio

Apache Thrift

Apache Thrift - Home
Reconciling GraphQL and Thrift at Airbnb - Airbnb Engineering & Data Science - Medium

Apache Parquet

Apache Parquet compatible to Pandas DataFrame
apache/parquet-format: Apache Parquet

Reading and Writing the Apache Parquet Format — Apache Arrow v6.0.1

xitongsys/parquet-go: pure golang library for reading/writing parquet file
Processing parquet files in Golang - DEV Community

fastparquet documentation

Inspect Parquet from command line - Stack Overflow
parquet-tools · PyPI
parquet-cli · PyPI

pyarrow also loads Parquet
Development update: High speed Apache Parquet in Python with Apache Arrow - Wes McKinney

Apache ORC

Apache ORC • High-Performance Columnar Storage for Hadoop

HDF

The HDF5® Library & File Format - The HDF Group

HDFGroup Documentation
Learning HDF5
Using the HDF5 Command-line Tools
Introduction to HDF5 | Quincey Koziol, The HDF Group - YouTube
Parallel HDF5 | Quincey Koziol, The HDF Group - YouTube
A Brief Introduction to HDF5

HDF Group - HDF5 old portal
https://support.hdfgroup.org/HDF5/docNewFeatures/SWMR/Design-HDF5-FileLocking.pdf

Parallel I/O – Why, How, and Where to? - The HDF Group

Cyrille Rossant - Moving away from HDF5
Cyrille Rossant - Should you use HDF5?
On HDF5 and the future of data management

HDF5 for Python — h5py documentation
Save Pandas objects to HDF5 - DEV Community

gonum/hdf5: hdf5 is a wrapper for the HDF5 library
hdf5 package - gonum.org/v1/hdf5 - pkg.go.dev

CDF

CDF Home Page

Unidata | NetCDF
NetCDF Why and How: Creating Publication Quality NetCDF Datasets - YouTube
Tutorial - Introduction to the NetCDF format - YouTube
Visualising data in NetCDF format - YouTube

ASDF

ASDF Standard — ASDF Standard documentation

FlatBuffers

FlatBuffers: FlatBuffers very similar to Protobuf

What's the difference between Protocol Buffers and Flatbuffers? - Stack Overflow
JSON vs Protocol Buffers vs FlatBuffers | by Kartik Khare | codeburst

Protocol Buffers is indeed relatively similar to FlatBuffers, with the primary difference being that FlatBuffers does not need a parsing/ unpacking step to a secondary representation before you can access data, often coupled with per-object memory allocation. The code is an order of magnitude bigger, too. Protocol Buffers has no optional text import/export.

FlatBuffers: Use in C#
FlatBuffers: Use in JavaScript
FlatBuffers: Use in Python

Cap'n Proto

Cap'n Proto: Introduction zero copy
Cap'n Proto: Cap'n Proto, FlatBuffers, and SBE

Simple Binary Encoding

Mechanical Sympathy: Simple Binary Encoding
real-logic.github.io/simple-binary-encoding
real-logic/simple-binary-encoding: Simple Binary Encoding (SBE) - High Performance Message Codec

BaseN encoding

multiformats/multibase: Self identifying base encodings
RFC 4648 - The Base16, Base32, and Base64 Data Encodings
draft-msporny-base58-03 base58btc
multibase/rfcs at master · multiformats/multibase

C++

What's the most mature JSON library for C++? Support for JSON Schema is a plus. - Quora

miloyip/nativejson-benchmark: C/C++ JSON parser/generator benchmark

cereal Docs - Main
USCiLab/cereal: A C++11 library for serialization

RapidJSON: Main Page
Tencent/rapidjson: A fast JSON parser/generator for C++ with both SAX/DOM style API
Martchus/reflective-rapidjson: Code generator for serializing/deserializing C++ objects to/from JSON using Clang and RapidJSON

JSON for Modern C++: JSON for Modern C++
nlohmann/json: JSON for Modern C++
pboettch/json-schema-validator: JSON schema validator for JSON for Modern C++

Jansson — C library for working with JSON data
Jansson Documentation — Jansson documentation
akheron/jansson: C library for encoding, decoding and manipulating JSON data

C#

Serializing JSON Data into Binary Form | DotNetCurry

Rust

Serde Serialization framework for Rust GitHub

sharksforarms/deku: Declarative binary reading and writing: bit-level, symmetric, serialization/deserialization

TimelyDataflow/abomonation: A mortifying serialization library for Rust works even with pointers

Go

fatih/gomodifytags: Go tool to modify struct field tags for JSON serialization

Several ways of serialization and deserialization of golang | Develop Paper
Ellerbach/Golang-Json-serialize-deserialize: Go (Golang) Json serialization and deserialization practices

smallnest/gosercomp: Golang Serializer Benchmark Comparison

gob package - encoding/gob - pkg.go.dev native codec for Go

glycerine/zebrapack: ZebraPack format is like gobs version 2: serialization in Go, but extremely fast and friendly to other languages. Use Go as your schema. Strong typing. Well documented (and msgpack2 compatible) format so other languages can be readily supported. See also https://github.com/glycerine/greenpack for a more recent alternative. Docs:
glycerine/greenpack: Cross-language serialization for Golang: greenpack adds versioning, stronger typing, and optional schema atop msgpack2. greenpack -msgpack2 produces classic msgpack2, and handles nils. Cousin to ZebraPack (https://github.com/glycerine/zebrapack), greenpack's advantage is fully self-describing data. Oh, and faster than protobufs.

microhq/go-bson: A copy of youtube/vitess/go/bson

Java

Java Object Serialization Specification: Contents
java.io A node implementation
jdeserialize

The Java serialization algorithm revealed | JavaWorld
5 things you didn't know about ... Java Object Serialization
Serialization and Deserialization in Java example using Serializable Interface | CodinGeek transient field will not be serialized

Kotlinx

An Extensive Kotlinx Serializer Library For Serialization | Android | Kotlin

Python

12.1. pickle — Python object serialization — Python documentation
Pickle’s nine flaws | Ned Batchelder

Serialization and Deserialization of Python Objects: Part 1
Serialization and Deserialization of Python Objects: Part 2
Object serialization in Python ~ The Python Corner

construct/construct: Construct: Declarative data structures for python that allow symmetric parsing and building

Serialize · PyPI

serpent · PyPI

marshmallow: simplified object serialization — marshmallow documentation
marshmallow-code/marshmallow: A lightweight library for converting complex objects to and from simple Python datatypes.

serpy: ridiculously fast object serialization — serpy documentation
JSON Serialization in Python using serpy – Twilio Cloud Communications Blog
Working With JSON Data in Python – Real Python
Reading and Writing JSON in Python - The Python Guru

Better Python Object Serialization · Homepage of Hynek Schlawack

Efficiently Store Pandas DataFrames

Python Validators

keleshev/schema: Schema validation just got Pythonic
Introduction to Schema: A Python Libary to Validate your Data | by Khuyen Tran | Towards Data Science
Data-science/schema.ipynb at master · khuyentran1401/Data-science

Welcome to Cerberus — Cerberus is a lightweight and extensible data validation library for Python
Do Not Use If-Else For Validating Data Objects In Python Anymore | by Christopher Tao | May, 2022 | Towards Data Science