The Duper logo, with a confident spectacled mole wearing a flailing blue cape.

Duper: The format that's super!

Duper aims to be a human-friendly extension of JSON with quality-of-life improvements, extra types, and semantic identifiers.

Preliminaries

Duper is case-sensitive, and files must be a valid UTF-8 encoded Unicode document.
"Whitespace" refers to tab (U+0009), space (U+0020), line feed (U+000A), and carriage return (U+000D).
"Newline" refers to line feed (U+000A) or carriage return (U+000D).
"Control characters other than line feeds" refers to the codepoints U+0000 through U+0009, U+000B through U+001F, and U+007F.
Files must have only one root value. Parsers must always accept objects, arrays, and tuples as the root value. Implementations may allow other values as the root value.
JSON values are valid Duper values.

Comments

Two forward slashes // mark the rest of the line, including the newline, as a comment, except when inside of a string.

duper

// This is a full-line comment
{
  key: "value",  // This is a comment at the end of a line
  another: "// This is not a comment"
}

The area delimited by a forward slash immediately followed by an asterisk /*, and an asterisk immediately followed by a forward slash */, is a comment that may span multiple lines, except when inside of a string.

duper

/* This is a fancier comment */
{
  /* Notice here that we can also span
     multiple lines in a single comment. Neat!
     */
  key: "value",
  another: "/* This is not a comment,
               despite spanning multiple lines */"
}

Comments should be used to communicate between the human readers of a file. Parsers must not modify keys or values based on the presence (or contents) of a comment.

Objects

Objects are composed of zero or more key-value pairs.

Keys are on the left of the colon :, and values are on the right. Whitespace is ignored around key names, colon, and values. The key, colon, and value may be on the same line or different ones.

duper

{
  key: "value",
  anotherKey
    :
      42
}

There must be a comma , between key-value pairs.

duper

{
  key: "value"  // INVALID: Missing comma
  foo: "bar"
}

Conversely, a trailing comma after the last key-value pair is allowed.

duper

{
  key: "value",
  foo: "bar",  // Comma here is valid but not required
}

Values must have one of the following types:

Object
Array
Tuple
String
Byte string
Temporal value
Integer
Float
Boolean
Null

Keys

A key may be either plain, quoted, or raw.

Plain keys may only contain ASCII letters, ASCII digits, underscores _, and hyphens -. They must start with an ASCII letter, or an underscore followed by a letter or digit. Sequences of underscores and hyphens are not allowed, and plain keys must not end with them.

duper

{
  // Allowed
  key: "value",
  plain_key: "value",
  pla1n-k3y: "value",
  _1234: "value",

  // Allowed but discouraged
  Capitalized: "value", 

  // Not allowed
  _: "value",               // INVALID
  ütf8: "value",            // INVALID
  : "value",                // INVALID
  kebabest--case: "value",  // INVALID
}

Quoted keys follow the exact same rules as quoted strings.

duper

{
  "127.0.0.1": "value",
  "with space": "value",
  "maçã": "value",
  "_": "value",
  "": "value",
}

Raw keys follow the exact same rules as raw strings.

duper

{
  r"key2": "value",
  r#"quoted "value""#: "value",
}

Whitespace around keys is ignored.

Defining a key multiple times is invalid. Note that plain keys, quoted keys, and raw keys are equivalent.

duper

{
  name: "Eric",
  "n\x61me": "Erik",  // INVALID
  r"name": "Erick",   // INVALID
}

Strings

A string may be either quoted or raw.

Quoted strings are surrounded by quotation marks ". Any Unicode character may be used, except those that must be escaped: quotation mark ", backslash \, and control characters other than line feeds.

duper

{
  str1: "I'm a string.",
  str2: "\"You can quote me\"",
  str3: "Name\tJos\xE9\nLocation\tBR",
  str4: "  padded  ",
  str5: "𝓓𝓾𝓹𝓮𝓻",
}

For convenience, some characters have a compact escape sequence. Other escape sequences are not permitted.

duper

[
  r#"| Sequence   | Character       | Value      |"#,
  r#"| ---------- | --------------- | ---------- |"#,
  r#"| \0         | null            | U+0000     |"#,
  r#"| \b         | backspace       | U+0008     |"#,
  r#"| \t         | tab             | U+0009     |"#,
  r#"| \n         | line feed       | U+000A     |"#,
  r#"| \f         | form feed       | U+000C     |"#,
  r#"| \r         | carriage return | U+000D     |"#,
  r#"| \"         | quote           | U+0022     |"#,
  r#"| \\         | backslash       | U+005C     |"#,
  r#"| \xHH       | arbitrary byte  | U+00HH     |"#,
  r#"| \uHHHH     | unicode         | U+HHHH     |"#,
  r#"| \UHHHHHHHH | unicode         | U+HHHHHHHH |"#,
]

Any Unicode character may be escaped with \uHHHH, \UHHHHHHHH, or a sequence of one or more \xHH, where H is a hexadecimal digit. The escape codes must be valid Unicode scalar values.

Keep in mind that Duper strings are sequences of Unicode characters, not byte sequences. Parsers should raise an error if a string decodes into invalid Unicode (i.e. via an invalid sequence of \xHH).

TIP

For binary data, use byte strings.

Raw strings start with the lowercase letter r, immediately followed by zero or more hash symbols #, immediately followed by a quotation mark ". They end with a quotation mark ", followed by the same number of starting hash symbols # (for example: r"...", r#"..."#, r##"..."##, and so on.). They allow line feeds and have no escaping whatsoever.

duper

{
  winpath: r"C:\Users\nodejs\templates",
  regex: r"<\i\c*\s*>",
  quoted: r#"Hello, "world"!"#,
  excessive_hashtags: r####"Just to be safe..."####,
  lines: r"
The first line feed is not trimmed.
  All whitespace is
    preserved in here.   ",
}

The hashtags are required to disambiguate quotes (", or "#, or "##, etc.) which are part of the value from the raw string terminator.

duper

{
  inner_quotes: r"Well, "that" just happened.",      // INVALID
  too_few_ending_hashes: r#"",                       // INVALID"#
  too_many_ending_hashes: r#""##,                    // INVALID
  not_enough_hashes: r#"will "# close the string"#,  // INVALID
}

Control characters other than line feeds are not permitted in a raw string.

Byte strings

Byte strings are similar to strings, but represent binary data. Like strings, they come in quoted or raw variants.

Quoted byte strings start with the lowercase letter b immediately followed by a quotation mark ", and end with a quotation mark ". The escape sequences are the same as in quoted strings, although they are not required to form valid UTF-8 codepoints.

duper

{
  png_signature: b"\x89PNG\r\n\x1a\n",
  ascii: b"Hello, World!",
  ansi_reset: b"\x1b[0m",
}

Raw byte strings are similar to raw strings, using the br (all lowercase) prefix instead.

duper

{
  path: br"C:\Windows\System32",
  shrug: br#""Whatever." ¯\_(ツ)_/¯"#,
  rust_block: br##"{ let str = r#"meta string"#; }"##,
}

Base64 byte strings use the b64" (all lowercase) prefix, and end with a quotation mark ". They must contain only valid Base64 characters (ASCII lowercase, ASCII uppercase, ASCII digits, plus sign +, forward slash /), followed by the appropriate padding composed of zero or more equals signs =, as per RFC 4648 section 4. Whitespace inside of the Base64 byte string is allowed and ignored. Parsers should allow for missing pad characters, while encoders must emit valid padding.

duper

{
  regular: b64"ZHVwZXI=",
  no_padding: b64"ZHVwZXI",
  with_whitespace: b64" +bo UO5 X/b YI= ",

  too_much_padding: b64"ZHVwZXI==",   // INVALID
  invalid_characters: b64"QUFB-Q==",  // INVALID
}

Temporal values

Temporal values are a set of value types, representing either a point in time or the difference between two points in time. They are surrounded by single quotes ' and must follow the Temporal proposal, which uses a strict version of the format specified in RFC 9557 (itself based off of ISO 8601 / RFC 3339).

Whitespace between the Temporal value and the single quotes are allowed and ignored, but not allowed inside the value itself (except for space as a date-time separator). Parsers should validate that the value between single quotes is a valid Temporal value.

duper

{
  // Allowed
  instant: '2022-02-28T03:06:00.092121729Z',
  duration: '  P7DT5.000001S  ',
  calendar_system: '2020-05-22[u-ca=hebrew]',  // 28 Iyar 5780
  christmas_eve: '--12-24',
  large_extensions: '2020-05-22T07:19:35.123456789-04:00[America/Indiana/Indianapolis][u-ca=islamic-umalqura]',

  // Not allowed
  not_temporal: 'hello world',         // INVALID
  "date doesn't exist": '2025-02-29',  // INVALID
}

These values may or may not contain an identifier. In the case where the identifier is one of the following, parsers must validate that the contained Temporal value matches its type:

duper

{
  // Allowed
  precise_identifier: PlainDateTime('2007-03-31T10:35:10'),
  subset: PlainYearMonth('1994-11-06T19:45:27-03:00'),  // PlainYearMonth is a subset of Instant

  // Allowed but discouraged
  string_in_disguise: PlainDate("not Temporal"),      // Uses double-quotes
  confusing_identifier: PlainTimeDate('2025-11-03'),  // Unlike `PlainDateTime`, this doesn't
                                                      // validate the input, other than that
                                                      // it's a Temporal value.

  // Not allowed
  wrong_type: Duration('2025-10-31T19:39:02'),  // INVALID
}

Integers

Integers are whole numbers. Positive numbers may be prefixed with a plus sign +. Negative numbers are prefixed with a minus sign -.

duper

{
  int1: +99,
  int2: 42,
  int3: 0,
  int4: -17,
}

For large numbers, you may use underscores between digits to enhance readability. Each underscore must be surrounded by at least one digit on each side.

duper

{
  // Allowed
  int5: 1_000,
  int6: 5_349_221,
  int7: 53_49_221,
  int8: 1_2_3_4_5,

  // Not allowed
  wrong1: 1__2,  // INVALID
  wrong2: _12,   // INVALID
  wrong3: 12_,   // INVALID
}

Leading zeros are not allowed. Integer values -0 and +0 are valid and identical to an unprefixed zero.

Non-negative integer values may also be expressed in hexadecimal (0x...), octal (0o...), or binary (0b...). In these formats, plus or minus signs + or - are not allowed, but leading zeros (after the prefix) are allowed. Hexadecimal values are case-insensitive. Underscores are allowed between digits (but not between the prefix and the value).

duper

{
  // Hexadecimal with prefix `0x`
  hex1: 0xDEADBEEF,
  hex2: 0x2001_0db1,

  // Octal with prefix `0o`
  oct1: 0o755,
  oct2: 0o01_234_567,

  // Binary with prefix `0b`
  bin1: 0b1101,
  bin2: 0b0101_0101,

  // Not allowed
  invalid_hex: -0x1234,  // INVALID
  invalid_oct: +0o7263,  // INVALID
  invalid_bin: 00b1001,  // INVALID
}

Implementations are free to support any integer size. It's recommended that at least 64-bit signed integers (i.e. long integers, from −2^63 to 2^63−1) are accepted and handled losslessly. If an integer cannot be represented in the chosen integer size, implementations may raise an error or convert it losslessly into a float or a string, using an appropriate identifier for its original type in both cases.

Floats

A float consists of an integer part (which follows the same rules as decimal integer values) followed by a fractional part and/or an exponent part. If both a fractional part and exponent part are present, the fractional part must precede the exponent part.

duper

{
  // Fractional
  float1: +1.0,
  float2: 3.1415,
  float3: -0.01,

  // Exponent
  float4: 5e+22,
  float5: 1e06,
  float6: -2E-2,

  // Both
  float7: 6.626e-34,
}

A fractional part is a decimal point followed by one or more digits.

An exponent part is an e (upper or lower case) followed by an integer part (which follows the same rules as decimal integer values).

The decimal point, if used, must be surrounded by at least one digit on each side.

duper

{
  invalid_float_1: .7,      // INVALID
  invalid_float_2: 7.,      // INVALID
  invalid_float_3: 3.e+20,  // INVALID
}

Similar to integers, you may use underscores to enhance readability. Each underscore must be surrounded by digits.

duper

{
  float8: 224_617.445_991_228,
  float9: 1e2_00,
}

Float values -0.0 and +0.0 are valid and should map according to IEEE 754. Infinity and NaN are not allowed.

Implementations are free to support any precision level. It's recommended that at least IEEE 754 64-bit floating point values (i.e. doubles) are supported.

Booleans

Booleans are one of true or false.

duper

{
  ja: true,
  nein: false,
}

Null

Null is always null.

duper

{
  nuclear_launch_code: null,
}

Arrays

Arrays are ordered values surrounded by square brackets [ and ]. Whitespace is ignored. Elements are separated by commas. Empty arrays may either include a single comma or nothing. Arrays can contain values of the same data types as allowed in key-value pairs. Values of different types may be mixed.

duper

{
  empty_array: [],
  another_empty_array: [,],
  integers: [1, 2, 3],
  colors: ["red", "yellow", r"green"],
  unflattened_ints: [[1, 2], [3, 4, 5]],

  // Mixed-type arrays are allowed
  numbers: [0.1, 0.2, 0.5, 1, 2, 5],
  nested_mixed_array: [[1, "a"], [2, "b", {}]],
  contributors: [
    "Foo Bar <foo@example.com>",
    {
      name: "Baz Qux",
      email: "bazqux@example.com",
      url: "https://example.com/bazqux",
    },
  ],

  // Not allowed
  commas: [,,],  // INVALID
  sep: [1,,2],   // INVALID
}

Arrays can span multiple lines. A trailing comma is permitted after the last value of the array. Any number of newlines and comments may precede values, commas, and the closing bracket. Indentation between array values and commas is treated as whitespace and ignored.

duper

[
  1,
  2,
]

Tuples

Tuples are similar to arrays, but are surrounded by parenthesis ( and ) instead.

duper

{
  empty_tuple: (),
  another_empty_tuple: (,),
  single_element: (1),
  another_single_element: (1,),
  tuple_of_arrays: ([true, 1.0], ["x", "y", "z"]),
  array_of_tuples: [(1, null), (3, 4.0, 5)],
  nested: (((), ("hi"))),
  multiline_tuple: (
    "Vec",
    "Cow",
    "Arc",
  ),

  // Not allowed
  commas: (,,),  // INVALID
  sep: (1,,2),   // INVALID
}

Any parenthesized expression must be interpreted as a tuple by parsers.

Parsers may choose to handle them differently from arrays. For example, in most programming languages, tuples might be treated as fixed-size values, while arrays equivalents are considered growable/shrinkable elements. On the other hand, it might make sense to index into an array freely, while tuples may get treated as a unit.

Identifiers

Identifiers are type-like annotations that wrap any kind of value, providing semantic meaning or hinting at special handling during parsing/validation. Identified values are composed of the identifier name, followed by the value wrapped in parenthesis ( and ). Whitespace and comments around the identifier name or its parenthesis is ignored.

The first character must be an ASCII uppercase letter, followed by zero or more ASCII letters, ASCII digits, underscores _, and hyphens -. Sequences of underscores and hyphens are not allowed in the identifier, and identifiers may not start or end with either of them.

duper

{
  user_id: Uuid("550e8400-e29b-41d4-a716-446655440000"),
  created: DateTime("2024-01-15T10:30:00Z"),
  birthday: ISO-8601("2025-10-20"),
  price: Decimal("19.99"),
  weight: Kilograms(2.5),
  color: RGB((255, 0, 128)),  // Two sets of parenthesis for tuples
  address: IPV4("192.168.1.1"),
  nested: Metadata({
    version: Version("1.2.3"),
    hash: SHA_256(b"\xde\xad\xbe\xef"),
  }),
  minimal: A(null),

  // Not allowed
  lowercase: aB(1),           // INVALID
  underscore: _Test(2),       // INVALID
  ends_with_hyphen: Foo-(3),  // INVALID
  sequence: X_-Y(4),          // INVALID
}

The root value may also contain an identifier.

duper

Items([
  "item1",
  "item2",
])

Values may not contain more than one identifier.

duper

{
  too_many: IpAddress(Ipv4Address("192.168.0.1"))  // INVALID
}

Identifiers are not allowed in object keys.

duper

{
  Wrong(use): null,         // INVALID
  Of("identifiers"): null,  // INVALID
}

Identifiers are optional and may be ignored by parsers, except when specifying one of the expected types for Temporal values.

Parsers should preserve identifier information on a best-effort basis. Deserializers may ignore identifiers, or use them for validation. Serializers may choose to output or omit identifiers, per the user's request.

Implementations are free to define their own identifiers with specific semantics. For example, in strongly-typed or OOP languages, serializers may use them as annotations for the underlying types.

Filename extension

Duper files should use the extension .duper.

MIME type

When transferring Duper files over the internet, the appropriate MIME type is application/duper.

Webservers should also accept the application/x-duper and application/json MIME types.

W3C-EBNF grammar

Also available as a railroad diagram.

ebnf

// Root rules, depending on the parser

DuperTrunk ::=
	 Identifier '(' ( Object | Array | Tuple ) ')'
	| Object
	| Array
	| Tuple
DuperValue ::=
	 IdentifiedValue
	| _Value

// Values

IdentifiedValue ::=
	 Identifier '(' _Value ')'
_Value ::=
	 Object
	| Array
	| Tuple
	| String
	| Bytes
	| Temporal
	| Float
	| Integer
	| Boolean
	| Null

Identifier ::=
	 [A-Z] ( [_-]? [a-zA-Z0-9] )*
	 /* ws: explicit */

Object ::=
	 '{' ( ObjectEntry ( ',' ObjectEntry )* ','? )? '}'
Array ::=
	 '[' ( DuperValue ( ',' DuperValue )* )? ','? ']'
Tuple ::=
	 '(' ( DuperValue ( ',' DuperValue )* )? ','? ')'
String ::=
	 QuotedString
	| RawString
Bytes ::=
	 QuotedBytes
	| RawBytes
	| Base64Bytes
Temporal ::=
	 "'"  TemporalContent "'"
	 /* ws: explicit */
Integer ::=
	 DecimalInteger
	| HexInteger
	| OctalInteger
	| BinaryInteger
Float ::=
	 DecimalInteger
	  ( ( '.' [0-9] ( '_'? [0-9] )* )? [eE] DecimalInteger
	   | '.' [0-9] ( '_'? [0-9] )* )
	 /* ws: explicit */
Boolean ::=
	 'true'
	| 'false'
Null ::=
	 'null'

// Objects

ObjectEntry ::=
	 ObjectKey ':' DuperValue
ObjectKey ::=
	 PlainKey
	| QuotedString
	| RawString
PlainKey ::=
	 ( '_' [a-zA-Z0-9] | [a-zA-Z] ) ( [_-]? [a-zA-Z0-9] )*
	 /* ws: explicit */

// Strings

QuotedString ::=
	 '"'  QuotedContent '"'
	 /* ws: explicit */
RawString ::=
	 'r' _RawInner
	 /* ws: explicit */

// Bytes

QuotedBytes ::=
	 'b"' QuotedContent '"'
	 /* ws: explicit */
RawBytes ::=
	 'br' _RawInner
	 /* ws: explicit */
Base64Bytes ::=
	 'b64"' Base64Content '"'
	 /* ws: explicit */

Base64Content ::=
	 ( [a-zA-Z0-9+/] | Whitespace )*
	 ( _Base64Padding _Base64Padding? )?
	 Whitespace*
	 /* ws: explicit */
_Base64Padding ::=
	 '=' Whitespace*
	 /* ws: explicit */

// Shared string / bytes

_RawInner ::=
	 ( '#' _RawInner '#' )
	 | ( '"' RawContent '"' )
RawContent ::=
	 [^#x00-#x09#x0B-#x1F#x7F]*
	 /* ws: explicit */

QuotedContent ::=
	 ( '\'
	  ( [0btnfr"\]
	   | 'x' _HexDigit _HexDigit
	   | 'u' _HexDigit _HexDigit _HexDigit _HexDigit
	   | 'U' _HexDigit _HexDigit _HexDigit _HexDigit
	         _HexDigit _HexDigit _HexDigit _HexDigit )
	  | [^"\] )*
	 /* ws: explicit */

// Temporal

TemporalContent ::=
	 Whitespace*
	 ( [^'] | [^#x09#x0A#x0D#x20] )
	 [^']+
	 ( [^'] | [^#x09#x0A#x0D#x20] )
	 Whitespace*
	 /* ws: explicit */

// Integer

DecimalInteger ::=
	 [+-]? ( [0-9] | [1-9] ( '_'? [0-9] )+ )
	 /* ws: explicit */
HexInteger ::=
	 '0x' _HexDigit ( '_'? _HexDigit )*
	 /* ws: explicit */
OctalInteger ::=
	 '0o' [0-7] ( '_'? [0-7] )*
	 /* ws: explicit */
BinaryInteger ::=
	 '0b' [01] ( '_'? [01] )*
	 /* ws: explicit */

_HexDigit ::=
	 [0-9a-fA-F]

// Comments and whitespace

LineComment ::=
	 ( '//' [^#x0A#x0D]* )
	 /* ws: explicit */
BlockComment ::=
	 ( '/*' [^*]* '*'+ ([^/*] [^*]* '*'+)* '/' )
	 /* ws: explicit */

Whitespace ::=
	 [#x09#x0A#x0D#x20]
Comment ::=
	 LineComment
	| BlockComment

Duper: The format that's super!

Preliminaries ​

Comments ​

Objects ​

Keys ​

Strings ​

Byte strings ​

Temporal values ​

Integers ​

Floats ​

Booleans ​

Null ​

Arrays ​

Tuples ​

Identifiers ​

Filename extension ​

MIME type ​

W3C-EBNF grammar ​

Preliminaries

Comments

Objects

Keys

Strings

Byte strings

Temporal values

Integers

Floats

Booleans

Null

Arrays

Tuples

Identifiers

Filename extension

MIME type

W3C-EBNF grammar