RELOAD

From OHRRPGCE-Wiki
Jump to: navigation, search

The format of many older lumps is, in many cases, a kludge. RELOAD is new generic binary serialisation format to replace existing data lumps with marked up data, enabling backwards compatibility without the need for a data upgrade procedure, and allowing semantic access to data. It is used for most new RPG lumps, with old lump file formats being gradually replaced with RELOAD-based ones.

Several formats were considered, and I (Mike C.) decided that none were satisfactory for this purpose. So, I came up with the idea of RELOAD.

Really Efficient List Of Arbitrary Data (an alternative acronym was VLOAD, or Variable List Of Arbitrary Data)

The format is that of a hierarchical tree. The file format is essentially a root node that contains everything in the file. Children can be nested to an arbitrary degree, similar to other formats such as XML or JSON, without some of the overhead of a text-based encoding.

To be sure, there is overhead, but the benefit of being able to ask for, say, the "text box's text" instead of "the data at offset 32" makes this solution superior to a flat data file.

RELOAD file format[edit]

A RELOAD file consists of three parts: the header, the body, and the string table.

File Header[edit]

About Formal Specs

Data Meaning
BYTE * 4 Magic word "RELD"
BYTE Version number (currently 1)
LONG Header size (altogether, should be 13)
LONG String table position (from beginning of file)

Body[edit]

The body is composed of a single element, known as the "root" element. The root contains all the data in the file, as children. Theoretically, the root could be a string element, with a simple payload, but in practise, it will always be a "container" element.

An Element[edit]

Data Meaning
LONG Total size of content, not including this LONG
VLI Tag name, stored in the string table
BYTE Type of element. See below for a list.
??? Data. This data is wholly dependent on the Type LONG. See below for more details.
VLI Number of children
??? each child element, back to back (If number of children > 0)

Children are whole elements said to be "contained" by this one, and have the same structure (just, nested).

Note that the OHRRPGCE does not preserve the difference between different integer data types when loading a document,

NOTE: The tag name should not start with the '@' character. This is reserved for "attributes" in documents converted from XML.

There are no specifications for what is a valid tag name; however the empty string is definitely valid. Currently, all RELOAD documents use tag names matched by [_a-z]*

The data chunk can be any of the following types.

Element Type 0 (null)[edit]

A null element has no data. It may be used as a flag or a container.

Element Type 1 (8-bit integer)[edit]

Data Meaning
BYTE The integer to be stored.

Element Type 2 (16-bit integer)[edit]

Data Meaning
INT The integer to be stored.

Element Type 3 (32-bit integer)[edit]

Data Meaning
LONG The integer to be stored.

Element Type 4 (64-bit integer)[edit]

Data Meaning
INT64 The integer to be stored.

Element Type 5 (double precision float)[edit]

Data Meaning
DOUBLE The float to be stored.

Element Type 6 (string, inline)[edit]

This can be used to store a string, or a blob of data (possibly containing NUL bytes). It doesn't make much of a difference to RELOAD.

Data Meaning
VLI Size of string
BYTE * size The string data

String Table[edit]

Element names are stored in a string table, to streamline the data itself. The table is, generally, appended after the body. Index zero of the string table is always the empty string, however it is not written to file. The first string in the on-disk table is string 1. The table has this format:

Data Meaning
VLI number of strings written to file (== size of the string table - 1)
??? string data, for table index 1 and above

The string data consists of a number of records with this format:

Data Meaning
VLI Size of string
BYTE * size string data

VLI (Variable Length Integer)[edit]

VLI (variable length integer) is a special format for storing signed integers which, obviously, can vary in length. By breaking the integer to be stored into 6 or 7 bit chunks, and using the remaining bits as control flags, we can effectively save bytes for integers that are typically small, but have potential to be quite large.

First, if the value is negative, take the bitwise complement, so that -1 becomes 0, -2 becomes 1, etc. The complement is then encoded. NOTE: RELOAD version 1 documents do not contain negative VLI numbers. They might be used in future.

The first byte in a stored VLI value looks like this:

 abxxxxxx

The low 6 bits are the low 6 bits of the integer. 'a' is a flag, indicating whether there are more bytes. a==0 means this is the last byte. 'b' is a flag indicating whether the VLI is negative. 0 means that it is positive.

Subsequent bytes are similar, but don't have negative flags, so they look like this:

 axxxxxxx

'a', again, is the continuation flag, while 'x' are the next-higher seven bits.

Stop adding continuation bytes once all 1 bits in the value have been output.

Example:

Let's read this value:

 0x83 0x01

The first byte looks like this:

 10000011

The first bit is set, so there's another byte. The low 6 bits are the number 3. Let's remember that.

The next byte looks like:

00000001

The first bit is clear, so obviously this is the last byte. So, if we shift the rest of the bits up by 6 (to make room for the first 6 bits we read), we get 64. Adding in those other bits, we get our final total, 67.

NOTE: The current implementation of this algorithm is limited to an unencoded 64-bits (which are encoded into 10 bytes). This is a platform limitation (the compiler does not support any integer data-types greater than 64-bit).

Plus, the overhead of the continuation flag has already chewed up 10 bits at this point, so it would be more efficient to use another encoding for these hypothetical really-big integers.

More examples[edit]

10000000 00000001 (128 1) is 64
01111111 (127) is -64
11000000 00000001 (192 1) is -65
11111101 11111011 00000001 (253 251 1) is -16126
11111111 11111111 11111111 11111111 11111111 11111111 11111111 11111111 11111111 00000001 is -2^63

Implementations[edit]

  • The original implementation in FreeBasic (reload.bas and reload.bi) is the most full-featured.
    • RELOADBasic is an extension of FreeBASIC which adds some DSL (domain specific language) features for simpler parsing of RELOAD-based data formats. It is translated to FB by the reloadbasic/reloadbasic.py preprocessor. Documentation is kept in reloadbasic/RELOADbasic.txt
  • nohrio has a complete implementation in Python.
  • HamsterWrench has an implementation in Java, with a viewer available for download.

Utilities[edit]

reload2xml and xml2reload convert a RELOAD document to an XML format and back, using base64 for binary blobs, converting @-name-prefixed node to/from attributes, and preserving leading and trailing whitespace on strings. Null-named nodes are also supported. However, exact types of nodes may be lost unless the --debug option is passed to reload2xml.

reloadutil can compare two documents and print differences.

OHRRPGCE Implementation details[edit]

See RPG Format or Special:WhatLinksHere/RELOAD for RELOAD-based file formats.

All integer node types are converted to the single 64-bit integer data type on loading a document, and the smallest possible integer type is used on saving.

Most of the RELOAD API ignores node types, converting as necessary. For example, calling GetString on an integer causes a string representation of the integer to be returned, while GetInteger on a string causes it to be parsed as a decimal number. While currently we always read and write nodes as the same data type, this might not be true in future, and converting a document to XML and back will cause strings and floating point numbers that look like integers to be converted into integers.

To be done[edit]

A TODO list for the RELOAD related code, in order of priority (row-major order)

  • Implement attributes in xml2reload
    • attributes are ordinary nodes with names that start with "@"
    • enumerate attributes when parsing an element? (for real this time)
    • need to include attributes when going back to XML
    • consensus seemed to be against needing support for type hints
  • Finish API
    • need content getting functions (GetString, GetInteger, etc)
    • need functions for moving branches around (or, do we?)
    • need navigation functions (GetChildByName, GetChildrenByName (see below))
  • Implement NodeList
    • which is like "search results", in that it stores a list of nodes not directly related to each other
    • will be returned from functions like GetChildrenByName and the RPath functions.
  • Implement RPath (RELOAD query syntax)
    • it will bear a surprising resemblance to XPath ("/root/mynode[@id=1]/foo")
    • the first "/" is the root. if omitted, navigate relative to here.
    • selectors ("[@id=1]") match based on the contents of child nodes.
      • since attributes already start with "@", then we can treat everything as ordinary nodes
    • other important features: ".." - up a level, "*" - everything
  • Import RELOAD documents to RPGs/Plotscripting interface
    • one should be able to store as many RELOAD documents as they want, and ideally, they would be combined into one single document internally.
    • RELOAD document(), RELOAD get child by name(), RELOAD get string(), RELOAD get integer(), etc etc.
      • Plotscripting should have as much access as the native API
    • allow documents to be saved in the save file?


RPath Syntax[edit]

Note: Work on this is not quite complete?

All RPath searches are performed on what is referred to as the "Context node". The Context node has two purposes:

  1. The document specified by the Context node is the one that will be searched, and
  2. If the RPath query is relative, the search will be performed with the Context node as the root (i.e., all returned nodes will be children of the Context node). If the query is absolute, the Context node is not used for this purpose.

The Context node cannot be null, as a Document must be specified.

Relative path[edit]

node1

not done

This query will select all nodes under the Context node named "node1" without any other constraints.

not done

node1/node2

This query will select all nodes under the Context node named "node2" which directly descend from a node named "node1"

not done

node1//node2

This query will select all nodes under the Context node named "node2" which descend, directly or indirectly, from a node named "node1"

not done

node1[foo=1]

This query will select all nodes under the Context node named "node1" which have a child named "foo" which has a value equivalent to 1.

not done

node1[bar=2]/node2[foo=1]

This query will select all nodes under the Context node named "node2", which meet the following conditions:

  • They descend from a node named "node1"
  • The "node1" which they descend from must also have another child named "bar", which has a value equivalent to 2
  • They also have a child of their own named "foo", which has a value equivalent to 1.

That's enough for now