RDF serialization to s-expressions from Evan Prodromou

This document proposes a serialization of RDF graphs as S-expressions.

Design goals

Nodes

A node is represented by a symbol containing a URI.

 |http://www.example.org/bob|

A namespaced node is a pair with a namespace abbreviation and a qualified part, both symbols:

 (subject . dc)
 (name . foaf)

If the namespace abbreviation is null, it stands for the default namespace:

 (bob)

Blank nodes

Blank nodes are represented by a list of one or more predicate-object lists.

 (((creator . dc) "Evan Prodromou")
  ((title . dc) "S-expression RDF serialization"))

Note that the parts of a blank node are proper lists, while namespaced nodes are improper lists.

Literals

Literals are strings.

 "foo"
 "bar"
 "bletch"

Typed literals

Typed literals are pairs of a string plus a type. The type is a symbol.

 ("1999-08-16" . http://www.w3.org/2001/XMLSchema#date)

Types can be abbreviated like node URIs:

 ("1999-08-16" . (date . xsd))

Statements

Statements are lists with three elements, representing the predicate, subject, and object respectively.

 (|http://www.example.org/loves| |http://www.example.org/bob| |http://www.example.org/fishing|)

If there's a default namespace, this could be said more compactly as:

 ((loves) (bob) (fishing))

Graphs

A graph is a list of statements:

 (((ex . loves) (ex . bob) (ex . fishing))
  ((creator . dc) |http://www.example.com/| "Bob Reynolds"))

Graphs can be abbreviated in three ways. First, multiple statements with the same predicate and subject can be combined into a single list. So this graph:

 (((loves) (bob) (fishing))
  ((loves) (bob) (databases))
  ((loves) (bob) (ice-skating)))

could be abbreviated as:

 (((loves) (bob) (fishing) (databases) (ice-skating)))

Second, if there is more than one statement with the same predicate, they can be combined, such that:

 (((met) (harry) (sally))
  ((met) (randolph) (his-doom)))

becomes:

 ((met) ((harry) (sally))
        ((randolph) (his-doom)))

Finally, a graph can use namespaces to abbreviate nodes. For each namespace, a graph must contain a special statement, "@prefix". If the statement has two arguments, it maps a namespace abbreviation to an URL prefix. If the statement has one argument, it makes a namespace abbreviation the "default".

 ((@prefix "dc" "http://purl.org/dc/elements/1.1/")
  (@prefix "http://evan.prodromou.name/")
  ((creator . dc) (Rdf_serialization_to_s-expressions) "Evan Prodromou"))

Canonical version

The canonical version of a serialized graph uses none of the above abbreviations.

Open questions

  1. There would be some utility in making typed literals for numbers be just s-expression integers and floats.
  2. Deal with RDF collections, especially lists!
  3. Make sure things are unambiguous. Blank node syntax is especially scary.
  4. Prefix notation is lovely and Lisp-y and makes lists of properties of a single subject kind of a hassle.
  5. Postfix notation for namespace abbreviations makes for easy-to-read versions of stuff in the default namespace but is in the opposite order of "normal" namespacing.
  6. Compare with KIF.

tags: