← home

Time Streams Day Archive Format

Complying with the Time Streams protocol is sufficient to implement a compatible server. In addition, there is a Time Streams Day Archive file format that is useful for file-based servers, or as a standard for importing and exporting archives.

The node.js implementation uses this as its on-disk storage format, and converters exist for Instagram, Pinboard, and Notabli.

Directory structure

The directory for a stream has a .timestream suffix. For example, a stream called statuses would exist at statuses.timestream.

Posts are organized in directories following the YYYY/MM/DD format based on the UTC day of the post.

File names

Post files have a name and an optional time prefix, delimited by a - character, and a file extension representing the content type. The time is an ISO 8601 time fragment in UTC time formatted HHmmssZ, without punctuation (the "basic" form).

For example, a text post generated at Wed, 24 Jun 2020 05:12:53 GMT with the name hi would be stored at 2020/06/24/051253Z-hi.txt.

If no time is present in the filename, the time fragment is assumed to be 000000Z. The store should sort first by time, and then lexicographic order. Time Stream data stores should generate a post name if none is provided.

Related files

Related files are stored with the format[main post id]$[relation].[ext]. That is, a metadata file describing 01-first-post.md could be stored at 01-first-post.md$describedby.json.

Related files should not be returned in regular previous-style traversal or by before date queries. They should be surfaced via Link headers on the primary post, and can be fetched directly by id.

Attributes

Additional attributes for any given post or related file are stored in a .attributes file, with a string matching the attributes portion of a Link header entry. For the post itself, the suffix $self.attributes is used.

For example, an image file stored at 2020/07/20/my-photo.jpg could have a corresponding 2020/07/20/my-photo.jpg$self.attributes file with content title="San Francisco cityscape with a view of Sutro Tower". When fetched via HTTP, the response would have a link header containing <post-id>; rel=self, title="San Francisco cityscape with a view of Sutro Tower".

Among other things, this gives us the option to specify post title without having to load a separate url. It can also be used to give standard relations (like describedby) more user friendly labels for display, like description. The Web Linking RFC includes a method for specifying different character sets and languages.

Example

In combination, these rules lend themselves well to managing day-resolution streams by hand. For instance, if you have a blog stored as a time stream, you could structure it like this:

blog.timestream/
   2020/
     06/
        23/
          01-first-post.md
          01-first-post.md$self.attributes
          01-first-post.md$describedby.json
          01-first-post.md$describedby.json.attributes
          02-second-post-on-the-first-day.md
        24/
           01-third-post.md

On the other end of the spectrum, if it's possible that multiple files are generated per second—and if the ordering within a single second is meaningful—then the store should generate names that sort properly. (ulid is a nice example of a name generation technique with the desired properties.)

In that case, the directory structure might look something like this:

posts.timestream/
2020/
 06/
    24/
       051121Z-01EBJBSPBA3BAMAHS9RQDDK83G.txt
       051121Z-01EBJBSPBA3BAMAHS9RQDDK83H.txt
       051121Z-01EBJBSPBA3BAMAHS9RQDDK83I.txt
← home