A little corner of the Empire on the web.

14 April, 2003

OK I've just spent the last few days converting jardBRAIN so that it can export an RSS feed. This involved rewriting and generalising some of the internals (which wasn't such a bad thing as I'd been meaning to do it myself for ages anyway, I just needed some external stimulus to give me a good reason, and a boot up the jacksie).

So all's going well, I settled on RSS version 2.0 in the end. Created a simple test template, looked everything up in the specs (RSS v2.0 spec) looked up things that aren't clear in the previous specs (even as far as going back to 1982's RFC822 to find the date/time format for various entries).

Unfortunately at the end of this I still have two questions about two ambiguous properties:

  1. What content type (MIME type) do I serve my file as?
  2. How do I use the GUID field?
Taking these one at a time:

After a bit of research on the 'net there are three content types in common use for RSS files; text/html, text/xml and application/rss+xml. Of those the first is obviously wrong, the second type fits in a general sense, and the third seems right (and is recommended and used by Mark Pilgrim). So I think that I'll go for application/rss+xml, but that's still not quite settled yet.

The second question is still a little unclear, at least for a bear of little brain like myself.

From the RSS v2.0 official specification:

guid stands for globally unique identifier. It's a string that uniquely identifies the item. When present, an aggregator may choose to use this string to determine if an item is new.…
If the guid element has an attribute named "isPermaLink" with a value of true, the reader may assume that it is a permalink to the item, that is, a url that can be opened in a Web browser, that points to the full item described by the element.

and:

A frequently asked question about guids is how do they compare to links. Aren't they the same thing? Yes, in some content systems, and no in others. In some systems, link is a permalink to a weblog item. However, in other systems, each item is a synopsis of a longer article, link points to the article, and guid is the permalink to the weblog entry. In all cases, it's recommended that you provide the guid, and if possible make it a permalink. This enables aggregators to not repeat items, even if there have been editing changes.

And from Userland's backend pages, Guids are not just for geeks anymore:

Aggregators and readers can use the guid in one or two ways:

1. To determine if an item is new or not, allowing the authors of weblogs to make minor editing changes without making all their readers figure out if a post is new or not.

2. If it's a permalink, make it easy for the reader to go directly to the item on the Web. This is cool for people who want to quickly include the link in their weblog.

The second feature is a nice convenience, the first, imho should be a feature of all aggregators, readers and content systems.

Ok, that's all well and good and I can see the reasoning behind it, people often take an existing entry, edit it to correct the spelling and then republish it to the same URL. My question is which way round does this work? Is the GUID there to always identify the one article through all of its changes, so that even if other things change, the GUID will stay the same (like my addeddate ID used in permalinks in jardBRAIN), or is it there to say that although this looks like the same article that was here ten minutes ago, its actually had the wording changed, and had some offensive phrase edited out, so the GUID changes (like my full recordid, or possibly something like an MD5 digest)?

Maybe I'm reading too much into all this, but I just can't get my head around this one.