RSS and Atom notes

From Helpful
Jump to navigation Jump to search
📃 These are primarily notes, intended to be a collection of useful fragments, that will probably never be complete in any sense.

RSS

Note that RSS is communication of recent items. The serving side often limits the amount of items shown. RSS clients may add a further limit.


Versions

The history of RSS is a mess. Seriously. The things you should know probably include

  • Most anything is incompatible with everything else in the strict sense.
  • A lot of parsers in the wild are written to be forgiving so that they can easily support most alternatives, as well as some common abuse of each.
  • The more official versions are, roughly, Netscape's 0.90 and RSS-DEV's 1.0
  • Userland's RSS 2.0 sees a bunch of use, probably because it is less work to generate than than 1.0.
  • There are variations in what text in the RSS file in general may contain, particularly in terms of:
    • named entities , since strict XML only has five and not the dozens that the varying HTML standards have. Numeric entities are always allowed by XML.
    • whether the description may or may not contain HTML elements, and how to put it in there.
    • A few specific-purpose tags and their prescribed contents.


More detail (skip this if lazy):

  • Earlier versions use RDF (0.90 only in the header, 1.0 widely), recent versions are simpler XML (though not always strictly valid XML due to people dumping HTML and its entities in the description fields verbatim)
  • There are many unoficial variations, primarily from Dave Winer while at Userland, such as the same-named but incompatible variation of 0.91, his 0.92, 0.93, 0.94, 2.0, two different versions of 2.01, and his scriptingNews format which predates RSS.
  • RSS aggregators tend to be resilient to the incompatibilities between the versions, largely because they have to be to actually work acceptably.
  • Atom is a competing format. Various sites offer it alongside RSS, and most anything will read it, partly because it is better standardized and less bothersome than RSS.
  • There seems to be a recent/upcoming RSS 3


See also:

RSS2 Formatting

Basics

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

A channel regularly has:

  • a title
  • a link to the browsable content it's serializing
  • a description

Items regularly have:

  • a title
  • an author - usually the email address of the item's author
  • a description
  • a pubDate

Items sometimes have:

  • a guid - the identity of the item; often used to determine whether a post is new. An URL is one easy way to create globally unique identifiers, but it doesn't have to be a URL (see below)
  • a link to a comments page (as for blogs; may be redundant with link and guid permalink in some systems)
  • category - taxonomic. May have a specified domain
  • source (source RSS channel)


pubDate format

Note that aggregators may choose to not show posts while they describe a date in the future. The format is RFC822/1233 style (see also common date formats):

Tue, 10 Jun 2003 04:00:00 GMT

In strftime terms:

%a, %d %b %Y %H:%M:%S GMT

...after you adjust to to GMT (RSS clients should be able to deal with other timezones, or timezones at all, though not all do).

On guid and link

The GUID is optional, but useful as a strict way of keeping track of items, particularly when entries can be edited.

It should be seen as a string. When you set isPermaLink="false" it can be entirely freeform and shouldn't be assumed to be a URI. When you add isPermaLink="true" (or omit this attribute(verify)) it can be assumed to be both an identifier and a permalink URI to the item.

There are a few situations in which it makes sense to have both a link and a permalink guid at the same time.

However, readers will assume one of the two is the 'main' link when linking to the item -- probably overriding guid with link (verify).


HTML in description, title

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

HTML is not valid XML, so if you want images or styling, pick a RSS variant that supports it and go by its standard.

RSS2.0:

Minimal example

This example uses RSS 2.0.

<?xml version="1.0"?>
<rss version="2.0">
  <channel>
 
    <title>Examples!</title>
    <link>http://example.com/What-this-RSS-is-publishing</link>
    <description>Standing on the shoulders of examples.</description>
  
    <item>
      <guid>http://example.com/blog?id=146</link>
      <pubDate>Fri, 09 Mar 2007 10:52:50 GMT</pubDate>
      <title>test entry</title>
      <description></description>
    </item>
 
  </channel>
</rss>

RSS in browsers

Browser smartness

You can add one or more entries to the HEAD of your HTML files:

<link rel='alternate' type='application/rss+xml' title='Name' href='/link.rss'/>

Firefox, among others, will show an RSS icon in the location bar, which allow you to quickly subscribe to these feeds. IE (at least up to 6) doesn't do this, which is why many people provide in-page links to RSS feeds.

MIME type

RSS readers/aggregators/whateverers generally do not care about MIME type, it only cares about the feed containing RSS code. MIME type are usually important to browsers since it controls what most of them actually do with RSS feeds they meet.

  • text/xml means you can see the code with your browsers, which is useful for debugging.
  • in the olden days application/rss+xml would confuse browsers, but it seems everything understands it now(verify), and may handle it with internal RSS handlers.


Browsers tend to also look at the contents; it seems firefox detects (RDF-less) RSS regardless of MIME, while IE (at least up to 6) is not very smart, regardless of MIME (which it seems to ignore). For IE you may want to add a CSS to make it display nicely; see below.

Very basic styling

If you serve as text/xml, you can add styling to make it look, well, user-readable.

Note that some browsers apply own own formatted preview (sometimes an 'subscribe to this?'), at least when using a suitable MIME type.


After the <?xml?> and before the root tag, add a reference to a css file:

<?xml-stylesheet type="text/css" href="rss.css"?>

A simple example for this rss.css file:

channel description,
channel link,
channel language,
item guid,
item link,
item author {
  /* this CSS is meant as a quick overview; we're not interested in seeing these */
  display: none;
  visibility: collapse; 
}

title {
  font-size: 120%;
}
channel title {
  padding:.3em;  font-size:200%;
}

item, description, pubDate { 
   display:block;
   margin:.2em;
   padding:.3em;
   background: #eee;
}
item { 
   border: 1px solid #777;
}

Extensions

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


Including media:


See also (RSS)

Atom

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Atom is an alternative to RSS, and is a clearer standard; It is generally stricter and has more specific support for features that are somewhat fuzzier in RSS.

Other differences include the fact that Atom entries must have a guid, which makes for easier and more robust update checking on Atom clients.

It also allows easier delivery of various types of payload data.


See also (Atom)