Site/app icons; robots and sitemap; and related

`favicon.ico`

The favicon is the icon beside the url in the address bar, in tabs, and (as seems the original intent) in bookmarks.

There are only ad-hoc conventions for the format and the way it should be referenced.

All browsers now ought to support at least

ICO

(static) GIF

PNG (except opera mini[1](verify))

Some allow

JPEG

animated GIF

APNG

SVG[2]

...but don't count on it. See also this table.

Size

historically 16x16 pixel

Most browsers will scale down larger images as necessary

so you see 16x16, 32x32, 48x48, 64x64, and 128x128 with some frequency

Note that a specifically made 16x16 pixel image will tend to look less blurry.

Some browsers may show a larger-than-16x16 image in places other than the URL bar.

When it gets fetched

The default behaviour for most browsers is to fetch favicon.ico, unprompted, typically only under the host root i.e. always /favicon.ico

Browsers may only do this when fetching HTML content.

Explicitly referencing to a favico from a HTML document will override default behaviour.

You can try to change favico with js, (basically altering the head link element) but behaviour varies with browsers, and you shouldn't assume it always works(verify)

Explicitly including (HTML)

You can add one via HTML (this example for PNG) using something like:

<link rel="icon" href="/img/app-icon.png" type="image/png">

Further notes:

The value for rel isn't standardized. Microsoft's suggestion did not consider standards (they used "shortcut icon" but did not consider that rel is seen as space-separated token set instead of a string). Some browsers were randomly picky, so in the wild you see:
- rel="icon" (fairly usual)(verify)
- rel="shortcut icon"
- rel="SHORTCUT ICON" (some older browsers would take only this)(verify)
- rel="ICON"

ICO's mime type is image/vnd.microsoft.icon. Before its definition, there was no standard, though image/x-icon was the conventional value (with a little less meaning).

Creating

Saving as PNG has been a good option for a long while.

If you insist on the ICO format: Various image editing programs (such as GIMP) can save to the ICO format, there are utilities like png2ico and such out there, and a whole load of websites that do the work for you (e.g. html-kit.com/favicon)

Arguably the most complex part is ensuring nice contrast when scaled down that much.

specific platform icons

apple-touch-icon

iOS (iPhone, iPad, iPod Touch) will use these typically larger images, as may some Android devices and the odd browser feature

These are nicer for icons on a home screen or such.

Much the same story as favicon, in that

it will look in pre-set places

/apple-touch-icon.png and

/apple-touch-icon-precomposed.png)

...unless you tell it where to go using:

<link rel="apple-touch-icon" href="/iphone.png"/>

or

<link rel="apple-touch-icon-precomposed" href="/iphone_precomposed.png"/>

Notes:

using apple-touch-icon-precomposed means the iOS won't add effects like a shine

android-chrome

Targeted 192×192 and 128×128 icons, preferring the 192, apparently mentioning 128 for overlap with some others.

If you declare many sizes, It seems to pick the largest it can find (≤192).

Declared like:

<link rel="icon" type="image/png" href="/fav192.png" sizes="192x192"/>

(earlier versions also used the apple touch icon, and had 196x196 icons)

JSON manifest

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Included like

<link rel="manifest" href="mymanifest">

(the filename can be anything, because you explicitly link to it anyway, but people may conventionally use something indicative like manifest.json, while site.webmanifest seems to come from the 'HTML5 Boilerplate' template)

A JSON file that lets you control how your app appears on your home screen, while launching, and whether to hide the browser IO once launched.

Mostly for websites that are acting their hardest to look like apps actually being apps in the mobile-device-app sense, a.k.a. progressive web apps. Also refers to phones helping that illusion when they are given a manifest in their browser.

Includes parameters like:

icons
background color - (...to show before its stylesheet is loaded, and used in a possible splash screen), theme color (affects task switching)

display - preferred display mode e.g.

browser (default), or showing less browser/navigation with one of:

standalone

minimal-ui

fullscreen

orientation - (preferred orientation)

shortcuts - (mobile devices can opt to choose common actions in a context menu)

..and things mostly meant for app stores:

author name
app name, short name, version, description
categories (e.g. "books", "education", "medical")
app rating, screenshots

whether to prefer/suggest a phone-installable app (instead of the website-like thing we're talking about)
whether to register an URL protocol handler

Notes:

devices may show a splash screen based on parameters like name, background color, icon

`robots.txt`

/robots.txt lets you ask crawlers(/robots/spiders) not to visit URLs or directories, to opt out of robot's basic find-everything behaviour.

Assuming they look this file.

Assuming they respect it.

Why robots.txt?

Practical uses include:

lessening the amount of unfinished work from appearing in web searches

not a guarantee, but can be easier than e.g. making sure you have no links to it, password protecting it until it's done, etc.

prevent crawlers from wasting bandwidth on things like:
- temporary directories
- short-term caches
- very large files (e.g. where you may want to put raw originals, downloads (assuming you have a page describing it that will get indexed), and such

selectively disallowing crawlers

asking some crawlers to be gentler than their default

You can expect crawlers to take a few days to notice change in robots.txt and make it current throughout its distributed setup. The delay isn't very controlled or predicted, which makes robots a poor choice for temporary blocks.

Besides, there are also crawlers that just ignore robots.txt

And classes of uses that ignore robots, like most (possibly even all) link preview bots

The contents and logic

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

There are only a few directives that you can expect the majority to parse, and while there are more, you should assume most will not understand them.

User-agent:

User agent names should be used interpreted case-insensitive

You can use * meaning 'everything'.

Disallow:

basically lets you specify a starting path
Wildcards are not supported, but:
- Disallow: / means disallow all
- Disallow: (no value) means allow all
- Strings act as 'starts with' strings, so /index would block /index/, /index.html, and more.
You get to specify one path per Disallow; use multiple Disallows if you want to disallow a list of things things

Further notes:

A spider that checks whether robots.txt has something to say about a given URL will use the first (applicable_user-agent, applicable_disallow) pair and stop processing.

The default if-no-rules-match policy is to allow, but a catch-all disallow at the end is possible.

...which in combination means that order matters, and allows slightly more complex constructions when you use allows - you can do both agent whitelist and blacklist.

Googlebot has some extensions, including wildcards and an Allow, but these aren't supported by many other things

Don't list secrets. Yes, you can keep it out of searches, but people with bad intentions can trivially look at your robots.txt just to find interesting things

`ads.txt`

`humans.txt`

sitemaps

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

A sitemap is a list of pages on a domain.

The word 'sitemap' comes from the days where webmasters would have one page that linked to all pages on the site, that being the easiest way to be sure that all of your public-facing pages would get indexed by crawlers.

So this is just more formal "yeah just make some HTML with lots of links", also allowing you to give more specific information of how (e.g. how often) to crawl, which can have some minor positive side effects e.g. on your resource use.

Sitemaps allow specification of

what parts are available for harvesting
when a page was last updated
how often each item will change
the relative, on the site (see note below)

Sitemaps are useful when

Things are not well linked yet, from the site itself and/or from elsewhere
you want to hint to search engines that, say, your news page and some dynamic content updates quite often, while some stuff is almost static
You are using Javascript drop-down menus, AJAXed content, in a way that means crawlers won't find your links/content.

They have little added value when all your content is well-harvested already.

Sitemaps could be said to complement robots.txt in that those can only ask not to index something.

Why sitemaps?

XML or plain text

Getting it referenced and used

Sitemap indexes

trafficbasedsspsitemap.xml

These files are generated by Bing Sitemap Plugin for IIS and Apache.

Requests for this will only come from msnbot, so implementing this is useless for other search engines.

You probably want to ignore these requests.

Site/app icons; robots and sitemap; and related

Contents

`favicon.ico`

When it gets fetched

Explicitly including (HTML)

Creating

specific platform icons

apple-touch-icon

android-chrome

JSON manifest

`robots.txt`

Why robots.txt?

The contents and logic

See also

`ads.txt`

`humans.txt`

sitemaps

Why sitemaps?

XML or plain text

Getting it referenced and used

Sitemap indexes

trafficbasedsspsitemap.xml

See also (sitemap)

Navigation menu