Site/app icons; robots and sitemap; and related
Related to web development, lower level hosting, and such: (See also the webdev category)
Lower levels
|
favicon.ico
The favicon is the icon beside the url in the address bar, in tabs, and (as seems the original intent) in bookmarks.
There are only ad-hoc conventions for the format and the way it should be referenced.
All browsers now ought to support at least
Some allow
- JPEG
- animated GIF
- APNG
- SVG[2]
...but don't count on it. See also this table.
Size
- historically 16x16 pixel
- Most browsers will scale down larger images as necessary
- so you see 16x16, 32x32, 48x48, 64x64, and 128x128 with some frequency
- Note that a specifically made 16x16 pixel image will tend to look less blurry.
- Some browsers may show a larger-than-16x16 image in places other than the URL bar.
When favico gets fetched
The default behaviour for most browsers is to fetch favicon.ico, unprompted, typically only under the host root i.e. always /favicon.ico
Browsers may only do this when fetching HTML content.
Explicitly referencing to a favico from a HTML document will override default behaviour.
You can try to change favico with js, (basically altering the head link element) but behaviour varies with browsers,
and you shouldn't assume it always works(verify)
Explicitly including favico (HTML)
You can add one via HTML (this example for PNG) using something like:
<link rel="icon" href="/img/app-icon.png" type="image/png">
Further notes:
- The value for rel isn't standardized. Microsoft's suggestion did not consider standards (they used "shortcut icon" but did not consider that rel is seen as space-separated token set instead of a string). Some browsers were randomly picky, so in the wild you see:
- ICO's mime type is image/vnd.microsoft.icon. Before its definition, there was no standard, though image/x-icon was the conventional value (with a little less meaning).
Creating favicos
Saving as PNG has been a good option for a long while.
If you insist on the ICO format: Various image editing programs (such as GIMP) can save to the ICO format, there are utilities like png2ico and such out there, and a whole load of websites that do the work for you (e.g. html-kit.com/favicon)
Arguably the most complex part is ensuring nice contrast when scaled down that much.
specific platform icons
apple-touch-icon
iOS (iPhone, iPad, iPod Touch) will use these typically larger images, as may some Android devices and the odd browser feature
These are nicer for icons on a home screen or such.
Much the same story as favicon, in that
- it will look in pre-set places
- /apple-touch-icon.png and
- /apple-touch-icon-precomposed.png)
...unless you tell it where to go using:
<link rel="apple-touch-icon" href="/iphone.png"/>
or
<link rel="apple-touch-icon-precomposed" href="/iphone_precomposed.png"/>
Notes:
- using apple-touch-icon-precomposed means the iOS won't add effects like a shine
See also:
android-chrome
Targeted 192×192 and 128×128 icons, preferring the 192, apparently mentioning 128 for overlap with some others.
If you declare many sizes, It seems to pick the largest it can find (≤192).
Declared like:
<link rel="icon" type="image/png" href="/fav192.png" sizes="192x192"/>
(earlier versions also used the apple touch icon, and had 196x196 icons)
See also:
sitemaps
What?
A sitemap is a list of pages on a domain.
The word 'sitemap' comes from the days where webmasters would have one page that linked to all pages on the site, as the easiest way to be sure that all of your public-facing pages would get indexed by crawlers.
The somewhat more modern sitemap also allows you to give more specific information, e.g.
- what parts are available for harvesting
- when a page was last updated
- how often each item will change
- the relative, on the site (see note below)
Why?
Sitemaps are useful when
- Things are not well linked yet, from the site itself and/or from elsewhere
- you want to hint to search engines that, say,
- your news page and some dynamic content updates quite often,
- while some other stuff is almost static
- You are using dynamically generated links (JS drop-down menus, AJAXed content), in a way that means crawlers won't find your links/content.
They have little added value when all your content is well-harvested already.
Sitemaps could be said to complement robots.txt, in that those can only ask not to index something.
XML or plain text
Getting a sitemap referenced and used
Sitemap indexes
trafficbasedsspsitemap.xml
These files are generated by Bing Sitemap Plugin for IIS and Apache.
Requests for this will only come from msnbot, so implementing this is useless for other search engines.
You probably want to ignore these requests.
See also
- http://www.bing.com/webmaster/help/bing-sitemap-plugin-beta-f50bebf5
- http://www.bing.com/blogs/site_blogs/b/webmaster/archive/2013/02/20/building-sitemaps-manually-stop-until-you-read-this.aspx
See also (sitemap)
robots.txt
/robots.txt lets you ask crawlers(/robots/spiders) not to visit URLs or directories, to opt out of robot's basic find-everything behaviour.
Assuming they look this file.
Assuming they respect it.
Why robots.txt?
Practical uses include:
- lessening the amount of unfinished work from appearing in web searches
- not a guarantee, but can be easier than e.g. making sure you have no links to it, password protecting it until it's done, etc.
- prevent crawlers from wasting bandwidth on things like:
- temporary directories
- short-term caches
- very large files (e.g. where you may want to put raw originals, downloads (assuming you have a page describing it that will get indexed), and such
- selectively disallowing crawlers
- asking some crawlers to be gentler than their default
You can expect crawlers to take a few days to notice change in robots.txt and make it current throughout its distributed setup.
The delay isn't very controlled or predicted, which makes robots a poor choice for temporary blocks.
Besides, there are also crawlers that just ignore robots.txt
And classes of uses that ignore robots, like most (possibly even all) link preview bots
The contents and logic
There are only a few directives that you can expect the majority to parse, and while there are more, you should assume most will not understand them.
User-agent:
- User agent names should be used interpreted case-insensitive
- You can use * meaning 'everything'.
Disallow:
- basically lets you specify a starting path
- Wildcards are not supported, but:
- Disallow: / means disallow all
- Disallow: (no value) means allow all
- Strings act as 'starts with' strings, so /index would block /index/, /index.html, and more.
- You get to specify one path per Disallow; use multiple Disallows if you want to disallow a list of things things
Further notes:
- A spider that checks whether robots.txt has something to say about a given URL will use the first (applicable_user-agent, applicable_disallow) pair and stop processing.
- The default if-no-rules-match policy is to allow, but a catch-all disallow at the end is possible.
- ...which in combination means that order matters, and allows slightly more complex constructions when you use allows - you can do both agent whitelist and blacklist.
- Googlebot has some extensions, including wildcards and an Allow, but these aren't supported by many other things
- Don't list secrets. Yes, you can keep it out of searches, but people with bad intentions can trivially look at your robots.txt just to find interesting things
See also
JSON manifest
Included like
<link rel="manifest" href="mymanifest">
(the filename can be anything, because you explicitly link to it anyway, but people may conventionally use something indicative like manifest.json, while site.webmanifest seems to come from the 'HTML5 Boilerplate' template)
A JSON file that lets you control how your app appears on your home screen, while launching, and whether to hide the browser IO once launched.
Mostly for websites that are acting their hardest to look like apps actually being apps in the mobile-device-app sense, a.k.a. progressive web apps. Also refers to phones helping that illusion when they are given a manifest in their browser.
Includes parameters like:
- icons
- background color - (...to show before its stylesheet is loaded, and used in a possible splash screen), theme color (affects task switching)
- display - preferred display mode e.g.
- browser (default), or showing less browser/navigation with one of:
- standalone
- minimal-ui
- fullscreen
- orientation - (preferred orientation)
- shortcuts - (mobile devices can opt to choose common actions in a context menu)
..and things mostly meant for app stores:
- author name
- app name, short name, version, description
- categories (e.g. "books", "education", "medical")
- app rating, screenshots
- whether to prefer/suggest a phone-installable app (instead of the website-like thing we're talking about)
- whether to register an URL protocol handler
Notes:
- devices may show a splash screen based on parameters like name, background color, icon
See also:
ads.txt
humans.txt
well-known
RFC 8615 (previously RFC 5785)
is the idea of, collecting a bunch of these extra bits of information,
why not put them in a well-known place aside from other content.
That would be at /.well-known/.
e.g. wikipedia has a list of things
https://en.wikipedia.org/wiki/Well-known_URI#List_of_well-known_URIs
But there seem to be a bunch of ad hoc things on top of that,
and not all are known. For example, the one I see most is .well-known/traffic-advice,
which seems to be a Google proposal -- see https://buettner.github.io/private-prefetch-proxy/traffic-advice.html
security.txt
A text file that should say where to report vulnerabilities found on this server.