Site/app icons; robots and sitemap; and related

From Helpful
Jump to: navigation, search
Related to web development, lower level hosting, and such: (See also the webdev category)

Lower levels


Server stuff:


Higher levels



favicon.ico

The favicon is the icon beside the url in the address bar, in tabs, and (as seems the original intent) in bookmarks.


There are only ad-hoc conventions for the format and the way it should be referenced.


All browsers now ought to support at least

ICO
(static) GIF
PNG (except opera mini[1](verify))


Some allow

JPEG
animated GIF
APNG
SVG[2]

...but don't count on it. See also this table.


Size

historically 16x16 pixel
Most browsers will scale down larger images as necessary
so you see 16x16, 32x32, 48x48, 64x64, and 128x128 with some frequency
Note that a specifically made 16x16 pixel image will tend to look less blurry.
Some browsers may show a larger-than-16x16 image in places other than the URL bar.


When it gets fetched

The default behaviour for most browsers is to fetch favicon.ico, unprompted, typically only under the host root i.e. always /favicon.ico

Browsers may only do this when fetching HTML content.


Explicitly referencing to a favico from a HTML document will override default behaviour.


You can try to change favico with js, (basically altering the head link element) but behaviour varies with browsers, and you shouldn't assume it always works(verify)



Explicitly including (HTML)

You can add one via HTML (this example for PNG) using something like:

<link rel="icon" href="/img/app-icon.png" type="image/png">


Further notes:

  • The value for rel isn't standardized. Microsoft's suggestion did not consider standards (they used "shortcut icon" but did not consider that rel is seen as space-separated token set instead of a string). Some browsers were randomly picky, so in the wild you see:
    • rel="icon" (fairly usual)(verify)
    • rel="shortcut icon"
    • rel="SHORTCUT ICON" (some older browsers would take only this)(verify)
    • rel="ICON"
  • ICO's mime type is image/vnd.microsoft.icon. Before its definition, there was no standard, though image/x-icon was the conventional value (with a little less meaning).

Creating

Saving as PNG has been a good option for a long while.

If you insist on the ICO format: Various image editing programs (such as GIMP) can save to the ICO format, there are utilities like png2ico and such out there, and a whole load of websites that do the work for you (e.g. html-kit.com/favicon)


Arguably the most complex part is ensuring nice contrast when scaled down that much.

specific platform icons

apple-touch-icon

iOS (iPhone, iPad, iPod Touch) will use these typically larger images, as may some Android devices and the odd browser feature

These are nicer for icons on a home screen or such.


Muc​h the same story as favicon, in that

it will look in pre-set places
/apple-touch-icon.png and
/apple-touch-icon-precomposed.png)

...unless you tell it where to go using:

<link rel="apple-touch-icon" href="/iphone.png"/>

or

<link rel="apple-touch-icon-precomposed" href="/iphone_precomposed.png"/>

Notes:

  • using apple-touch-icon-precomposed means the iOS won't add effects like a shine


See also:

android-chrome

Targeted 192×192 and 128×128 icons, preferring the 192, apparently mentioning 128 for overlap with some others.

If you declare many sizes, It seems to pick the largest it can find (≤192).

Declared like:

<link rel="icon" type="image/png" href="/fav192.png" sizes="192x192"/>

(earlier versions also used the apple touch icon, and had 196x196 icons)


See also:


Related to web development, lower level hosting, and such: (See also the webdev category)

Lower levels


Server stuff:


Higher levels



JSON manifest

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Included like

<link rel="manifest" href="manifest.json">


Mostly for websites that are acting their hardest to look like apps, a.k.a. progressive web apps, and the assistance on phones that helps that illusion.

A JSON file that lets you control how your app appears on your home screen, while launching, and whether to hide the browser IO once launched.


Includes:

  • icons
  • background color - (...to show before its stylesheet is loaded, and used in a possible splash screen), theme color (affects task switching)
  • display - preferred display mode e.g. browser (default), or showing less browser/navigation with one of fullscreen, minimal-ui, standalone
  • orientation - (preferred orientation)
  • shortcuts - mobile devices can use these to show common actions in a context menu


Mostly meant for app stores:

  • author name
  • app rating, screenshots -
  • categories (e.g. "books", "education", "medical")
  • app name, and a short name
  • app version
  • app description
  • whether to prefer/suggest a phone-installable app
  • whether to register url protocol handler


Notes:

  • some things may show a splash screen based on e.g. name, icon, background color
  • the filename can be anything, you may see different conventions
e.g.
site.webmanifest
seems to come from the 'HTML5 Boilerplate' template


See also:

robots.txt

/robots.txt lets you ask crawlers(/robots/spiders) not to visit URLs or directories, to opt out of robot's basic find-everything behaviour.

Assuming they look this file.

Assuming they respect it.


Why robots.txt?

Practical uses include:

  • lessening the amount of unfinished work from appearing in web searches
not a guarantee, but can be easier than e.g. making sure you have no links to it, password protecting it until it's done, etc.
  • prevent crawlers from wasting bandwidth on things like:
    • temporary directories
    • short-term caches
    • very large files (e.g. where you may want to put raw originals, downloads (assuming you have a page describing it that will get indexed), and such
  • selectively disallowing crawlers
  • asking some crawlers to be gentler than their default


You can expect crawlers to take a few days to notice change in robots.txt and make it current throughout its distributed setup. The delay isn't very controlled or predicted, which makes robots a poor choice for temporary blocks.

Besides, there are also crawlers that just ignore robots.txt



The contents and logic

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

There are only a few directives that you can expect the majority to parse, and while there are more, you should assume most will not understand them.


User-agent:

  • User agent names should be used interpreted case-insensitive
You can use
*
meaning 'everything'.


Disallow:

  • basically lets you specify a starting path
  • Wildcards are not supported, but:
    • Disallow: /
      means disallow all
    • Disallow:
      (no value) means allow all
    • Strings act as 'starts with' strings, so /index would block /index/, /index.html, and more.
  • You get to specify one path per Disallow; use multiple Disallows if you want to disallow a list of things things




Further notes:

  • A spider that checks whether robots.txt has something to say about a given URL will use the first (applicable_user-agent, applicable_disallow) pair and stop processing.
  • The default if-no-rules-match policy is to allow, but a catch-all disallow at the end is possible.
  • ...which in combination means that order matters, and allows slightly more complex constructions when you use allows - you can do both agent whitelist and blacklist.
  • Googlebot has some extensions, including wildcards and an Allow, but these aren't supported by many other things
  • Don't list secrets. Yes, you can keep it out of searches, but people with bad intentions can trivially look at your robots.txt just to find interesting things



See also


sitemaps

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


A sitemap is a list of pages on a domain.

The word 'sitemap' comes from the days where webmasters would have one page that linked to all pages on the site, that being the easiest way to be sure that all of your public-facing pages would get indexed by crawlers.

So this is just more formal "yeah just make some HTML with lots of links", also allowing you to give more specific information of how (e.g. how often) to crawl, which can have some minor positive side effects e.g. on your resource use.


Sitemaps allow specification of

  • what parts are available for harvesting
  • when a page was last updated
  • how often each item will change
  • the relative, on the site (see note below)


Sitemaps are useful when

  • Things are not well linked yet, from the site itself and/or from elsewhere
  • you want to hint to search engines that, say, your news page and some dynamic content updates quite often, while some stuff is almost static
  • You are using Javascript drop-down menus, AJAXed content, in a way that means crawlers won't find your links/content.

They have little added value when all your content is well-harvested already.


Sitemaps could be said to complement robots.txt in that those can only ask not to index something.



Why sitemaps?

XML or plain text

Getting it referenced and used

Sitemap indexes

trafficbasedsspsitemap.xml

These files are generated by Bing Sitemap Plugin for IIS and Apache.

Requests for this will only come from msnbot, so implementing this is useless for other search engines.

You probably want to ignore these requests.

See also

See also (sitemap)