Site/app icons; robots and sitemap; and related

From Helpful
(Redirected from Sitemaps)
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
Related to web development, lower level hosting, and such: (See also the webdev category)

Lower levels


Server stuff:


Higher levels



favicon.ico

The favicon is the icon beside the url in the address bar, in tabs, and (as seems the original intent) in bookmarks.


There are only ad-hoc conventions for the format and the way it should be referenced.


All browsers now ought to support at least

ICO
(static) GIF
PNG (except opera mini[1](verify))


Some allow

JPEG
animated GIF
APNG
SVG[2]

...but don't count on it. See also this table.


Size

historically 16x16 pixel
Most browsers will scale down larger images as necessary
so you see 16x16, 32x32, 48x48, 64x64, and 128x128 with some frequency
Note that a specifically made 16x16 pixel image will tend to look less blurry.
Some browsers may show a larger-than-16x16 image in places other than the URL bar.


When it gets fetched

The default behaviour for most browsers is to fetch favicon.ico, unprompted, typically only under the host root i.e. always /favicon.ico

Browsers may only do this when fetching HTML content.


Explicitly referencing to a favico from a HTML document will override default behaviour.


You can try to change favico with js, (basically altering the head link element) but behaviour varies with browsers, and you shouldn't assume it always works(verify)



Explicitly including (HTML)

You can add one via HTML (this example for PNG) using something like:

<link rel="icon" href="/img/app-icon.png" type="image/png">


Further notes:

  • The value for rel isn't standardized. Microsoft's suggestion did not consider standards (they used "shortcut icon" but did not consider that rel is seen as space-separated token set instead of a string). Some browsers were randomly picky, so in the wild you see:
    • rel="icon" (fairly usual)(verify)
    • rel="shortcut icon"
    • rel="SHORTCUT ICON" (some older browsers would take only this)(verify)
    • rel="ICON"
  • ICO's mime type is image/vnd.microsoft.icon. Before its definition, there was no standard, though image/x-icon was the conventional value (with a little less meaning).

Creating

Saving as PNG has been a good option for a long while.

If you insist on the ICO format: Various image editing programs (such as GIMP) can save to the ICO format, there are utilities like png2ico and such out there, and a whole load of websites that do the work for you (e.g. html-kit.com/favicon)


Arguably the most complex part is ensuring nice contrast when scaled down that much.

specific platform icons

apple-touch-icon

iOS (iPhone, iPad, iPod Touch) will use these typically larger images, as may some Android devices and the odd browser feature

These are nicer for icons on a home screen or such.


Muc​h the same story as favicon, in that

it will look in pre-set places
/apple-touch-icon.png and
/apple-touch-icon-precomposed.png)

...unless you tell it where to go using:

<link rel="apple-touch-icon" href="/iphone.png"/>

or

<link rel="apple-touch-icon-precomposed" href="/iphone_precomposed.png"/>

Notes:

  • using apple-touch-icon-precomposed means the iOS won't add effects like a shine


See also:

android-chrome

Targeted 192×192 and 128×128 icons, preferring the 192, apparently mentioning 128 for overlap with some others.

If you declare many sizes, It seems to pick the largest it can find (≤192).

Declared like:

<link rel="icon" type="image/png" href="/fav192.png" sizes="192x192"/>

(earlier versions also used the apple touch icon, and had 196x196 icons)


See also:


Related to web development, lower level hosting, and such: (See also the webdev category)

Lower levels


Server stuff:


Higher levels



JSON manifest

This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)

Included like

<link rel="manifest" href="mymanifest">

(the filename can be anything, because you explicitly link to it anyway, but people may conventionally use something indicative like manifest.json, while site.webmanifest seems to come from the 'HTML5 Boilerplate' template)


A JSON file that lets you control how your app appears on your home screen, while launching, and whether to hide the browser IO once launched.

Mostly for websites that are acting their hardest to look like apps actually being apps in the mobile-device-app sense, a.k.a. progressive web apps. Also refers to phones helping that illusion when they are given a manifest in their browser.


Includes parameters like:

  • icons
  • background color - (...to show before its stylesheet is loaded, and used in a possible splash screen), theme color (affects task switching)
  • display - preferred display mode e.g.
browser (default), or showing less browser/navigation with one of:
standalone
minimal-ui
fullscreen
  • orientation - (preferred orientation)
  • shortcuts - (mobile devices can opt to choose common actions in a context menu)


..and things mostly meant for app stores:

  • author name
  • app name, short name, version, description
  • categories (e.g. "books", "education", "medical")
  • app rating, screenshots
  • whether to prefer/suggest a phone-installable app (instead of the website-like thing we're talking about)
  • whether to register an URL protocol handler


Notes:

  • devices may show a splash screen based on parameters like name, background color, icon


See also:

robots.txt

/robots.txt lets you ask crawlers(/robots/spiders) not to visit URLs or directories, to opt out of robot's basic find-everything behaviour.

Assuming they look this file.

Assuming they respect it.


Why robots.txt?

Practical uses include:

  • lessening the amount of unfinished work from appearing in web searches
not a guarantee, but can be easier than e.g. making sure you have no links to it, password protecting it until it's done, etc.
  • prevent crawlers from wasting bandwidth on things like:
    • temporary directories
    • short-term caches
    • very large files (e.g. where you may want to put raw originals, downloads (assuming you have a page describing it that will get indexed), and such
  • selectively disallowing crawlers
  • asking some crawlers to be gentler than their default


You can expect crawlers to take a few days to notice change in robots.txt and make it current throughout its distributed setup. The delay isn't very controlled or predicted, which makes robots a poor choice for temporary blocks.

Besides, there are also crawlers that just ignore robots.txt

And classes of uses that ignore robots, like most (possibly even all) link preview bots




The contents and logic

This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)

There are only a few directives that you can expect the majority to parse, and while there are more, you should assume most will not understand them.


User-agent:

  • User agent names should be used interpreted case-insensitive
You can use * meaning 'everything'.


Disallow:

  • basically lets you specify a starting path
  • Wildcards are not supported, but:
    • Disallow: / means disallow all
    • Disallow: (no value) means allow all
    • Strings act as 'starts with' strings, so /index would block /index/, /index.html, and more.
  • You get to specify one path per Disallow; use multiple Disallows if you want to disallow a list of things things




Further notes:

  • A spider that checks whether robots.txt has something to say about a given URL will use the first (applicable_user-agent, applicable_disallow) pair and stop processing.
  • The default if-no-rules-match policy is to allow, but a catch-all disallow at the end is possible.
  • ...which in combination means that order matters, and allows slightly more complex constructions when you use allows - you can do both agent whitelist and blacklist.
  • Googlebot has some extensions, including wildcards and an Allow, but these aren't supported by many other things
  • Don't list secrets. Yes, you can keep it out of searches, but people with bad intentions can trivially look at your robots.txt just to find interesting things



See also



ads.txt

sitemaps

This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)


A sitemap is a list of pages on a domain.

The word 'sitemap' comes from the days where webmasters would have one page that linked to all pages on the site, that being the easiest way to be sure that all of your public-facing pages would get indexed by crawlers.

So this is just more formal "yeah just make some HTML with lots of links", also allowing you to give more specific information of how (e.g. how often) to crawl, which can have some minor positive side effects e.g. on your resource use.


Sitemaps allow specification of

  • what parts are available for harvesting
  • when a page was last updated
  • how often each item will change
  • the relative, on the site (see note below)


Sitemaps are useful when

  • Things are not well linked yet, from the site itself and/or from elsewhere
  • you want to hint to search engines that, say, your news page and some dynamic content updates quite often, while some stuff is almost static
  • You are using Javascript drop-down menus, AJAXed content, in a way that means crawlers won't find your links/content.

They have little added value when all your content is well-harvested already.


Sitemaps could be said to complement robots.txt in that those can only ask not to index something.



Why sitemaps?

XML or plain text

Getting it referenced and used

Sitemap indexes

trafficbasedsspsitemap.xml

These files are generated by Bing Sitemap Plugin for IIS and Apache.

Requests for this will only come from msnbot, so implementing this is useless for other search engines.

You probably want to ignore these requests.

See also

See also (sitemap)