Graph notes

From Helpful
Jump to: navigation, search

A graph (in the data structure sense rather than the plotting sense) is a node-link structure.


From a distance, it is just that, and any other details are semantics applied on top.

They are used whenever it is convenient to see data this way.


See also Information visualisation.


Types and properties

Algorithms

Visualization

Graph visualization can be static (as in e.g. graphviz), or some degree of dynamic (e.g. TouchGraph (java), JSViz (javascript), SpringGraph (Flex) and more).

The dynamic part of dynamic visualization can refer to the ability to allow new nodes to be added at any time, filtering or some other use of views, refocusing (e.g. making another node the center in a radial layout), and more.


Visual criteria

Static/general

Minimize:

  • edge crossings (strictly or heuristically)

Maximize:

  • node separation
  • angular spacing

Where possible:

  • minimize (unnecessary) edge angles,
  • minimize graph area
  • show symmetry
  • show clusters
  • make cognitive-semantic sense (show tree as tree)
  • have a information density that is nice to humans

Various of these are NP-Hard even individually, fuzzily defined, and some cases incompatible, calling for heuristically solutions.

When graphs are trees, you can do more dividing-and-conquering. For dags or even graphs in general, you can still do so by pre-clustering, although there are some pathological cases (though you can question whether they have a good solution anyway), and some cases where clustering may not pick out the data that we want it to.


Dynamic

When the graph can change (in terms of data or the visualization itself), there are a few more critera.

This is mostly just the intuitive "if there are interests to follow, make relayout as consistent with each other as possible," since creating an independent new layout and tweening to it is likely to only have superficial relations to the last and cognitively be equivalent to looking at a new graph.

The criteria here are mostly based on maintaining sibling/neighbour ordering. Solutions usually do not scale, though.

Visualization models

Models

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Springs

Some model of springs is commonly employed. The layouts roughly correlate to graph-theorhetic distance, show clusters (of connections), and serves node spacing to a decent degree.

Theoretical springs have an ideal length, and push and pull with a force linear to the distance they are from their center point, and to a spring strength.

Simple models do not deal with a quite-connected graphs. Whenever there is a cluster with many connections this makes for a lot of all-equal stress, which can force nodes in connected areas close together. To a degree this is inherent with densely connected graphs, but there are models and tweaks that work better visually.

When a graph has edge values that indicate connection importance, you can use this to control the spring strength. It somewhat helps to use edge values (if they differ) as the strength of the spring; this will usually center the stronger ones.

Electrical particles / magnets / gravity

The sprint-mass model isn't really complete since edges that are not connected do not interact at all. While they will be pulled along, they might overlap without reason to separate, and you won't get things like angular separation of nodes.

When going for aesthetics one usually adds some way of making particles repel each other, usually with a force that grows weaker over distance, often based on the inverse, based on a physical model of some sort.

This approach visually separates quite well, but has the downside of needing to work on all pairs of nodes and therefore not scaling well with the amount of nodes.

Spring-Mass

A simple spring model assumes that nodes affect each other equally, implicitly assuming equal masses. In reality you can use some node value as a mass, which means it affects other nodes more and itself moves less.


Oscillation, damping, and some attempted smartness

One common problem is oscillation, partly because some energy is conserved, or because of inaccuracies in discrete simulations. This usually makes graphs twitch.

Usually, movement/energy is removed from the system in some way, e.g. in proportion to how settled you think something is.


Both too little and too much damping is possible, or more specifically, will cause and avoid different problems. Since graphs usually start off without any positional information they are often started off with random positions and may be decided to be settled to soon.

Independently of that and possibly more important is that stress is bad because it tends to indicate entangled forces of nodes that want to separate but can't because of local pseudo-optimums created by overly strong forces.


A good guess is possible, but using rules based on information like the degree of the node to increase the damping, which adds sluggishness similar to the way mass does, but now based on the graph structure.

Other things can also be chosen differently or smarter. For example, repulsion is scale-dependent in that putting more nodes in the same area linearly increases stress, which is probably unnecessary.

Also, nodes with high degrees will strongly center, which sounds good at first, but with enough nodes will just force everything into a stressful mess. It is not insane to actually lessen the effect of springs connected to nodes with high degrees.

It's also handy to keep repulsion on the same sort of scale as the spring forces, so that both have their say.

Hyperbolic views

A rubbersheeting type of model that focuses on a specific part of the data. Distances are not judgable, but local relations are easier, and it will work for larger datasets than basic spring-mass does.

Example: http://hypergraph.sourceforge.net/


See also (models)

See also


Tangential


Storage