Locking, data versioning, concurrency, and larger-scale computing notes

From Helpful
Jump to: navigation, search
Pieces of larger or complex systems
Varied databases
Message brokers/queues; job management
Locking, data versioning, concurrency, and larger-scale computing notes
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


Contents

Glossary

Glossary of (concurrent) computation

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)





"Thread"

Thread of execution
Threads in a processor
Multithreading

Concurrency and parallelism

Coroutines and cooperative multitasking

async and await

async and await are syntactic sugar in a few languages, that often mean you're doing cooperative multitasking (/coroutine) style programming.


See e.g.

Glossary of exclusiveness

Atomic operations

Critical section

Races

Thread-(un)safe

Deadlock, livelock

Deadlock and livelock are both situations where the locking in two or more threads will block things from getting done.


A deadlock is when threads end up indefinitely waiting for the other(s).

The classical, and perhaps typical example is because of the order they did multiple locks.

Consider

two threads, call them 1 and 2
two locks (protecting two resources), call them A and B
the code in both threads is "lock one, lock the other, do work, unlock both"

Because the timing of what the threads do is unrelated, a lot of the possible orders can happen. You could write them out of you want a sense of completeness, but perhaps most interesting is one of the problem cases:

1 locks A, nothing wrong with that
2 locks B, nothing wrong with that
2 locks A - so starts waiting for 1 to release it
1 locks B - so starts waiting for 2 to release it

In this particular variant, both are now waiting indefinitely on the other, before they can start work. Also, neither will release locks until they're done with that work. This can never be resolved, so both will wait indefinitely - this is a deadlock.


There are various possible ways to detect it ahead of time, and/or ways to resolve it.


One may be less finely grained locks. Consider a single lock that protects both mentioned resources at the same time.

Yes, this is often less efficient, because anything that uses that lock will do so if it needs one, the other, or both, and there are cases where other threads might have gone in parallel but don't now. Depending on how long the works-while-locked is, it may partly defeat the paralellism. But unless it defeats the paralleism entirely, it's better than hanging forever, for something that is trivial to implement.


Other deadlocks are resolved for you. Say, relational databases will typically contain a lot of logic that basically means that if two transactions happen to deadlock, one will be rolled back.


Specific deadlock situations may be easy enough to detect happening ahead of time, but this require knowledge of the situation, its locks, and how to write such checks without race conditions.


Sometimes there's a cheat in locks with timeouts and make your code retry. ...but this doesn't scale well at all, and even at smaller scale can cause other problems, like livelocks instead.


When you get to design the locking (e.g. in your own programs, rather than in a given database), consider different approach to locking.


There has been plenty of research into lock-free data structures, cleverer lock relief, and such.



A livelock refers to situations where there is regular locking and unlocking going on, but the system is trapped in a limited subset of its states and it may not get out of this subset within acceptable time, or possibly even at all.

Often these states are functional/bookkeeping states between actual work, so it's not doing anything useful either. Meaning you can see it as a type of starvation, and arguably practically equivalent to a deadlock.

It can happen when the concurrency model is loose and/or complex, but exclusive locking is still an important part of it.

Livelocks may be harder to predict, detect, and may depend on secondary effects like timing and amount of load.


Starvation

(Lock) contention, false contention, lock convoys

Thundering herd

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Locks and lock-like implementations

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)
Lock hierarchies

Re-entrant code

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Re-entrant code may be defined most by the idea that code can be called when a call is already underway (the name literally refers to entering that code again).


...which is a wide and not hammered down very well, and you usually hear slightly more specific, varied, though in the end closely related definitions, like:

  • is safe to pause and continue execution later
  • can be safely called before a previous invocation has completed
  • code that can be called by multiple threads of execution
  • execution that keeps only local, static, non-shared state

...most of which don't so much define what re-entrancy is, as it is thinking about a specific ways to (not) violate it.

And yes, in most cases, most of what ensures re-entrancy is about not sharing state with other execution (e.g. keeping your state to parameters and local variables). But

the case-specific details are also important for a more complete grasp.
you sometimes have to share state (a lot of heap allocation is), and it is your code that should think about ensuring re-entrancy


Recursion may be the easiest introduction, because it illustrates that the state of each call into a function can (and should) be kept entirely distinct from every other (each with its own scope based on its own stack frame), and that you don't need concurrency (that you don't control) to worry about re-entrancy.


Another example is the C library function strtok, which which lets you split a string by a delimiter. It was written so that repeat calls give you just the next token (until there are no more), and the original version would basically remember itself where it was in the string, making repeat calls easier. That memory is however in global state, which means that you cannot start tokenizing a new string until you're done with the last. The variant strtok_r basically makes you remember the state (has an extra parameter). It's a little more typing, but this makes it re-entrant.


But beyond that, re-entrancy is is also quality you care about in a system that does concurrency (multitasking, threading, but also concurrency that is purely sequential and cooperative), and is also relevant around hardware interrupts, callbacks that may fire quickly (e.g. on a timer) and may overlap, and some other things.

Each of those are their own topic, because what makes code unsafe in each context is specific, as are the solutions (e.g. hardware interrupts have some unique worries).


Around concurrency, your largest worries are often

  • not using shared or global state
  • atomic changes, because races are an easy way to make something violate re-entrancy





Re-entrant code is intuitively related to thread safety, because both are about doing things safely with probably-shared state.

However,

  • code can be re-entrant but not thread-safe
in part because re-entrant can be about safety within just one thread.
  • code can be thread-safe without being re-entrant
e.g. when the violation doesn't matter - e.g. use of random_r() is important for fully reproducable results, but in a bunch of cases random() will be acceptable

...or both, or neither.

There are some good examples here: https://stackoverflow.com/questions/856823/threadsafe-vs-re-entrant


That said, a lot of code either

doesn't have to worry about either, or
makes it easy, or hard-but-important, to do both




Glossary of scaling

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Types of scaling (note that this has become a bit more of a continuum in the multi-core, multi-CPU, and have-GPU world)

  • horizontal scaling, a.k.a. scaling out:
more boxes: adding more computers to do parts of a larger job
...either all with the same information, and a knowledge of who works on what,
...or horizontal partitioning , a.k.a. sharding, to distribute distinct, self-contained jobs


  • vertical scaling, a.k.a. scaling up:
bigger boxes: adding more processing power, e.g. better/more CPUs or GPUs, to the same computer
In comparison to horizontal scaling, this tends to be
faster up to some point (due to local interconnects being faster than most networking - but this is also why specialized networking blurs this line)
often pricier for the same compute, so only worth it for some types of jobs (e.g. those that require a lot of computation and near-zero latency between parts)
In practice, you'd do a cost-benefit estimation of how far to scale up each node before you start scaling out
sometimes a little more stable, in that a single node or single network connection can't trip up large jobs
At some point, speed of interconnection becomes the main issue.
particularly when you need unusually-high-speed/low-latency networking.



  • cluster computing versus grid computing is not a clear distinction, though in general the associations are:
a cluster is more homogeneous, i.e. mostly identical hardware
a grid heterogeneous, "whatever you have"
a cluster is often a tightly coupled, predetermined set of computers, and typically all in the same rack(s)
a grid is often more loosely coupled and more widely spread
a cluster's jobs are often centralized and synchronized,
a grid's jobs may be more more loose/flexible/dynamic/automatic (...by necessity)
a cluster is more likely to have a central node give jobs to the workers
a grid is more likely to have each client ask for work from the central node



Glossary - further concepts

Coroutines

Continuations

Informally, continuations are the abstract concept of 'the rest of a calculation'.


Sometimes as just a thought exercise, sometimes in a very duct-taped way, but when we give it the name continuation we often have a more generic language/library/interface framework that somehow allows a chunk of nontrivial code to be chunked/paused, and continued in some way or other.

Ideally in a way that is not clunky to write or understand.


Examples will typically also tie in practical execution details, like async, multithreading, multiprocessing,


For those familiar with generators, those are effectively continuation-style iterables.

Generators demonstrate one of the things continuations can be useful for: You can get work on demand - just the work necessary for the next yield, and, you get such state without theading.


Continuations can also be useful for concurrency-geared control structures.

Scaling, parallelizing and such

Some inherent limitations and choices

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Common fallacies of distributed computing (suggested by Deutsch and various others):

  • bandwidth is infinite
  • latency is zero
  • transport cost is zero
  • the network is reliable
  • the network is secure
  • the network is homogeneous
  • topology doesn't change
  • there is one administrator

If you assume any of these, that may return as a fundamental limitation.

Some fairly easy acceptable solutions - but often at some cost, possibly in one of the other listed aspects.


Mobility gap

What some have called the mobility gap is based on the observations that:

Moore's law describes processing growing exponentially (currently it's often the one we have to think about least)
Kryder's law describes storage growing exponentially (scaling takes some worry in relation to error+failure, but we've started dealing with that),
Nielson's law describes network speed growing more or less linearly.


One implication is that over time, we spend more time sending data around than processing it.

Which implies it is preferable to process it where it is currently stored.

Among many other implications, this is why approaches like map-reduce became useful, even necessary: It is becoming significantly cheaper to move programs to the data than to move data to programs. Given the above patterns, this will continue to change how we solve problems.

Brewer's CAP theorem

See Relational_and_non-relational_database_notes#CAP

Scope of consistency

Consensus

Offline mode

Amdahl's law

See also:

Storage

Flavours

Further considerations

On locks

Re-entrant lock

Critical section, mutex (mutual exclusion)

Futex

'Fast Userspace Mutex' (appeared in linux 2.6 kernels)

When there is no contention on the futex, things can be resolved in userspace library code, so in that case it is faster than a mutex, which always incurs a system calls.


In the case there is contention on the futex, arbitration is assisted by a kernel-space wait queue like any mutex.

In other words, for low-volume locking it is often a more efficient.

Spinlock, busy waiting

Semaphore

higher-level constucts, communication models, and such

Granularity of locking

Opportunistic locking

A Microsoft name, mostly for how it does client locking in SMB/CIFS context.


This seems to be a server-mediated lock that counts on there being one main user for a file, and signalling when there is another.

This lets the first user cache contents locally (and buffer changes) as if it had a read/write lock on the server. Further details only happen when further users come in on the same file.

Those semantics are roughly that the second client requests a lock break, which means it asks the server to ask the first client to write its changes. (what's the state of the locks after that, though?)


https://msdn.microsoft.com/en-us/library/windows/desktop/aa365433(v=vs.85).aspx

https://www.samba.org/samba/docs/old/Samba3-HOWTO/locking.html

Optimistic locking, pessimistic locking

Optimistic locking means you start working under the assumption that most of the time, there is no conflict.


For example, you might fetch a record (plus something that lets you check correctness later, like a version or timestamp), figure out what the new record would be, and send in the change.

If that checkable thing checks out (e.g. that record did not change its version or timestamp), you can just replace it.


If not, you decide what you should do instead -- and this may be as complex as any rollback or merge logic.


The point is that

  • nothing up to that error handling involved any locking.
  • if the problem cases are rare, you can avoid a lot of locking


This can be particularly handy in cases where just locking on all alterations might easily create lock convoys, e.g. when operations are quick and small (...comparable to the locking operation).


That said, in uses where you typically do run into conflicts, this will be no better (still creating lock convoys), and potentially worse e.g. when there's repeated redoing.


Optimistic locking is a good idea if you know the access patterns well. For example, "update our search index every hour" is likely to have few or no such collisions because it's essentially a sequential batch job.




pessimistic locking is, then, roughly the assumption that collisions are so troublesome (consider also races, dirty reads) that locking always is the better choice, leading to behaviour may not be the fastest, but may be a lot more consistent.

It's the right thing to do for strict consistency management (personal information, financial transactions), but can kill performance for sometimes no good reason.

Advisory and mandatory locks

Advisory locks are those that are not enforced.

Each relevant application chooses to observe them correctly - or may not.

Whereas mandatory locks are enforced for you.


Advisory locks are useful when

  • there is no easy way to implement proper locking, e.g. you are a random program wanting to exclude another copy of yourself (and you can't count on kernel or services features being there)
...but you do want to try your best regardness
  • you want pessimistic locking where in a context that allows optimistic locking


An advisory lock can be as simple as "a file existing here", though because this implies no-transaction context, this allows races and other edge cases.

Lock files and file locks

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Lock files are a mechanism of advisory locks, often using files with predictable names as markers for what is in use.

For example, a program can create an empty file foo.lock to signal that foo is currently in use. Any other programs that wish to observe this convention can delay opening that file until that lock is gone.

This is not always free of race conditions, and since it's polled it's slower than it could be, but for a number of appliatins it's a lot better than no locking at all when you just want to prevent multiple processes clobbering each other's work, editing the same file, or such.


File locks usually refer to mandatory locks that you can make on filenames, which make certain file operations block (for other processes) until released.

Local file systems often can do this, managed by the system.

Doing this on networked filesystems requires distributed lock management, which is easily too inefficient or too fragile and doesn't scale well. There are cases in which it works, there are cases in which advisory locking is handier, and for applications you may want to build something cleverer.



  • nix file locking
  • POSIX file locks
fcntl(F_SET_LK)
upsides
portable
can work across NFS
byte-range locking
downsides
tied to to processes, not file descriptors - meaning it doesn't solve much around threading
not secure, e.g. released on close() which can mean embedded code could break your locking logic
  • BSD flock
locks tied to file descriptors, not processes
upsides
safer around threads, useful around forks
downsides
not secure
not useful across NFS
less portable than POSIX
on linux may actually be implemented using POSIX(verify)

inter-process locks

Advisory locking is typically done via lock files, are not very strict but may be good enough for any purpose where apps want to cooperate.


Mandatory locks are possible but tend to be less portable (...fnctl and/or mount options).


Process-shared mutex


named mutex/semaphores


IPC semaphores


SHM mechanisms

flock notes
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

The flock command wraps flock() syscalls around commands that it runs, which means such commands on the same lockpath will run in series.

Example:

flock /tmp/sleepdemo sh -c 'echo Start;sleep 4;echo Done' &
flock /tmp/sleepdemo sh -c 'echo Start;sleep 4;echo Done' &
flock /tmp/sleepdemo sh -c 'echo Start;sleep 4;echo Done' &
flock /tmp/sleepdemo sh -c 'echo Start;sleep 4;echo Done' &

A more practical example might to make cron jobs that may sometimes be long-running sometimes not overlap (though note if they always are, they would seriously queue instead).



If you want to test for a lock being present (rather than unconditionally queue a command), then you want either:

  • -n means nonblocking - exit immediately, instead of waiting indefinitely
  • -w n is a timeout in seconds - after which to give up and exit
# in the background to have a thing to test against
flock      foo -c 'sleep 1h' 
 
# give up immediately
flock -n   foo -c true || echo nope
 
# give up after 5 seconds
flock -w 5 foo -c true || echo nope


Notes:

  • will create nonexisting lock files
  • Without a command, you'll get a "flock: bad file descriptor" - why? (verify)
  • many but not all filesystems support this
  • You can see what's holding a lock on a file like
    fuser -v foo

Non-blocking, lock-free, wait-free

On data

Communicating

With any sort of parallel communication, atomic communication of data can be a problem, particularly whenever threads are involved.

A now fairly common solution is to model in demand/supply in some way, commonly via queues, and/or generators. Such a setup forces the programmer to read and write packages of information at a time, and can allow lazy creation.

This may imply that you should always communicate copies of data, and not heap references.

When handling large amounts of data and you wish to avoid unnecessary copies (databases would, for example) you could probably get away with MVCC-style setup, but it does mean a lot of management even if it avoids locking.


See how things are handled e.g. in

Versioning

lock-edit-commit-unlock

A resource is locked as a whole.


The easiest to code.


Also tends to scale badly, because things are limited by the speed the lock can be locked and unlocked.


Latency reduces the maximum rate, and things will sit around waiting, leading to lock contention.

(The way they wait can also be relevant, see e.g. polling/spinlocks versus event-driven things)


Typically the same actor that locked it must unlock it.

...so without timeouts, this can can leave the resource unusable.


There are also potential deadlocks if actors want to lock multiple thens.


Note that instead of an exclusive lock, you could use an exclusive write lock: others can read, but none should/can write without a lock. This is a little harder to code in that you may now have to deal with atomicity of the changes.

branch-and-merge, copy-edit-replace

Lockless, so people/processes don't lock each other while they work.

A copy(/branch; the difference is mostly in terminology and housekeeping) is edited, and the changes are re-committed.


Can be useful when difference alterations are likely to be independents somehow - different files, different regions in text files, or such.

Often relies on (mostly-)automatic conflict resolution (...partly because manually resolving such problems is probably a time loss comparable to waiting for a lock).

'Just use the last version' may be good enough in some situations -- but that would mostly just replace with a single version and throw away all others, which is only a satisfying solution, well, when it makes sense in the particular application.


MVCC

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


tl;dr:

  • an alternative to aggressive locking - but cannot completely remove the need for locks
  • tends to avoid reading clients blocking writing clients, and the other way around
so e.g. read-heavy workloads tend to wait less
  • can not prevent the occurence of deadlocks
though you can informedly make them less likely


Multiversion Concurrency Control (MVCC) steps away from handling data both monolithically and purely sequentially.

Its focus is on isolating distinct versions of data, and often works out as each client seeing/handling a consistent snapshot of a certain unit of data (where that unit may be a database row, a file, or such) within a certain transaction (e.g. a database transaction, a file operation, etc.).

Conceptually, one client may be working on creating a new version while everything else still sees the current/old version, and only at commit time do we need some detailed handling, not during that reaction.


Details vary with use case (e.g. databases, files, etc), and even with implementation within a use case (e.g. there are a few different MVCC systems in RDBMS systems)


In some applications, it can work out as a variant of copy-edit-replace which is aware of transactions and basic data modeling.


In some situations, for example where commits on the same database tables aren't on the same rows (e.g. different people updating their own profile text) or not conflicting in other ways (say, someone clicking 'set name' on their user profile twice), it can be a perfectly useful middle way between the slowness of locking the entire table for each such operation, and the possible inconsistency of dirty reads and such.


MVCC works well in mostly-read, occasional-write, particularly when there is no strict necessity to see the very latest data at the earliest possible time, as you can serve the latest committed version without delay without it being potentially inconsistent.

From that perspective it provides lockless versioning. If properly set up, it need not break (m)any other guarantees.


When there are multiple writing clients at the same time, things depend on decisions. You can say that the one that committed last becomes current, and the rest is old - it is usually fairly trivial to decide which of these is old, though you can make this cleverer and more complex if necessary.

This may sound like a bad idea, but consider that sequentially handling commits that were isolated alterations on the same basic data amounts to the same idea but with the latest commit time being decisive. In some situations you want really these operations to be serialized, in others it really doesn't matter.


MVCC is not a very good idea when there are dependencies you can break - but that's arguably a misapplication or a fault in modelling, or code - not locking where locking is necessary under most every underlying model.

Note that it certainly matters what your unit of data encompasses (and doesn't), and also whether there is transactioning around it (which may make overall management and cleaning of old versions a little harder - see e.g. the guts of Postgresql's MVCC).


APIs

Larger-scale computing

The Seven Fallacies of Distributed Computing:  Reliable delivery;  Zero latency; Infinite bandwidth;  Secure transmissions;  Stable topology;  Single adminstrator;  Zero cost. --Peter Deutsch


There recently has been a general move to get more processing power by interconnecting many machines, either in your own cluster or in a cloud service. As opposed to putting more power into a single monolithic - and often disproportionately pricy - workhorse.


Which of the specific options is best depends on many things. You can think and read and write about this one for years, but a little common sense thinking about effective bottlenecks goes a long way.


Threading versus message passing

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


Broadly speaking, there are two approaches to communication in parallel execution: threading + shared-memory techniques, and message passing.


Multiprocessing models:

  • UMA (Uniform Memory Access), commonly as SMP:
processor owns no memory (other than its cache), and all view the same memory
e.g. a single bus from CPU to memory controller
speed very predictable, no special planning necessary
...but doesn't well beyond a handful of cores
latency ~10ns(verify)
  • NUMA (Non-Uniform Memory Access): processor owns its own memory, and can view that of others more slowly
e.g. HT and QPI means each CPU talks directly its own physical memory (yes, memory controller is in the CPU now), and interchanges internally when necessary.
NUMA allows further (vertical) scaling than strict UMA
but varying distance/time to different memory and dealing with ownership means access speeds vary more. The averages are higher than in UMA/SMP (though in theory scales somewhat better)
cache coherency is more complicated (and slower), which stops the scalability at some point
latency ~100ns(verify)
  • Cluster: nodes owns their own memory, cannot address that of others
(at opcode level; sofware does it instead)
latency ~1000ns or a bunch of multiples more(verify)

Horizontal scaling inevitably slides down this scale.


To quickly contrast:

When the same memory is mapped by the same thread (of execution, so this includes processes, it

  • allows no-copy sharing
  • requires explicit synchronization

On a single workstation/server (UMA/SMP, or small NUMA via HT/QPI), transfer costs are low and predictable, which makes this more efficient than cluster.

It can be error-prone to have a basis that never synchronizes/locks until you realize you need it (though there are now languages with direct awareness of threading safeness, and better threading libraries for those that are not).


Note that UMA and NUMA can be mixed, e.g. on double-CPU motherboards.


Some larger systems still manage to be NUMA fairly efficiently (e.g. being aware of its own topology).

...but at some point you have to give up and break from it, and effectively go cluster instead.


At this point, message passing becomes preferable.

Data is always shared via a send/copy - which usually is the case anyway (and can sometimes be upgraded to no-copy within the same host under certain conditions)

Message passing is cooperative (a send has to be received), synchronization is implicit with complete sends (avoiding many race conditions).

When data serialization is often part of the protocol, this is easily more portable.

...and many of these details can make the overhead higher. How much higher depends on how much you work with it.

Single-host computing

Distributed, parallel, cluster, grid

Memory issues

Commoditized computing (a.k.a. cloud)

See Cloudy notes

Parallel processing libraries

OpenMP

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

OpenMP targets shared-memory systems (only) - so in some ways is just a portable alternative, or wrapper, to pthreads / windows threads.

A compiler-level (C, C++, FORTRAN) thing that decently abstracts, so makes it fairly easy to safely access data structures, make syncing easier, do parallel sections, parallel loops, and such.


Not to be confused with (Open)MPI.

MPI

MPI is a spec, aimed at coordinating a bunch of processes, and optionally any transfers between them.

More technically, MPI is just one of many message passing specifications/implementations.


Implementations include OpenMPI, MPICH, HP-MPI, and others - see [1].

MPI is a little more language/environment-agnostic than some alternatives - if a little overdesigned (to the point that sometimes makes it a bit hairy to use, though usually just in terms of config that are for your sysadmin to worry about).


When/why MPI?

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Some notes on MPI structure and jargon

Examples

Hello world

A minimal hello world is:

#include <stdio.h>
#include <mpi.h>
 
main(int argc, char **argv) {
  int ierr;
 
  MPI_Init(*argc,**argv);   // returns error code, here we assume it won't have trouble
  printf("hello world\n");
  MPI_Finalize();           // ditto on the error code
}
As part of the init (which gets some details via the fact that you'll start it via
mpirun
with something like -np 4), one process takes care of starting the others, which means that all of them print hello world.


When you want to tell the processes apart, or communicate, you'll probably want to fetch and report the rank (and often size)

Distributing jobs

MPI config trouble

A requested component was not found, or was unable to be opened

At different levels, the lines

ompi_rte_init failed --> Returned "Not found" (-13) 
ORTE_ERROR_LOG: Not found in file runtime/orte_init.c

seem to mean the same thing.


ompi_info
should list at least a few dozen MCA lines.

If it shows only a few, and the following error, then you've probably got a mix of openmpi versions installed, and paths (binary/library) effectively mixing them.

A requested component was not found, or was unable to be opened.  This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded).  Note that
Open MPI stopped checking at the first component that it did not find.

Host:      myhostname
Framework: crs
Component: none

However, this is not always a good test when specific software may do things in its own environment.

Keyval parser: error 1 reading file mpicc-wrapper-data.txt

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Seems to not be fatal (compilation works), but presumably it won't use the configuration.


Seems to typically be a bad openmpi config (possibly because of updates, mixing versions, etc). I have no idea in which way it's bad, and it seems that since the problem is rare no one has really figured that out.

A fairly thorough way to fix it is to uninstall MPI, clean any leftovers (e.g. anything that
locate -e openmpi
mentions that looks like config files), and reinstall(/rebuild).


Runtime MPI Errors

For help debugging programs using MPI, see http://www.open-mpi.org/faq/?category=debugging and similar, and some of the below.


mca_btl_tcp_endpoint_complete_connect] connect() to ipaddress failed: No route to host

Boils down to a generic TCP "No route to host" error (which is what errno 113 refers to).

Check NIC status and link lights (e.g. ethtool).

Check routing rules (e.g. netstat -r).


Error 137

Seems to be a generic out-of-resource error.

In our case it came from a process that allocated more than it should have, and was oom_killled


MPI_ABORT was invoked on rank 0 in communicator mpi_comm_world

Regularly


See also

Notes on threading implementations (lower levels)

tl;dr

windows and posix threads

On hardware

APIs and implementations

On threads and cores

Notes on thinking about threads and other parallelism (higher level)

Unsorted