Content and code versioning

Why?

File versioning

Versioning any content is useful already. Consider that almost all shared-documents apps of the world (dropbox, online documents, onedrive, seafile, etc.) have some form of this.

Say

you like having an archive of recent versions (or potentially all)

so that you can go back before recent mistakes

when you have more than one person who can access the thing

...though not much more. It's arguably just a backup system that has been polished to let you poke at it on a per-file basis.

Code versioning

as a type of versioning that was used widely, this came first.

And goes further, by working at the level of distinct changes within a file (not just the file overall), but mainly just for text files.

the history of all versions has a more explicit 'easy to reverse' about it

which is useful even on single-contributor personal projects

so that you can try restructuring things, and if that turns out to become an unworkable mess I can revert those files a few versions and it's all good

also useful as a log of how bugs were fixed, and might regress

deals more specifically with distinct changes in the same file (easier to handle than 'make archive of source directory every day')

useful to merge in changes, and deal with conflicts when people work on the same code

often making it easy to look at and merge the differences between conflicting files

...rather than leaving it entirely to you to stare at two nearly-identical files

let you have multiple working copies of the code

e.g. a development version, and production version (and maybe more)

useful to keep such variants organized, particularly since they will share most code

...rather than you making version-1.0-final-reallyfinal2a.zip which you probably won't do anyway)

There are a lot of more minor details that can matter in practice, including but not limited to:

you like having an archive of all releases (release basically meaning 'collection of files of a specific version' in a shared pool

...rather than many '-v2.1.reallylast' on a lot of different workstations

makes it easier to go back and review at what stage a particular bug was introduced and fixed

can helps learning to avoid that next time

automated builds for testing and such (see e.g. CI/CD) is probably simpler to tie to a versioning system than to do manually

Code revision glossary

Beyond revision, there's some terms typically seen.

Particularly branches, tags, and trunk directories. In most systems, none of these have special status beyond being conventions that everyone shares.

Branches are working areas separated from the main line (trunk) of development, often for doing larger changes without interrupting everyone else until you're done, or experiments.

Merges takes branched code and merges it back into whatever they came from.

Branches tend to make cooperative development simpler. There are many possible strategies, some with names, each of ones have their upsides and downsides, but at least make it clearer for everyone what to expect and how to communicate about it. They can also make things like make backports and bugfixes easier (taking out the the set changes that are that bugfix should be simpler, and applying that to an older version may or may not be hairy, but it is still a well-contained job)

Trunk is a term regularly used to refer to the main line of development - in contrast with branches (...get it?). The bulk of your project is typically in trunk. It may be called main Or possibly branches, if there you have both stable and development releases, with things like backports going on.

Tags refer to snapshots of specific source code (which under the hood may e.g. be reference to a revision, or to specific revisions per file, depending on how the specific revision system models things), which is a great way to keep track of release candidates, specific released version or such.

While you are working alone and/or without branches, a revision can take the role of being a particular release, but in the face of any branching whatsoever, or even just multiple projects in the same repository, tags are a nice organization tool. And take almost no extra storage. (Word of warning: You probably want to settle a conventions on tag naming ahead of time. Organisation doesn't stay useful if you can't tell what you meant some time later.)

In smaller groups:

you may even get away without any branching, when it's workable to polish one shared version until it works

and can ignore the trunk distinction completely

you may not need tags, for similar reasons

working alone I can usually tell by the commit message

In larger groups:

you may find that you only ever want finished branches to update the main development line.

you usually want some conventions about how to branch, mostly how you communicate that you are and what it means to other development, to keep everyone sane (and avoid double work, pointless work, etc).

tags are great because it's always clear what specific releases were

Historical options, modern options

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Historically, one can mention CSSC, RCS, and CVS, and a bunch more, but you probably wouldn't use them today.

There are now quite a few systems out there, with mostly overlapping functionality. Some more current ones:

Subversion based in part on CVS but with focus on ease of use.
- you may want to look at svk, TortoiseSVN

git (inspired by the proprietary BitKeeper, and Monotone. Written to be flexible and fast enough to deal with the linux kernel source project. Note this speed-for-size requirement is fairly unique) [1].

web wrappers: gitlab, github, bitBucket, more (verify)

GNU Bazaar [2]

Mercurial
GNU arch

Perforce - non-free for real-sized teams (>2?), but not expensive

see also http://en.wikipedia.org/wiki/List_of_revision_control_software

Criticism of various systems includes that you need to learn each system's terms, behaviour and quirks, because all of these are different in some details. So people tend to stick with the first one they tried unless it becomes unworkable.

Before you choose one based on "heard good things about", think about a few requirements you may actually have such as:

is it simple to work with? Fancier features often comes with more thinking about it.

ability to work with eventually very large repositories or a centralized repository server that stores a lot of data (say, gigabyte scale on disk) without some operations slowing to the order of seconds
centralized or shared? (depends on cooperation model, though many probably want centralized)

do you want offline editing? (That is, do you like to work on a laptop?)
- you you need (a possibly offline-style) edit-merge-commit style, can you deal with (possibly online-only) checkout-edit-checkin style?

if the system is one that need occasional adminning, is there someohe to do that within reasonable time

do you want a system easy enough to understand that you could do it by hand, or is relying on IDE integration enough?

Content and code versioning

Contents

Why?

Code revision glossary

Historical options, modern options

Criticism (of parts and approaches)

On criticism and pragmatism

On conventions to numeric versions, and relations to code versioning

See also

Navigation menu