Content and code versioning

From Helpful
Jump to navigation Jump to search

Why?

File versioning

Versioning any content is useful already. Consider that almost all shared-documents apps of the world (dropbox, online documents, onedrive, seafile, etc.) have some form of this.

Say

  • you like having an archive of recent versions (or potentially all)
so that you can go back before recent mistakes
  • when you have more than one person who can access the thing

...though not much more. It's arguably just a backup system that has been polished to let you poke at it on a per-file basis.


Code versioning

  • as a type of versioning that was used widely, this came first.
And goes further, by working at the level of distinct changes within a file (not just the file overall), but mainly just for text files.
  • the history of all versions has a more explicit 'easy to reverse' about it
which is useful even on single-contributor personal projects
so that you can try restructuring things, and if that turns out to become an unworkable mess I can revert those files a few versions and it's all good
also useful as a log of how bugs were fixed, and might regress
  • deals more specifically with distinct changes in the same file (easier to handle than 'make archive of source directory every day')
useful to merge in changes, and deal with conflicts when people work on the same code
often making it easy to look at and merge the differences between conflicting files
...rather than leaving it entirely to you to stare at two nearly-identical files
  • let you have multiple working copies of the code
e.g. a development version, and production version (and maybe more)
useful to keep such variants organized, particularly since they will share most code
...rather than you making version-1.0-final-reallyfinal2a.zip which you probably won't do anyway)


There are a lot of more minor details that can matter in practice, including but not limited to:

  • you like having an archive of all releases (release basically meaning 'collection of files of a specific version' in a shared pool
...rather than many '-v2.1.reallylast' on a lot of different workstations
  • makes it easier to go back and review at what stage a particular bug was introduced and fixed
can helps learning to avoid that next time
  • automated builds for testing and such (see e.g. CI/CD) is probably simpler to tie to a versioning system than to do manually



Code revision glossary

An example history of some branch-heavy development, with branching and merging, and some tags


Beyond revision, there's some terms typically seen.

Particularly branches, tags, and trunk directories. In most systems, none of these have special status beyond being conventions that everyone shares.


Branches are working areas separated from the main line (trunk) of development, often for doing larger changes without interrupting everyone else until you're done, or experiments.

Merges takes branched code and merges it back into whatever they came from.

Branches tend to make cooperative development simpler. There are many possible strategies, some with names, each of ones have their upsides and downsides, but at least make it clearer for everyone what to expect and how to communicate about it. They can also make things like make backports and bugfixes easier (taking out the the set changes that are that bugfix should be simpler, and applying that to an older version may or may not be hairy, but it is still a well-contained job)



Trunk is a term regularly used to refer to the main line of development - in contrast with branches (...get it?). The bulk of your project is typically in trunk. It may be called main Or possibly branches, if there you have both stable and development releases, with things like backports going on.


Tags refer to snapshots of specific source code (which under the hood may e.g. be reference to a revision, or to specific revisions per file, depending on how the specific revision system models things), which is a great way to keep track of release candidates, specific released version or such.

While you are working alone and/or without branches, a revision can take the role of being a particular release, but in the face of any branching whatsoever, or even just multiple projects in the same repository, tags are a nice organization tool. And take almost no extra storage. (Word of warning: You probably want to settle a conventions on tag naming ahead of time. Organisation doesn't stay useful if you can't tell what you meant some time later.)


In smaller groups:

  • you may even get away without any branching, when it's workable to polish one shared version until it works
and can ignore the trunk distinction completely
  • you may not need tags, for similar reasons
working alone I can usually tell by the commit message


In larger groups:

  • you may find that you only ever want finished branches to update the main development line.
  • you usually want some conventions about how to branch, mostly how you communicate that you are and what it means to other development, to keep everyone sane (and avoid double work, pointless work, etc).
  • tags are great because it's always clear what specific releases were




See also:

Historical options, modern options

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Historically, one can mention CSSC, RCS, and CVS, and a bunch more, but you probably wouldn't use them today.

There are now quite a few systems out there, with mostly overlapping functionality. Some more current ones:

  • git (inspired by the proprietary BitKeeper, and Monotone. Written to be flexible and fast enough to deal with the linux kernel source project. Note this speed-for-size requirement is fairly unique) [1].
web wrappers: gitlab, github, bitBucket, more (verify)


  • Perforce - non-free for real-sized teams (>2?), but not expensive


Criticism of various systems includes that you need to learn each system's terms, behaviour and quirks, because all of these are different in some details. So people tend to stick with the first one they tried unless it becomes unworkable.

Before you choose one based on "heard good things about", think about a few requirements you may actually have such as:

  • is it simple to work with? Fancier features often comes with more thinking about it.
  • ability to work with eventually very large repositories or a centralized repository server that stores a lot of data (say, gigabyte scale on disk) without some operations slowing to the order of seconds
  • centralized or shared? (depends on cooperation model, though many probably want centralized)
  • do you want offline editing? (That is, do you like to work on a laptop?)
    • you you need (a possibly offline-style) edit-merge-commit style, can you deal with (possibly online-only) checkout-edit-checkin style?
  • if the system is one that need occasional adminning, is there someohe to do that within reasonable time
  • do you want a system easy enough to understand that you could do it by hand, or is relying on IDE integration enough?


See also:

Criticism (of parts and approaches)

On criticism and pragmatism

On conventions to numeric versions, and relations to code versioning

See also