Git notes
Waffling about mental models, and "For those coming from other versioning systems..."
Asking what and why
From a distance:
- git is a content/code versioning system
- one of many
- more flexible than others
- more complex than others as a result
- currently popular
What do you even use it for?
If you have a bunch of programmers on the same team, then you need some sort of code versioning to keep sane while you make (hopefully distinct) changes in the same shared set of files.
Whether you're in a team or alone, it is also a nice way to keep a history of changes.
- It's not quite the same as backup, but it's similar safety net.
For the most part, any source management system covers these bases, and all of them broadly work very similar to each other.
So why choose git specifically?
In the real world you may also have the concept of people sending in unsolicited code.
Bare git is arguably no better than the classical "here's an email with a diff to some recent version, good luck", but git as used via github/gitlab is sort of nice for this, if only because it forces the people sending in changes to do more prep work to send it in a form you can deal with more easily.
Why git and not something else? How to choose?
Comparing purposes
A little more on the waffling side
Relevant snark
- "If the power of git is sophisticated branching and merging, then its weakness is the complexity of simple tasks."[1]
- 90% of people don't need 70% of git
- and may specifically want to avoid it, because why invite extra edge cases? Who wants to learn them if they will never use them?
- ...also, even that basic 30% ends up a little more complex than other systems
- the CLI documentation is very opaque, unless you already understand everything.
- There are no abstractions, and there sure are a lot of technicalities.
- The documentation is also not geared towards specific tasks, so you will probably not find the thing you want by searching for it
- turns out the CLI should absolutely not be how you learn git's backing model
- because the CLI is already a layer of pragmatism on top of it, so it will be misleading
- For example, git reset does around six entirely distict things in terms of the underlying model, which is not only confusing but also potentially dangerous
- (note that this isn't the CLI's fault - it grew organically, and new operations got tacked onto the closest matching command, which frankly makes sense)
- the whole thing where every copy is indpendent is... true but almost irrelevant - no one uses it wild west style.
- it's hard to teach git far beyond the barest basics, because most demo code will not have the problems you have in the real world
- more power and flexibility means more edge cases, that you will have to learn sooner or later
- ...in crunch time, git will throw yet another error you've never seen before
- git can be smarter at handling conflicts than some alternatives - which is great
- but when it hits something it doesn't deal with, you suddenly need a deeper understanding than of those alternatives.
- Good luck! (hint: many of us are pretending, haven't found such a case yet, or did the "wipe copy, start over" thing)
- ...so sometimes the best way to resolve errors is to wipe the project and upload a new copy
- unless you're one of the people who actually uses it distributed, in which case -- good luck!
- there is a some not-great naming going on.
- Heck, git can't always even agree with itself. Depending on context, it uses 'index', 'stage', or 'cache' which are the exact same thing.
- it might have been easier to understand intents if (stealing from here), (and a little from here):
- "commits" could have been called something like "snapshot of reachable content", even though...
- most operations on commits only care about the difference that that makes - which is derived from what is actually stored
- 'snapshot' has its own problems
- 'reachable' has a specific definition in git
- this it doesn't explain how these references actually work
- "repository" could have been called the "snapshot store".
- or perhaps 'store of all the objects referred to in in snapshots'
- "branches" could have been called "snapshot lineage"
- ...implied via just labels that aren't even part of the repo
- "index" were called the "staging snapshot"
- "stash" could have been called the "content drafts"
- in that it is put aside everything else
- Git becomes easier to use once you use third party wrappers instead -- be it github/gitlab/similar, and/or client-side tools, and/or IDE integration
- ...though each one will have their own workflow, none of which are quite the same
- ...some of them add convenience that is so useful that it becomes hard to live without
- github/gitlab has a bunch of of actually-quite-nice tooling
- ...to deal with things that only really happen when you have zero communication with upstream before a lot of code is changed (...that the upstream maintainers will typically reject the first version of anyway)
- also, that tooling isn't a part of core git
- github != git
- to the point that some of the most important github/gitlab knowledge has no direct CLI equivalent. Such as the pull/merge request.
- Also, I guess we're fine with microsoft feature-controlling even more of our go-to open-source dev environment, then?
- Github introduces its own behaviour, and edge cases that bare git introductions can't even warn you about
- for example, ask express.js or node.js how many accidentally spammy edit pull requests on their README.md -- because yes, a website button is also a pull request, by anyone, towards any public repo.
- why does that two-line merge take a minute?
- Who can tell?
- Probably your fault though.
- (cue long discussion about packfiles, how purges would fix this but at the same time are a bad idea, etc.)
- don't use git as backup, because there are several ways you can wipe out contents permanently
- so have your own backup
- I don't even mean with open-heart surgery like repocleaner, I mean with base git
- incidentally, backing up git is hard to make both correct and efficient/fast at the same time (especially if you like repack)
- git is a classic case of "when these dozen previously things vague snap into place, you will suddenly get a lot of it".
- Until that point you will be actively confused by the fact that you will not be tutorialized, by some concepts are explained poorly, and even conflated.
- ...seemingly conflated for the sake of those who already get it to make more concise sentences, at the cost of all learners. (I personally think that was a bit of a mistake, but hey)
- "Git becomes a lot easier once you understand that [x]"
- ...almost invariably means "I stared at it long enough to internalize enough of it to sort of get it"
- and almost never means "I now produce a paragraph that will help you learn the whole thing more easily"
- git's documentation seems so adverse to actual explanation that it is nigh impossible to understand unless you already understand git well
- every page one uses terms that it doesn't explain, and it's uncertain where they re defined. By the time you've found definitions you've read most of everything and wasted at least one workday.
- Say, git-rebase says it "forward-ports local commits to the updated upstream head".
- Uhhuh, uhhuh.
- That sure didn't say that
- content-wise, it's taking changes on one branch/copy and figuring out what sort of commits you need to do to make the same changes on another branch/copy, create a new commit based on those changes,
- most people's intent is to send a simpler commit to another clone/copy (in a pull/merge request), or even towards yourself to isolate iterative commits to branches to then collapse once you're done with each, or that you're specifically doing non-linear things to the end of keeping the main branch stay quite clean and linear
- and that more technically, the point is that your commits are now against an earlier version/commit, and rebase allows you to ask "git, please take this later version/commit and figure out the diff/commits against that
- All that may be obvious once you already know all that, but, um... what is documentation for again?
- This is about as legible
Team contributions versus unsollicited contributions
A little more practical
On confusing concepts and terminology
What is a commit, really?
What is a branch?
What is "tree-ish"?
What is the reflog?
History is backwards - on reachability
What is a commit, broadly
Most of the time, it is conceptually most useful to see a commit as:
- a reference to the state of things you want to tack your changes onto (which will be an existing commit) - the parent
- the contents of files you changed in that commit, and
- a snapshot of everything after that commit
Model for your local state
You can think of your own directory you're git-versioning as
- your local git copy, alongside...
- containing file changes you have not yet added to git
- your own HEAD
- which is the commit you have currently checked out
- which in general is the last on the current branch, but doesn't have to be.
- you can add commits, but they will not belong to any branch - considered detached HEAD - and you only ever do this with a plan to soon resolve that
- your own branches
- you start with one, you can have more
- a staging area, so that a commit can be a specific set of changes
- the index is what you stage to, which you build up interactively
- then commit that to your own copy (in a single transaction)
- Commits
- think of them not as "the new revision that everyone should have" (as in repo/working copy), but of each commit as a specific annotated collection of differences
- a commit is local unless communicated
- each commit has an id
- each will chain onto a previous commit
- which a lot of the time makes a straight line (one parent) but occasionally branches (two things have the same parent), and merges (multiple parents)
- (the structure is a directed acyclic graph)
Stuff you should learn better eventually
What a commit is
Branching
Merging
Squashing/rebasing merges
Introduction by example
Client setup (optional)
Note: You may wish to set how you'll be identified elsewhere: (...saved in your ~/.gitconfig)
git config --global user.name "My Name" git config --global user.email my.name@example.com
There's some other config you may want to play with, like:
git config --global color.ui true
Starting to work with versioned code
Perhaps the simplest way to start is to
- find or create a project on github, and
- clone it
- implicitly sets up that project as the origin, e.g. for later pulls (and, if it's yours, also pushes)
git clone https://github.com/example/test.git
Alternatively, you could create a completely blank repository
- perfectly good for messing around, but you e.g. won't get to test any pushing or pulling until you learn how to set up origins and such.
git init
Also keep in mind that if you make a copy of a git directory (including its .git metatada), you get a separate copy you can play around with.
(This is also the simple-and-stupid way to make backups)
Basic staging and commits
Undoing things
See a few commands in the conflict resolution below
Inspecting local state (staged, committed but not pushed, stashed)
Inspecting some stuff
Communication
Interacting with connected repos
Pull requests basically mean you saying "hey collaborator, I've completed adding this feature to your code, might you want to integrate it?".
Pull requests aren't really a git concept, they're added by git hosters.
And they make more sense to do with such a more centralized place, than with a "everyone has their own copy" variant, if only because of the amount of confusion involved.
Branches and communication
Tags
On using someone's existing branches
Pull requests / Merge requests
Branching for cooperation
- shared branches
Conflict handling
Specifying commits and ranges
Resolving conflicts (also: undoing things)
stashing
"You have not concluded your merge (MERGE_HEAD exists)."
CONFLICT (rename/delete)
e.g. deleted in HEAD, but locally renamed to something else
Your branch and 'origin/main' have diverged, and have 4 and 5 different commits each, respectively
....but the commit history is identical
More regular
"Your local changes to the following files would be overwritten by merge"
You have changed a file.
- which is a difference to remote copy
Someone else changed that file too
- which they committed and pushed (to what you consider remote HEAD)
You are pulling their changes.
That pull wants to update a file (remember, pull = fetch + merge).
A file you have changed, and specifically in the same areas, so it cannot be merged automatically.
So that's a conflict, in that you probably don't want it to overwrite what you have done.
"Updates were rejected because the remote contains work that you do not have locally"
Your branch is behind 'origin/master' by 8 commits, and can be fast-forwarded
error: You have not concluded your merge (MERGE_HEAD exists)
The previous pull needed to merge, tried to merge, and failed to do so.
There are multiple reasons you can get into that situation, and the best fix will vary along.
Your configuration specifies to merge with the ref 'refs/heads/master' from the remote, but no such ref was fetched
Seems to mean that that ref doesn't exist - anymore, or never did.
Chances are this came from a git pull, which you'll remember is effectively a git fetch plus git merge,
and it is the latter that complains.
(not that a git merge gives the same error, but hey...)
Other messages and errors
fatal: detected dubious ownership in repository
Some files are owned by other users, e.g. root, which is potentially security-relevant. (That is, if you share storage with untrusted users, them editing your .git/ can be Bad)
Apparently it won't tell you what it saw, though.
Which is probably why the suggested fix is "no just trust it, and ignore this security warning", but it's probably a good idea to actually look at the ownership first.
Options
- change ownership, often something like
chown username:groupname /path/to/dir -R
- say you don't care
git config --global --add safe.directory /path/to/dir
You are in detached HEAD state (and: what is HEAD)
This repository moved. Please use the new location
This is github informally telling you that the repo was probably renamed, it's resolving that for you, but you may want to change what you're referring to.
You probably want to do:
git remote set-url origin 'new_url'
Altering history (and potentially creating bigger problems)
Other notes
Credential stuff
Credential management
github personal access tokens
A few years ago, github stopped allowing passwords in credentials.
It wants you to use access tokens, which are a mix of
- a longer password
- each (yourusername,atoken) pair
- can be associated with its own rights
- can have its own expiry
As https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens explains, generating a token is done at:
Settings → Developer settings → Personal Access Tokens → Fine-grained tokens
Criticism #1: this is coarse grained, mostly allowing "write to all my repos"
Criticism #2: It does not explain how to actually use it.
- sort of makes sense, in that each client does it its own way -- including how it prefers to store the credentials
- but some instructions for common clients would have been nice
Semi-sorted
rerere
Objects on disk, packfiles, backup, and cleanup
bundles
git is a great drive corruption detection tool
"Your configuration specifies to merge with the ref 'refs/heads/master' from the remote, but no such ref was fetched."
Lemme guess, github user?
Specifically, you just created a new repo on github, just cloned it, and thought you could fetch/pull from it?
If so, the simplest fix is probably to remove your clone, and clone it again.
"Wait what. Why."
What that error means is that the remote that you call origin does not have a branch with that name.
In general practice, this error often means that someone else removed the branch it names Template:(because the likely reason your copy points at something is that that ''used'' to exist).
"Why would there be a branch called 'master'?"
"Why isn't there now?"
- because many public git hosts moved to naming their main branch 'main' and not 'master'for reasons not really relevant right now.
So the repo copy that that clone made was correct at that time. Yet after you created a repo, it instructed you to do something like:
git branch -M main
Where -M is --move --force, i.e. a branch rename, i.e. move your current branch onto a new branch called 'main'.
"But I ignored that instruction, I didn't do that"
Yeah. But you did one of the things on that page, like add a README via the site, figuring that would be useful.
Soooooo it turns out that implied that github did a branch rename without telling you.
Surprise!
Which means your original clone is now pointed at something that doesn't exist, and you need to do some git gymnastics to point it to the right branch.
It's just easier to remove it and clone it again.
Backup
Git URLs
What's with pull requests?
fast forwarding
degit
Looks to me as if
degit some-user/some-repo
is functionally much like
git clone --depth 1 https://github.com/some-user/some-repo && rm -rf ./some-repo/.git/
It mostly seems used by webdevs who put a template on github, and want to save keystrokes fetching it and leaving no extra mess.
Which is quite useful.
Notes:
- actually does more, e.g. fetches a tgz into a cache in your user dir, which speeds up repeat installs
Unsorted
See also: (TODO: do so myself)
Reference-like:
Introductions:
- http://marklodato.github.com/visual-git-guide/
- http://scottr.org/presentations/git-in-5-minutes/
- http://www.gitcasts.com/
- https://git.wiki.kernel.org/index.php/GitFaq
- http://progit.org/book/
Discussions and other:
git GUIs
In no particular order:
- gitkraken (win+lin+osx)
- git-cola (win+lin+osx)
- smartgit (java)
- gitg (lin)
- comes with git:
- git gui (more for management, not so polished)
- gitk (mostly a viewer)
- qGit (mostly a viewer)
- giggle (mostly a viewer)
...also note that various IDEs have integrated git. Some of them are quite good, even, and potentially more convenient than anything external.
github-specific
Extensions to git
Pull requests
LFS
Hosters don't like you pushing large files.
Nor will you, when
- you realize that changing large files will mean the bulk of space taken by all copies is now versions of that file.
Remember, one of the selling points is that everyone has a full version history (yes, you can actually remove things from that history(verify), but it's not really intended).
- you notice git would taking minutes to do anything, and trashes your computer when you try a gc or repack.
So e.g. github warns you above 50MB, and refuses above 100MB, and limits your repository to a few GB.
Other hosters have similar limits.
Git LFS (Large File Storage) is an extension developed and used by some of these git hosting sites.
It comes down roughly to
- your repository stores what amounts to a pointer - to a completely separated storage (that we happen to call LFS)
- specific clients know what to do with that
For that to work, in a regular add/commit/clone/pull workflow, all collaborating clients (and probably the git hoster) need to support this LFS extension.
- A client with LFS support will work transparent in that it will fetch the content that this pointer points to
- A client without LFS support installed will just see files that happen to contain these pointers)
The specific service called LFS has a (rather opaque) set of limits to storage and to bandwidth[3].
- so beware - using this for actively changing data is effectively a paid service
github specific
"Your main branch isn't protected"
gitlab specific
groups
https://docs.gitlab.com/ee/user/group/
Merge requests
Basically the same as pull requests