Continues Git for Philosophers (pt. 1)
Collaborative Writing with Git
Collaborative writing presents similar issues as collaborative programming: different people making changes to the same document from different locations. Sending the document back and forth is inefficient: only one person can work on it at a time, and there is a risk of changes being made in parallel which are then either overlooked or which are hard to reconcile. Collaborative editing tools (e.g., Google Docs for Word documents, shared LaTeX editors like ShareLaTeX and Overleaf help if you’re using those formats, but have their own drawbacks (e.g., they can’t be used offline). Simply keeping your shared Document in a Dropbox folder is a partial solution, but the doesn’t solve the issue of conflicting changes to your shared document. Dropbox assumes that you know what you’re doing. So if you make a change to a document on your office desktop, and then make a change to the same document on your laptop, the later change will silently overwrite the former. If for some reason you didn’t update the file on your laptop with the changes from your office desktop (say, if you were without an internet connection), these changes will be gone from the document when you save it the second time. The changes will be preserved in a previous version of the document, since Dropbox keeps every intermediate version — but it might take you a while to notice that your change got lost, and it might take you a while to find the most recent version of the file on Dropbox that still has it.
Dropbox is a bit more careful when two different users make changes to the same document in a shared Dropbox folder. If your collaborator has edited the file while you were away, you will see the changes automatically. But if you’ve had your file open in an editor while your collaborator has made changes — perhaps even with your laptop asleep! — Dropbox will realize that there may be a conflict between your version of the document and your collaborator’s. But it won’t know what to do. Rather than overwrite one of your changes, Dropbox will make a copy of the file you’re working on simultaneously, and leave you to figure out which version is more up-to-date and how to reconcile the two copies into one. So if author A and author B are both working on a document, author A adds a sentence to the introduction and at the same time author B adds a sentence to the conclusion, your shared Dropbox folder will suddenly contain a second version of the document, document (author B's conflicted copy).doc
, say. It will contain the additional sentence in the conclusion, but not author A’s additional sentence in the introduction, and document.doc
will contain author A’s sentence but not author B’s. Dropbox will leave it to you to figure out who has made what change where and how to reconcile them and integrate the two documents into a single one. This is annoying and complicated, especially if the conflict goes unnoticed for a while.
Using Git helps you avoid these problems to the extent it is possible to avoid them.
When you have set up a repository for your writing project (say, containing a LaTeX document plus a BibTeX file for references), you can edit the files, commit your changes to generate a new revision, and periodically push your revisions to the remote repository on GitHub or GitLab. If you are working on the project with someone else, you can give them access to the repository as well. If they have push access, they can send their own revisions to the shared repository just like you do.
Git keeps track of the state of your own local repository and the remote on GitHub/GitLab. The command
git status
will prompt Git to display the status of your repository: which branch you are on (typically, this is the master branch, but more about branches later), whether your local version of the repository is up to date with the remote or not, which files have changes that are waiting to be committed, and which files are untracked. If you have commits that you have not pushed to the remote yet, Git will report something like “Your branch is ahead of ‘origin/master’ by 1 commit.” The remote repository is usually called origin, and the remote branch that corresponds to your local master branch is then called origin/master. It might happen that your branch is behind the remote: if your collaborator has pushed changes to the shared remote, there will be commits on the remote that you don’t yet have in your local repository. Before you can push your changes, you will have to incorporate your collaborators’ changes into your own local version of your paper.
If your document is under Git control, you have to pull changes from the shared repository before you see what your co-author has done. By the same token, you and they have to remember to push new commits to the repository, or there won’t be any changes to pull. In the crucial case where you have both made changes at the same time, however, Git will handle the discrepancies gracefully.
In the best case scenario, you and your co-author have edited the same file, but you haven’t edited the same part of the file: say, they added a footnote to the introduction, you have cleaned up a passage in a middle section. If they have committed their changes and pushed to the shared remote, Git won’t let you push your changes. You’ll get an error message like
! [rejected] master -> master (fetch first)
If your changes do not conflict (were not made to the same line of text), Git can fix this automatically. Just say
git pull
Git will then download (“fetch”) the changes from the remote and automatically merge them with your version of the repository, creating a new revision which includes both your changes and the older changes by your co-author. Your repository is now ahead of the remote by one commit (the act of merging your co-authors changes with your own created a new revision), and you can push the combined changes to the remote.
If you did happen to both make changes that cannot be merged automatically, Git will alert you to this fact:
CONFLICT (content): Merge conflict in <filenames>
Automatic merge failed; fix conflicts and then commit the result.
Instead of letting you fend for yourself in figuring out what has changed and where, Git will tell you exactly what you have to fix. The files with editing conflicts will now contain the conflicting lines, indicating your and their changes, e.g.:
<<<<<<< HEAD
what you wrote
=======
what your co-author wrote
>>>>>>> hash code of your co-author's commit
in the relevant place in the document. Fix up just that part of the document. Then say git commit -a
. Your local repository now contains a conflict-free version of both your changes, which is ahead of the shared repository by one commit, and Git will again let you push to the remote. When your co-author returns to work and says git pull
, they will have the merged, clean version of the document.
Note that your intervention is only required if both you and your co-author have made changes to the very same line of text, otherwise Git will merge the changes automatically. This includes the case where one of you adds a paragraph and the other one cuts text somewhere else in the file: after git pull
the file will have the new paragraph but the text you deleted will still be gone.
(Continued in part 3; Read, fork, or download the full document on GitHub.)
Thanks, Richard, this is super useful and clear!