Git for Philosophers (pt. 1)

What is Git?

When software developers work on complex programming projects, they use something called a revision control system. A revision control system allows them to keep track of changes in their code — it stores a history of changes, and allows them to quickly and easily take back (“revert”) changes that turn out to break things. It also makes is easy to collaborate with others: multiple contributors can edit and add code, and the revision control system automatically integrates changes when possible, and alerts contributors to conflicts when it isn’t. Software code is just text, and revision control systems work just as well with LaTeX code, or Markdown text, as they do with Java or Haskell programs.

Git is such a revision control system. It is distributed, which means that the entire history of your code lives not just on a server somewhere, but on your own computer and any other computer that uses the same shared code. Thus, you do not have to be connected to the internet to use Git to make changes, add material, or revert to a previous version. In order to collaborate with others, it is of course necessary to make code maintained by Git available over the internet. There are a number of websites which provide this service, the most popular ones are GitHub and GitLab. Both are free. GitHub is bigger, but GitLab is open source and allows you to have private projects without paying.

You are no doubt familiar with the history function in Word, and with automatic backup services like Dropbox. Git is a little bit like these, and so it might be useful to compare them as we go along. You can use Git as a backup solution, but it is not an automatic tool.

How does Git work?

A collection of files managed by Git are called a repository. A repository may be a single text document, or an entire collection. It is essentially a folder (possibly with subfolders) for which Git keeps track of changes to (some of) the files in it.

A particular state of a repository is called a revision. In contrast to automatic versioning (e.g., track changes in Word, or automatic updates in Dropbox), a Git revision must be explicitly created. If you have a file on your computer that is tracked by Git, you can make any changes you like to it, but you have to tell Git when you want your changes to “stick,” i.e., to count as a new revision. This is called committing. When you commit changes in a Git repository, Git creates a new revision of your repository. A revision may include changes only to a single file, or it may include changes to files, new files added to the repository index, files renamed, moved, or deleted. Each revision is assigned a unique 40-digit hexadecimal “hash” code (often abbreviated to a 10-digit code). When you commit changes to create a new revision, the identity of the committer (i.e., you) as well as a commit message describing the changes is recorded.

One of the differences between Git and, say, Dropbox, is that any new Git revision has to be created manually by committing, while Dropbox just checks if a file has changed on your disk, and makes a new revision whenever that happens. Another difference is that Git only tracks those files it has been asked to track, while Dropbox tracks every change in every file in the Dropbox folder. To ask Git to track a file, you add that file to the repository index. A third difference is that Git works locally until you tell it to save or load changes from the cloud, whereas Dropbox not only records your changes automatically, it also saves those changes to the cloud, and any change in the cloud automatically is mirrored on your own computer without you having to do anything — but also without asking you first!

To save the revisions in your Git repository over the internet, your Git repository must be linked with a version of the repository on a server somewhere (e.g., on GitHub or GitLab). This server repository is called a remote. Think of it as the cloud copy of your local Dropbox folder. However, the remote repository is not automatically synced with your local version. You have to tell Git to copy the changes in your local repository to the remote. This is called pushing your changes. In the other directions, to incorporate changes to the remote copy into your local repository, you pull changes from the remote. It’s in this case, when the remote repository has been changed (by you, from a different computer, or by a collaborator), that Git is most useful. If a file is changed in your Dropbox folder at the same time you’re editing it on your own computer, Dropbox will make a new copy of the entire file and leave you to figure out if there are conflicting changes and what to do about them. If you pull changes from a remote Git repository, Git will try very hard to compare your local version with the remote version of each file. If it is possible to merge changes automatically, Git will do so. If not — e.g., when both you and your collaborator have changed the same sentence — Git will alert you to a conflict that has to be reconciled manually. In practice, you will find a version of the file with both lines marked; you delete the line you want don’t want to keep, save the file, and commit the change to creat a revision that reflects both your and your collaborator’s change.

How do I use Git?

In order to use Git on your computer, you have to install the Git software. The bare-bones “command line” Git client is available for all operating systems, but there are a fair number of graphical interfaces available as well. The best place to start is on the Git download page. For Windows, there is TortoiseGit, which will make all the Git commands available via the Windows Explorer right-click menu. If your repositories are or will be hosted on GtHub, you can use GitHub’s own graphical tools for Windows and Mac.

Once you have Git and possibly a graphical Git tool installed, you have to set up a repository. There are two ways to do this. The first is using the git init command: in the folder/directory you want to track using Git, just say git init. Most of the time, however, you want your repository to have a matching remote version on a server. Then it will be a lot easier to first create the repository on the server and create a local copy of that repository, which will then be linked to the remote. Creating such a local version of an existing repository is called ‘cloning.’

So go to GitHub or GitLab and create a repository. In both GitHub and GitLab, when you are logged in, there is a little ‘+’ next to your username in the top right corner which lets you add a new repository (GitLab calls them “projects”). Once your repository/project is created, you can clone it to a repository living on your computer using the clone URL at the bottom of the right sidebar in GitHub or at the top of the projct page in GitLab. There are HTTPS and SSH versions of those URLs. HTTPS always works, but you will have to give your GitHub/GitLab user ID and password whenever you push or pull. SSH can be set up to avoid that, so it’s preferable, but it is a bit of a hassle to set up on Windows.

You can of course also use Git to clone repositories not created by you. For instance, the Open Logic Text is a logic textbook project that uses Git, and you could use the clone URL on its GitHub page to download a copy using Git. This very document can be cloned from its GitHub page as well. In the same way, you can clone a repository set up by a collaborator for a document you are planning to work on together. However, only in the latter case will you have permissions to change the repository on the server. Instead of cloning, e.g., the Open Logic repository directly, you can first ask GitHub to make a copy of it, which you then own. This is called a fork. A fork of a repository is a snapshot of the original, which you have complete control over. In particular, you can clone it to your own computer, and push any changes you make back to GitHub or GitLab. Both GitHub and GitLab display a “fork” button in the top right corner of the repository page which allows you to do this.

Once you have a clone URL, you can create your local version of the repository with the command

git clone URL

To track your own work, you’ll start with a new, empty repository and clone it. Suppose you are ‘user’ on GtHub and have started a repository ‘project’. You make a local version of this repository using

git clone https://github.com/user/project.git

and this repository will live in the directory/folder ‘project’ on your local drive. You can add a new file to that folder, but to have it tracked you also have to add it explicitly to the Git index:

git add new-file.tex

Now Git knows that new-file.tex should be tracked. To create a new revision of your ‘project’ repository which records all the changes to your tracked files, say

git commit -a

The switch “-a” is for “all”: Git will ‘stage’ all changes for all tracked files in the repository.

Then to sync your local changes to GitHub/GitLab, you say

git push

Contined in pt.~2. Read, fork, or download the full document on GitHub.

5 thoughts on “Git for Philosophers (pt. 1)

  1. It had never occurred to me to use Git for writing. This has a lot of potential. Thanks for the idea, for explaining the terminology, and even explaining the commands! Very helpful!

  2. I’ve found using the comment option during the commit step is also really helpful (and generally a good habit). It let’s you add a comment to describe what changed during that commit. So you might issue the command: git commit -a -c "Added paragraph to clarify topic ..." and in your git logs this comment (which need to be in single or double quotes) will be attached to whatever files changed during that commit step. It’s helpful for locating things and remembering what happened during each commit.

  3. To an extent, I am not suprised that Git is the VCS used by the Open Logic Project,
    and of the few Philosophers and Logicians who use version control for writing papers etc., Git seems to be the go-to choice. This was my choice when i first explored VCSs for this purpose. It is by far and away the most popular choice in general for version control in software developement. Git does, however, have its floors, and at least for the purposed of writing papers and text books, I think there are some much better choices available.

    The particular breed of VCS i have in mind are patch based systems, as opposed to snapshot based systems such as Git. Essentially, this means that rather than taking snapshots of your project folder in its various states after a sequences of changes, patche based systems simply record the changes themselves. This allows greater flexibility in lots of respects, and the command set and concepts are much easier to get a handle on, both as someone who is new to version control, and as someone who might consider migrating to a system other than Git.

    The particular VCS that I use is Darcs. It is the favourite VCS amongst the Haskell-language programming community. Other than the boons I noted above, Darcs has some very nice features that are particularly suited to writing papers.

    First, because of the particularities of the flexibility it gets from being patch based, what one might call ‘rewriting the history of a file or project’ becomes very easy. For instance, If one wishes to change the order in which some changes are made, one can do that very easily. Or if one wishes to undo a change one made a long time ago, but wish to keep subsequent changes, then one can do that very easily.

    Second, Darcs has a much more refined notion of change than Git which is pretty handy. Git thinks of changes only at the file or directory level. This means that, if you make 10 changes in a single file for a paper say, then Git sees that as one change. On the other hand, Darcs thinks of changes in a much more fine-grained way, which basically means that it recognised changes at something closer to the paragraph level. So, if those 10 changes were in ten different paragraphs, then Darcs will see those as 10 different changes. One can then take a subset of those changes and record them as a patch, i.e. a named set of changes to that file. Of course, it also allows one to name sets of changes spanning multiple files. I personally find this feature very handy. If i like some changes that I made to a file, but I don’t like others, then so long as I clustered those changes together into patches in a convenient way I can undo some of those changes, and not others, no matter what order I made those changes. With Git, I can’t do this. I can only revert the whole file back to a previous state.

    Of course, Darcs and other patch based systems have their downsides. Notably,
    they are not as fast, they are not as widely used, and the free hosting services available for projects do not have such flashy websites (although, the code for setting up your own hosting service – for your research project or group – is freely available and open-source).

    I am not saying that Darcs (and patch based VCS) is better than Git (and snapshot based VCS), but I do think that it has some features that makes it a strong contender for use in writing academic papers, and I encourage new and old users of VCS to consider it for version controlling their papers, books, and projects.

    For more information see http://darcs.net/

Leave a Reply

Your email address will not be published. Required fields are marked *