Report on OpenProof Day 2009

[On March 27, 2009, Stanford/CSLI hosted a workshop on OpenProof (aka, the software behind Barwise and Etchemendy’s Language, Proof, and Logic textbook). Alexei Angelides was there and provided the following report for LogBlog.]

On March 27th, CSLI and the OpenProof Project hosted a full day of presentation and discussion commemorating the, so to speak, tenth anniversary of the publication of Language, Proof, and Logic (LPL). I say “so to speak” because although the publication date was 1999, the date that appears on the first edition is 2000, a little tidbit of information that Etchemendy revealed was intentional, as both he and Barwise wanted its publication to symbolically coincide with the new millennium. The presentations were divided into six sections. Etchemendy began the day with a description of the history of the project; David Barker-Plummer followed with a description of the LPL package, including some of the new features that are going to be added to the next edition; this was followed by a roundtable discussion with three professors currently using the LPL courseware packages; then Richard Cox, one of the principal OpenProof researchers, presented some findings from data-mining the Grade Grinder corpus; Eric Pacuit, a modal logician currently at Stanford, presented on Kripke’s World, a program intended to supplement work in modal logic; and, finally, Barker-Plummer ended the day with a description of OpenProof’s main current research program, namely, HyperProof and OpenBox. I’ll discuss a bit of the history, and some of the new features being added to LPL in the near future, and then describe just two of the presentations a bit. More info can be found here, including slides from each talk.

In the late ’80s, Jon Barwise and John Etchemendy undertook work that attempted to blend different types of reasoning, primarily diagrammatic and sentential. Their work had two main aims, one theoretical, and the other pedagogical. On the theoretical side, Etch and Barwise were interested in challenging the “hegemony” of first-order predicate logic by arguing that reasoning with diagrams could be as rigorous as reasoning with a standard formal language. To this end, they developed so-called “heterogeneous” systems, formal systems that capture the structure of sentential or predicate reasoning (as in, e.g., sentential logic), and the structure of diagrammatic and visual reasoning (as in, e.g., reasoning with maps or Venn diagrams). On the pedagogical side, they were interested in using heterogeneous systems to enhance logic education, primarily at the college level, and so began to develop software packages that acted as supplements to the textbooks. By the mid-1990’s, their work led to Hyperproof (1994) , a textbook and package of programs that implemented sentential reasoning alongside diagrammatic reasoning, making it possible to reason, say, with maps (or visual information more generally), and first-order sentences in the same contexts. Along with Tarski’s World’s first incarnation (1987), Turing’s World (1986), and their The Language of First-Order Logic (1991), Hyperproof led to the first incarnation of LPL, a book that is nominally a course in first-order logic, but for which one of the programs allows students to evaluate the truth-values of sentences and the consequence relations between them, in a fully interpreted language called the “Blocks Language,” by using the spatial and visual relationships between tetrahedra, cubes, and dodecahedra in the Tarski’s World program.

Tarski’s World is one of three programs designed to supplement a student’s learning. The other two are Boole, a program designed to allow students to construct truth tables and test for truth-functionality and first-order consequence, and Fitch, a formal system for constructing proofs that includes the ability for teachers to allow and disallow certain rules to be used, and which includes a proof checker for students to verify their constructions. Of course, the software package also includes the Grade Grinder, which is like a very programmatic, 24-hour, mini-TA for students and instructors alike, giving some feedback to students who submit incorrect answers to it, and allowing teachers to relegate the more menial, possibly non-conceptual problems to the machine. (It includes, for students, the option of submitting solely to themselves in order to check if their answers are correct before submitting to the instructor.) Now, anyone who has used LPL knows how enormously helpful these programs are, both in terms of saving time for the professor using it, and pedagogically for the student, especially Tarski’s World. Understanding how to abstract away from the meanings of the terms, and regard them as entirely uninterpreted has been, in my experience, the first road block for students. Tarski’s World, though, acts as a kind of happy middle between the uninterpreted languages of first-order logic, and the interpreted kind we all grow up speaking.

However, anyone who has used LPL for anything more than an intro course also knows that software support drops off at chapter fifteen, the chapter in which we are introduced to first-order set theory, and after that, mathematical induction, and some of FOL’s metatheory. In his presentation, Barker-Plummer noted that new editions of LPL rectify this. So, new editions enhance Fitch by adding the possibility of constructing inductive and set-theoretic proofs, the ability to “flag” lemmas, so that students do not have to reprove instances of, say, excluded middle, improved Henkin-Hintikka game play in Tarski’s World, and a feature they’re calling “goggles,” in which the distinctions between tautological consequence, first-order consequence, and logical consequence are given a “visual” component. Thus, for example, if you’re using your TautCon Goggles, only the sentential connectives are visible, and each instance of the same sentential letter gets assigned the same color, allowing the student to visualize the underlying logical structure of the argument. (Footnote: You might worry, as one audience member did, about how this might work for the (partially) colorblind or blind. An effective way to deal with this might be to have “goggled” areas shaded with variations in grey. For the blind, the problem is more general, obviously. For some recent work dealing with it see Jesse Alama, Patrick Girard, and Elizabeth Phillips’s project on “brailling logic” here.) Finally, new editions of LPL will also include an ability for instructors to input their own exercises that are then gradable via the Grade Grinder, an addition, I think, that enhances the interaction between instructor and LPL, allowing instructors a bit more freedom in what they stress in their courses.

One of the most interesting presentations of the day was Cox’s. Along with a few others (including Barker-Plummer), Cox has been data mining the results of the Grade Grinder in order to find the most common mistakes, and then attempting to give some explanation to those mistakes. Not only does this have interest in itself, since it’s possible that such research leads to a better understanding of some of the cognitive processes underlying logical and, in at least one case as I’ll explain in a second, visual reasoning. It also assists with updates to the LPL software, since by isolating the source of common errors in problem-solving, better feedback can be automatically generated by the Grade Grinder. They looked at two different exercises, one dealing with translations from English to FOL, and one dealing with the graphical interface between Tarski’s World and the Blocks Language. For the first, the team selected an exercise of mid-range difficulty by looking at the number of submissions from beginning of data from the Grade Grinder (1999), rank ordering those submissions, and then choosing one with a high error rate, which turned out to be exercise 7.12 from chapter 7. (Footnote: Specifically, of 46,200 submissions, 27,473 were incorrect (59%), taken from a sample size of 4912 students, representing 5.6 incorrect sentences (out of 20).) Consider the following sentence:

if a is a tetrahedron, then it is in front of d.

On this problem (7.12.1), students were in error 43%. But now consider:

c is to the right of a, provided it (i.e., c) is small.

On this problem (7.12.4), students were in error 66%, reflecting the fact that when word order between natural language and first-order languages is not preserved in translation, the error rate rises. One frequent example that logic instructors everywhere will recognize immediately is the “only if” construction in English. As LPL points out, “only if” introduces a necessary condition. Even after much instruction, however, students are apt to translate “S only if P” as:

P → S.

Cox and his team found that, across all twenty sentences in problem 7.12, the frequency of error in translating from “only if” to FOL was 75.43%, a far higher error rate than any other conditional translation error. While many of their results dealing with translations from natural language into FOL are unsurprising, it is nice to see some statistics backing up anecdotal experience.

On the other hand, the third error the data miners canvassed was one dealing with the interface between Tarski’s World and translations from natural language to FOL, this time with an unexpected result. Again choosing a problem of medium difficulty, 7.15, they found that one subproblem in particular, 7.15.7, suggested that students have a harder time with processing visual (sizes, etc.), as opposed to spatial (shapes, etc.) information. In the problem you are asked to start a sentence file in Tarski’s World, and translate:

d is a small dodecahedron unless a is small.

Note again the intuitively problematic conditional. Once the student has translated the sentence into FOL, she is asked to figure out the sizes and shapes of the objects named in all 12 subproblems. Then, the student is asked to build a world in Tarski’s World where all of the sentences are jointly true. The point is that the student is asked to translate from English to FOL, then to determine, based on her translations, the relevant visual and spatial information, and then, based on her classification of that information, actually produce the required visual and spatial arrangement. In other words, the exercise takes the student from linguistic processing to visual and spatial processing. Cox and company found that for the above sentence, 92% of errors involved the size of d. But only 24% of errors involved its shape. (16% involved both.) So, given that such a high error rate was not found in cases involving translations between English and FOL that did not also involve the graphical interface, Cox suggested, based on research by Knauff (2001), that the information about an object’s size may have negative effects on reasoning when it’s not relevant to the problem. (Footnote: Of course, there are other interpretations of the data here, and Cox was careful to go through a few. I found this the most interesting for the purposes of making a distinction between linguistic, visual, and spatial types of reasoning. But see his slides (on the website) for more.)

Pacuit’s presentation was on another new, but currently in progress, feature of the OpenProof project. Kripke’s World, which the developers hope to have ready to accompany an as of yet unwritten text, is a means of evaluating modal formulae that is, in most respects, just like Tarski’s World. Salient differences are that rather than using a propositional modal language, the developers have chosen a first-order modal language over the Blocks Language. Hence, upon opening Kripke’s World, one is able to form sentences such as:

◊∃ x Tet(x).

The states at which modal formulae are evaluated are Blocks Worlds. The interface is essentially the same as the Tarski’s World interface, the difference being that multiple Blocks Worlds are constructible where there is an accessibility relation (reflexive, symmetric, Euclidean, and so on, selected from a useful drop down scroll list) between the various Kripke’s Worlds that one constructs. So, for example, if one starts in a world where an object a is a cube, then in order for:

◻Cube(a)

to be true, the object named by a must be a cube in all Kripke Worlds accessible from the initial world, reflecting the fact that names are rigid designators. Moreover, because the developers use a first-order modal logic, in Kripke’s World objects have haecceity, a fact which is reflected in the software as well, as the “thisness” (represented in Kripke’s World by Greek lettering) for an object in the initial world carries over to all subsequent worlds in which the formula containing it is evaluated. One feature, or problem–depending on who’s looking–of their approach is that Kripke’s World seems to be a nice approach for philosophers who are trying to emphasize the philosophical aspects of modal logic. But for logicians, who commonly emphasize the metatheoretic properties of modal logics, and its relation to (fragments of) first-order logic, it might not be as useful. For example, it doesn’t touch on the myriad applications of modal logic, such as epistemic or deontic logic. Moreover, in my opinion another drawback of Kripke’s World is that it uses the same graphical interface as Tarski’s World. The salient difference is that more than one Tarski World is constructible, given an accessibility relation, but that only seems to visually complicate matters, whereas one of the nice things about Tarski’s World is that, visually, it’s so easy to use. Surely there’s some happy middle, here. In any event, Pacuit brought up some very nice questions about Kripke’s World, including one that has received much airtime at Stanford lately, namely, how best to integrate an introduction to modal logic within a standard course on first-order logic. Suffice it to say that this discussion is ongoing.

One final note on Barker-Plummer’s final presentation, on OpenProof’s OpenBox project. The OpenBox, a direct descendent of Barwise and Etchemendy’s theoretical work that led to Hyperproof, is due to appear soon, and, as emerged from the discussion, is the most extant and up-to-date version of the original Hyperproof platform. As such, the intention is to integrate diagrammatic and sentential reasoning into a single software package. Now, however, the architecture of the program allows users to modify programs by plugging in “components,” new interfaces that are uploaded by the users themselves, making the project entirely interactive. Hence, for example, an architect who wants to plan his next project might upload his design specifications, and then use the reasoning environment given by OpenBox to find his available possibilities, given his specifications. Or, as Etchemendy pointed out, one might use it to modify a picture, for example, in Adobe Photoshop, but use Openbox to save the history of all possibilities that had been available before a given modification to the picture. Of course, more theoretical uses are available, but the point of the components is to allow users to build their own reasoning specifications into different contexts, visual, spatial, sentential, and various combinations of the three. Barker-Plummer intends this to be a generalization of the notion of justification, and to the extent that it captures the reason for which an inference is made, whether in logical, architectural, or other contexts, this is correct. Indeed, it must be emphasized that maximal interaction between interface and user, and the systematic investigation of different types of reasoning, have been theoretical goals of the OpenProof Project since its inception, for it’s led to a book, that led to a software and textbook package, that led to today’s LPL, that led to OpenProof. So I hope, given their stress on interaction and the commemoration’s fruitfulness, that more meetings like this take place.

Richard Zach

Report on OpenProof Day 2009

Leave a Reply Cancel reply

Share this:

Leave a Reply Cancel reply