60
$\begingroup$

After writing many proofs, most of which contained errors in their initial form, I have developed some simple techniques for "debugging" my proofs. Of course, a good way to detect errors in proofs is to send them to a colleague for review. But even friendly and helpful colleagues tend to focus on the more interesting proofs and ignore the more technical ones, which are prone to errors. So it is important to have techniques for self-debugging. Some simple techniques that I use are:

  1. Verify that every assumption made in the theorem is used at least once in the proof. Mathematically this is not strictly required, since if "A" implies "C", then "A and B" implies "C" too. But, if an assumption is made and not used, it may indicate an error [there is an analogous technique in programming: many compilers will raise a warning if they detect that a variable is declared but not used.]

  2. Rewrite the proof in the opposite direction. For example, if the proof is by contradiction, rewrite it as a direct proof, and vice-versa. Mathematically it should not matter, but the process of rewrite may help to discover hidden errors.

  3. Read the proof in print. I have no rational explanation for this, but I found out that, when I read proofs in print, I often detect errors that evaded my eyes when I read them on the computer screen.

Question: what other techniques do you find useful for detecting errors in mathematical proofs? I am looking for general techniques, that you would recommend to your research students.

$\endgroup$
9
  • 21
    $\begingroup$ Generalize until there is a counterexample, and make sure that the example does not run counter to your original theorem. Also check that your proposed proof applies to examples. $\endgroup$ Apr 29, 2022 at 11:03
  • 6
    $\begingroup$ This older post is tangentially related: Community experiences writing Lamport's structured proofs. Some related questions on Mathematics: Are there any common practices in mathematics to guard against mistakes? and How to make sure a proof is correct $\endgroup$ Apr 29, 2022 at 11:22
  • 3
    $\begingroup$ Understand not only that the arguments yield the claimed result but why they do. $\endgroup$ Apr 29, 2022 at 11:46
  • 8
    $\begingroup$ Followup to D.S. Lipham's comment: Check extreme cases, e.g. zero, constant function, empty set, complete graph, certain event. Once you are happy with those, continue with next-to extreme cases, like complete graph with one edge missing. If your proof branches into cases, try to generate examples for each branch (similar to debugging software). $\endgroup$ Apr 29, 2022 at 12:09
  • 21
    $\begingroup$ @LSpice Here's one technique that may make more sense on MO than on MSE: Write out a long version of the proof for your own private consumption. There is often some pressure to make the published version of the proof concise. Being overly concise is a good way to make mistakes. So it is good practice to write out a fully detailed version of the proof, just to make sure the argument is correct, even if the final version ends up eliding many details. $\endgroup$ Apr 29, 2022 at 22:44

9 Answers 9

52
$\begingroup$

Several basic suggestions.

First, put your manuscript into a drawer, and forget about it for a couple of months. You will discover a whole lot of exciting new things when you take it off from the drawer and re-read anew.

Second, I found that 80% of all mistakes are conveniently marked by the authors with the words like 'evidently", "clearly", and phrases "it is easy to see" etc. Double-check all occurrences of these words/phrases in your manuscript.

The final suggestion: make your manuscript public. Post it to the arXiv, mail it to a colleague, submit it to a journal. You are guaranteed to find a bunch of mistakes in the following ten minutes.

$\endgroup$
8
  • 7
    $\begingroup$ "You are guaranteed to find a bunch of mistakes in the following ten minutes." Really? For submitting to journals, it seems expected to wait for at least several months. $\endgroup$
    – Z. M
    Apr 29, 2022 at 14:02
  • 37
    $\begingroup$ @Z.M What was meant is that errors become instantly apparent once you submit a manuscript somewhere. $\endgroup$ Apr 29, 2022 at 14:19
  • 5
    $\begingroup$ I agree with Seva's advice. When I set something aside and then try to re-read it, often I find that I don't understand certain things that I wrote. If even I'm confused then it's a sure sign the proof is not written well (even if the argument turns out to be correct). $\endgroup$ Apr 29, 2022 at 14:47
  • 7
    $\begingroup$ Hear, hear to the second. That's a trick I use when refereeing: look for those "clearly"s. $\endgroup$
    – Todd Trimble
    Apr 29, 2022 at 15:22
  • 7
    $\begingroup$ Putting the manuscript in a drawer for a few months is a good idea in principle, but maybe not ideal in a world where people are expected to put out articles at reasonable speed. $\endgroup$ Apr 30, 2022 at 19:51
39
$\begingroup$

I find that there is a surprisingly large class of proofs for which it is possible to write software representations of many ingredients. I usually use Maple, but of course there are other options. The process of writing code forces you to clarify many things, which is useful for debugging. One can then check thousands of random examples, with randomisation procedures tuned to explore all the edge cases. This is not as rigorous as full formalisation in a proof assistant such as Lean, but it is quicker and easier.

I once gave a talk about this sort of thing, titled Semi-formal verification as a routine tool. My main library of Maple code is at https://github.com/NeilStrickland/maple_lib. There is a very large project using these techniques at https://neilstrickland.github.io/genus2/, with an explanatory talk titled An interesting surface of genus 2

$\endgroup$
15
  • 5
    $\begingroup$ Checking random examples by computer is indeed very useful when it can be done. It already helped me detect several bugs. $\endgroup$ Apr 30, 2022 at 19:42
  • 1
    $\begingroup$ In combinatorics, it is common to describe bijections. These can of course be implemented and verified for small 'n'. $\endgroup$ May 1, 2022 at 6:43
  • $\begingroup$ @neil-strickland - Your slides are very nice. Thank you for posting these. One small comment: the second sentence "$X$ separates $S^3$ into two handlebodies, which are homeomorphic to each other and homotopy equivalent to a wedge of $g$ circles." is true if $X$ is a two-sphere, and is false in all higher genus. For example, consider $X$ arising as the boundary of a small regular neighbourhood of a knot. $\endgroup$
    – Sam Nead
    May 1, 2022 at 8:58
  • $\begingroup$ @SamNead You are right, the claim is true if the embedding of $X$ in $S^3$ is of the usual isotopy class, but not in general. I will think about how to rephrase it. $\endgroup$ May 1, 2022 at 9:08
  • $\begingroup$ @neil-strickland - Looking more carefully at your first slide, you end with "For many of these phenomena, the literature contains no explicit examples." But it seems to me that there are examples of most/all of these in the literature? In particular there is a bit of an industry in finding (numerically) conformal structures on embedded surfaces in three-space. Confused. $\endgroup$
    – Sam Nead
    May 2, 2022 at 16:04
33
$\begingroup$

When I was an undergraduate student, one of my professors used to repeat during classes: “this is obviously true - let’s see if it’s also true”. Recalling those words proved useful many times, so here’s my suggested debugging tool: check that what you left as obvious is actually true. This applies not only to things that you explicitly declared as obvious, but even more to things left as obvious without even mentioning.

Concerning the latter category, there is the following piece of advice, given by J.-P. Serre in an online lecture (https://www.youtube.com/watch?v=ECQyFzzBHlo&t): assuming all your passages are correct, still check that the things you proved actually imply the thesis.

$\endgroup$
26
$\begingroup$

When writing a long paper (~100 pages) with many long chains of equalities and inequalities a few years ago, I developed a system of LaTeX macros that make a semantic distinction between “definitional” equal signs and “assertional” equal signs, and a further subdistinction between “assertional but unverified” and “assertional and verified” equal signs.

In the writing phase, I made it a habit to enter each definitional equal sign using my “\defeq” macro, and to enter each assertional equal sign with the “\unverifiedeq” macro. Later I would go over various draft sections in “debugging” (i.e., proof-reading) mode and carefully verify each “\unverifiedeq” transition, often entering the formulas into Mathematica and confirming that Mathematica agrees with my paper calculations, or running various sanity checks (specializing some parameter to zero, etc). Once I became convinced the equation deserves a “verified” status, I changed the macro from \unverifiedeq to \verifiedeq.

With this system I was able to keep track of which of the (several hundred) equalities in the paper had been verified and which still need to be checked. Gradually during the writing process I eliminated all unverified equalities in this way (occasionally correcting small typos and errors I discovered), until none remained. The system worked well and I believe my paper is essentially error free.

In this system, the definitional equal signs do not need to be debugged, so there is no “verified” and “unverified” variants of a definitional equal sign, only one \defeq macro. The premise behind this is that definitions are correct by definition and therefore cannot have a “bug”, but that if occasionally you actually have a typo in your definition of a quantity and as result you’re defining it as something other than what you intended, then you will necessarily encounter a bug later on in one of the assertional equal signs that involve that quantity. This worked well in practice, and my debugging process did lead me to correct errors in definitional equal signs as well.

Another couple of notes about this system:

  1. I also had macros for inequalities, again classifying them into “definitional” inequalities (or maybe it’s more correct to say “assumptional”, since they represent assumptions in the statement of a theorem) and “assertional”. So the debugging principle for inequalities is exactly the same.

  2. The macros were defined conditionally on a Boolean flag that specified whether I am compiling the paper in “debug mode” or “release mode”. In release mode, all the different kinds of equal signs got typeset as a normal equal sign, so in the generated PDF you couldn’t tell what kind of equal sign macro it came from. In debug mode, verified equal signs has a green check mark above them, and unverified equal signs were numbered (using a LaTeX counter I defined), with the index of the unverified sign appearing in red above the sign. This enabled me to tell at a glance when proof-reading which equal signs still need to be verified and how many such signs I still have to go over. (And similarly for inequalities.)

If someone is interested, I can share the LaTeX macros here. It’s a fairly basic setup, but perhaps it could be expanded with more work into a full LaTeX package with additional debugging features.

$\endgroup$
9
  • 6
    $\begingroup$ Could there be an advantage in typesetting the definitional equals signs slightly differently in release mode, to clarify which is which for the reader? Maybe $:=$. $\endgroup$
    – Will Sawin
    May 1, 2022 at 12:42
  • 1
    $\begingroup$ @WillSawin sure, it’s basically a matter of taste. Some people prefer := precisely to eliminate this ambiguity. Actually assertional equal signs also come in two flavors, an assertion that you’re making at the same time as explaining it, and another kind of assertion of the kind “we want to prove that X=Y” where it’s implicit that the proof is given later (or the statement could be a conjecture). For debugging purposes this can be typeset as an equal sign with a question mark above it. It’s a real problem that mathematical writing overloads the equal sign to mean all these different things. $\endgroup$
    – Dan Romik
    May 1, 2022 at 13:43
  • $\begingroup$ @WillSawin FWIW, I was taught equals sign with triangle on to meant equals by definition. This seems to be commonly but not universally recognized. $\endgroup$
    – Bruce
    May 2, 2022 at 7:15
  • $\begingroup$ Wonderful idea, thanks. I agree with Will Sawin, that these symbols should be typeset differently also in the release version. Maybe some day it will become more common. $\endgroup$ May 2, 2022 at 8:58
  • 3
    $\begingroup$ "I was taught equals sign with triangle on to meant equals by definition." It's a shame this symbol for definitional equality looks symmetric. $\endgroup$ May 2, 2022 at 12:27
18
$\begingroup$

There are two important classes of proofs not explicitly mentioned in other answers -

  1. Where the theorem is correct in all cases of interest, but not strictly correct as stated;
  2. Where the theorem is true but the proof fails to prove it.

These are distinct from when the theorem is Just Plain Wrong, something which you could tell more easily. I think professional mathematicians are more likely to produce a near-correct argument than a proof of something outright false, compared to people working at a more elementary level.

By the first of these classes, I mean that there may be missteps in how the theorem is stated, such that it holds for the "important" examples, but may fail in edge cases. For example, maybe it is stated for all sets $S$, but actually only holds if $S$ is nonempty. Sharpening up the theorem statement can help to bring out more subtle presentational issues as well. I would tend to go through the conditions used in the theorem and check against extremal cases like "but what if $n=0$?", "does $A$ really have to be finite?", that kind of thing.

This issue can occur more seriously in disguise. I recall a paper (not by me) which proved that all mappings with a certain property had a unique fixed point. A critical step was a lemma that derived a contradiction from the existence of two distinct fixed points, hence concluding uniqueness. But the authors had not noticed that some of their mappings had no fixed points at all. What they had really proved was that there was at most one fixed point. (They were generalizing from a previous result where there always was a fixed point, but their new version of the key property did not guarantee that.) The problem arose from the way that they were focused on the "good" cases, for which the proof worked. They might have noticed it if they had been more explicit about naming the set $S$ of fixed points, and therefore getting a "hook" into considering the various cases of $S$ being empty, a singleton, etc.

My second class of proofs is more tricky, because the theorem is actually correct. It's just that the argument for it is flawed. Often, the proof does work for a subset of the scenarios envisaged in the theorem statement, but fails to go through more generally. So it is allied to my first class of examples. For example, I think it's fairly common for well-ordering or choice issues to sneak their way into the middle of a proof, and for the use of choice to be ultimately unnecessary.

What we are looking for to detect these problems is not a counterexample to the theorem (there isn't one), but a counterexample to the proof. So again, it helps to run through the proof steps "adversarially", trying to pick the most extreme or awkward cases. When you read something like "let $x$ be the minimal such doodad", focus on that "the" and try to break it with an example where there are zero or many possibilities.

I've also found it helpful to restructure big proofs, so that there are more and smaller lemmas leading up to the main result. While this does add length, it has the benefits of:

  1. Shorter lemmas are easier to check in isolation.
  2. Making the intermediate structures clearer is often helpful - even just by giving names to frequently-used concepts.
  3. Having to formalize the lemma statement forces clarity over what it is claiming, vs. having it embedded seamlessly in a longer context.

Another little trick for incorrect proofs of $A \implies B$ is to try to recast them as proofs of $\neg B \implies \neg A$. By focusing on $\neg B$, you can avoid the trap of accidentally relying on the most favourable cases of $A$.

$\endgroup$
2
  • 1
    $\begingroup$ The first of your classes reminds me of A counterexample to a 1961 “theorem” in homological algebra. Actually, several of the answers to that MO question might suggest good techniques for debugging proofs. $\endgroup$ May 1, 2022 at 19:10
  • 2
    $\begingroup$ Refactoring big proofs to small lemmas is a very useful technique, that I adapted from my experience as a programmer. $\endgroup$ May 3, 2022 at 10:40
7
$\begingroup$

Explain the proof to your cat. Indeed, it is often difficult to find a willing collegue listening for hours to all details without falling asleep or getting lost (both things happen simultaneously quite often).

Cats are infinitely more enduring and are eager listeners. Do not expect them to point paws at your errors but simply trying to explain things carefully makes you generally aware of errors.

Cats can of course be replaced by PHD-students. Coffee is however generally more expensive than food for cats.

PS: During my studies I had no cat. I replaced it with my grandmother.

$\endgroup$
5
  • 7
    $\begingroup$ This is a technique known as rubber duck debugging. Rubber ducks are known for having even greater patience than felines. $\endgroup$
    – Wojowu
    May 4, 2022 at 16:06
  • 2
    $\begingroup$ @Wojowu : Many thanks for your clarification. I like 'rubber ducking' even more than 'catsplaining'! $\endgroup$ May 4, 2022 at 16:13
  • $\begingroup$ Regarding to the scratching issue with cats and their arrogant temperament. a duck may be a better candidate. $\endgroup$ May 4, 2022 at 16:29
  • 3
    $\begingroup$ Cats are not arrogant, they are simply aware of their worth! $\endgroup$ May 4, 2022 at 16:42
  • $\begingroup$ That's a point. But surely, they overestimate their value. $\endgroup$ May 6, 2022 at 16:31
2
$\begingroup$

Another technique that seems no one pointed is to explain your article to a colleague or a friend in details, and in his side will play the role of contradictor and double checker. This trick is very useful, besides help me every time to improve the final version of the article.

$\endgroup$
1
$\begingroup$

CLARITY MATTERS.

From a specification point of view : this nice question lacks a proper goal. The point is what is a GOOD PROOF? What may matter is correctness or clarity.

A) For correctness use computerized proofs, this is the only way: hand written proofs are always buggy!(see note)

B) For clarity all techniques mentioned here and in answers are good. Yet clarity has several subgoals like reuse, understanding.

  1. Reuse happen when writing useful lemmas that help the clarity of the proof.
  2. Understanding may happen when using cases : Some easier cases would give you understanding, or place the result as an extension of such case.

This analysis is of course not exhaustive.

C) As in programs : A lot of techniques used in writing program apply here.

1.Limiting methods length: Same as split the proof in lemmas.

2.Each method does one thing: Same as split in cases with meaningful sub-cases and sub-sub-cases.

Note: This may seems provocative, yet I believe that 5 (or more) lines of code are almost always buggy. If you don't agree please check the array sorting method (a 3 lines code in Java) see https://ai.googleblog.com/2006/06/extra-extra-read-all-about-it-nearly.html

$\endgroup$
1
$\begingroup$

Check that you name things consistently. If in one chapter/lemma an index is denoted by $n$ and another index by $m$, and somewhere else it is the other way around, it can at best lead to unreadable texts and at worst to real mistakes.

I started to write an ANTLR Parser for tex-Files (caveat: it has a lot of assumptions on the precise structure of the tex file and does not work for Tex Files in general), that makes it possible to say rename variable $x$ to $y$ in this chapter, and then $x$ is replaced by $y$ on in math-mode.

It can also draw a reference graph, e.g. all labeled statements (things with \label} are nodes, and whenever there is a reference to another statement in the proof, we add an edge. Of course cycles could be problematic, but so are multiple connected components. It enables us to detect unused lemmas. Different connected components indicate that some lemmas are not used in the proof of the main theorem.

Finally, one can also use it to detect common flaws automatically, like a missing qedhere whenever a proof ends in a math-environment or an enumerate.

$\endgroup$

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge that you have read and understand our privacy policy and code of conduct.

Not the answer you're looking for? Browse other questions tagged or ask your own question.