49
$\begingroup$

Gian-Carlo Rota's famous 1991 essay, "The pernicious influence of mathematics upon philosophy" contains the following passage:

Perform the following thought experiment. Suppose that you are given two formal presentations of the same mathematical theory. The definitions of the first presentation are the theorems of the second, and vice versa. This situation frequently occurs in mathematics. Which of the two presentations makes the theory 'true'? Neither, evidently: what we have is two presentations of the same theory.

Rota's claim that "this situation frequently occurs in mathematics" sounds reasonable to me, because I feel that I have frequently encountered authors who, after proving a certain theorem, say something like, "This theorem can be taken to be the definition of X," with the implicit suggestion that the original definition of X would then become a theorem. However, when I tried to come up with explicit examples, I had a lot of trouble. My question is, does this situation described by Rota really arise frequently in the literature?

There is a close connection between this question and another MO question about cryptomorphisms. But I don't think the questions are exactly the same. For instance, different axiomatizations of matroids comprise standard examples of cryptomorphisms. It is true that one can take (say) the circuit axiomatization of a matroid and prove basis exchange as a theorem, or one can take basis exchange as an axiom and prove the circuit "axioms" as theorems. But these equivalences are all pretty easy to prove; in Oxley's book Matroid Theory, they all appear in the introductory chapter. As far as I know, none of the theorems in later chapters have the property that they could be taken as the starting point for matroid theory, with (say) basis exchange becoming a deep theorem. What I'm wondering is whether there are cases in which a significant piece of theory really is developed in two different ways in the literature, with a major theorem of Presentation A being taken as a starting point for Presentation B, and the definitions of Presentation A being major theorems of Presentation B.

Let me also mention that I don't think that reverse mathematics is quite what Rota is referring to. Brouwer's fixed-point theorem can be shown to imply the weak Kőnig's lemma over RCA0, but as far as I know, nobody seriously thinks that it makes sense to take Brouwer's fixed-point theorem as an axiom when developing the basics of analysis or topology.


EDIT: In another MO question, someone quoted Bott as referring to "the old French trick of turning a theorem into a definition". I'm not sure if Bott and Rota had exactly the same concept in mind, but it seems related.

$\endgroup$
16
  • 2
    $\begingroup$ Are you interested in just single examples of theorems and definitions that can be switched (exploiting the logical equivalence of two properties), or do you want examples of two presentations of a large mathematical subject with multiple switches of that type? In Rota's phrasing, it's not clear whether he thought that the former or the latter is common. Of course the first case is much easier to find, so you probably want the second one... but your title makes me think otherwise. $\endgroup$ Sep 26, 2021 at 20:23
  • 4
    $\begingroup$ Definition: a circle is the curve with the largest enclosed area/length ratio. Theorem: (the usual definition). You can go on forever with things like this. Is that really what you want? $\endgroup$ Sep 26, 2021 at 20:40
  • 3
    $\begingroup$ @AlessandroDellaCorte I am thinking of a large mathematical subject, which I think is what Rota had in mind (although of course I am not sure). In the other MO question, maybe Pete Clark's example of the determinant comes closest. Not only can the determinant be defined in different ways that are nontrivial to prove equivalent, but there's a sizable chunk of linear algebra surrounding the determinant that can be developed differently depending on whether the focus is on the abstract theory of linear transformations, or on algorithms for matrices. $\endgroup$ Sep 26, 2021 at 20:48
  • 3
    $\begingroup$ Something different that I don't think really fits your request but has something of a similar feel: the equivalence between holomorphic and complex analytic functions $\endgroup$ Sep 26, 2021 at 23:32
  • 5
    $\begingroup$ Any time we see a theorem that says "the following are equivalent" followed by a list of statements, any one of the statements can be taken as a definition and the rest as theorems. $\endgroup$ Sep 27, 2021 at 7:22

30 Answers 30

54
$\begingroup$

From a conversation I had with Gian-Carlo Rota when I was undergraduate, I know that one simple but important example that he specifically had in mind was the calculus of vector fields (whether specifically in three dimensions or more generally). The gradient, divergence, and curl of differentiable fields on ${\mathbb R}^{3}$ can be defined as particular combinations of partial derivatives—in which case it is necessary to prove that they represent geometrical objects (meaning they transform correctly). Alternatively, it is possible to specific purely geometrical definitions of all three objects, in which case it is necessary to prove that, when applied to sufficiently smooth functions, they can be calculated entirely in terms of partial derivatives. Whichever way you like to approach the theory, it is possible to find textbooks that take your preferred starting point and do a good job of explaining vector calculus—even though the two approaches are, philosophically, quite different in terms of what they seem to assume about what, say, $\operatorname{grad} f$ "really means." Moreover, there are also plenty of important theorems that can be proven from either starting point, without proving the equivalence first.

Somebody else, in the course of that conversation, mentioned the logarithm as an even more basic example. There are actually many ways of initially defining the logarithm, and Calculus by James Stewart (or the first edition, at least) actually demonstrates explicitly that you can begin with the logarithm as the inverse of the exponential, or you can define $\ln x=\int_{1}^{x}(1/t)\,dt$ and eventually prove all the same things.

$\endgroup$
7
  • 1
    $\begingroup$ Most calculus books we use today are labeled "early transcendentals", meaning that the transcendental functions (sin, cos, exp, ln) are introduced early, as opposed to defining them by power series. One might think they are not defined properly; but (to OP's point) they are just defined differently. E.g., sin is the inverse to arcsin, which is integral of an algebraic function (over complex domain); exp has 3 definitions. $\endgroup$
    – liuyao
    Sep 28, 2021 at 3:04
  • 3
    $\begingroup$ @liuyao, re, "early transcendentals" usually, I think, means earlier than that: outside of Spivak, I think the transcendentals usually come even before integration, and certainly before any proof of an existence theorem for solutions to differential equations. Is there a proper definition of the transcendentals that does not require these ingredients? $\endgroup$
    – LSpice
    Sep 28, 2021 at 3:45
  • $\begingroup$ @lspice What do you mean "proper definition" and "these ingredients" (integration, differential equations?)? e can be uniquely defined without either of those. $\endgroup$
    – Yakk
    Sep 28, 2021 at 20:06
  • 2
    $\begingroup$ @LSpice Sine is just the ratio of sides of a particular right angle triangle whose far corner touches a circle. You don't need integration or differential equations to define it. Many of the properties can be difficult to get ahold of, but geometry can give you a lot far before you have to touch calculus. If you know that orbiting in a circle with constant thrust away from the center gives a circle, you can even get derivatives geometrically .. maybe, $\endgroup$
    – Yakk
    Sep 28, 2021 at 21:26
  • 1
    $\begingroup$ Sorry to digress, @LSpice (from the main point of this answer). I didn't mean that it should be introduced as the definition of sin to calculus students, just as an alternative, in the spirit of the original question. That may satisfy the purists, than the "line of x radian intersecting the circle" definition. $\endgroup$
    – liuyao
    Sep 29, 2021 at 6:09
39
$\begingroup$

Many of the standard abstract mathematical structures were first defined and studied "externally" (in terms of some sort of concrete representation) and only later defined "internally" (as abstract spaces obeying some list of axioms), after a foundational theorem had been established demonstrating the equivalence of the two definitions. For instance (oversimplifying the history a little bit to emphasise the point):

  1. Groups were first defined as permutation groups, or as groups that could be represented by permutations. It was only with Cayley's theorem that one could equivalently define such groups in terms of the modern group axioms.
  2. In the early study of manifolds (e.g., by Poincaré), these spaces were often understood to be subspaces of some ambient Euclidean space, as opposed to the modern internal definition using atlases of coordinate charts, etc.. With the advent of embedding theorems such as the Whitney embedding theorem (for smooth manifolds) or Nash embedding theorem (for Riemannian manifolds), one could relate the two types of definitions. In a similar spirit, Gauss's theorema egregium equates the external and internal definitions of what we now call Gauss curvature.
  3. Lie algebras were initially studied as algebras of linear transformations, often finite dimensional in nature. The modern abstract definition came later, with theorems such as Ado's theorem and its relatives (Poincaré-Birkhoff-Witt theorem, Engel's theorem, Lie's theorem, etc.) providing fundamental equivalences between the two viewpoints. I believe the history of von Neumann algebras follows a similar trajectory, though I am less familiar with this story.
  4. Boolean algebras are an interesting case in that (from my understanding of the history), Boole introduced the abstract concept of this algebra first, before the realisation that concrete Boolean algebras of sets obeyed Boole's axioms. Nowadays of course, due to Stone's theorem, the two definitions can be viewed as equivalent.
  5. Probability spaces are still commonly defined today using a concrete representation (a sample space $\Omega$, equipped with a sigma-algebra of events and a probability measure). But these spaces (up to almost sure equivalence) can be equated also with commutative tracial von Neumann algebras, thanks to the classification theory of the latter. This equivalent definition is the most convenient starting point for introducing noncommutative probability spaces, which do not enjoy a classical representation in terms of sample spaces (though, thanks to the GNS construction, one can still interpret them in terms of algebras of bounded operators on a Hilbert space).
$\endgroup$
5
  • 4
    $\begingroup$ IMO, these fundamental theorems don't serve as big a role as we often credit them. It's more a "peace of mind" that the abstract definition that supposedly generalizes the concrete examples doesn't give us more. (A counter-example is Lie groups, which do include non-matrix ones.) The benefit of the abstract definition is that it lets us see more clearly what is essential and what is subject to the particular presentation (which is useful for calculation). $\endgroup$
    – liuyao
    Sep 28, 2021 at 3:16
  • 1
    $\begingroup$ That is the primary benefit to be sure, but there are some cases where knowing that a concrete representation exists can greatly simplify the proof of other useful theorems. Some examples are given at mathoverflow.net/questions/101061/proof-by-universal-receiver $\endgroup$
    – Terry Tao
    Sep 28, 2021 at 3:32
  • 3
    $\begingroup$ Perhaps Ostrowski's theorem is another example of this type. I remember taking a course from Goro Shimura in which he said he didn't think that Ostrowski's theorem was that important because there was no real need to have a rigorous proof that no other examples existed. I would be curious to know if Ostrowski's theorem has any applications. $\endgroup$ Sep 28, 2021 at 14:37
  • 2
    $\begingroup$ To add to you 5th example, also at some point there used to be a distinction between abstract $C^*$-algebras, concrete $C^*$-algebras (of operators), and $B^*$-algebras. Nowadays we know these to be equivalent, and call them all $C^*$-algebras. $\endgroup$ Sep 29, 2021 at 15:28
  • $\begingroup$ I would like to add one more (which I essentially picked up from you :-) ): complex numbers are also a part of this list; they were initially meant to be the algebraic completion of reals, but this is not the usual definition of complex numbers we see these days and we only prove this as the Fundamental Theorem of Algebra. $\endgroup$ Nov 18, 2021 at 15:07
31
$\begingroup$

See the question Geometric interpretation of trace. There are several ways to define the trace:

  • The sum of elements on the main diagonal.

  • The sum of eigenvalues.

  • The derivative of the determinant at the identity.

  • The unique Lie algebra homomorphism onto $\mathbb{R}$, up to scale. (See also here.)

  • "What you get when a linear map eats itself", as one answer put it. More precisely:

The idea of the trace operation is easily seen in string diagram notation: essentially one takes the endomorphism $a \stackrel{f}{\to} a$, "bends it around" using the duality and the symmetry and connects its output to its input.

This comment tells a story about it:

This reminds me of a story recounted by a friend of mine in graduate school. He spent a lot of time in the department, and one evening was approached by an undergraduate taking a fancy class that had introduced the trace of a linear transformation in the slick coordinate-free manner. This undergraduate had been tasked with computing the trace of a certain 2×2 matrix and had no idea how to proceed.

Nontrivial divides between coordinate-free/abstract and coordinate-based/concrete approaches—turning theorems into definitions and viceversa—also arise elsewhere, such as when defining tensors and tensor products.

$\endgroup$
2
  • 1
    $\begingroup$ I would like this answer more with clearer (more rigoorous) statements of the definitions for the last three bullet points. The current form makes it hard to see whether someone might plausibly take them as definitions. $\endgroup$
    – user44143
    Sep 27, 2021 at 8:39
  • 4
    $\begingroup$ I would restate the fourth bullet point as “the unique linear map $f:M_n(\mathbb{R}) \to \mathbb{R}$ with $f(AB)=f(BA)$ and $f(I_n)=n$. It seems right to me to make the definition elementary; I don’t think you can simultaneously define trace for a broad class of abstract Lie algebras and also illustrate Rota’s point. $\endgroup$
    – user44143
    Sep 27, 2021 at 8:47
16
$\begingroup$

I think the standard example is topological dimension:

  • Dimension can be defined as either covering dimension, inductive dimension, or simplicial dimension. Then it is a theorem that dimension thus defined gives the same number as the other definitions.

The standard reference for this would probably have been Hurewicz and Wallman's Dimension Theory, published fifty years before Rota's article.

I'd guess that Rota was also thinking of homology theory:

  • One can define homology as singular or simplicial homology. Then it is a theorem that homology thus defined has the same properties as with the other definitions, e.g. the Eilenberg-Steenrod axioms.
$\endgroup$
1
  • $\begingroup$ The notion of paracompactness is a modification of the definition of the Lebesgue covering dimension, but there are many different equivalent ways of defining paracompactness. $\endgroup$ Nov 6, 2021 at 20:09
16
$\begingroup$

A Grothendieck topos can be defined either as the category of sheaves on a site, or as a category satisfying Giraud's axioms, or as an elementary topos that is bounded over $\rm Set$. I believe any of these definitions can be taken as basic and a good deal of theory developed before proving the equivalence to the others.

For that matter, a sheaf on a topological space (or locale) $X$ can be defined either as a presheaf satisfying a gluing condition or as a local homeomorphism with codomain $X$.

$\endgroup$
16
$\begingroup$

When I was teaching elementary geometry in high school in Holland 10 years ago, the concept of equivalent definitions was actually part of the curriculum. There the examples were more down to earth: you can either define a parallelogram as a quadrangle in which opposite sides have the same length and prove as a theorem that they are parallel, or as a quadrangle in which opposite sides are parallel and prove as a theorem that they are also of the same length.

Then you can also take the third standpoint of 'pick whichever definition you like best but justify this by proving as a theorem that both definitions are equivalent'.

Actually the third standpoint was kind of like the official school standpoint and we spent a lot of time trying to convey to the students that proving this latter theorem (of both definitions being equivalent) amounts to proving both the theorems mentioned above it.

I remember students found this 'equivalent definitions' stuff exceedingly confusing. This might however be partly be due to the fact that this short block of elementary geometry was the only time they had to work with definitions and proofs at all.

$\endgroup$
15
$\begingroup$

Elementary geometry has switched between different approaches many times. From synthetic to analytic, from axiomatic to structural etc.

Euclid took additivity of magnitude, e.g. area, as its starting point. Hilbert builds the theory of area from first principles. Euclid builds the number system from geometric data, whereas nowadays it is customary in Cartesian geometry to start from a number system and build the geometry. Classically, euclidean transformations are the ones that preserve lengths. Felix Klein showed how to build the euclidean invariants such as the length from the group itself. Area, length and angles are the primary concepts in the Elements. Nowadays the scalar product and orientation are taken as the basic definitions of an euclidean space in university courses.

It is quite interesting that what is an important theorem in a theory is close to being a definition in another one. I mentioned additivity of area. In the same fashion, the Pythagorean theorem is the last proposition of the first book of Euclid and its highest point. If one starts from a quadratic form, this is more or less a definition. Less known, Euclid actually proves that the plane has two dimension (prop. 7) from the additivity of angles. In modern approaches, a plan has two dimensions from its very definition and it is then deduced, sometimes painfully in cartesian coordinates, that the measure of angles is additive.

It is still a much heated debate which approach is the best with regards to teaching geometry to high school students.

$\endgroup$
1
  • $\begingroup$ For a reference to Klein's approach, see my second response below. $\endgroup$ Oct 14, 2021 at 19:34
14
$\begingroup$

A quite obvious, yet historically important example is represented by classical mechanics, in its Newtonian, Lagrangian and Hamiltonian presentations. In particular, the field was widely reorganized (and a lot was discovered) when it was realized that stationarity of the action can be assumed as a principle.

Of course the example is not perfect, because the level of generality of these theories is not the same, but considering only sufficiently regular motions it becomes close to what you search for.

$\endgroup$
12
$\begingroup$

Rudin's Real and Complex Analysis begins by discussing the exponential function:

This is the most important function in mathematics. It is defined, for every complex number z, by the formula $$\exp(z)=\sum_{n=0}^{\infty}\frac{z^n}{n!}$$

$\endgroup$
11
  • 9
    $\begingroup$ This feels like the start of a good answer... Can we get some other possible definitions of $\exp$ which are equivalent? $\endgroup$
    – Alex Jones
    Sep 27, 2021 at 14:49
  • 3
    $\begingroup$ I guess the other standard definition would be $e^x=\lim (1+x/n)^n$. $\endgroup$ Sep 27, 2021 at 16:51
  • 9
    $\begingroup$ Solution of the initial value problem $y’=y$, $y(0)=1$. $\endgroup$ Sep 27, 2021 at 17:58
  • 3
    $\begingroup$ Over $\mathbb R$ it can be defined as the inverse of $\ln$ where $\ln(x) := \int_1^x \frac{dt}t$. $\endgroup$
    – wlad
    Sep 28, 2021 at 0:17
  • 6
    $\begingroup$ @TomCopeland it is amusing that you propose describing $e^z$ as an exponential generating function of something when the entire reason for calling something an "exponential generating function" is a formal resemblance of the construction to the identity $e^z = \sum z^n/n!$. $\endgroup$
    – KConrad
    Sep 28, 2021 at 1:26
12
$\begingroup$

A very basic example is the field $\mathbb{C}$ of complex numbers. -- It can be defined as the field one obtains when adjoining the square root of -1 to $\mathbb{R}$, in which case it needs to be proved as a theorem that every non-constant polynomial has a root / can be factored into linear factors. -- Or it can be defined as the algebraic closure of $\mathbb{R}$, in which case by definition every non-constant polynomial has a root / can be factored into linear factors, but it needs to be proved as a theorem that $\mathbb{C}$ can also be obtained from $\mathbb{R}$ by adjoining the square root of -1.

$\endgroup$
10
$\begingroup$

Example 1: In some textbooks, the trigonometric functions are defined via geometry. The advantage is that it is simpler to understand, but the disadvantage is that it is difficult to make completely rigorous without using integrals or at least a notion of area equivalent to the Jordan measure. In contrast, it is relatively easy to define them via their power series in a completely rigorous manner, and then prove their desired properties. Moreover, the first approach is restricted to $ℝ$, and we need to do something later to extend to $ℂ$, whereas in the second approach we can get the complex trigonometric functions directly. Nevertheless, both definitions (if done right) are equivalent.

Example 2: The Lebesgue measure has a number of equivalent definitions, as mentioned in Terence Tao's "An Introduction to Measure Theory". One definition is that a set $S⊆ℝ^n$ is measurable iff for every $ε>0$ there is some open set $G⊇S$ such that there is a countable sequence of boxes whose union contains $G∖S$ and whose total volume is at most $ε$. Another definition is the Caratheodory criterion; $S$ is measurable iff for every $A⊆ℝ^n$ we have $m^*(A) = m^*(A∩S)+m^*(A∖S)$, where $m^*$ is the outer measure. Although the Caratheodory criterion is unintuitive (as Tao also says), there are some textbooks that define the Lebesgue measure using that and then prove all the other equivalents.

$\endgroup$
3
  • 5
    $\begingroup$ "In some textbooks"?! Trigonometric functions are often introduced in middle school, at least 5 years before power series. Defining trig functions as their power series would require either teaching power series much earlier, or trigonometric functions much later. Several identities involving trig functions appear to have been known in Ancient Greece already; on the other hand, power series were not introduced before the 17th century, as far as I can tell. So, it's probably a little more than "some textbooks" which define trig functions without power series. $\endgroup$
    – Stef
    Sep 27, 2021 at 9:25
  • $\begingroup$ For me, the most natural way to conceptualize trigonometric functions is in terms of the linear relationships between them, rotation maps, and the differential equations they satisfy. This approach can be introduced intuitively before making it more rigorous, and is particularly natural if exponentials are taken as part of the background. The relationship with circles, complex versions, power series etc. then follow directly as soon as the relevant concepts are present. What's more, exponentials and rotations lead naturally to Euler's formula and Lie theory. $\endgroup$ Sep 27, 2021 at 10:33
  • 3
    $\begingroup$ @Stef textbooks on real analysis often define trigonometric functions by their power series, so indeed it is reasonable from the perspective of mathematicians to have the textbook approaches by elementary geometry and by analysis treated on an equal footing. There are many textbooks on analysis, after all. $\endgroup$
    – KConrad
    Sep 28, 2021 at 1:29
10
$\begingroup$

[A] An example.
Construct the real numbers, for example some complicated thing using cuts.
Define addition, multiplication, ordering for the real numbers.
Prove the real numbers make up a complete ordered field.
Prove: any two complete ordered fields are isomorphic.
From then on, use only "complete ordered field", and never mention cuts again.

[B] Simpler
(1) An "ordered pair" is defined $\langle a,b \rangle = \{\{a\},\{a,b\}\}$.
(2) Prove $\langle a,b \rangle = \langle c,d\rangle$ if and only if $a=c$ and $b=d$.
(3) From now on, use only the "defining property" (2). After all, that is the motivation for (1) in the first place.

[C] Another
Prove the following are equivalent (in ZF): (1) the axiom of choice (2) the well ordering princple (3) Zorn's lemma (4) Tukey's lemma (5) the Hausdorff maximality principle.
After that, when you say "ZFC" use any one of these.
(From Brendan McKay's comment.)

$\endgroup$
0
7
$\begingroup$

An example which is not quite what is being asked, and which is certainly much less sophisticated than the other answers, but which comes up in my calculus teaching around this week each year: continuity and the intermediate value theorem. High school students (at least in my country) are generally not given a limit definition of continuity; “the graph doesn't jump”, i.e. the conclusion of the IVT, is their definition of continuity. The history of this conception of continuity, in relation to the epsilon–delta/preimage-of-open-set definitions, is explored by Barany in a Notices article Stuck in the middle. Every year, one of my most interesting pedagogical challenges is to help students internalize the limit definition of continuity by seeing how it makes the IVT less obvious.

$\endgroup$
4
  • 1
    $\begingroup$ I don't think "the graph doesn't jump" is equivalent to the conclusion of IVT. Would a high school student really think of the base 13 function as continuous because the graph doesn't jump? $\endgroup$
    – Will Sawin
    Sep 28, 2021 at 14:19
  • 3
    $\begingroup$ You're right, it's not equivalent! However ... a high school student who was prepared to grapple with the base 13 function would already be way beyond the level of an intro lesson on the IVT! This is what I meant by "not quite what is being asked" -- mathematically, it's not equivalent, but pedagogically and historically, the shift in perspective feels similar to me. (The historical narrative is complicated by the work Bolzano, and the pedagogical one by [insert your favourite pedagogical complication].) $\endgroup$ Sep 28, 2021 at 15:28
  • 2
    $\begingroup$ I don't disagree with what you say, but one thing I would say is that the intuitive "the graph doesn't jump" definition doesn't directly correspond to any formal definition - certainly I don't think typical high school students would be able to give a definition that we would recognize as formal if asked. So I wouldn't see this as a switch from one definition to another but rather a shift from an informal description to a formal definition. $\endgroup$
    – Will Sawin
    Sep 28, 2021 at 16:13
  • 2
    $\begingroup$ Certainly! But I think that it's not just the change in level of formality that matters -- the content of the informal description and of the formal definition matter as well. If the informal description was "if you use enough decimal places for $x$, your calculator will give pretty much the correct value of $f(x)$", then I would see the change to the $\varepsilon$-$\delta$ formulation as entirely an issue of precision and rigour. (I don't think you were saying that the only meaningful change was in the level of rigour, I'm just enjoying the conversation and thought I'd add this thought.) $\endgroup$ Sep 29, 2021 at 5:34
5
$\begingroup$

I recall multiple different definitions of holomorphic functions in several Complex Analysis textbooks, one by Ahlfors, one by Conway, and a couple more in Russian language textbooks.

One definition of a holomorphic function was based on Cauchy-Riemann equation, one starts from being locally analytic, one from angle preservating maps, and one from vanishing integral over closed curves.

And, of course, each textbook derived all other definitions as theorems.

$\endgroup$
2
  • 2
    $\begingroup$ This example was also suggested by Sam Hopkins. $\endgroup$ Sep 27, 2021 at 16:03
  • 1
    $\begingroup$ Also harmonic functions, i.e., conjugate solns. to Laplace's equation, and their associated mean value and max/min principles can be used to give equivalent definitions of analtytic complex functions. This is closely related to electrostatics and to Brownian motion, so physics and probabilistic approaches are possible. Other approaches not mentioned already? $\endgroup$ Sep 30, 2021 at 23:17
4
$\begingroup$

Brendan McKay commented, "Any time we see a theorem that says 'the following are equivalent' followed by a list of statements, any one of the statements can be taken as a definition and the rest as theorems."

As a simple example, consider several equivalent definitions of Appell polynomial sequences (e.g., the Hermite and Bernoulli polynomials) $A_n(x)$ with the moment e.g.f. $A(t)=e^{a.t}$ with $a_0=1$ via an

1) e.g.f. (using the umbral maneuver $(A.(x))^n = A_n(x)$)

$$e^{A.(x) t} = A(t) \; e^{xt}$$

2) binomial convolution

$$A_n(x) = (a.+x)^n = \sum_{k=0}^n \; \binom{n}{k} \; a_k \; x^{n-k} $$

3) shift op

$$A_n(x) = A(\frac{d}{dx}) \; x^n = e^{a. \frac{d}{dx}} \; x^n$$

4) translation property

$$A_{n}(x+y) = (A.(x)+y)^n$$

5) recursion relation (via the formal cumulants $(c.)^k =c_k$ )

\begin{multline*} e^{c.t} = \ln[A(t)]\quad\text{and} \\ A_{n+1}(x) = (x+c_1) \; A_n(x) + \sum_{k=1}^n \; \binom{n}{k} \; c_{k+1} \; A_{n-k}(x) \end{multline*}

6) lowering op, given any Appell polynomial of order $n$, the lower order polynomials are defined by

$$\frac{d}{dx} \; A_n(x) = n \; A_{n-1}(x)$$

7) raising op

$$ R \; A_n(x) = A_{n+1}(x),$$

$$ R = x + \frac{d}{dt} \ln[A(t)] \; |_{t= \frac{d}{dx}}$$

8) a diagonally multiplied Pascal matrix, the first few rows being

$$ [A] = \begin{pmatrix} 1 & 0 & 0 & 0 \\ a_1 & 1 & 0 & 0\\ a_2 & 2a_1 & 1 & 0\\ a_3 & 3a_2 & 3a_1 & 1 \end{pmatrix}$$

9) a 'probability density function' $pdf(u)$, if it exists, that generates the moments $(a.)^n = a_n$,

$$A_n(x) = \int (u +x)^n \; pdf(u) \; du $$

$$= \sum_{k=0}^n \binom{n}{k} \; x^{n-k} \; \int \; u^k \; pdf(u) \; du = (x +a.)^n$$

Having many equivalent definitions/reps could be taken as a definition of a 'rich' mathematical construct. The historical development of determinants/matrices from Cayley and Sylvester in invariant theory to their use in symmetric function theory, in definition of special functions, in the theory of characteristic classes, and in quantum mechanics bears this out. Recognition of their utility in diverse areas of mathematics and physics evolved gradually hand-in-hand with equivalent definitions/perspectives.

$\endgroup$
2
  • 1
    $\begingroup$ To 'understand' an elephant to any real depth as opposed to simply identifying it, you need to understand its anatomy, physiology, psychology/behavior, and even evolution within a natural context. Mathematical creatures require similar large investments of time and effort to understand to some substantial depth. To present the divergence of a vector field in terms of an expression of partial derivatives is not an explanation, just a starting point in a long dialogue. $\endgroup$ Sep 27, 2021 at 23:58
  • 1
    $\begingroup$ Same with the natural logarithm. A historical perspective on the development of the logarithm first and the 'anti-logarithm" second--the exponential--and their uses in a multitude of contexts provides understanding that a simple, concise definition cannot. $\endgroup$ Sep 27, 2021 at 23:58
3
$\begingroup$

If a module $M$ over a ring $R$ has a finite presentation (i.e. is the quotient of a module of finite type by a submodule which is also of finite type), then the functor $Hom_R(M,-)$ commutes with filtered colimits. The converse is true: if the functor $Hom_R(M,-)$ preserves filtered colimits, then $M$ has a finite presentation.

Remark: The possibilty of determining an object through finitely many generators and relations can very often characterized by the property the taking maps out of it is compatible with filtered colimits/unions (e.g. for groups, rings, and so on). Taking the property that $Hom(M,-)$ preserves filtered colimits as a definition is the notion of object of finite presentation in any category, which make sense even if there are no generators nor relations to discuss. Studying categories which are generated by objects of finite presentation is thus a natural thing to do. Replacing filtered diagrams by $\kappa$-filtered ones for various cardinals $\kappa$, this naturally leads to the theory of presentable categories, which is a major branch of category theory (being in the background of the theory of Grothendieck topoi, for instance), so robust that is has a counterpart in $\infty$-category theory. Developping this kind of ideas in a derived/homotopical context has also proved to be very useful (e.g. the notion of prefect complex of quasi-coherent sheaves).

$\endgroup$
3
$\begingroup$

This isn't an answer, so much as a string of hopefully-relevant observations:

Your question feels related to the distinction between analytic and synthetic mathematics. The former defines all objects and properties in terms of an existing theory, while the latter's core objects and properties are introduced as axioms. For example, analytic geometry is associated with Descartes - describing points, lines etc. in terms of co-ordinates and equations - and synthetic geometry with Euclid and Hilbert.

Personally, I associate the analytic approach with exploration: we find some examples of the things we're interested in, work out how to specify them (definitions in terms of an existing theory), study them as the concrete objects they are, and "pseudo-empirically" discover results which we then establish more reliably with proofs. Meanwhile, the synthetic approach takes some things that we already have a good feel for - in Euclid's case from everyday life, but these days more often from playing around analytically - and tries to extract their "essence" in the form of axioms so that we can prove results about them without worrying about concrete details of "implementation".

These associations 1 are conceptual only, and not hard-and-fast rules. Still, they suggest a way to think about your question:

In the "explorative" analytic approach, we might happen upon one specification first, discover that it implies another result, and then when investigating that result in its own right find that it leads back to the original specification. Meanwhile, in the synthetic approach, if two formalizations using the same language are truly equivalent then they will have exactly the same consequences, and in some sense it's the consequences that "matter": we could if we wished take both as primitive, along with the machinery needed to translate between them, and then observe that our choice of primitives contains redundancies (but the choice of which to discard, if any, is arbitrary). I think this is what Rota was getting at.

1 There are other lurking associations too, e.g. between analytic mathematics and (material) set theory or Platonism, and between synthetic mathematics and category theory or formalism. But of course any philosophical stance or background theory is compatible with both approaches.

$\endgroup$
3
$\begingroup$

There are two completely different "representation theorems" (one famous, one not so famous) for functions of bounded variation. They yield the same theory and are ultimately connected (via the below centred equation), though this connection is not obvious at first glance. Here is the short version:

On one hand, the most basic result for functions $f:[0,1]\rightarrow \mathbb{R}$ of bounded variation is the Jordan decomposition theorem (see here), namely that $f=g-h$ where $g, h$ are monotone functions, one of which can be taken to be $\lambda x.V_0^x(f)$, where the latter is the variation of $f$ on $[0,x]$ for $x\in (0,1]$.

On the other hand, Banach proves the following surprising result (for continuous functions, but it can be generalised to any function of bounded variation): $$ V_{a}^{b}(f)=\int_{\mathbb{R}} N(f)(y) dy, \text{ where $N(f)(y)=\# \{x\in [a,b ]: f(x)=y\}$}, $$ where $N(f)$ is the Banach indicatrix. To define $N(f)$ for discontinuous $f$ of bounded variation, one uses Sierpinski's decomposition theorem, implying that for $f$ of bounded variation, there is continuous $g$ and strictly increasing $h$ with $f=g\circ h$. One can show that $N(g)=N(f)$ for such functions.

$\endgroup$
3
$\begingroup$

One of the other answers briefly mentions tensors, but I think it’s a perfect answer to the question and deserves expansion. Tensors can be defined in several ways that are equivalent, but one’s choice of definition strongly influences how one thinks of tensors.

Tensor products of vector spaces can be defined by:

  1. Universal property,

  2. Span of $(v_1 \otimes \dotsb \otimes v_n)$s, with some multilinearity relations,

  3. Space of multidimensional arrays (i.e., with choices of ordered bases for the factor spaces), or

  4. Gadgets that eat vectors and/or covectors, and spit out some other vectors (or covectors or numbers).

Similar definitions are available for tensor fields (or tensor products of bundles), and in that case there is additionally the "definition" of a tensor field as

  1. An object that transforms in certain ways under changes of coordinates.
$\endgroup$
19
  • 1
    $\begingroup$ 5 is different to the others, since it is talking about sections of tensor powers of a vector bundle and their coordinates with respect to local frames. The others are purely algebraic. So something like 'tensor-valued functions'. No one confuses a real number (Cauchy seq or Dedekind cut or...) with a function. $\endgroup$
    – David Roberts
    Sep 28, 2021 at 0:18
  • 2
    $\begingroup$ @DavidRoberts If you interpret "change of coordinates" as "automorphisms of $V_1,\dots, V_n$", (5) seems to me like a perfectly reasonable definition of a tensor. It's equivalent, I would say, to defining the tensor product as a functor from the $n$-fold product of the category of finite-dimensional vector spaces and isomorphisms, by observing that each object in this category is isomorphic to $F^m$ for some $m$ and defining an explicit representation of $GL_{m_1} (F) \times \dots \times GL_{m_n}(F)$. $\endgroup$
    – Will Sawin
    Sep 28, 2021 at 14:16
  • 1
    $\begingroup$ @LSpice A perhaps slightly different approach would be to define a tensor as a function from the set of bases of the vector space to the set of multidimensional arrays of numbers that satisfies some transformation rules. $\endgroup$
    – Will Sawin
    Sep 28, 2021 at 22:13
  • 1
    $\begingroup$ @LSpice In my original comment I intended to use a theorem like "To produce a functor from a groupoid to another category, it suffices to fix the value of that functor on one object in each equivalence class together with its automorphisms." Since you mentioned based vector spaces i thought it might be helpful to mention the category of based vector spaces with vector space isomorphisms and explain why it is relevant here. (Though I guess, among other things, I should probably have said why the category of based vector spaces with basis-preserving maps is not relevant.) $\endgroup$
    – Will Sawin
    Sep 28, 2021 at 23:53
  • 2
    $\begingroup$ I’m a little taken aback. There are several ways that people think about tensors, I listed a few. $\endgroup$ Sep 29, 2021 at 0:50
3
$\begingroup$

In set theory, measurable cardinals can be defined in two quite different ways. In the original 1930 definition, an uncountable cardinal $\kappa$ is measurable if its powerset $\mathcal{P}(\kappa)$ has a nonprincipal ultrafilter closed under meets of $<\kappa$-many elements.

In the 1960s, enough tools, techniques and concepts were developed to prove a theorem characterizing measurables in what seems to be an intrinsically 2nd-order way that quantifies over proper classes: $\kappa$ is measurable iff there is a (nontrivial) elementary embedding $j\colon V\rightarrow M$ of the universe $V$ into a model $M$ such that $\kappa$ is the critical point of $j$ — the least ordinal $\alpha$ such that $j(\alpha) \neq \alpha$.

For expository purposes the original definition is usually preferred. But it's the latter characterization that has been generalized to define yet-larger cardinals, and it's not uncommon to see authors adopting it as their definition of 'measurable' in contexts where stronger notions are also considered.

$\endgroup$
2
$\begingroup$

Several model-theoretic properties have multiple equivalent definitions. For a basic example, a theory $T$ is stable if

  • there exists $\kappa\ge\|T\|$ such that for any model $M\models T$ and $A\subseteq M$ of size $|A|\le\kappa$, there are at most $\kappa$ complete types over $A$, or equivalently,

  • there do not exist a formula $\phi(\bar x,\bar y)$, a model $M\models T$, and tuples $\{\bar a_n:n\in\omega\}$ in $M$ such that $$M\models\phi(\bar a_i,\bar a_j)\iff i<j$$ for all $i,j\in\omega$.

You can start developing stability theory taking either property as a definition, and eventually deriving the other as a theorem.

$\endgroup$
2
$\begingroup$

The concept of a Sturmian word admits several equivalent definitions which one can work with, and which are not obviously equivalent.

$\endgroup$
2
$\begingroup$

Not a specific example, but the fact that this is common in mathematics is evident from the fact that mathematicians use the acronym TFAE, or "the following are equivalent", when stating that multiple propositions can each be derived from each other. I recall there were many theorems of the form below in my lectures as an undergraduate:

TFAE:

  1. (the definition)
  2. (some property)
  3. (some other property) ...

The first item in the list is usually a statement of the definition of a concept, and the remaining items in the list would be theorems which can be proved from the definition; but "TFAE" means the proofs can be done in any direction, so any individual item in the list would be usable as a definition.

In cases where there isn't a longer list of equivalent formulations, a theorem might be labelled as an "alternative definition".

$\endgroup$
1
2
$\begingroup$

In ergodic theory, the conclusion of the Shannon-MacMillan-Breiman theorem about the pointwise growth rates of measures of cylinder sets, $h(\mu)=-\lim_{n\to\infty}1/n\log \mu([x_0\ldots x_{n-1}])$ almost everywhere, valid for ergodic shift-invariant measures is often taken as a definition of measure-theoretic entropy.

$\endgroup$
1
$\begingroup$

There seems to be plenty of such examples in the theory of Sobolev spaces and their refinements, such as Besov and Triebel-Lizorkin spaces, which have several definitions equivalent in non-obvious ways. One famous documentation of this is the H=W paper of Meyers and Serrin.

$\endgroup$
1
$\begingroup$

A couple elementary examples of this:

Greatest Common Divisor.

Consider the following two statements:

S1: The greatest common divisor $d=\gcd(a,b)$ satisfies: (i) $d \mid a$ and $d\mid b$ and (ii) if $c \mid a$ and $c \mid b$ then $c \mid d$.

S2: The greatest common divisor $d=\gcd(a,b)$ is the least positive element of the set $\{ax+by \mathrel\vert x,y \in \mathbb{Z}\}$.

I have seen textbooks define gcd with S1 and then prove S2 and I have seen textbooks define gcd with S2 and then prove S1.

Induction vs. Well-Ordering Principles

Likewise, axiomatic constructions of the natural numbers can choose to take the Principle of Induction: $$((a \in S) \land ((\forall n \ge a)(n \in S \Rightarrow n+1 \in S))\Rightarrow \{a,a+1,a+2,a+3,\dotsc\} \subseteq S$$ or the Well-Ordering Principle: $$(\forall S \in \mathcal{P}(\mathbb{N}), S \neq \emptyset)(\exists a \in S)(\forall s \in S)(a \le s)$$ as an axiom and then prove the other as a theorem.

$\endgroup$
7
  • $\begingroup$ As far as I remember, well-ordering is not able to prove the full strength of induction as witnessed by $\omega+\omega$. The crucial problem is that "every natural number has a unique predecessor" requires more than the usual Peano Axioms with Induction replaces by Well-Ordering. $\endgroup$
    – mrtaurho
    Sep 28, 2021 at 21:00
  • 1
    $\begingroup$ @mrtaurho, re, I'm not sure what are the usual Peano axioms, but, in the formulation with which I'm familiar (and that Wikipedia uses), injectivity of the successor map is taken as one of the axioms. $\endgroup$
    – LSpice
    Sep 28, 2021 at 21:14
  • $\begingroup$ @LSpice This version only guarentees the existence of a unique successor; not predecessor (or I'm confusing things right now). The example $\omega+\omega$ stands regardless, which satisfies well-ordering but not induction. $\endgroup$
    – mrtaurho
    Sep 28, 2021 at 21:45
  • $\begingroup$ @mrtaurho, re, certainly I agree that induction for $\omega + \omega$ is stronger than well ordering for $\omega$ (though not than well ordering for $\omega + \omega$, I think?). But I think injectivity of the successor function (subsingleton preimages) is exactly about unique predecessors. Unique successors is just the statement that the successor function is a function. $\endgroup$
    – LSpice
    Sep 28, 2021 at 23:14
  • 1
    $\begingroup$ @LSpice Ah, I see. Then I most likely misremembered something here. Thanks for clearing that up! $\endgroup$
    – mrtaurho
    Sep 29, 2021 at 6:04
1
$\begingroup$

In probability theory, the notion of independence of random variables is usually introduced as follows: we say that two random variables $X, Y$ are independent if for any two real numbers $(x,y)$ one has: $\mathbf{P} \{ X \le x , Y \le y \} = \mathbf{P} \{ X \le x \} \cdot \mathbf{P} \{ Y \le y \}$ (i.e, the CDF of the pair $(X,Y)$ is just the product of the CDFs of $X$ and $Y$).

However, if we go back to the formal definition of random variables as measurable functions, then independence basically means that the sigma algebras $\sigma (X)$ and $\sigma (Y)$ generated by $X, Y$ (i.e., $\sigma (A)$ is the smallest sigma algebra inside which $A$ is measurable) are independent. This definition allows us to speak about independence of more general class of random variables, than just $\mathbf{R}^n$-valued ones'.

Since, the Borel sets can be formed from the sets of the form $( -\infty , x)$ by using the operations of taking complements, countable union and countable intersections, so the above property in terms of CDFs is equivalent to the definition of independence.

$\endgroup$
1
$\begingroup$

Another example would be a Tychonoff (or Tiknonov, if you prefer) space. It can defined, equivalently, as a completely regular $T_0$-space, as homeomorphic to a subspace of some generalized cube (that is, the product space $[0, 1]^S$ for some set $S$, or as homeomorphic to a subspace of a compact Hausdorff space.

$\endgroup$
1
$\begingroup$

"There are good reasons why the theorems should all be easy and the definitions hard." --Michael Spivak, Preface to "Calculus on Manifolds", Addison-Wesley: 1965.

This may be a little off theme for the original question as it involves proving a theorem in different frameworks, rather than it being used as an definition in one of them.

I like the example of Stokes' Theorem. In 3 dimensions Stokes' can be laboriously proved in terms of vector calculus - coordinates and partial derivatives. Or one can build up the definitions of differential forms and chains, whence Stokes follows as a special case of the higher dimensional Fundamental Theorem of Calculus.

$\endgroup$
2
  • 1
    $\begingroup$ Stokes' Theorem was essentially mentioned in Buzz's answer. $\endgroup$ Oct 4, 2021 at 16:47
  • 1
    $\begingroup$ Sorry, I just wanted to put the Michael Spivak quote out there. Unfortunately his full paragraph is too long. $\endgroup$ Oct 5, 2021 at 1:11
1
$\begingroup$

From "An Elementary Treatise on Cross-ratio Geometry: With Historical Notes" by J. J. Milne, 1911, (pg. 10), a definition of the cross-ratio of four lines:

If we consider the four concurrent straight lines OA, OB, OC, OD, we define the compound ratio $\frac{\sin(A,C)}{\sin(A,D)}: \frac{\sin(B,C)}{\sin(B,D)}$ ... as the cross-ratio of the pencil O(ABCD).

The CR is preserved by linear fractional/Möbius transformations transformations, supplying a chararcterization of projective space starting with a definition of the CR in terms of angles.

Conversely, from "On Klein’s So-called Non-Euclidean geometry" by Norbert A’Campo and Athanase Papadopoulos:

The cross ratio of four points is a projective invariant, and in some sense it is a complete projective invariant, since a transformation of projective space which preserves the cross ratio of quadruples of aligned points is a projective transformation. Therefore, it is natural to try to define distances and angles using the cross ratio. This is what Klein did.

The authors assert that this forms the basis of Klein's characterization of hyperbolic, spherical, and Euclidean geometries--the three geometries with constant curvature (Euclidean and non-Euclidean)--as encompassed by projective space:

A much less known fact is that Klein, besides giving a formula for the distance function in hyperbolic geometry, gave formulae for the distance in spherical and in Euclidean geometry using the cross ratio, taking instead of the ellipse (or ellipsoid) other kinds of conics. In the case of Euclidean geometry, the conic is degenerate. In this way, the formulae that define the three geometries of constant curvature are of the same type, and the constructions of the three geometries are hereby done in a unified way in the realm of projective geometry.

$\endgroup$
2
  • $\begingroup$ This example is subsumed by coudy's more general answer above. $\endgroup$ Oct 15, 2021 at 17:18
  • $\begingroup$ For some value of "above". $\endgroup$ Nov 7, 2021 at 1:34

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge that you have read and understand our privacy policy and code of conduct.

Not the answer you're looking for? Browse other questions tagged or ask your own question.