An Intro to Sets I
Set theory has some very powerful concepts and results that make proving results in other branches of math easier. It also has plenty of results that are fascinating in and of themselves. It can be used as a basis for the rest of math, and is one of the few branches of math that can generate results about all of math. For these reasons, we want to dive into some of the basics of set theory.
We’ve been using sets without defining them. There are different ways to define them, and we’ll use the ZFC Axioms (there are several other good references to them online as well)
Even though the axioms are available on other sites, we’ll list them and go over them here as well. Feel free to skip this part if you’re already familiar with them.
First, though, we’ll have to discuss the language of set theory. One of the axioms makes reference to properties, and as the Wikipedia entry says, that can be “formulated as a well-formed formula in a first-order logic whose atomic formulas were limited to set membership and identity.”
What this means is that we only need the relations, “\(=\)” (identity, or equality) and “\(\in\)” (membership) to generate the “properties”. This means that there are two kinds of atomic formulas, if \(s\) and \(t\) are variables, we have
- \( s = t \)
- \( s \in t \)
as the two possible atomic formulas.
Well-formed formulas in first order logic are built up from smaller formulas, starting with atomic formulas, through the logical connectives of predicate calculus, i.e. “\( \wedge \)” (“and”), and “\( \urcorner \)” (“not”), as well as quantification, i.e. “\( \forall \)” (“for all”) so that if \(\sigma\) and \( \tau \) are well-formed formulas then so are
- \( \sigma \wedge \tau \) ("\( \sigma \) and \( \tau \)")
- \( \urcorner\sigma \) (“not \( \sigma \)”)
- \( \forall s,\)\( \sigma \) (“for all \(s\), \( \sigma \) holds”)
There are other connectives and a quantification that can be built up from these:
- \( \sigma \vee \tau \) means “\( \urcorner (\urcorner\sigma \wedge \urcorner\tau) \)”
- \( \sigma \rightarrow \tau \) means “\( \urcorner \sigma \vee \tau \)”
- \( \sigma \leftrightarrow \tau \) means “\( (\sigma \rightarrow \tau) \)\(\wedge (\tau \rightarrow \sigma) \)”
- \( \exists s, \sigma(s) \) means “\( \urcorner (\forall s, \urcorner \sigma(s)) \)”
We’ll also sometimes use shorthand like “\( s \not\in t \)” for “\( \urcorner (s \in t) \)”.
We have just a couple more definitions to get out of the way before we can start digging into some axioms.
Definition: Bound and Free variables, sentences
Let \(\forall s, \sigma(s,t)\) be a formula, then, since \(s\) occurs inside a quantifier (in this case the “\(\forall\)”), it is considered to be a bound variable. On the other hand, since \(t\) does not occur inside any quantifier, it is considered to be a free variable.
A formula with no free variables is called a sentence.
We needed to explain what well-formed formulas were, because the axiom schema of separation involves them (they create the “properties”). It is a schema because there is actually one axiom for each well-formed formula. This is necessary because in first-order logic we aren’t allowed to quantify over formulas.
Axiom: Schema of Separation
Let \( \sigma(s,t) \) be a well-formed formula, and \(X\) be an arbitrary set, then there is another set \(Y\), such that for all \(t\),
\[t \in Y \leftrightarrow t \in X \wedge \sigma(s,t)\]
\( s \) is allowed to be an arbitrary variable. In fact, \(s\) can stand in for several variables, \( s_1, s_2, …, s_k \).
\( \sigma(s,t) \) is the “property”.
What we’re getting here is that a property and a set will determine another set. The use of variables allows us to explicitly define finite subsets. For example, we could use the formula
\[ \sigma(s_1, s_2, s_3) := (t = s_1) \vee (t = s_2) \vee (t = s_3) \]
to define a subset of a given set that contains only the elements \( s_1 \), \( s_2 \), and \( s_3 \), provided they are in the original set.
The next axiom schema references functions. A function is nothing more nor less than a well-formed formula that says that for each \(x\), there is a unique \(y\), (i.e. \( y = f(x) \)). That formula will look like
$$ \begin{align} \forall x \forall y_1 \forall y_2, &\sigma(s_1, s_2, …, s_k, x, y_1) \\ \wedge &\sigma(s_1, s_2, …, s_k, x, y_2) \\ \rightarrow & y_1 = y_2 \\ \end{align} $$
Axiom: Schema of Replacement
Let \( f(x) = y \) be a function, and \(X\) be a set. Then there is a set, \(Y\), where $$ \begin{align} &\forall y, y \in Y \leftrightarrow \\ &\exists x \in X \wedge y = f(x) \\ \end{align} $$
We’ve seen this before, \(Y\) is just the range (or image) of \(f\), where the domain of \(f\) is \(X\).
Keep in mind that this can be expressed in the language of set theory more formally as
$$ \begin{align} \forall &s_1 \forall s_2 … \forall s_k [\forall x \forall y_1 \forall y_2 \\ [ &\sigma(s_1, s_2, …, s_k, x, y_1) \\ \wedge &\sigma(s_1, s_2, …, s_k, x, y_2) \\ \rightarrow &y_1 = y_2 ] \\ \\ &\rightarrow \\ \\ [& \forall X \exists Y \forall u, y \in Y \\ &\leftrightarrow \exists v, v \in X \wedge \\ &\sigma(s_1, s_2, …, s_k, v, u) ]] \\ \end{align} $$
and even this could be broken down further so that it only uses “\( \urcorner \)”, “\( \forall \)”, “\( \in \)”, “\( = \)”, and “\( \wedge \)” and variable names.
That’s the last of the axiom schemas. The rest won’t make use of formulas.
Axiom: Extensionality
Let \( X \) and \(Y\) be sets then we have $$ \begin{align} \forall s, s \in &X \leftrightarrow s \in Y \\ \rightarrow &X = Y \\ \end{align} $$
The converse, i.e. \[ X = Y \rightarrow \forall s, s \in X \leftrightarrow s \in Y \]
is a consequence of the definition of “\( = \)”.
You’ve probably heard of the power set before, so this should be straight-forward.
Axiom: Power Set
Let \(X\) be a set, then there exists another set, \(Y\), such that
$$ \begin{align} \forall S, S &\in Y \\ \leftrightarrow [\forall s, s &\in S \\ \rightarrow s &\in X] \\ \end{align} $$
\(Y\) is typically written, “\(P(X)\).”
Axiom: Pairing
Let \(s\) and \(t\) be sets, then \( \{ s, t \} \) is also a set.
Pairing is so simple that you might think we don’t need it, because it’s covered by the schema of separation, but remember, separation only works to define a subset of a given set. Pairing, on the other hand, doesn’t require the existence of a set containing \(s\) and \(t\), it guarantees the existence of that set.
The next axiom might not mean what you’re expecting. If you’re familiar with the union of two sets, (e.g. \( A \cup B \)), that’s not what this axiom gives us. It’s quite a bit more powerful than that, and can be used to generate the union of two sets (we’ll show that after we present it).
Axiom: Union
Let \( S \) be a set, then there exists a set, \(U\), such that
$$ \begin{align} \forall u, [u &\in U \leftrightarrow \\ \exists s, s &\in S \wedge u \in s] \\ \end{align} $$
This is also written
\[ U = \bigcup S \]
or sometimes even
\[ U = \mathcal{U}S \]
Written another way, with an indexing set, this means that
\[ U = \bigcup_{s \in S} s \]
We can use pairing to generate the “union of two sets”, i.e.
\[ A \cup B = \bigcup \{ A, B \} \]
For the next axiom, we will need to use the notion of a non-empty set. Notice that we can define the empty set with \[ \forall x, x \not\in \emptyset \]
By the axiom of extensionality, the empty set is unique.
Axiom: Regularity
Let \(S\) be a set, \(S \neq \emptyset\), then there exists \(s \in S\), such that
\[ \forall t \in S, t \not\in s \]
We can now use the axiom of pairing, on the set \(S\), to get \(\{S, S\} = \{S\}\), then applying the axiom of regularity to that set we get that \( S \not\in S \). This holds for all sets.
For the next axiom, we have to define what “infinite” means. It turns out that this is surprisingly simple. A set, \(S\), is infinite if and only if there is a one-to-one function, \(f\), from \(S\) into a proper subset of \(S\).
The axiom schema of replacement involved functions which were definable by formulas. The last two axioms, however, involve more arbitrary functions. What we mean is that some functions don’t have to be thought of as being defined by formulas. They, instead, can be through of as subsets of cartesian products of two sets.
We’ve defined one-to-one functions in previous entries. Notice that they can be described formally by a well-formed-formula, starting with a formula that describes a function, and then adding this
$$ \begin{align} f(x_1) &= y \wedge f(x_2) = y \\ \rightarrow x_1 &= x_2 \end{align} $$
Axiom: Infinity
There exists a set \(S\), such that \(S\) is infinite.
We’ve saved the most “controversial” axiom for last. It’s controversial, because it’s so powerful that it allows mathematicians to prove results that some people doubt.
In order to introduce it, we must first describe what a “choice function” is. A choice function is a function whose domain is a collection of sets, and whose range is elements of each set: one element per set.
The collection doesn’t necessarily have to be a set.
Mini-lemma: Collection of all sets is not a set
The collection of all sets is not a set.
Proof
Suppose, for a contradiction, that it is a set. Call it \( \mathcal{S} \). Then, since \( \mathcal{S} \) is a set, we have \( \mathcal{S} \in \mathcal{S} \), but we showed earlier that that can’t happen for any set (it violates the axiom of reguliarity when combined with the pairing axiom).
QED
A collection of sets which is not a set is called a class. A class can be thought of in mostly the same way as a set, except that a class may not belong to another class. That is, every element of a class is a set.
Axiom: Choice
Let \(\mathcal{S}\) be a collection of sets (it could be either a set, or a class). Then there is a function, \( F \), such that for any set, \(S \in \mathcal{S} \), we have \( F(S) = s \), where \(s \in S\).
We should explain what it means for a function to be defined on a class. What it really means is that “\(F \upharpoonright S \)”, which means “\( F \) with its domain restricted to the set \(S\)” is a function.
We also need to clarify what form that function might take. In the axiom schema of replacement each function was defined by a formula. There are other ways of defining functions. A function,
\[ f : X \rightarrow Y \]
from \(X\) to \(Y\) can also be constructed as a subset of the Cartesian Product, \( X \times Y \), which we define just below.
Definition: Cartesian Product
Let \(X\) and \(Y\) be sets, then an ordered pair, \( (x,y) \), where \( x \in X \) and \(y \in Y\) is, as the name suggests, a collection of exactly two elements, where the order matters. I.e. \( (x,y) \neq (y,x) \), in general (although, it could happen that they are equal, if \( x = y \)).
The ordered pair \( (x,y) \) can be represented as a set by
\[ (x,y) := \{ \{x\}, \{x,y\} \} \]
The Cartesian Product of \(X\) and \(Y\) is the set of all ordered pairs, where the first member is an element of \(X\) and the second an element of \(Y\). I.e.
$$ \begin{align} (x,y) &\in X \times Y \leftrightarrow \\ x & \in X \wedge y \in Y \\ \end{align} $$
As mentioned just before the above definition, a function can be viewed as a special subset of a cartesian product. Let \( S_{F} \subseteq X \times Y \), then if we have
$$ \begin{align} \forall (&x_1,y_1) \forall (x_2, y_2) \in S_{F} \\ &x_1 = x_2 \rightarrow y_1 = y_2 \\ \end{align} $$
then \( S_F \) defines a function. So, we’ve seen that a special type of well-formed formula can define a function, but so can a special type of set. It should also be noted that a function defined on a class will be a special type of subclass, analogous to the special type of subset of a cartesian product. This is the set (or class) whose existence is postulated by the Axiom of Choice.
We’ve used order relations in earlier entries. They can also be viewed as subsets of Cartesian Products. Depending on the relation, they will satisfy different formulas.
Definitions: Linear and Well-order
We defined partial order here.
Let \(S\) be a set and \( <_{w} \) be a partial order on \(S\) such that
- \( <_w \) is a linear order. That is, if we have \( s <_w t \) and \( t <_w u \), then we must also have \( s <_w u \).
- For every subset \(X \subseteq S\), there exists an element, \( \lambda \in X \), such that for all \(x \in X \), either:
- \( \lambda = x \), or
- \( \lambda <_w x \)
but not both. That is, \( s \not<_w s \), for all \(s \in S\). We write “\( s \leq_w t \)” for “\( s <_w t \) or \( s = t \).”
Then \( <_w \) is said to be a well-order on \(S\).
Definitions: Order-preserving function, increasing function
Let \( f : S \rightarrow T \) be a function from a set \(S\) to itself, where \(S\) is partially-ordered by \( <_S \), and \( T \) is partially-ordered by \( <_T \), such that for all \(s, t \in S\), we have
$$ s <_S t \implies f(s) <_T f(t) $$
then \(f\) is said to be an order-preserving function.
If \( <_S \) and \( <_T\) are linear orders, then \(f\) is sometimes said to be an increasing functions.
Increasing functions should be familiar from calculus.
Note that since well-orderings are a special type of linear orderings that an order-preserving function between two well-ordered sets could be called an increasing function.
Definition: Ordinal Number
A set, \( \alpha \), is called an ordinal number if:
- it is well-ordered by “\( \in \)”
- it is transitive, which means that \( \bigcup \alpha \subseteq \alpha \)
There are many results to prove about ordinal numbers (or “ordinals”). We will only cover the very basics in the remainder of this entry.
The next few results are quick and basic, so we won’t be giving each its own section. (This is a practice we’ll probably start doing more often on this site.)
Mini-lemmas
- \( \emptyset \) is an ordinal number.
- If \( \alpha \) is an ordinal, and \( T \subseteq \alpha \), and \( T \) is transitive, then one of the following must hold:
- \( T = \alpha\)
- \( T = \{ a \in \alpha | a \in \tau \} \), for some \( \tau \in \alpha\).
- If \( \alpha \) is an ordinal, and \( \beta \in \alpha \), then \( \beta \) is also an ordinal.
Proof
-
\( \emptyset \) is trivially both transitive and well-ordered by “\( \in \)”.
-
Suppose \( T \neq \alpha \), and let \( \lambda \) be the \( \in \)-least element of \( \alpha \) not in \(T\), then there are no elements, \( t \in T\) such that \( \lambda \in t \), otherwise, since \(T\) is transitive, we would have \( \lambda \in T \), a contradiction. This means that \(T\) is an initial segment of \( \alpha \), i.e. \( T = \{ a \in \alpha | a \in \lambda \} \).
-
Since \(\alpha\) is transitive, we have \( \beta \subseteq \alpha \), and since \( \alpha \) is well-ordered by \( \in \), \( \beta \) must be as well (since it is a subset). Thus we only have to show that \( \beta \) is transitive to show that it is also an ordinal. Let \( \delta \in \gamma \) and \( \gamma \in \beta \). Since \( \beta \subseteq \alpha \) and \( \alpha \) is linearly ordered by \( \in\) (since it is well-ordered by \( \in \)), we have to have \( \delta \in \beta \). Thus \( \bigcup \beta \subseteq \beta \), and \( \beta \) is an ordinal.
QED
That finishes this entry. We’ll continue our discussion of sets before returning to our earlier discussion of Rings, and other algebraic structures. Thanks for reading!!