Exploration 1: Modules and vector spaces

December 27, 2023.

Questions:

How do we use matrices to represent mathematical objects?

This entire exploration is actually going to be about matrices. Specifically, we’re going to talk about $R$ -matrices, which are matrices with entries in a ring $R$ . It turns out the study of $R$ -matrices subsumes the study of modules, which we will see by the end of this exploration.

What are modules? It is easiest to start with free modules $R^n$ , which are additive abelian groups of $R$ -vectors, which are row or column vectors whose $n$ entries are in $R$ , where addition and scalar multiplication are defined in the usual way: $\left[\begin{matrix}a_1\\\vdots\\a_n\end{matrix}\right]+\left[\begin{matrix}b_1\\\vdots\\b_n\end{matrix}\right]=\left[\begin{matrix}a_1+b_1\\\vdots\\a_n+b_n\end{matrix}\right]\quad\text{ and }\quad r\left[\begin{matrix}a_1\\\vdots\\a_n\end{matrix}\right]=\left[\begin{matrix}ra_1\\\vdots\\ra_n\end{matrix}\right]\text{ for }r\in R$

These should be familiar from linear algebra, only that the entries of $R$ -vectors are elements in the ring $R$ . In this context, elements of $R$ are called scalars and are not in $R^n$ themselves — only $R$ -vectors are in $R^n$ . Since $R^n$ is composed of $R$ -vectors, it is straightforward to describe $n\times m$ $R$ -matrices as maps $R^m\to R^n$ , like in linear algebra: $\left[\begin{matrix} r_{11}&r_{12}&\ldots&r_{1m}\\ r_{21}&r_{22}&\ldots&r_{2m}\\ \vdots&\vdots&\ddots&\vdots\\ r_{n1}&r_{n2}&\ldots&r_{nm} \end{matrix}\right]\left[\begin{matrix}v_1\\v_2\\\vdots\\v_n\end{matrix}\right]=\left[\begin{matrix}\sum_{i=1}^m v_ir_{1i}\\\sum_{i=1}^m v_ir_{2i}\\\vdots\\\sum_{i=1}^m v_ir_{ni}\end{matrix}\right]$ where the LHS vector is in $R^m$ and the RHS vector is in $R^n$ .

Theorem: Every ring

R

is a free module over itself.

Every element in the ring can be treated as a scalar or a (one-dimensional) $R$ -vector, so scalar multiplication is given by multiplication in $R$ .

This theorem introduces some ambiguity: writing $R$ could either mean the ring $R$ , or the one-dimensional module over $R$ . For clarity, here we write $R$ when we mean the ring, and we always write $R$ -modules using the letters $M$ or $N$ or $V$ .

In this section, we describe $R$ -modules beyond free modules.

$R$ -modules are more general than free modules $R^n$ , however. They are additive abelian groups of elements $v$ (not necessarily $R$ -vectors!), where elements $r\in R$ come into play by defining a scalar multiplication $r\cdot v$ governed by the following laws:

Scalar distributivity: $r\cdot(v+w)=rv+rw$
Vector distributivity: $(r_1+r_2)\cdot v=r_1\cdot v+r_2\cdot v$
Scalar associativity: $r_1\cdot(r_2\cdot v)=(r_1\cdot r_2)v$
Scalar unit law: $1\cdot v=v$

Theorem: The

\ZZ

-modules are exactly the abelian groups.

( $\to$ ) Every $R$ -module is an additive abelian group by definition.
( $\from$ ) If you have an additive abelian group, you can define scalar multiplication $n\cdot a$ as the sum of $n$ copies of $a$ . This satisfies the laws above, so we obtain a $\ZZ$ -module.

Like with groups, rings, and fields, we can define a couple concepts for $R$ -modules:

A submodule $W$ of an $R$ -module $M$ is the same as subgroup/subring/subfield – it’s just an $R$ -module that is a subset of another $R$ -module. The zero module $\{0\}$ is always a submodule of $M$ , and so is $M$ itself.
The quotient module $M/N$ is always defined – there is no need for $N$ to be a special kind of module the same way you need ideals for quotient rings and normal subgroups for quotient groups. This is because we can always eliminate the elements generating $N$ from the elements generating $M$ without breaking any of the module laws.
The direct product of two $R$ -modules is called the direct sum $N\oplus M$ . It works identically to the direct product (elements are pairs $(n,m)$ with $n\in N$ and $m\in M$ ) except we write $\oplus$ instead of $\otimes$ , referring to the additive nature of $R$ -modules.

In this section, we describe how to construct $R$ -modules.

Just like how groups and rings can be finitely generated, we can construct an $R$ -module $M$ out of a generating set of elements $\{v_1,v_2,\ldots,v_n\}$ , known as a spanning set for $M$ . We can write $M=\text{span}(\{v_1,v_2,\ldots,v_n\})$ , and say that $M$ consists of $R$ -linear combinations of its spanning set $\{v_1,v_2,\ldots,v_n\}$ , i.e. $M$ is the set $M=\text{span}(\{v_1,v_2,\ldots,v_n\})=\left\{\sum_i r_iv_i\right\}$ where the coefficients $r_i$ are elements of $R$ , and each sum is finite (though the spanning set can be infinite).

No matter the spanning set, it is always possible to generate the zero element of the $R$ -module — just take the $R$ -linear combination where all the coefficients $r_i$ are zero. Of interest is whether this is the only way to generate the zero element. If there is another way to generate the zero element, i.e. using a nonzero coefficient somewhere, then we say that the spanning set is linearly dependent. Otherwise, there is only one way to generate the zero vector, and we have a linearly independent spanning set, also known as a basis.

If we consider spanning sets as a map $R^n\to M$ from a $n$ -tuple of coefficients in $R$ to elements of the $R$ -module, then a spanning set that is a basis is interesting. This is because all spanning sets are trivially surjective (the spanning set generates the module), and if the spanning set is a basis, then there is only one way to generate the zero vector — thus the kernel is trivial, thus the map is injective and bijective. This bijection between $n$ -tuples of coefficients and elements of an $R$ -module means that elements of the $R$ -module are essentially $n$ -tuples of coefficients, also known as $R$ -vectors. Sound familiar?

Theorem: A free module

R^n

is exactly an

R

-module generated by a basis.

Every spanning set gives rise to a surjective map $R^n\to M$ from $n$ -tuples of coefficients to elements of the module.
If the spanning set is a basis, then only the zero tuple maps to the zero element, i.e. the map has a trivial kernel and is injective. Therefore a basis represents a bijection between $n$ -tuples of coefficients, i.e. $R$ -vectors, and elements of the module: $\left[\begin{matrix}a_1\\\vdots\\a_n\end{matrix}\right]\quad\text{represents}\quad a_1v_1+a_2v_2+\ldots+a_nv_n$
Thus the elements of the $R$ -module can be expressed as $R$ -vectors, which is the definition of a free module $R^n$ .

For a free module $R^n$ , the standard basis $e_i$ is defined as the columns of the $n\times n$ identity matrix $I_n$ : $e_1=\left[\begin{matrix}1\\0\\\vdots\end{matrix}\right], e_2=\left[\begin{matrix}0\\1\\\vdots\end{matrix}\right],\ldots,e_n=\left[\begin{matrix}\vdots\\0\\1\end{matrix}\right]$

Linear algebra is exactly when $R$ is a field $F$ . In fact, $F$ -modules are exactly $F$ -vector spaces. We will draw parallels to linear algebra extensively throughout this exploration.

Theorem: Every spanning set of a

F

-module

V

contains a basis, if

F

is a field.

Let $\{v_1,v_2,\ldots,v_n\}$ be an arbitrary spanning set for $V$ . If this spanning set is linearly independent, it is a basis and we are done. Otherwise, assume that $\{v_1,v_2,\ldots,v_n\}$ is linearly dependent.
If $\{v_1,v_2,\ldots,v_n\}$ is linearly dependent, then zero can be represented by some $F$ -linear combination $\sum_i r_iv_i=0$ with a nonzero coefficient. WLOG let $r_1$ be one of the nonzero coefficients.
Since we’re working in a field $F$ , $r_1$ has a multiplicative inverse $\frac{1}{r_1}$ . Then the corresponding $v_1$ can be written as a $F$ -linear combination of the other elements: $\begin{aligned} 0&=r_1v_1+r_2v_2\ldots+r_nv_n\\ v_1&=-\frac{r_2}{r_1}v_1-\ldots-\frac{r_n}{r_1}v_n \end{aligned}$
Since every $v_1$ can be represented as a $F$ -linear combination of the others, its contribution to the spanning set is redundant. We can remove $v_1$ from the spanning set because $\{v_1,v_2,\ldots,v_n\}$ and $\{v_2,\ldots,v_n\}$ generate the same module.
If the resulting spanning set $\{v_2,\ldots,v_n\}$ is not a basis, we can repeat this process to further decrease the size of the spanning set. This process terminates either at a basis or at the empty set, which is trivially a basis (for the zero module). Thus the original spanning set $\{v_1,v_2,\ldots,v_n\}$ contains a basis for $V$ .

In this section, we start exploring $R$ -matrices for real.

At the beginning we mentioned that this entire exploration is going to be about studying $R$ -matrices. One big reason is because $R$ -matrices are homomorphisms of free $R$ -modules. For instance: $A=\left[\begin{matrix}3&2&-4\\-1&4&0\\1&1&-1\end{matrix}\right]$ is a $3\times 3$ $R$ -matrix. Left multiplication by $A$ corresponds to a homomorphism $\varphi:R^3\to R^3$ . $B=\left[\begin{matrix}2&4&1&3&8\\5&8&8&1&2\\1&0&3&4&2\end{matrix}\right]$ is a $3\times 5$ $R$ -matrix. Left multiplication by $B$ corresponds to a homomorphism $\varphi:R^5\to R^3$ . Typically we just refer to the $R$ -matrix itself as the homomorphism, falling back to $\varphi$ when we’re talking about homomorphisms in the abstract.

Our first question is, which $R$ -matrices correspond to isomorphisms?

This translates to the question: when are $R$ -matrices invertible? This requires the concept of a determinant, which should be familiar from linear algebra. For $n\times n$ (square) $R$ -matrices, the determinant $\det(A)$ is the unique function $R^n\to R$ that is:

Multilinear: linear in each row and column.
Alternating: Swapping two rows or columns swaps the sign, which also implies that two identical rows/columns makes the function zero (since only $0$ is invariant under swapping those rows/columns).
Equal to $1$ for the identity matrix $I$ .

This unique function is $\det(A)=\sum_{\sigma\in S_n}\sgn(\sigma)\prod_{i=1}^n a_{i,\sigma(i)}$ , which:

Is multilinear: each term of the sum is a product $\prod_{i=1}^n a_{i,\sigma(i)}$ of matrix entries of in each column $\sigma(i)$ .
Is alternating: since a property of $\sgn(\sigma)$ is that it switches sign when you swap two elements in the permutation (i.e. you swap two rows or columns).
Is $1$ for identity: because the summand $\prod_{i=1}^n a_{i,\sigma(i)}$ is only nonzero for the identity permutation $\sigma=1$ where $\sgn(\sigma)=1$ .

Although the above shows that the given function is a determinant, the proof that it is the unique function that is the determinant requires some more advanced tools we’ll introduce later. Let’s first determine when an square $R$ -matrix is invertible.

Therefore: We can write the determinant recursively as

\det(A)=\sum_{j=1}^n a_{ij}C_{ij}

, where

C_{ij}=(-1)^{i+j}\det(M_{ij})

, for an arbitrary row index

i

. This is known as the cofactor expansion of the determinant along the row

i

Recall the definition of the determinant from above: $\det(A)=\sum_{\sigma\in S_n}\sgn(\sigma)\prod_{i=1}^n a_{i,\sigma(i)}$ The determinant $\det(A)$ can be defined recursively. First, we factor a single $a_{i,\sigma(i)}$ out of the product: $\det(A)=\sum_{\sigma\in S_n}\sgn(\sigma)a_{i,\sigma(i)}\prod_{k=1,k\ne i}^n a_{k,\sigma(k)}$ Now note that since the sum goes over every permutation $\sigma\in S_n$ , the $i$ we just took out gets permuted to every value $j$ from $1$ to $n$ , $(n-1)!$ times each (since there are $(n-1)!$ permutations in $S_n$ that takes $i$ to any given value). This means $a_{ij}$ appears $(n-1)!$ times in the sum, so we can move $a_{i,\sigma(i)}$ out as $a_{ij}$ iff there are $(n-1)!$ permutations for which $\sigma(i)=j$ . But this is true since there are $(n-1)!$ permutations of the remaining $n-1$ elements as well. $\det(A)=\sum_{j=1}^n\left(a_{ij}\sum_{\sigma\in S_n,\sigma(i)=j}\sgn(\sigma)\prod_{k=1,k\ne i}^n a_{k,\sigma(k)}\right)$ Note that the inner sum and product $\sum_{\sigma\in S_n,\sigma(i)=j}\sgn(\sigma)\prod_{k=1,k\ne i}^n a_{k,\sigma(k)}$ is very close to the formula for the determinant of a matrix that has the row $i$ removed (since $a_{i,\sigma(i)}$ is excluded from the product) and the column $j$ removed (since $\sigma$ is defined to send $i$ to $j$ , meaning the column $j$ only appears in the form $a_{i,\sigma(i)}=a_{ij}$ , but the row $i$ is excluded so it never appears.)

Define the minor $M_{ij}$ as the matrix $A$ with the row $i$ and column $j$ removed. Then we have $\det(M_{ij})=\sum_{\sigma\in S_n,\sigma(i)=j}\sgn(\sigma)\prod_{k=1,k\ne i}^n a_{k,\sigma(k)}$ This is almost the determinant function as defined above, except we’re relying on $n$ instead of the actual size of the matrix $n-1$ . To correct this, we can take permutations $\tau\in S_{n-1}$ and use the entries $b_{ij}$ of the minor instead: $\det(M_{ij})=\sum_{\tau\in S_{n-1}}\sgn(\sigma)\prod_{k=1}^n b_{k,\tau(k)}$ Note that we’re still mentioning $\sigma$ , which we define as the permutation in $S_n$ corresponding to $\tau$ where $\sigma(i)=j$ . To convert $\sgn(\sigma)$ to $\sgn(\tau)$ , let’s move the $i$ th row to the top of the matrix, which is $i-1$ transpositions, multiplying the determinant by $(-1)^{i-1}$ due to the alternating property of the determinant. Similarly, we multiply the determinant by $(-1)^{j-1}$ to represent moving the $j$ th column to the left of the matrix. This ensures that all the rows and columns of the minor are contiguous, so the permutations in $S_{n-1}$ use the correct indices. Overall, we’re multiplying $\sgn(\sigma)$ by a total of $(-1)^{i+j-2}=(-1)^{i+j}$ to get $\sgn(\tau)$ , and thus: $(-1)^{i+j}\det(M_{ij})=\sum_{\tau\in S_{n-1}}\sgn(\tau)\prod_{k=1}^n b_{k,\tau(k)}$ which matches our original definition of determinant for the minor $M_{ij}$ . The value $(-1)^{i+j}\det(M_{ij})$ is known as the cofactor $C_{ij}$ of the matrix $A$ .

Finally, we can plug this back into our original expression for $\det(A)$ to get: $\det(A)=\sum_{j=1}^n a_{ij}C_{ij}$ for some fixed row index $i$ .

The cofactor expansion can be used to directly define the inverse of an $R$ -matrix.

Therefore:

A^{-1}=\frac{\adj(A)}{\det(A)}

, which only exists when

\det(A)

is a unit in the ring.

Recall the cofactor expansion: $\det(A)=\sum_{j=1}^n a_{ij}C_{ij}$ Notice how similar it is to matrix multiplication: $[A\cdot B]_{ik}=\sum_{j=1}^n a_{ij}b_{jk}$ In fact, let $\adj(A)$ be the adjugate matrix of $A$ where each entry $b_{ij}$ is equal to $C_{ji}$ (note the swapped indices). Then we have exactly $[A\cdot\adj(A)]_{ik}=\sum_{j=1}^n a_{ij}b_{jk}=\sum_{j=1}^n a_{ij}C_{kj}=\begin{cases}\det(A)&\text{if }i=k\\0&\text{if }i\ne k\end{cases}=[\det(A)\cdot I]_{ik}$ The reason that $\sum_{j=1}^n a_{ij}C_{kj}$ is zero when $i\ne k$ is because if you copy the values in row $i$ to row $k$ , resulting in a matrix $A'$ , the sum becomes $\sum_{j=1}^n a_{kj}C_{kj}=\det(A')$ where $C_{kj}$ is unchanged since it is calculated on a minor with the row $k$ removed. But since $A'$ has two identical rows, then by the alternating property of determinants, $\det(A')=0$ . Thus the sum is zero.

So we have proved the following: $A\cdot\adj(A)=\det(A)\cdot I$ Note that if $\det(A)$ is a unit in the ring $R$ , then we have $A\cdot\frac{\adj(A)}{\det(A)}=I$ implying that $\frac{\adj(A)}{\det(A)}$ is the right inverse of $A$ . A similar argument shows that $\frac{\adj(A)}{\det(A)}$ is the left inverse of $A$ as well.

Corollary: An

R

-matrix is invertible iff its determinant is a unit in

R

Since you can only divide by units in a ring, and the inverse matrix is exactly the adjugate divided by the determinant, the determinant must be a unit in order to define the inverse matrix.

This should be familiar from linear algebra, where you have $F$ -matrices invertible iff its determinant is nonzero. Of course, this is exactly because the only nonunit in a field $F$ is zero. For an example that isn’t a field, note that the units of $\ZZ$ are $\pm 1$ . Thus a $\ZZ$ -matrix is invertible iff its determinant is $\pm 1$ .

For us, invertible matrices are significant because, being reversible, their corresponding homomorphism is an isomorphism. More importantly, this means multiplying an $R$ -matrix $A$ by an invertible matrix $P$ is the same as composing an isomorphism $\tau$ to the $R$ -module homomorphism $\sigma$ corresponding to $A$ . Right-multiplying by $P$ is precomposition by $\tau$ , and left-multiplying by $P$ is postcomposition by $\tau$ . As we know, isomorphisms are essentially renamings, so composing a homomorphism by isomorphisms gives you essentially the same homomorphism.

So we have proved the existence and a formula for the inverse of an $R$ -matrix. To build on this, let’s introduce how they let us manipulate $R$ -matrices for the rest of this exploration.

We introduce the three row and column operations.

First, we may swap two rows of a matrix by left-multiplying by a suitable elementary matrix. ${\color{blue}\left[\begin{matrix}1&0&0\\0&0&1\\0&1&0\end{matrix}\right]} \left[\begin{matrix}1&1&1\\2&2&2\\3&3&3\end{matrix}\right] =\left[\begin{matrix}1&1&1\\3&3&3\\2&2&2\end{matrix}\right]$

Second, we may multiply any row by a unit (here, assume $10$ is a unit) by left-multiplying by another kind of elementary matrix. ${\color{blue}\left[\begin{matrix}10&0&0\\0&1&0\\0&0&1\end{matrix}\right]} \left[\begin{matrix}1&1&1\\2&2&2\\3&3&3\end{matrix}\right] =\left[\begin{matrix}10&10&10\\2&2&2\\3&3&3\end{matrix}\right]$

Third, we may add a multiple of any row to another row, again by left-multiplying by a third kind of elementary matrix. ${\color{blue}\left[\begin{matrix}1&0&0\\0&1&0\\10&0&1\end{matrix}\right]} \left[\begin{matrix}1&1&1\\2&2&2\\3&3&3\end{matrix}\right] =\left[\begin{matrix}1&1&1\\2&2&2\\13&13&13\end{matrix}\right]$

Column operations use the same (transposed) elementary matrices, except you right-multiply instead of left-multiply.

Obviously these operations can be undone by doing the operation again – you can un-swap by swapping again, un-scale by scaling by the inverse unit, and un-add the row $cr_i$ by adding $-cr_i$ . This implies that the elementary matrices are invertible. In fact, the inverse of an elementary matrix represents the inverse operation.

When you apply row and column operations to a matrix, you’re essentially doing a clever renaming of the domain (column operations) and codomain (row operations). The resulting matrix represents the same homomorphism but on the renamed domain/codomain.

How do the row and column operations affect the determinant? We can give a short proof of each:

Theorem: Swapping two rows flips the sign of the determinant.

This is exactly the alternating property of determinants.

Theorem: Scaling a row by a unit will scale the determinant by the same unit.

This follows immediately from the multilinear property of determinants.

Theorem: Adding a multiple of a row to another row does not change the determinant.

If row $r_i$ becomes row $r_i+cr_j$ , then by multilinearity, $\begin{aligned} &\det(\text{matrix where }r_i=r_i+cr_j)\\ &=\det(\text{original matrix where }r_i=r_i)\\ &+c\cdot\det(\text{matrix where }r_i=r_j) \end{aligned}$ But the last matrix has two identical rows $r_i$ and $r_j$ , so by the alternating property its determinant is zero. Therefore the resulting matrix has determinant equal to that of the original matrix.

Corollary: Row and column operations multiply the determinant by a unit.

Follows immediately from the above three theorems, which show that the three row and column operations multiply the determinant by $-1,c,1$ respectively ( $c$ a unit).

Theorem: The determinant of a product of matrices

\det(AB)

is the product of their individual determinants

\det(A)\det(B)

Using the formula for determinant for $AB$ , we have $\begin{aligned} \det(AB)&=\sum_{\sigma\in S_n}\sgn(\sigma)\prod_{i=1}^n(AB)_{i,\sigma(i)}\\ &=\sum_{\sigma\in S_n}\sgn(\sigma)\prod_{i=1}^n\sum_{k=1}^n a_{ik}b_{k,\sigma(i)} \end{aligned}$ Summing over the index $k$ from $1$ to $n$ can be done in any order since addition is commutative. Thus we can add them in an arbitrary order. Since we’re multiplying together a sum $\prod_i\sum_k$ , by the distributive property the result is essentially adding together all the ways of picking one term from each sum and multiplying them together, just like how $(a+b)(c+d)=ac+ad+bc+bd$ . This is the same as adding together all the ways of picking one $k$ for each $i$ , so we can express this as the sum of using all permutations taking $i$ to $k$ : $\begin{aligned} \det(AB)&=\sum_{\sigma\in S_n}\sgn(\sigma)\sum_{\tau\in S_n}\prod_{i=1}^n a_{i,\tau(i)}b_{\tau(i),\sigma(i)}\\ &=\sum_{\tau\in S_n}\sum_{\sigma\in S_n}\sgn(\sigma)\left(\prod_{i=1}^n a_{i,\tau(i)}\right)\left(\prod_{i=1}^n b_{\tau(i),\sigma(i)}\right)\\ &=\sum_{\tau\in S_n}\left(\prod_{i=1}^n a_{i,\tau(i)}\right)\left(\sum_{\sigma\in S_n}\sgn(\sigma)\prod_{i=1}^n b_{\tau(i),\sigma(i)}\right) \end{aligned}$ We can reorder the last product by applying $\tau^{-1}$ to the indices $i$ : $\begin{aligned} \det(AB)&=\sum_{\tau\in S_n}\left(\prod_{i=1}^n a_{i,\tau(i)}\right)\left(\sum_{\sigma\in S_n}\sgn(\sigma)\prod_{i=1}^n b_{i,\sigma(\tau^{-1}(i))}\right) \end{aligned}$ Such a reordering is equivalent to multiplying by the signature of $\tau$ : $\begin{aligned} \det(AB)&=\sum_{\tau\in S_n}\left(\prod_{i=1}^n a_{i,\tau(i)}\right)\left(\sum_{\sigma\in S_n}\sgn(\sigma)\sgn(\tau)\prod_{i=1}^n b_{i,\sigma(i)}\right)\\ &=\left(\sum_{\tau\in S_n}\sgn(\tau)\prod_{i=1}^n a_{i,\tau(i)}\right)\left(\sum_{\sigma\in S_n}\sgn(\sigma)\prod_{i=1}^n b_{i,\sigma(i)}\right)\\ &=\det(A)\det(B) \end{aligned}$

In this section, we show how row and column operations can be used to simplify a matrix.

Therefore: If

R

is a Bézout domain, then any

R

-matrix

A

can be reduced to Smith normal form, a diagonal matrix where each nonzero entry divides the next.

Using row and column operations, you can always reduce an $R$ -matrix down into the block form $\left[\begin{matrix}D&0\\0&0\end{matrix}\right]$ , where $D$ is a diagonal $R$ -matrix, i.e. all its nonzero entries are on the main diagonal.

There is a way to get $D$ into the form such that the diagonal elements $d_1,d_2,\ldots$ divide each order: $d_1\mid d_2\mid d_3\mid\ldots$ . This form is called Smith normal form. Any $R$ -matrix can be reduced to Smith normal form, provided $R$ is a Bézout domain, which is an integral domain in which every sum of principal ideals is principal. For instance, in PIDs every ideal is principal, so PIDs are Bézout domains.

Theorem: Bézout domains define a GCD

d=\gcd(a,b,\ldots)

and have Bézout’s identity: for every

\{a,b,\ldots\}

, there exist

\{x,y,\ldots\}

where

ax+yb+\ldots=\gcd(a,b,\ldots)

The fact that every sum of principal ideals is principal $(a)+(b)+\ldots=(d)$ $(a) + (b) + \dots = (d)$ implies two things:
- First, since $(d)$ contains each of $\{(a),(b),\ldots\}$ , we know that $d$ is a common divisor of $\{a,b,\ldots\}$ .
- Second, $(a)+(b)+\ldots=(d)$ represents all the linear combinations of $\{a,b,\ldots\}$ . Since a common divisor of $\{a,b,\ldots\}$ must divide all linear combinations of those elements $=(d)$ , every common divisor must divide $d$ in particular.
From this, we can conclude two things:
- Since every common divisor divides $d$ , it is in fact the greatest common divisor $\gcd(a,b,\ldots)$ .
- $\gcd(a,b,\ldots)$ is some linear combination of $\{a,b,\ldots\}$ , i.e. we have Bézout’s identity $ax+yb+\ldots=\gcd(a,b,\ldots)$ for some $\{x,y,\ldots\}$ .

Method: To put a matrix into Smith normal form, choose $d_1$ to be the GCD of all its entries. Then use row/column operations to isolate $d_1$ in the upper left such that its row and column are zero except for $d_1$ , getting a block matrix $\left[\begin{matrix} d_1&\begin{matrix}0&\ldots\end{matrix}\\ \begin{matrix}0\\\vdots\end{matrix}&\left[\begin{matrix}&&\\&B&\\&&\end{matrix}\right] \end{matrix}\right]$ where $B$ is the minor $M_{11}$ where every entry is divisible by $d_1$ because of how we picked $d_1$ . Because of Bézout’s identity, it’s always possible to isolate the GCD using row and column operations this way. Repeating the process for the minor $B$ gives you Smith normal form. Here’s a short example:

$\begin{aligned} &~\left[\begin{matrix}2&-1\\1&2\end{matrix}\right]\\ =&~\left[\begin{matrix}1&-1\\3&2\end{matrix}\right]{\color{red}\left[\begin{matrix}1&0\\1&1\end{matrix}\right]^{-1}}&\text{ add col }2\text{ to col }1\\ =&~\left[\begin{matrix}1&0\\3&5\end{matrix}\right]{\color{red}\left(\left[\begin{matrix}1&1\\0&1\end{matrix}\right]\left[\begin{matrix}1&0\\1&1\end{matrix}\right]\right)^{-1}}&\text{ add col }1\text{ to col }2\\ =&~\left[\begin{matrix}1&0\\3&5\end{matrix}\right]{\color{red}\left[\begin{matrix}2&1\\1&1\end{matrix}\right]^{-1}}\\ =&~{\color{blue}\left[\begin{matrix}1&0\\-3&1\end{matrix}\right]^{-1}}\left[\begin{matrix}1&0\\0&5\end{matrix}\right]{\color{red}\left[\begin{matrix}2&1\\1&1\end{matrix}\right]^{-1}}&\text{ subtract }3(\text{row }1)\text{ from row }2\\ \end{aligned}$

Taking the Smith normal form of a matrix $A$ can be written $A={\color{blue}U}A'{\color{red}V}$ , where ${\color{blue}U},{\color{red}V}$ are the elementary matrices used to reduce $A$ to the diagonal matrix $A'$ . The entries $d_i$ along the diagonal of $A'$ are called the invariant factors of $A$ , because you will always arrive at these factors in the Smith normal form regardless of how you do the row and column operations.

Theorem: The invariant factors of an

R

-matrix

A

are unique up to units.

At each step, the invariant factor $d_i$ is chosen as the GCD of the remaining elements, and since GCDs are unique up to units, the invariant factors are unique up to units.

Theorem: The determinant of a diagonal

R

-matrix is equal to the product of its entries.

In the definition of determinant, $\det(A)=\sum_{\sigma\in S_n}\sgn(\sigma)\prod_{i=1}^n a_{i,\sigma(i)}$ you can see that unless $\sigma(i)=i$ for each $i$ , the product will include a zero term that makes the whole product zero. This is because $a_{ij}$ for $i\ne j$ is zero in a diagonal matrix. Thus the only $\sigma$ that results in a nonzero term is the identity permutation, and so for diagonal matrices the formula simplifies down to $\det(A)=\prod_{i=1}^n a_{ii}$

Corollary: The determinant of an

R

-matrix is equal to the product of its invariant factors, up to units.

This follows immediately from the fact that the determinant of a matrix equals (up to units) the determinant of its Smith normal form, which is equal to the product of the diagonal entries, which are the invariant factors.

Theorem: The invariant factors of a matrix

A

are all ones iff

A

is invertible.

If $A$ is invertible, then its determinant is a unit, which can only be a product of units. But since the determinant is the product of invariant factors up to units, that’s the same as saying every invariant factor is a unit, and therefore associate to $1$ .

Corollary: The Smith normal form of an invertible matrix is the identity matrix.

Corollary: Every invertible matrix over a Bézout domain

R

factors into elementary matrices.

For matrices over a Bézout domain, the Smith normal form is obtained via row and column operations, a process that only factors out corresponding elementary matrices. By the previous theorem, the resulting Smith normal form of an invertible matrix is the identity matrix, which is also an elementary matrix. Thus the original matrix factors into elementary matrices.

In this section, we abstractly represent relationships between $R$ -modules via $R$ -matrices.

We already know that a $n\times m$ $R$ -matrix $M$ defines a homomorphism $R^m\to R^n$ . Let’s bring back our example matrix $A$ and explore some properties of this homomorphism: $A=\left[\begin{matrix}3&2&-4\\-1&4&0\\1&1&-1\end{matrix}\right]$ First, the columns of an $R$ -matrix $M$ generate its image (also known as column space), which is written $\im M=MR^m=\span(\text{columns of }M)$ For instance with $A$ above, $\im A=AR^3=\span\left( \left[\begin{matrix}3\\-1\\1\end{matrix}\right], \left[\begin{matrix}2\\4\\1\end{matrix}\right], \left[\begin{matrix}-4\\0\\-1\end{matrix}\right]\right)$ Second, the solutions to $AX=0$ comprise the kernel (also known as null space) of $A$ . The kernel of a diagonal matrix $D$ is relatively straightforward. The equation $\left[\begin{matrix}d_1&&\\&d_2&\\&&\ddots\end{matrix}\right] \left[\begin{matrix}x_1\\x_2\\\vdots\end{matrix}\right]=0$ is equivalent to the system of equations $\begin{aligned} d_1x_1&=0\\ d_2x_2&=0\\ &\ldots \end{aligned}$ In an integral domain, $x_i$ is zero when $d_i\ne 0$ , and $x_i$ can take on any value when $d_i=0$ . This means that the kernel of a diagonal matrix is $\span(e_i,e_j,\ldots)$ where $\{i,j,\ldots\}$ are the indices $i$ where $d_i=0$ . In particular, if there are no zeroes along the main diagonal of a diagonal matrix, then its kernel is trivial.

Theorem: An

R

-matrix defines an injective

R

-module homomorphism iff its kernel is trivial.

( $\to$ ) If $A:R^m\to R^n$ is injective, then $AX=AY$ implies $X=Y$ . In particular, elements $X$ of the kernel, where $AX=0$ , must be zero because $AX=0\implies AX=A0\implies X=0$ .
( $\from$ ) If $A:R^m\to R^n$ has a trivial kernel, then consider $AX=AY$ , which implies $AX-AY=0$ and $A(X-Y)=0$ . Since the kernel is trivial, we have $X-Y=0$ and thus $X=Y$ , showing that $A$ is injective.

Theorem: An

R

-matrix defines an surjective

R

-module homomorphism iff its image is the codomain.

By definition. Since surjective means the whole codomain is mapped to, it’s the same as saying that the image is equal to the codomain.

Corollary: Since the columns of $A:R^m\to R^n$ generate $\im A$ by definition, if $A$ is surjective, then its columns generate $R^n$ .

Theorem: The kernel of an

R

-matrix

A

over a Bézout domain

R

is the same as the kernel of its Smith normal form

QA'P^{-1}

For example, let’s take the Smith normal form of $A$ : $A=QA'P^{-1}= \left[\begin{matrix}-3&-2&2\\-1&-1&2\\-2&-1&1\end{matrix}\right] \left[\begin{matrix}1&0&0\\0&2&0\\0&0&3\end{matrix}\right] \left[\begin{matrix}1&0&-2\\-3&1&4\\-1&1&1\end{matrix}\right]$ and then solve $QA'P^{-1}X=0$ with these steps: $\begin{aligned} QA'P^{-1}X=&~0\\ \left[\begin{matrix}-3&-2&2\\-1&-1&2\\-2&-1&1\end{matrix}\right] \left[\begin{matrix}1&0&0\\0&2&0\\0&0&3\end{matrix}\right] \left[\begin{matrix}1&0&-2\\-3&1&4\\-1&1&1\end{matrix}\right]X=&~0\\ \left[\begin{matrix}1&0&0\\0&2&0\\0&0&3\end{matrix}\right] \left(\left[\begin{matrix}1&0&-2\\-3&1&4\\-1&1&1\end{matrix}\right]X\right)=&~0&\text{ left-multiply by }Q^{-1}\\ \left[\begin{matrix}1&0&-2\\-3&1&4\\-1&1&1\end{matrix}\right]X=&\left[\begin{matrix}0\\0\\0\end{matrix}\right]&\text{ solve for }P^{-1}X\\ X=\left[\begin{matrix}1&0&-2\\-3&1&4\\-1&1&1\end{matrix}\right]^{-1}&\left[\begin{matrix}0\\0\\0\end{matrix}\right]&\text{ left-multiply by }P\\ X=&\left[\begin{matrix}0\\0\\0\end{matrix}\right]\\ \end{aligned}$ In this case, $A$ has a trivial kernel $\{0\}$ . You can see that this is precisely because its Smith normal form has no zero entries along the diagonal. We’ll be talking about the kernel in the abstract, defined as the solutions to $AX=0$ . We can see that the kernel is easily calculable when $R$ is a Bézout domain, but it may not be as easy for other rings.

Theorem: The kernel and image of a homomorphism

\sigma:M\to N

is a submodule of the domain

M

and codomain

N

respectively.

Contains $0$ : $\sigma$ must send $0$ to $0$ .
Closed under addition: because $\sigma(m)+\sigma(n)=\sigma(m+n)$ , if both $m$ and $n$ are in the kernel/image so is $m+n$ .
Closed under scalar multiplication: because $r\sigma(m)=\sigma(rm)$ , if $m$ is in the kernel/image so is $rm$ .

Like with groups, an $R$ -module is simple if has no proper non-trivial submodules.

Theorem: Any homomorphism

\sigma:M\to N

between simple modules

M

and

N

is either trivial or an isomorphism.

Since kernel and image must both be submodules of $M$ and $N$ respectively, and simple modules have only two possible choices for submodules (trivial submodule and the module itself), the only possible homomorphisms are when the kernel is trivial (implying that the image is $N$ , therefore an isomorphism) and when the kernel is $M$ (implying that the image is trivial, therefore the trivial homomorphism).

In this section, we describe properties of $R$ -modules with only diagrams.

Here we’re going to represent $R$ -matrices abstractly as homomorphisms $\sigma$ on $R$ -modules.

Consider two homomorphisms $\sigma:A\to B,\tau:B\to C$ where the image of $\sigma$ is the kernel of $\tau$ . In other words, everything $\sigma$ maps to will get mapped to $0$ by $\tau$ . This property is called exactness, and we say this sequence is exact at $B$ , and any sequence of homomorphisms $A\xrightarrow{\sigma}B\xrightarrow{\tau}C\xrightarrow{\upsilon}\ldots$ with the property that “the image of one homomorphism is the kernel of the next” is called an exact sequence.

In fact, we have discussed the concept of exact sequences before for groups. Exact sequences for groups and exact sequences for $R$ -modules are precisely the same concept, but since $R$ -modules have a richer structure, we may use exact sequences in ways that we cannot do for groups.

To recap, we showed the following for exact sequences for groups. These are true for $R$ -modules as well:

Theorem: The exact sequence $\{0\}\xrightarrow{\sigma}B\xrightarrow{\tau}C$ implies $\tau$ is injective.
Theorem: The exact sequence $A\xrightarrow{\sigma}B\xrightarrow{\tau}\{0\}$ implies $\sigma$ is surjective.
First Isomorphism Theorem: Given $\sigma:M\to N$ , $M/\ker\sigma\iso\im\sigma$ .
An exact sequence of the form $\{0\}\to A\xrightarrow{\sigma}B\xrightarrow{\tau}C\to\{0\}$ is known as a short exact sequence, with the following properties:
- Theorem: $A$ is isomorphic to a submodule of $B$ .
- Theorem: If $\sigma$ is an inclusion map, then $C\iso B/A$ .
- Splitting lemma: If there is an left inverse homomorphism $\sigma^{-1}:B\to A$ , or if there is an right inverse homomorphism $\tau^{-1}:C\to B$ , then the sequence splits and we have $B\iso A\oplus C$ .
- A short exact sequence describes an embedding of $A$ into $B$ , and then $C\iso B/A$ captures the structure that $A$ doesn’t account for. When $\sigma$ or $\tau$ has an inverse, then the splitting lemma says you can directly piece together the structures $A$ and $C$ to obtain $B$ , via $A\oplus C\iso B$ .

For an example of how exact sequences are used, we can look at projections. A projection is a idempotent endomorphism, i.e. a homomorphism $\pi:M\to M$ such that $\pi^2=\pi$ .

Theorem: A module

M

can be decomposed into the direct sum

M=\ker\pi\oplus\im\pi

iff a projection

\pi:M\to M

exists.

( $\from$ ) Any direct sum $A\oplus B$ has a canonical left and right projection, so the forward direction is trivial.
( $\to$ ) For any homomorphism like $\pi$ , we can always write the short exact sequence $\{0\}\to\ker\pi\xrightarrow{\iota}M\xrightarrow{\pi}\im\pi\to\{0\}$ because the kernel of $\pi$ is exactly the image of the injective inclusion map $\iota:\ker\pi\to M$ , and $\pi$ is surjective onto its image $\im\pi$ .
Since $\pi$ is idempotent, it acts like the identity on its image $\im\pi$ . Since that implies $\pi(m)=m$ for $m\in\im\pi$ , we have $\pi(\pi(m))=m$ , meaning $\pi$ itself is its own inverse for elements in its image $m\in\im\pi$ .
By the splitting lemma, since $\pi:M\to\im\pi$ has an inverse $\pi':\im\pi\to M$ , the sequence splits and therefore $M=\ker\pi\oplus\im\pi$ .

In this section, we utilize exact sequences in the context of $R$ -modules specifically.

Take this exact sequence, for instance: $R^m\xrightarrow{A}R^n\xrightarrow{\sigma}M\to\{0\}$ This exact sequence is known as a presentation of the $R$ -module $M$ . The idea is that $M$ is entirely described by the presentation, and the presentation is entirely described by the $R$ -matrix $A$ , called the presentation matrix. In other words, it only takes a single $R$ -matrix, $A$ , to define all you need to know about an $R$ -module $M$ . Let’s see why this is.

Theorem: In the above exact sequence,

M

is isomorphic to the elements

X

R^m

where

AX=0

By the first isomorphism theorem on $\sigma$ $σ$ , we have $R^n/\ker\sigma\iso\im\sigma$ $R^{n} / ker σ ≅ im σ$ . However:
- The exact sequence $R^n\to M\to\{0\}$ implies that the homomorphism $\sigma:R^n\to M$ is surjective, and thus $M$ is isomorphic to the image of $\sigma$ . Thus $\im\sigma\iso M$ .
- By exactness, $\im A=\ker\sigma$ .
Then we have $R^n/\im A\iso M$ . This is the same as saying $M$ is $R^n$ , except we send all elements that are linear combinations of the columns of $A$ to $0$ ( $AX=0$ ).

In the context of presentations, the basis of $R^n$ (which is the codomain of $A$ ) are known as generators of $M$ , and the columns of $A$ describes relations $AX=0$ on the generated elements $X$ . So the $R$ -matrix $A$ , by virtue of defining both the generators of $M$ , and the relations on those generators of $M$ , completely determines the $R$ -module $M$ . Because every $R$ -matrix $A$ defines a $R$ -module this way, we say that $M\iso R^n/\im A$ is the cokernel of $A$ . It is so named because there is a duality between the kernel and the cokernel. Where the kernel is comprised of the elements in the domain sent to zero by $A$ , and a trivial kernel implies injectivity of $A$ , the cokernel is comprised of the structure in the codomain not sent to zero by $A$ , and a trivial cokernel implies surjectivity of $A$ .

Theorem: An

R

-matrix defines an surjective homomorphism iff its cokernel is trivial.

Surjectivity of $A:R^n\to R^n$ means $\im A=R^n$ . But then the cokernel $R^n/\im A$ is trivial if and only if $\im A=R^n$ .

Hopefully this makes clear the reason we only study $R$ -matrices in this exploration — because every $R$ -module can be defined as the cokernel of a suitable $R$ -matrix, studying $R$ -matrices completely subsumes the need to study $R$ -modules directly!

In this section, we describe how to derive the $R$ -matrix that presents a given $R$ -module.

Let’s do the reverse. How do you construct the $A$ -matrix (the presentation matrix) that presents a given (finitely generated) $R$ -module? Here is a step-by-step:

Step 1: Find its generators $v_i$ .
Step 2: Find the relations among those generators.
Step 3: Express the relation vectors as columns of the presentation matrix.
Step 4: Reduce the matrix.

(Note that a non-finitely generated $R$ -module will ask for infinitely many $v_i$ , so describing that class of $R$ -modules would require a different process.)

Example:

Let $M$ be a $\ZZ$ -module generated by $(v₁,v₂,v₃)$ under the relations $\begin{aligned} 3v_1+2v_2+v_3&=0\\ 8v_1+4v_2+2v_3&=0\\ 7v_1+6v_2+2v_3&=0\\ 9v_1+6v_2+v_3&=0 \end{aligned}$ Express each relation as a column of the presentation matrix $A$ : $A=\left[\begin{matrix}3&8&7&9\\2&4&6&6\\1&2&2&1\end{matrix}\right]$ so that the above system of relations can be summarized as $\left[\begin{matrix}v_1&v_2&v_3\end{matrix}\right]A=\left[\begin{matrix}0&0&0&0\end{matrix}\right]$ And theoretically we are done, since $A$ presents $M$ .

However, we can reduce this matrix down a bit. The following simplifying operations also do not change the isomorphism class of the module presented by $A$ :

Row or column operations.
Removing a column of zeroes (since that represents the zero relation $0v_1+0v_2+0v_3=0$ )
Removing the $i$ th row and $j$ th column whenever the $j$ th column consists of zeros except for one $1$ (since that represents the relation $v_i=0$ , and so we can remove $v_i$ from consideration)

Reducing our matrix above:

$\begin{aligned} &~\left[\begin{matrix}3&8&7&9\\2&4&6&6\\1&2&2&1\end{matrix}\right]\\ =&~\left[\begin{matrix}0&2&1&6\\0&0&2&4\\1&2&2&1\end{matrix}\right]&\begin{matrix}r_1-3r_3\to r_1\\r_2-2r_3\to r_2\end{matrix}\\ =&~\left[\begin{matrix}2&1&6\\0&2&4\end{matrix}\right]&\text{remove }c_1,r_3\\ =&~\left[\begin{matrix}2&1&6\\-4&0&-8\end{matrix}\right]&r_2-2r_1\to r_2\\ =&~\left[\begin{matrix}-4&-8\end{matrix}\right]&\text{remove }c_2,r_1\\ =&~\left[\begin{matrix}-4&0\end{matrix}\right]&-2c_1+c_2\to c_2\\ =&~\left[\begin{matrix}-4\end{matrix}\right]&\text{remove }c_1\\ =&~\left[\begin{matrix}4\end{matrix}\right]&-1c_1\to c_1\\ \end{aligned}$

Since the image of $\left[\begin{matrix}4\end{matrix}\right]:\ZZ\to\ZZ$ is $4\ZZ$ , the big $3\times 4$ matrix above actually just presents the module $\ZZ/4\ZZ$ ! Thus $M=\ZZ/4\ZZ$ .

We’ve seen how it’s done for a specific finitely generated $R$ -module, and this approach generalizes for all finitely generated $R$ -modules. Thus we can say that every finitely generated $R$ -module $M$ can be presented by an $R$ -matrix. But how do you tell when $M$ is finitely generated in the first place?

In this section, we learn the conditions under which an $R$ -module is finitely generated.

To see whether an $R$ -module $M$ is finitely generated (and thus presentable), let’s try building one.

Start with the zero submodule $W_0=\{0\}$ of $M$ . At every step, add an element $v_i\in M$ outside of $W_{i-1}$ to get $W_i$ , generated by $v_i$ and the old $W_{i-1}$ .

The fact of the matter is, if $M$ is finitely generated, then eventually there are no more elements $v$ outside of $M$ to add, so the process stops. Otherwise, the process keeps going. One way to characterize this is known as the ascending chain condition (ACC) on submodules: “There is no infinite strictly increasing chain $W_1<W_2<\ldots$ of submodules of $M$ .” Satisfying the ACC is the same as saying that the process stops.

Theorem: Every submodule

W\le M

is finitely generated iff it satisfies the ACC.

( $\to$ ) Towards contradiction, say you have an infinitely increasing chain $W_1<W_2<\ldots$ of submodules of $M$ . Let $U$ be the infinite union of these submodules, so $U$ is a submodule of $M$ too, and in particular $U$ is a finitely generated superset of every submodule. This means that $U$ appears at some point in the infinite chain, but it is also a superset of every submodule of the infinite chain, so some point $\ldots<U<\ldots<U$ . This implies $W_i=W_{i+1}$ at some point. So every infinite chain of $W_i$ is not strictly increasing, contradiction.
( $\from$ ) Let $w_i$ be a special set of generators of W, inductively constructed by having $W_0=\{0\}$ be generated by $w_0=\{\}$ , and having $W'\le W$ be generated by the existing $w_i$ together with an element v in $W$ but not $W'$ . Since we’re constructing a strictly increasing chain of submodules $W_i$ , and there is no infinite strictly increasing chain of submodules by the ACC, this process ends with a finite number of generators in $w_i$ .

Recall that integral domains that satisfy the ascending chain condition on principal ideals (ACCP) are exactly the factorization domains. We can say something similar here. A noetherian $R$ -module is one where every submodule is finitely generated, i.e. satisfies the ACC (by the above proof). Every Noetherian $R$ -module can be described by a presentation matrix.

In this section, we explore the properties of Noetherian rings.

More generally, a noetherian ring is one where every ideal is finitely generated, i.e satisfies the ACC. Note that it’s possible for a ring to be finitely generated but have ideals that are not finitely generated, so proving noetherianity requires proving facts about ideals, not the whole ring. We’re going to prove a number of theorems that let us work with such rings:

Theorem: Surjective homomorphisms preserve noetherianity. (If the domain is Noetherian, so is the codomain.)

Because surjective homomorphisms preserve all subset relations, and thus preserves the ACC.

Theorem: Quotients preserve noetherianity. (If

R

is Noetherian, so are its quotients

R/I

Because the canonical map $\pi:R\to R/I$ is surjective, and surjective homomorphisms preserve noetherianity.

Theorem: Finite direct sums preserve noetherianity. (If

R,S

are Noetherian, so is the direct sum

R\oplus S

Ideals in $R\oplus S$ are in the form $I_R\oplus I_S$ , where $I_R$ is an ideal in $R$ and $I_S$ is an ideal in $S$ . This is precisely because there is no way for elements of $R$ to influence elements of $S$ and vice versa.
Because $R,S$ are Noetherian, $I_R$ and $I_S$ are finitely generated, from which you can construct a finite set of generators for $I_R\oplus I_S$ .

Corollary: Free modules over Noetherian rings are Noetherian. (If

R

is a Noetherian ring, then

R^n

is a Noetherian

R

-module.)

You can take the direct sum of $n$ copies of $R$ to get $R^n$ . Since finite direct sum preserves noetherianity, when $R$ is Noetherian, $R^n$ is Noetherian.

Theorem: Every ideal of a Noetherian ring

R

is contained in a maximal ideal, except

R

itself.

All Noetherian rings satisfy ACC, which says every ideal is contained in a chain up to some point, i.e. the maximal ideal.

Hilbert Basis Theorem: If

R

is Noetherian, so is

R[x]

The goal is to prove that an arbitrary ideal $I$ for $R[x]$ is finitely generated, given the ACC for $R$ .
Lemma: The leading coefficients of degree $n$ polynomials in any ideal $I$ of $R[x]$ form an ideal $C_n$ of $R$ .

If $c_i\in C_n$ is the leading coefficient of $f_i\in I$ , then $c_i+c_j$ is the leading coefficient of $f_i+f_j$ , and $rc_i$ (for arbitary $r\in R$ ) is the leading coefficient of $rf_i$ . $f_i+f_j$ and $rf_i$ are both in $I$ since it’s an ideal, and therefore $c_i+c_j$ and $rc_i$ are both in $C_n$ , making it an ideal.
We have $C_n\subseteq C_{n+1}$ because for every $c_i\in C_n$ and corresponding degree $n$ polynomial $f_i$ , $xf_i$ exists (due to $I$ being an ideal) as a degree $n+1$ polynomial with the same leading coefficient $c_i\in C_{n+1}$ .
Thus the $C_n$ form a chain of ideals in $R$ , which stops strictly increasing at some point due to $R$ being Noetherian and therefore satisfying the ACC. Let $C_N$ be the last strictly increasing ideal in this chain (the one where $C_n=C_N$ for all $n\ge N$ ).
Take $J$ as all the polynomials $f_i$ corresponding to the generators $c_i$ of $C_N$ . Each $f_i$ has degree at most $N$ . This is a finite number of polynomials since each $C_n$ is finitely generated, due to $R$ being Noetherian. Clearly $J\subseteq I$ , as the $f_i$ are taken from $I$ .
We can prove any polynomial $f\in I$ $f \in I$ can be expressed in terms of these $f_i$ $f_{i}$ , i.e. $I\subseteq J$ $I \subseteq J$ .
- First, if $\deg f>N$ then pick some polynomial $g\in J$ with the same leading coefficient as $f$ , which exists since $C_{\deg f}=C_N$ . Then $f'=f-x^{\deg f-\deg g}g$ is a lower-degree polynomial in $I$ due to $I$ being an ideal. Repeat this process until you obtain a polynomial whose degree is $\le N$ , reducing it to the second case:
- Then if $\deg f\le N$ , then the leading coefficient of $f$ is in $C_N$ and therefore you can generate $f$ using the elements in $J$ . To see this, notice that you can match the leading coefficient by adding polynomials corresponding to the leading coefficients in C_N, and you can do the same process for every lower coefficient.
Since $I=J$ , and $J$ is finitely generated, any arbitrary ideal $I$ of $R[x]$ is finitely generated, thus $R[x]$ is Noetherian.

Corollary: If $R$ is a Noetherian ring, so is anything in the form $R[x]/(f)$ because quotients preserve noetherianity.

Theorem: The Noetherian Bézout domains are exactly the PIDs.

( $\to$ ) Every ideal is finitely generated in a Noetherian ring, and every finitely generated ideal is principal in a Bézout domain. Thus every ideal is principal in a Noetherian Bézout domain.
( $\from$ ) In a PID, every ideal is principal making it trivially Bézout, and every ideal is generated by one element making it trivially Noetherian.

Structure Theorem for Modules over a PID: if

M

is a finitely generated

R

-module where

R

is a PID, then it can be uniquely decomposed as a direct sum of quotients of

R

. To be specific,

M\iso R/(f_1)\oplus\ldots\oplus R/(f_n)

where

f_i

are elements of

R

that divide each other:

f_1\mid\ldots\mid f_n

We can always find the $R$ -presentation matrix $A$ for any finitely generated Noetherian $R$ -module $M$ (so that $M\iso R^n/\im A$ ). In this case, $M$ is given as finitely generated over a PID, and is Noetherian because PIDs are Noetherian.
The Smith normal form is defined for any matrix defined over a Bézout domain. PIDs are also Bézout domains, so taking the Smith normal form of $A$ gives $A=PDQ^{-1}$ , obtaining a diagonal matrix $D$ of invariant factors $f_i$ .
In particular, the resulting diagonal matrix $D=\left[\begin{matrix}f_1&&&\\&f_2&&\\&&\ddots&\\&&&f_n\end{matrix}\right]$ has the property that $f_1\mid\ldots\mid f_n$ , which is a property of the Smith normal form of any matrix.
Then, since $P$ and $Q$ are invertible (i.e. isomorphisms), we have $\im A\iso\im PDQ^{-1}\iso\im D$
The image of a matrix is the span of its columns. For a diagonal matrix $D$ , each column corresponds to a scalar multiplication of a single dimension of $M$ , and so the matrix itself can be expressed as a direct sum $\bigoplus_i f_iR$ , with one submodule $f_iR$ for each dimension.
Then we have $M\iso R^n/\im A\iso R^n/(f_1R\oplus f_2R\oplus\ldots\oplus f_nR)$ which by the Third Isomorphism Theorem is isomorphic to $M\iso R/(f_1)\oplus R/(f_2)\oplus\ldots\oplus R/(f_n)$
For the proof of uniqueness, any other decomposition into some $R/(g_1)\oplus R/(g_2)\oplus\ldots\oplus R/(g_n)$ would imply that the $g_i$ are invariant factors of $A$ appearing on the diagonal of its Smith normal form, which must be the same invariant factors as $f_i$ since the Smith normal form is unique.

This theorem is a generalization of the structure theorem for finitely generated abelian groups, which is the specific case where $R=\ZZ$ , using the fact that the $\ZZ$ -modules are isomorphic to the abelian groups.

In this section, we examine the components of finitely generated $R$ -modules.

In the above theorem, we found that any finitely generated $R$ -module (over a PID $R$ ) can be expressed as a direct sum of components in the form $R/(f)$ . $R$ -modules in the form $R/(f)$ are called cyclic $R$ -modules.

In general, a cyclic $R$ -module $M$ is one in which every element of the module can be expressed as scalar multiples of a single element $m\in M$ . Because of this, cyclic $R$ -modules can be written as $Rm$ . If $m\in R$ , then $Rm$ is isomorphic to the principal ideal $(m)$ .

Theorem:

R/(f)

is a cyclic

R

-module.

Since every element of $R/(f)$ is a coset in the form $[f]$ , the multiplicative identity is the coset $[1]$ . But the multiplicative identity generates all elements of $R/(f)$ , thus $R/(f)$ is a cyclic $R$ -module.

So finitely generated $R$ -modules are composed of a direct sum of cyclic $R$ -submodules $R/(f)$ .

Recall that quotienting a ring/module essentially sends all quotiented elements to zero. In other words, $R/(f)$ means “ $R$ , where $f=0$ ”. Any relation can be expressed this way – if you want to apply the relation $a=b$ , then you quotient by $a-b$ to send it to zero.

Cyclic $R$ -modules $Rm$ make this even simpler. Since every possible element can be expressed in the form $am$ for some element $a$ , every relation on $Rm$ can be expressed in the form $am=0$ instead of $a-b=0$ .

In fact, the elements $a$ that make $am=0$ true are given a special name: the annihilator of $m$ .

Theorem: Given a cyclic

R

-module

Rm

, the elements

a\in R

such that

am=0

form an ideal of

R

It is enough to prove that these elements $a$ are closed under subtraction and by multiplication with elements $r\in R$ .
If $am=0$ and $bm=0$ , then $(a-b)m=am-bm=0-0=0$ , thus elements $a$ are closed under subtraction.
If $am=0$ , then for arbitrary $r\in R$ we have $(ra)m=r(am)=r0=0$ .
Thus these elements $a\in R$ where $am=0$ form an ideal of $R$ .

So this ideal of $R$ , containing all elements $a\in R$ where $am=0$ , is called the annihilator of $m\in Rm$ , and is written $\Ann_R(m)$ . The idea is that the annihilator consists of all elements that send the generator $m$ (and therefore all elements) to zero. In fact:

Theorem: Every cyclic module

Rm

is isomorphic to

R/Ann_R(m)

Let $\sigma: R\to Rm$ be the map $r\mapsto rm$ .
By definition of cyclic module, the image of $\sigma$ is all of $Rm$ , thus $\sigma$ is surjective.
The kernel of $\sigma$ consists of elements $a$ where $am=0$ , which is exactly $\Ann_R(m)$ .
Then by the First Isomorphism Theorem we have $R/\ker\sigma\iso\im\sigma$ , which is $R/\Ann_R(m)\iso Rm$ .

Thus we have that every quotient by a principal ideal $R/(f)$ is cyclic (with $m=[1]$ ), and every cyclic module $Rm$ is isomorphic to the quotient $R/\Ann_R(m)$ . If $R$ is a PID, then $\Ann_R(m)$ is a principal ideal as well, and we get an isomorphism between quotients by principal ideals $R/(f)$ and cyclic modules $Rm$ . The important part is: $R/(f)$ is either the free module $R$ (when $f=0$ ) or a torsion module $R/(f)$ (when $f\ne 0$ ).

Recall the structure theorem of modules over a PID. Now that we know more about cyclic modules, we know that any module $M$ defined over a PID $R$ is isomorphic to a direct sum of free submodules $R/(0)\iso R$ , and torsion cyclic submodules $R/(f_i)$ , which are ordered by divisibility: $f_1\mid\ldots\mid f_r$ for some order of the $f_i$ .

$M\iso\underbrace{R^n}_{\text{free}}\oplus\underbrace{R/(f_1)\oplus R/(f_2)\oplus\ldots}_{\text{torsion}}$

We can go a bit further:

Theorem: Over a PID

R

, every cyclic module

Rm

is a direct sum of cyclic modules, each of which is either a free cyclic module

R/(0)\iso R

, or a torsion cyclic module

R/(p^k)

where

p

is irreducible in

R

and

k>0

Since $R$ is a PID, $\Ann_R(m)$ (being an ideal) is a principal ideal $(a)$ .
Recall that $Rm\iso R/\Ann_R(m)$ .
If $a=0$ then $Rm\iso R/(0)\iso R$ , meaning $R/(m)$ is a free cyclic module.
Otherwise, if $a$ is nonzero, note that $R$ is a PID and therefore a UFD. Then every nonzero $a$ has a unique factorization into primes $p_1^{k^1}p_2^{k^2}\ldots p_n^{k^n}$ .
So we have $R/(m)\iso R/\Ann_R(m)\iso R/(p_1^{k^1}p_2^{k^2}\ldots p_n^{k^n})$ which factors by the Chinese Remainder Theorem into $R/(p_1^{k^1})\oplus R/(p_2^{k^2})\oplus\ldots\oplus R/(p_n^{k^n})$ .
Therefore $Rm$ is either a free cyclic module, or factors into torsion cyclic modules where the annihilator is a prime power $p^k$ .

Thus another way to decompose a finitely generated module over a PID is $M\iso\underbrace{R^n}_{\text{free}}\oplus\underbrace{R/(p_1^{k_1})\oplus R/(p_2^{k_2})\oplus\ldots}_{\text{torsion}}$ where these prime power factors $p_i^{k_i}$ are known as elementary divisors.

Thus there are two ways to decompose a module according to the structure theorem:

A hierarchial decomposition: $M\iso\underbrace{R^n}_{\text{free}}\oplus\underbrace{R/(f_1)\oplus R/(f_2)\oplus\ldots}_{\text{torsion}}$ where the invariant factors $f_i$ are ordered by divisibility: $f_1\mid\ldots\mid f_r$ , and are obtained by the Smith normal form.
Into irreducibles: $M\iso\underbrace{R^n}_{\text{free}}\oplus\underbrace{R/(p_1^{k_1})\oplus R/(p_2^{k_2})\oplus\ldots}_{\text{torsion}}$ where the elementary divisors $p_i^{k_i}$ are powers of irreducibles $p_i$ in the ring $R$ , obtained by factoring each $R/(f_i)=R/(p_1^{k^1}p_2^{k^2}\ldots p_n^{k^n})$ via the Chinese Remainder Theorem.

The product of invariant factors must be equal to the product of the elementary divisors, since these describe the same module.

In this exploration, we discovered how $R$ -matrices can be used to fully describe all the homomorphisms between free $R$ -modules. As a bonus, we also learned how presentation matrices can be used to fully describe any $R$ -module, and further, broke down the structure of finitely generated modules over a PID.

Next time we’ll be applying this structure theorem to characterize completely all the homomorphisms between non-free modules.

< Back to category Exploration 1: Modules and vector spaces (permalink)
Exploration 2: Module actions