tinytheorems - about groups - rings - modules - models atom/rss

Exploration 1: Modules and vector spaces

.

Questions:


This entire exploration is actually going to be about matrices. Specifically, we’re going to talk about RR-matrices, which are matrices with entries in a ring RR. It turns out the study of RR-matrices subsumes the study of modules, which we will see by the end of this exploration.

What are modules? It is easiest to start with free modules RnR^n, which are additive abelian groups of RR-vectors, which are row or column vectors whose nn entries are in RR, where addition and scalar multiplication are defined in the usual way: [a1an]+[b1bn]=[a1+b1an+bn] and r[a1an]=[ra1ran] for rR\left[\begin{matrix}a_1\\\vdots\\a_n\end{matrix}\right]+\left[\begin{matrix}b_1\\\vdots\\b_n\end{matrix}\right]=\left[\begin{matrix}a_1+b_1\\\vdots\\a_n+b_n\end{matrix}\right]\quad\text{ and }\quad r\left[\begin{matrix}a_1\\\vdots\\a_n\end{matrix}\right]=\left[\begin{matrix}ra_1\\\vdots\\ra_n\end{matrix}\right]\text{ for }r\in R

These should be familiar from linear algebra, only that the entries of RR-vectors are elements in the ring RR. In this context, elements of RR are called scalars and are not in RnR^n themselves — only RR-vectors are in RnR^n. Since RnR^n is composed of RR-vectors, it is straightforward to describe n×mn\times m RR-matrices as maps RmRnR^m\to R^n, like in linear algebra: [r11r12r1mr21r22r2mrn1rn2rnm][v1v2vn]=[i=1mvir1ii=1mvir2ii=1mvirni]\left[\begin{matrix} r_{11}&r_{12}&\ldots&r_{1m}\\ r_{21}&r_{22}&\ldots&r_{2m}\\ \vdots&\vdots&\ddots&\vdots\\ r_{n1}&r_{n2}&\ldots&r_{nm} \end{matrix}\right]\left[\begin{matrix}v_1\\v_2\\\vdots\\v_n\end{matrix}\right]=\left[\begin{matrix}\sum_{i=1}^m v_ir_{1i}\\\sum_{i=1}^m v_ir_{2i}\\\vdots\\\sum_{i=1}^m v_ir_{ni}\end{matrix}\right] where the LHS vector is in RmR^m and the RHS vector is in RnR^n.

Theorem: Every ring RR is a free module over itself.

Every element in the ring can be treated as a scalar or a (one-dimensional) RR-vector, so scalar multiplication is given by multiplication in RR.

This theorem introduces some ambiguity: writing RR could either mean the ring RR, or the one-dimensional module over RR. For clarity, here we write RR when we mean the ring, and we always write RR-modules using the letters MM or NN or VV.

In this section, we describe RR-modules beyond free modules.

RR-modules are more general than free modules RnR^n, however. They are additive abelian groups of elements vv (not necessarily RR-vectors!), where elements rRr\in R come into play by defining a scalar multiplication rvr\cdot v governed by the following laws:

Theorem: The Z\ZZ-modules are exactly the abelian groups.
  • (\to) Every RR-module is an additive abelian group by definition.
  • (\from) If you have an additive abelian group, you can define scalar multiplication nan\cdot a as the sum of nn copies of aa. This satisfies the laws above, so we obtain a Z\ZZ-module.

Like with groups, rings, and fields, we can define a couple concepts for RR-modules:

In this section, we describe how to construct RR-modules.

Just like how groups and rings can be finitely generated, we can construct an RR-module MM out of a generating set of elements {v1,v2,,vn}\{v_1,v_2,\ldots,v_n\}, known as a spanning set for MM. We can write M=span({v1,v2,,vn})M=\text{span}(\{v_1,v_2,\ldots,v_n\}), and say that MM consists of RR-linear combinations of its spanning set {v1,v2,,vn}\{v_1,v_2,\ldots,v_n\}, i.e. MM is the set M=span({v1,v2,,vn})={irivi}M=\text{span}(\{v_1,v_2,\ldots,v_n\})=\left\{\sum_i r_iv_i\right\} where the coefficients rir_i are elements of RR, and each sum is finite (though the spanning set can be infinite).

No matter the spanning set, it is always possible to generate the zero element of the RR-module — just take the RR-linear combination where all the coefficients rir_i are zero. Of interest is whether this is the only way to generate the zero element. If there is another way to generate the zero element, i.e. using a nonzero coefficient somewhere, then we say that the spanning set is linearly dependent. Otherwise, there is only one way to generate the zero vector, and we have a linearly independent spanning set, also known as a basis.

If we consider spanning sets as a map RnMR^n\to M from a nn-tuple of coefficients in RR to elements of the RR-module, then a spanning set that is a basis is interesting. This is because all spanning sets are trivially surjective (the spanning set generates the module), and if the spanning set is a basis, then there is only one way to generate the zero vector — thus the kernel is trivial, thus the map is injective and bijective. This bijection between nn-tuples of coefficients and elements of an RR-module means that elements of the RR-module are essentially nn-tuples of coefficients, also known as RR-vectors. Sound familiar?

Theorem: A free module RnR^n is exactly an RR-module generated by a basis.
  • Every spanning set gives rise to a surjective map RnMR^n\to M from nn-tuples of coefficients to elements of the module.
  • If the spanning set is a basis, then only the zero tuple maps to the zero element, i.e. the map has a trivial kernel and is injective. Therefore a basis represents a bijection between nn-tuples of coefficients, i.e. RR-vectors, and elements of the module: [a1an]representsa1v1+a2v2++anvn\left[\begin{matrix}a_1\\\vdots\\a_n\end{matrix}\right]\quad\text{represents}\quad a_1v_1+a_2v_2+\ldots+a_nv_n
  • Thus the elements of the RR-module can be expressed as RR-vectors, which is the definition of a free module RnR^n.

For a free module RnR^n, the standard basis eie_i is defined as the columns of the n×nn\times n identity matrix InI_n: e1=[10],e2=[01],,en=[01]e_1=\left[\begin{matrix}1\\0\\\vdots\end{matrix}\right], e_2=\left[\begin{matrix}0\\1\\\vdots\end{matrix}\right],\ldots,e_n=\left[\begin{matrix}\vdots\\0\\1\end{matrix}\right]

Linear algebra is exactly when RR is a field FF. In fact, FF-modules are exactly FF-vector spaces. We will draw parallels to linear algebra extensively throughout this exploration.

Theorem: Every spanning set of a FF-module VV contains a basis, if FF is a field.
  • Let {v1,v2,,vn}\{v_1,v_2,\ldots,v_n\} be an arbitrary spanning set for VV. If this spanning set is linearly independent, it is a basis and we are done. Otherwise, assume that {v1,v2,,vn}\{v_1,v_2,\ldots,v_n\} is linearly dependent.
  • If {v1,v2,,vn}\{v_1,v_2,\ldots,v_n\} is linearly dependent, then zero can be represented by some FF-linear combination irivi=0\sum_i r_iv_i=0 with a nonzero coefficient. WLOG let r1r_1 be one of the nonzero coefficients.
  • Since we’re working in a field FF, r1r_1 has a multiplicative inverse 1r1\frac{1}{r_1}. Then the corresponding v1v_1 can be written as a FF-linear combination of the other elements: 0=r1v1+r2v2+rnvnv1=r2r1v1rnr1vn\begin{aligned} 0&=r_1v_1+r_2v_2\ldots+r_nv_n\\ v_1&=-\frac{r_2}{r_1}v_1-\ldots-\frac{r_n}{r_1}v_n \end{aligned}
  • Since every v1v_1 can be represented as a FF-linear combination of the others, its contribution to the spanning set is redundant. We can remove v1v_1 from the spanning set because {v1,v2,,vn}\{v_1,v_2,\ldots,v_n\} and {v2,,vn}\{v_2,\ldots,v_n\} generate the same module.
  • If the resulting spanning set {v2,,vn}\{v_2,\ldots,v_n\} is not a basis, we can repeat this process to further decrease the size of the spanning set. This process terminates either at a basis or at the empty set, which is trivially a basis (for the zero module). Thus the original spanning set {v1,v2,,vn}\{v_1,v_2,\ldots,v_n\} contains a basis for VV.

In this section, we start exploring RR-matrices for real.

At the beginning we mentioned that this entire exploration is going to be about studying RR-matrices. One big reason is because RR-matrices are homomorphisms of free RR-modules. For instance: A=[324140111]A=\left[\begin{matrix}3&2&-4\\-1&4&0\\1&1&-1\end{matrix}\right] is a 3×33\times 3 RR-matrix. Left multiplication by AA corresponds to a homomorphism φ:R3R3\varphi:R^3\to R^3. B=[241385881210342]B=\left[\begin{matrix}2&4&1&3&8\\5&8&8&1&2\\1&0&3&4&2\end{matrix}\right] is a 3×53\times 5 RR-matrix. Left multiplication by BB corresponds to a homomorphism φ:R5R3\varphi:R^5\to R^3. Typically we just refer to the RR-matrix itself as the homomorphism, falling back to φ\varphi when we’re talking about homomorphisms in the abstract.

Our first question is, which RR-matrices correspond to isomorphisms?

This translates to the question: when are RR-matrices invertible? This requires the concept of a determinant, which should be familiar from linear algebra. For n×nn\times n (square) RR-matrices, the determinant det(A)\det(A) is the unique function RnRR^n\to R that is:

This unique function is det(A)=σSnsgn(σ)i=1nai,σ(i)\det(A)=\sum_{\sigma\in S_n}\sgn(\sigma)\prod_{i=1}^n a_{i,\sigma(i)}, which:

Although the above shows that the given function is a determinant, the proof that it is the unique function that is the determinant requires some more advanced tools we’ll introduce later. Let’s first determine when an square RR-matrix is invertible.

Therefore: We can write the determinant recursively as det(A)=j=1naijCij\det(A)=\sum_{j=1}^n a_{ij}C_{ij}, where Cij=(1)i+jdet(Mij)C_{ij}=(-1)^{i+j}\det(M_{ij}), for an arbitrary row index ii. This is known as the cofactor expansion of the determinant along the row ii.

Recall the definition of the determinant from above: det(A)=σSnsgn(σ)i=1nai,σ(i)\det(A)=\sum_{\sigma\in S_n}\sgn(\sigma)\prod_{i=1}^n a_{i,\sigma(i)} The determinant det(A)\det(A) can be defined recursively. First, we factor a single ai,σ(i)a_{i,\sigma(i)} out of the product: det(A)=σSnsgn(σ)ai,σ(i)k=1,kchar "338=inak,σ(k)\det(A)=\sum_{\sigma\in S_n}\sgn(\sigma)a_{i,\sigma(i)}\prod_{k=1,k\ne i}^n a_{k,\sigma(k)} Now note that since the sum goes over every permutation σSn\sigma\in S_n, the ii we just took out gets permuted to every value jj from 11 to nn, (n1)!(n-1)! times each (since there are (n1)!(n-1)! permutations in SnS_n that takes ii to any given value). This means aija_{ij} appears (n1)!(n-1)! times in the sum, so we can move ai,σ(i)a_{i,\sigma(i)} out as aija_{ij} iff there are (n1)!(n-1)! permutations for which σ(i)=j\sigma(i)=j. But this is true since there are (n1)!(n-1)! permutations of the remaining n1n-1 elements as well. det(A)=j=1n(aijσSn,σ(i)=jsgn(σ)k=1,kchar "338=inak,σ(k))\det(A)=\sum_{j=1}^n\left(a_{ij}\sum_{\sigma\in S_n,\sigma(i)=j}\sgn(\sigma)\prod_{k=1,k\ne i}^n a_{k,\sigma(k)}\right) Note that the inner sum and product σSn,σ(i)=jsgn(σ)k=1,kchar "338=inak,σ(k)\sum_{\sigma\in S_n,\sigma(i)=j}\sgn(\sigma)\prod_{k=1,k\ne i}^n a_{k,\sigma(k)} is very close to the formula for the determinant of a matrix that has the row ii removed (since ai,σ(i)a_{i,\sigma(i)} is excluded from the product) and the column jj removed (since σ\sigma is defined to send ii to jj, meaning the column jj only appears in the form ai,σ(i)=aija_{i,\sigma(i)}=a_{ij}, but the row ii is excluded so it never appears.)

Define the minor MijM_{ij} as the matrix AA with the row ii and column jj removed. Then we have det(Mij)=σSn,σ(i)=jsgn(σ)k=1,kchar "338=inak,σ(k)\det(M_{ij})=\sum_{\sigma\in S_n,\sigma(i)=j}\sgn(\sigma)\prod_{k=1,k\ne i}^n a_{k,\sigma(k)} This is almost the determinant function as defined above, except we’re relying on nn instead of the actual size of the matrix n1n-1. To correct this, we can take permutations τSn1\tau\in S_{n-1} and use the entries bijb_{ij} of the minor instead: det(Mij)=τSn1sgn(σ)k=1nbk,τ(k)\det(M_{ij})=\sum_{\tau\in S_{n-1}}\sgn(\sigma)\prod_{k=1}^n b_{k,\tau(k)} Note that we’re still mentioning σ\sigma, which we define as the permutation in SnS_n corresponding to τ\tau where σ(i)=j\sigma(i)=j. To convert sgn(σ)\sgn(\sigma) to sgn(τ)\sgn(\tau), let’s move the iith row to the top of the matrix, which is i1i-1 transpositions, multiplying the determinant by (1)i1(-1)^{i-1} due to the alternating property of the determinant. Similarly, we multiply the determinant by (1)j1(-1)^{j-1} to represent moving the jjth column to the left of the matrix. This ensures that all the rows and columns of the minor are contiguous, so the permutations in Sn1S_{n-1} use the correct indices. Overall, we’re multiplying sgn(σ)\sgn(\sigma) by a total of (1)i+j2=(1)i+j(-1)^{i+j-2}=(-1)^{i+j} to get sgn(τ)\sgn(\tau), and thus: (1)i+jdet(Mij)=τSn1sgn(τ)k=1nbk,τ(k)(-1)^{i+j}\det(M_{ij})=\sum_{\tau\in S_{n-1}}\sgn(\tau)\prod_{k=1}^n b_{k,\tau(k)} which matches our original definition of determinant for the minor MijM_{ij}. The value (1)i+jdet(Mij)(-1)^{i+j}\det(M_{ij}) is known as the cofactor CijC_{ij} of the matrix AA.

Finally, we can plug this back into our original expression for det(A)\det(A) to get: det(A)=j=1naijCij\det(A)=\sum_{j=1}^n a_{ij}C_{ij} for some fixed row index ii.

The cofactor expansion can be used to directly define the inverse of an RR-matrix.

Therefore: A1=adj(A)det(A)A^{-1}=\frac{\adj(A)}{\det(A)}, which only exists when det(A)\det(A) is a unit in the ring.

Recall the cofactor expansion: det(A)=j=1naijCij\det(A)=\sum_{j=1}^n a_{ij}C_{ij} Notice how similar it is to matrix multiplication: [AB]ik=j=1naijbjk[A\cdot B]_{ik}=\sum_{j=1}^n a_{ij}b_{jk} In fact, let adj(A)\adj(A) be the adjugate matrix of AA where each entry bijb_{ij} is equal to CjiC_{ji} (note the swapped indices). Then we have exactly [Aadj(A)]ik=j=1naijbjk=j=1naijCkj={det(A)if i=k0if ichar "338=k=[det(A)I]ik[A\cdot\adj(A)]_{ik}=\sum_{j=1}^n a_{ij}b_{jk}=\sum_{j=1}^n a_{ij}C_{kj}=\begin{cases}\det(A)&\text{if }i=k\\0&\text{if }i\ne k\end{cases}=[\det(A)\cdot I]_{ik} The reason that j=1naijCkj\sum_{j=1}^n a_{ij}C_{kj} is zero when ichar "338=ki\ne k is because if you copy the values in row ii to row kk, resulting in a matrix AA', the sum becomes j=1nakjCkj=det(A)\sum_{j=1}^n a_{kj}C_{kj}=\det(A') where CkjC_{kj} is unchanged since it is calculated on a minor with the row kk removed. But since AA' has two identical rows, then by the alternating property of determinants, det(A)=0\det(A')=0. Thus the sum is zero.

So we have proved the following: Aadj(A)=det(A)IA\cdot\adj(A)=\det(A)\cdot I Note that if det(A)\det(A) is a unit in the ring RR, then we have Aadj(A)det(A)=IA\cdot\frac{\adj(A)}{\det(A)}=I implying that adj(A)det(A)\frac{\adj(A)}{\det(A)} is the right inverse of AA. A similar argument shows that adj(A)det(A)\frac{\adj(A)}{\det(A)} is the left inverse of AA as well.

Corollary: An RR-matrix is invertible iff its determinant is a unit in RR.

Since you can only divide by units in a ring, and the inverse matrix is exactly the adjugate divided by the determinant, the determinant must be a unit in order to define the inverse matrix.

This should be familiar from linear algebra, where you have FF-matrices invertible iff its determinant is nonzero. Of course, this is exactly because the only nonunit in a field FF is zero. For an example that isn’t a field, note that the units of Z\ZZ are ±1\pm 1. Thus a Z\ZZ-matrix is invertible iff its determinant is ±1\pm 1.

For us, invertible matrices are significant because, being reversible, their corresponding homomorphism is an isomorphism. More importantly, this means multiplying an RR-matrix AA by an invertible matrix PP is the same as composing an isomorphism τ\tau to the RR-module homomorphism σ\sigma corresponding to AA. Right-multiplying by PP is precomposition by τ\tau, and left-multiplying by PP is postcomposition by τ\tau. As we know, isomorphisms are essentially renamings, so composing a homomorphism by isomorphisms gives you essentially the same homomorphism.

So we have proved the existence and a formula for the inverse of an RR-matrix. To build on this, let’s introduce how they let us manipulate RR-matrices for the rest of this exploration.

We introduce the three row and column operations.

First, we may swap two rows of a matrix by left-multiplying by a suitable elementary matrix. [100001010][111222333]=[111333222]{\color{blue}\left[\begin{matrix}1&0&0\\0&0&1\\0&1&0\end{matrix}\right]} \left[\begin{matrix}1&1&1\\2&2&2\\3&3&3\end{matrix}\right] =\left[\begin{matrix}1&1&1\\3&3&3\\2&2&2\end{matrix}\right]

Second, we may multiply any row by a unit (here, assume 1010 is a unit) by left-multiplying by another kind of elementary matrix. [1000010001][111222333]=[101010222333]{\color{blue}\left[\begin{matrix}10&0&0\\0&1&0\\0&0&1\end{matrix}\right]} \left[\begin{matrix}1&1&1\\2&2&2\\3&3&3\end{matrix}\right] =\left[\begin{matrix}10&10&10\\2&2&2\\3&3&3\end{matrix}\right]

Third, we may add a multiple of any row to another row, again by left-multiplying by a third kind of elementary matrix. [1000101001][111222333]=[111222131313]{\color{blue}\left[\begin{matrix}1&0&0\\0&1&0\\10&0&1\end{matrix}\right]} \left[\begin{matrix}1&1&1\\2&2&2\\3&3&3\end{matrix}\right] =\left[\begin{matrix}1&1&1\\2&2&2\\13&13&13\end{matrix}\right]

Column operations use the same (transposed) elementary matrices, except you right-multiply instead of left-multiply.

Obviously these operations can be undone by doing the operation again – you can un-swap by swapping again, un-scale by scaling by the inverse unit, and un-add the row cricr_i by adding cri-cr_i. This implies that the elementary matrices are invertible. In fact, the inverse of an elementary matrix represents the inverse operation.

When you apply row and column operations to a matrix, you’re essentially doing a clever renaming of the domain (column operations) and codomain (row operations). The resulting matrix represents the same homomorphism but on the renamed domain/codomain.

How do the row and column operations affect the determinant? We can give a short proof of each:

Theorem: Swapping two rows flips the sign of the determinant.

This is exactly the alternating property of determinants.

Theorem: Scaling a row by a unit will scale the determinant by the same unit.

This follows immediately from the multilinear property of determinants.

Theorem: Adding a multiple of a row to another row does not change the determinant.

If row rir_i becomes row ri+crjr_i+cr_j, then by multilinearity, det(matrix where ri=ri+crj)=det(original matrix where ri=ri)+cdet(matrix where ri=rj)\begin{aligned} &\det(\text{matrix where }r_i=r_i+cr_j)\\ &=\det(\text{original matrix where }r_i=r_i)\\ &+c\cdot\det(\text{matrix where }r_i=r_j) \end{aligned} But the last matrix has two identical rows rir_i and rjr_j, so by the alternating property its determinant is zero. Therefore the resulting matrix has determinant equal to that of the original matrix.

Corollary: Row and column operations multiply the determinant by a unit.

Follows immediately from the above three theorems, which show that the three row and column operations multiply the determinant by 1,c,1-1,c,1 respectively (cc a unit).

Theorem: The determinant of a product of matrices det(AB)\det(AB) is the product of their individual determinants det(A)det(B)\det(A)\det(B).

Using the formula for determinant for ABAB, we have det(AB)=σSnsgn(σ)i=1n(AB)i,σ(i)=σSnsgn(σ)i=1nk=1naikbk,σ(i)\begin{aligned} \det(AB)&=\sum_{\sigma\in S_n}\sgn(\sigma)\prod_{i=1}^n(AB)_{i,\sigma(i)}\\ &=\sum_{\sigma\in S_n}\sgn(\sigma)\prod_{i=1}^n\sum_{k=1}^n a_{ik}b_{k,\sigma(i)} \end{aligned} Summing over the index kk from 11 to nn can be done in any order since addition is commutative. Thus we can add them in an arbitrary order. Since we’re multiplying together a sum ik\prod_i\sum_k, by the distributive property the result is essentially adding together all the ways of picking one term from each sum and multiplying them together, just like how (a+b)(c+d)=ac+ad+bc+bd(a+b)(c+d)=ac+ad+bc+bd. This is the same as adding together all the ways of picking one kk for each ii, so we can express this as the sum of using all permutations taking ii to kk: det(AB)=σSnsgn(σ)τSni=1nai,τ(i)bτ(i),σ(i)=τSnσSnsgn(σ)(i=1nai,τ(i))(i=1nbτ(i),σ(i))=τSn(i=1nai,τ(i))(σSnsgn(σ)i=1nbτ(i),σ(i))\begin{aligned} \det(AB)&=\sum_{\sigma\in S_n}\sgn(\sigma)\sum_{\tau\in S_n}\prod_{i=1}^n a_{i,\tau(i)}b_{\tau(i),\sigma(i)}\\ &=\sum_{\tau\in S_n}\sum_{\sigma\in S_n}\sgn(\sigma)\left(\prod_{i=1}^n a_{i,\tau(i)}\right)\left(\prod_{i=1}^n b_{\tau(i),\sigma(i)}\right)\\ &=\sum_{\tau\in S_n}\left(\prod_{i=1}^n a_{i,\tau(i)}\right)\left(\sum_{\sigma\in S_n}\sgn(\sigma)\prod_{i=1}^n b_{\tau(i),\sigma(i)}\right) \end{aligned} We can reorder the last product by applying τ1\tau^{-1} to the indices ii: det(AB)=τSn(i=1nai,τ(i))(σSnsgn(σ)i=1nbi,σ(τ1(i)))\begin{aligned} \det(AB)&=\sum_{\tau\in S_n}\left(\prod_{i=1}^n a_{i,\tau(i)}\right)\left(\sum_{\sigma\in S_n}\sgn(\sigma)\prod_{i=1}^n b_{i,\sigma(\tau^{-1}(i))}\right) \end{aligned} Such a reordering is equivalent to multiplying by the signature of τ\tau: det(AB)=τSn(i=1nai,τ(i))(σSnsgn(σ)sgn(τ)i=1nbi,σ(i))=(τSnsgn(τ)i=1nai,τ(i))(σSnsgn(σ)i=1nbi,σ(i))=det(A)det(B)\begin{aligned} \det(AB)&=\sum_{\tau\in S_n}\left(\prod_{i=1}^n a_{i,\tau(i)}\right)\left(\sum_{\sigma\in S_n}\sgn(\sigma)\sgn(\tau)\prod_{i=1}^n b_{i,\sigma(i)}\right)\\ &=\left(\sum_{\tau\in S_n}\sgn(\tau)\prod_{i=1}^n a_{i,\tau(i)}\right)\left(\sum_{\sigma\in S_n}\sgn(\sigma)\prod_{i=1}^n b_{i,\sigma(i)}\right)\\ &=\det(A)\det(B) \end{aligned}

In this section, we show how row and column operations can be used to simplify a matrix.

Therefore: If RR is a Bézout domain, then any RR-matrix AA can be reduced to Smith normal form, a diagonal matrix where each nonzero entry divides the next.

Using row and column operations, you can always reduce an RR-matrix down into the block form [D000]\left[\begin{matrix}D&0\\0&0\end{matrix}\right], where DD is a diagonal RR-matrix, i.e. all its nonzero entries are on the main diagonal.

There is a way to get DD into the form such that the diagonal elements d1,d2,d_1,d_2,\ldots divide each order: d1d2d3d_1\mid d_2\mid d_3\mid\ldots. This form is called Smith normal form. Any RR-matrix can be reduced to Smith normal form, provided RR is a Bézout domain, which is an integral domain in which every sum of principal ideals is principal. For instance, in PIDs every ideal is principal, so PIDs are Bézout domains.

Theorem: Bézout domains define a GCD d=gcd(a,b,)d=\gcd(a,b,\ldots) and have Bézout’s identity: for every {a,b,}\{a,b,\ldots\}, there exist {x,y,}\{x,y,\ldots\} where ax+yb+=gcd(a,b,)ax+yb+\ldots=\gcd(a,b,\ldots).
  • The fact that every sum of principal ideals is principal (a)+(b)+=(d)(a)+(b)+\ldots=(d) implies two things:
    • First, since (d)(d) contains each of {(a),(b),}\{(a),(b),\ldots\}, we know that dd is a common divisor of {a,b,}\{a,b,\ldots\}.
    • Second, (a)+(b)+=(d)(a)+(b)+\ldots=(d) represents all the linear combinations of {a,b,}\{a,b,\ldots\}. Since a common divisor of {a,b,}\{a,b,\ldots\} must divide all linear combinations of those elements =(d)=(d), every common divisor must divide dd in particular.
  • From this, we can conclude two things:
    • Since every common divisor divides dd, it is in fact the greatest common divisor gcd(a,b,)\gcd(a,b,\ldots).
    • gcd(a,b,)\gcd(a,b,\ldots) is some linear combination of {a,b,}\{a,b,\ldots\}, i.e. we have Bézout’s identity ax+yb+=gcd(a,b,)ax+yb+\ldots=\gcd(a,b,\ldots) for some {x,y,}\{x,y,\ldots\}.

Method: To put a matrix into Smith normal form, choose d1d_1 to be the GCD of all its entries. Then use row/column operations to isolate d1d_1 in the upper left such that its row and column are zero except for d1d_1, getting a block matrix [d100[B]]\left[\begin{matrix} d_1&\begin{matrix}0&\ldots\end{matrix}\\ \begin{matrix}0\\\vdots\end{matrix}&\left[\begin{matrix}&&\\&B&\\&&\end{matrix}\right] \end{matrix}\right] where BB is the minor M11M_{11} where every entry is divisible by d1d_1 because of how we picked d1d_1. Because of Bézout’s identity, it’s always possible to isolate the GCD using row and column operations this way. Repeating the process for the minor BB gives you Smith normal form. Here’s a short example:

 [2112]= [1132][1011]1 add col 2 to col 1= [1035]([1101][1011])1 add col 1 to col 2= [1035][2111]1= [1031]1[1005][2111]1 subtract 3(row 1) from row 2\begin{aligned} &~\left[\begin{matrix}2&-1\\1&2\end{matrix}\right]\\ =&~\left[\begin{matrix}1&-1\\3&2\end{matrix}\right]{\color{red}\left[\begin{matrix}1&0\\1&1\end{matrix}\right]^{-1}}&\text{ add col }2\text{ to col }1\\ =&~\left[\begin{matrix}1&0\\3&5\end{matrix}\right]{\color{red}\left(\left[\begin{matrix}1&1\\0&1\end{matrix}\right]\left[\begin{matrix}1&0\\1&1\end{matrix}\right]\right)^{-1}}&\text{ add col }1\text{ to col }2\\ =&~\left[\begin{matrix}1&0\\3&5\end{matrix}\right]{\color{red}\left[\begin{matrix}2&1\\1&1\end{matrix}\right]^{-1}}\\ =&~{\color{blue}\left[\begin{matrix}1&0\\-3&1\end{matrix}\right]^{-1}}\left[\begin{matrix}1&0\\0&5\end{matrix}\right]{\color{red}\left[\begin{matrix}2&1\\1&1\end{matrix}\right]^{-1}}&\text{ subtract }3(\text{row }1)\text{ from row }2\\ \end{aligned}

Taking the Smith normal form of a matrix AA can be written A=UAVA={\color{blue}U}A'{\color{red}V}, where U,V{\color{blue}U},{\color{red}V} are the elementary matrices used to reduce AA to the diagonal matrix AA'. The entries did_i along the diagonal of AA' are called the invariant factors of AA, because you will always arrive at these factors in the Smith normal form regardless of how you do the row and column operations.

Theorem: The invariant factors of an RR-matrix AA are unique up to units.

At each step, the invariant factor did_i is chosen as the GCD of the remaining elements, and since GCDs are unique up to units, the invariant factors are unique up to units.

Theorem: The determinant of a diagonal RR-matrix is equal to the product of its entries.

In the definition of determinant, det(A)=σSnsgn(σ)i=1nai,σ(i)\det(A)=\sum_{\sigma\in S_n}\sgn(\sigma)\prod_{i=1}^n a_{i,\sigma(i)} you can see that unless σ(i)=i\sigma(i)=i for each ii, the product will include a zero term that makes the whole product zero. This is because aija_{ij} for ichar "338=ji\ne j is zero in a diagonal matrix. Thus the only σ\sigma that results in a nonzero term is the identity permutation, and so for diagonal matrices the formula simplifies down to det(A)=i=1naii\det(A)=\prod_{i=1}^n a_{ii}

Corollary: The determinant of an RR-matrix is equal to the product of its invariant factors, up to units.

This follows immediately from the fact that the determinant of a matrix equals (up to units) the determinant of its Smith normal form, which is equal to the product of the diagonal entries, which are the invariant factors.

Theorem: The invariant factors of a matrix AA are all ones iff AA is invertible.

If AA is invertible, then its determinant is a unit, which can only be a product of units. But since the determinant is the product of invariant factors up to units, that’s the same as saying every invariant factor is a unit, and therefore associate to 11.

Corollary: The Smith normal form of an invertible matrix is the identity matrix.

Corollary: Every invertible matrix over a Bézout domain RR factors into elementary matrices.

For matrices over a Bézout domain, the Smith normal form is obtained via row and column operations, a process that only factors out corresponding elementary matrices. By the previous theorem, the resulting Smith normal form of an invertible matrix is the identity matrix, which is also an elementary matrix. Thus the original matrix factors into elementary matrices.

In this section, we abstractly represent relationships between RR-modules via RR-matrices.

We already know that a n×mn\times m RR-matrix MM defines a homomorphism RmRnR^m\to R^n. Let’s bring back our example matrix AA and explore some properties of this homomorphism: A=[324140111]A=\left[\begin{matrix}3&2&-4\\-1&4&0\\1&1&-1\end{matrix}\right] First, the columns of an RR-matrix MM generate its image (also known as column space), which is written im M=MRm=span(columns of M)\im M=MR^m=\span(\text{columns of }M) For instance with AA above, im A=AR3=span([311],[241],[401])\im A=AR^3=\span\left( \left[\begin{matrix}3\\-1\\1\end{matrix}\right], \left[\begin{matrix}2\\4\\1\end{matrix}\right], \left[\begin{matrix}-4\\0\\-1\end{matrix}\right]\right) Second, the solutions to AX=0AX=0 comprise the kernel (also known as null space) of AA. The kernel of a diagonal matrix DD is relatively straightforward. The equation [d1d2][x1x2]=0\left[\begin{matrix}d_1&&\\&d_2&\\&&\ddots\end{matrix}\right] \left[\begin{matrix}x_1\\x_2\\\vdots\end{matrix}\right]=0 is equivalent to the system of equations d1x1=0d2x2=0\begin{aligned} d_1x_1&=0\\ d_2x_2&=0\\ &\ldots \end{aligned} In an integral domain, xix_i is zero when dichar "338=0d_i\ne 0, and xix_i can take on any value when di=0d_i=0. This means that the kernel of a diagonal matrix is span(ei,ej,)\span(e_i,e_j,\ldots) where {i,j,}\{i,j,\ldots\} are the indices ii where di=0d_i=0. In particular, if there are no zeroes along the main diagonal of a diagonal matrix, then its kernel is trivial.

Theorem: An RR-matrix defines an injective RR-module homomorphism iff its kernel is trivial.
  • (\to) If A:RmRnA:R^m\to R^n is injective, then AX=AYAX=AY implies X=YX=Y. In particular, elements XX of the kernel, where AX=0AX=0, must be zero because AX=0    AX=A0    X=0AX=0\implies AX=A0\implies X=0.
  • (\from) If A:RmRnA:R^m\to R^n has a trivial kernel, then consider AX=AYAX=AY, which implies AXAY=0AX-AY=0 and A(XY)=0A(X-Y)=0. Since the kernel is trivial, we have XY=0X-Y=0 and thus X=YX=Y, showing that AA is injective.

Theorem: An RR-matrix defines an surjective RR-module homomorphism iff its image is the codomain.

By definition. Since surjective means the whole codomain is mapped to, it’s the same as saying that the image is equal to the codomain.

Corollary: Since the columns of A:RmRnA:R^m\to R^n generate im A\im A by definition, if AA is surjective, then its columns generate RnR^n.

Theorem: The kernel of an RR-matrix AA over a Bézout domain RR is the same as the kernel of its Smith normal form QAP1QA'P^{-1}.

For example, let’s take the Smith normal form of AA: A=QAP1=[322112211][100020003][102314111]A=QA'P^{-1}= \left[\begin{matrix}-3&-2&2\\-1&-1&2\\-2&-1&1\end{matrix}\right] \left[\begin{matrix}1&0&0\\0&2&0\\0&0&3\end{matrix}\right] \left[\begin{matrix}1&0&-2\\-3&1&4\\-1&1&1\end{matrix}\right] and then solve QAP1X=0QA'P^{-1}X=0 with these steps: QAP1X= 0[322112211][100020003][102314111]X= 0[100020003]([102314111]X)= 0 left-multiply by Q1[102314111]X=[000] solve for P1XX=[102314111]1[000] left-multiply by PX=[000]\begin{aligned} QA'P^{-1}X=&~0\\ \left[\begin{matrix}-3&-2&2\\-1&-1&2\\-2&-1&1\end{matrix}\right] \left[\begin{matrix}1&0&0\\0&2&0\\0&0&3\end{matrix}\right] \left[\begin{matrix}1&0&-2\\-3&1&4\\-1&1&1\end{matrix}\right]X=&~0\\ \left[\begin{matrix}1&0&0\\0&2&0\\0&0&3\end{matrix}\right] \left(\left[\begin{matrix}1&0&-2\\-3&1&4\\-1&1&1\end{matrix}\right]X\right)=&~0&\text{ left-multiply by }Q^{-1}\\ \left[\begin{matrix}1&0&-2\\-3&1&4\\-1&1&1\end{matrix}\right]X=&\left[\begin{matrix}0\\0\\0\end{matrix}\right]&\text{ solve for }P^{-1}X\\ X=\left[\begin{matrix}1&0&-2\\-3&1&4\\-1&1&1\end{matrix}\right]^{-1}&\left[\begin{matrix}0\\0\\0\end{matrix}\right]&\text{ left-multiply by }P\\ X=&\left[\begin{matrix}0\\0\\0\end{matrix}\right]\\ \end{aligned} In this case, AA has a trivial kernel {0}\{0\}. You can see that this is precisely because its Smith normal form has no zero entries along the diagonal. We’ll be talking about the kernel in the abstract, defined as the solutions to AX=0AX=0. We can see that the kernel is easily calculable when RR is a Bézout domain, but it may not be as easy for other rings.

Theorem: The kernel and image of a homomorphism σ:MN\sigma:M\to N is a submodule of the domain MM and codomain NN respectively.
  • Contains 00: σ\sigma must send 00 to 00.
  • Closed under addition: because σ(m)+σ(n)=σ(m+n)\sigma(m)+\sigma(n)=\sigma(m+n), if both mm and nn are in the kernel/image so is m+nm+n.
  • Closed under scalar multiplication: because rσ(m)=σ(rm)r\sigma(m)=\sigma(rm), if mm is in the kernel/image so is rmrm.

Like with groups, an RR-module is simple if has no proper non-trivial submodules.

Theorem: Any homomorphism σ:MN\sigma:M\to N between simple modules MM and NN is either trivial or an isomorphism.

Since kernel and image must both be submodules of MM and NN respectively, and simple modules have only two possible choices for submodules (trivial submodule and the module itself), the only possible homomorphisms are when the kernel is trivial (implying that the image is NN, therefore an isomorphism) and when the kernel is MM (implying that the image is trivial, therefore the trivial homomorphism).

In this section, we describe properties of RR-modules with only diagrams.

Here we’re going to represent RR-matrices abstractly as homomorphisms σ\sigma on RR-modules.

Consider two homomorphisms σ:AB,τ:BC\sigma:A\to B,\tau:B\to C where the image of σ\sigma is the kernel of τ\tau. In other words, everything σ\sigma maps to will get mapped to 00 by τ\tau. This property is called exactness, and we say this sequence is exact at BB, and any sequence of homomorphisms AσBτCυA\xrightarrow{\sigma}B\xrightarrow{\tau}C\xrightarrow{\upsilon}\ldots with the property that “the image of one homomorphism is the kernel of the next” is called an exact sequence.

In fact, we have discussed the concept of exact sequences before for groups. Exact sequences for groups and exact sequences for RR-modules are precisely the same concept, but since RR-modules have a richer structure, we may use exact sequences in ways that we cannot do for groups.

To recap, we showed the following for exact sequences for groups. These are true for RR-modules as well:

For an example of how exact sequences are used, we can look at projections. A projection is a idempotent endomorphism, i.e. a homomorphism π:MM\pi:M\to M such that π2=π\pi^2=\pi.

Theorem: A module MM can be decomposed into the direct sum M=ker πim πM=\ker\pi\oplus\im\pi iff a projection π:MM\pi:M\to M exists.
  • (\from) Any direct sum ABA\oplus B has a canonical left and right projection, so the forward direction is trivial.
  • (\to) For any homomorphism like π\pi, we can always write the short exact sequence {0}ker πιMπim π{0}\{0\}\to\ker\pi\xrightarrow{\iota}M\xrightarrow{\pi}\im\pi\to\{0\} because the kernel of π\pi is exactly the image of the injective inclusion map ι:ker πM\iota:\ker\pi\to M, and π\pi is surjective onto its image im π\im\pi.
  • Since π\pi is idempotent, it acts like the identity on its image im π\im\pi. Since that implies π(m)=m\pi(m)=m for mim πm\in\im\pi, we have π(π(m))=m\pi(\pi(m))=m, meaning π\pi itself is its own inverse for elements in its image mim πm\in\im\pi.
  • By the splitting lemma, since π:Mim π\pi:M\to\im\pi has an inverse π:im πM\pi':\im\pi\to M, the sequence splits and therefore M=ker πim πM=\ker\pi\oplus\im\pi.

In this section, we utilize exact sequences in the context of RR-modules specifically.

Take this exact sequence, for instance: RmARnσM{0}R^m\xrightarrow{A}R^n\xrightarrow{\sigma}M\to\{0\} This exact sequence is known as a presentation of the RR-module MM. The idea is that MM is entirely described by the presentation, and the presentation is entirely described by the RR-matrix AA, called the presentation matrix. In other words, it only takes a single RR-matrix, AA, to define all you need to know about an RR-module MM. Let’s see why this is.

Theorem: In the above exact sequence, MM is isomorphic to the elements XX of RmR^m where AX=0AX=0.
  • By the first isomorphism theorem on σ\sigma, we have Rn/ker σim σR^n/\ker\sigma\iso\im\sigma. However:
    • The exact sequence RnM{0}R^n\to M\to\{0\} implies that the homomorphism σ:RnM\sigma:R^n\to M is surjective, and thus MM is isomorphic to the image of σ\sigma. Thus im σM\im\sigma\iso M.
    • By exactness, im A=ker σ\im A=\ker\sigma.
  • Then we have Rn/im AMR^n/\im A\iso M. This is the same as saying MM is RnR^n, except we send all elements that are linear combinations of the columns of AA to 00 (AX=0AX=0).

In the context of presentations, the basis of RnR^n (which is the codomain of AA) are known as generators of MM, and the columns of AA describes relations AX=0AX=0 on the generated elements XX. So the RR-matrix AA, by virtue of defining both the generators of MM, and the relations on those generators of MM, completely determines the RR-module MM. Because every RR-matrix AA defines a RR-module this way, we say that MRn/im AM\iso R^n/\im A is the cokernel of AA. It is so named because there is a duality between the kernel and the cokernel. Where the kernel is comprised of the elements in the domain sent to zero by AA, and a trivial kernel implies injectivity of AA, the cokernel is comprised of the structure in the codomain not sent to zero by AA, and a trivial cokernel implies surjectivity of AA.

Theorem: An RR-matrix defines an surjective homomorphism iff its cokernel is trivial.

Surjectivity of A:RnRnA:R^n\to R^n means im A=Rn\im A=R^n. But then the cokernel Rn/im AR^n/\im A is trivial if and only if im A=Rn\im A=R^n.

Hopefully this makes clear the reason we only study RR-matrices in this exploration — because every RR-module can be defined as the cokernel of a suitable RR-matrix, studying RR-matrices completely subsumes the need to study RR-modules directly!

In this section, we describe how to derive the RR-matrix that presents a given RR-module.

Let’s do the reverse. How do you construct the AA-matrix (the presentation matrix) that presents a given (finitely generated) RR-module? Here is a step-by-step:

(Note that a non-finitely generated RR-module will ask for infinitely many viv_i, so describing that class of RR-modules would require a different process.)

Example:

Let MM be a Z\ZZ-module generated by (v1,v2,v3)(v₁,v₂,v₃) under the relations 3v1+2v2+v3=08v1+4v2+2v3=07v1+6v2+2v3=09v1+6v2+v3=0\begin{aligned} 3v_1+2v_2+v_3&=0\\ 8v_1+4v_2+2v_3&=0\\ 7v_1+6v_2+2v_3&=0\\ 9v_1+6v_2+v_3&=0 \end{aligned} Express each relation as a column of the presentation matrix AA: A=[387924661221]A=\left[\begin{matrix}3&8&7&9\\2&4&6&6\\1&2&2&1\end{matrix}\right] so that the above system of relations can be summarized as [v1v2v3]A=[0000]\left[\begin{matrix}v_1&v_2&v_3\end{matrix}\right]A=\left[\begin{matrix}0&0&0&0\end{matrix}\right] And theoretically we are done, since AA presents MM.

However, we can reduce this matrix down a bit. The following simplifying operations also do not change the isomorphism class of the module presented by AA:

Reducing our matrix above:

 [387924661221]= [021600241221]r13r3r1r22r3r2= [216024]remove c1,r3= [216408]r22r1r2= [48]remove c2,r1= [40]2c1+c2c2= [4]remove c1= [4]1c1c1\begin{aligned} &~\left[\begin{matrix}3&8&7&9\\2&4&6&6\\1&2&2&1\end{matrix}\right]\\ =&~\left[\begin{matrix}0&2&1&6\\0&0&2&4\\1&2&2&1\end{matrix}\right]&\begin{matrix}r_1-3r_3\to r_1\\r_2-2r_3\to r_2\end{matrix}\\ =&~\left[\begin{matrix}2&1&6\\0&2&4\end{matrix}\right]&\text{remove }c_1,r_3\\ =&~\left[\begin{matrix}2&1&6\\-4&0&-8\end{matrix}\right]&r_2-2r_1\to r_2\\ =&~\left[\begin{matrix}-4&-8\end{matrix}\right]&\text{remove }c_2,r_1\\ =&~\left[\begin{matrix}-4&0\end{matrix}\right]&-2c_1+c_2\to c_2\\ =&~\left[\begin{matrix}-4\end{matrix}\right]&\text{remove }c_1\\ =&~\left[\begin{matrix}4\end{matrix}\right]&-1c_1\to c_1\\ \end{aligned}

Since the image of [4]:ZZ\left[\begin{matrix}4\end{matrix}\right]:\ZZ\to\ZZ is 4Z4\ZZ, the big 3×43\times 4 matrix above actually just presents the module Z/4Z\ZZ/4\ZZ! Thus M=Z/4ZM=\ZZ/4\ZZ.

We’ve seen how it’s done for a specific finitely generated RR-module, and this approach generalizes for all finitely generated RR-modules. Thus we can say that every finitely generated RR-module MM can be presented by an RR-matrix. But how do you tell when MM is finitely generated in the first place?

In this section, we learn the conditions under which an RR-module is finitely generated.

To see whether an RR-module MM is finitely generated (and thus presentable), let’s try building one.

Start with the zero submodule W0={0}W_0=\{0\} of MM. At every step, add an element viMv_i\in M outside of Wi1W_{i-1} to get WiW_i, generated by viv_i and the old Wi1W_{i-1}.

The fact of the matter is, if MM is finitely generated, then eventually there are no more elements vv outside of MM to add, so the process stops. Otherwise, the process keeps going. One way to characterize this is known as the ascending chain condition (ACC) on submodules: “There is no infinite strictly increasing chain W1<W2<W_1<W_2<\ldots of submodules of MM.” Satisfying the ACC is the same as saying that the process stops.

Theorem: Every submodule WMW\le M is finitely generated iff it satisfies the ACC.
  • (\to) Towards contradiction, say you have an infinitely increasing chain W1<W2<W_1<W_2<\ldots of submodules of MM. Let UU be the infinite union of these submodules, so UU is a submodule of MM too, and in particular UU is a finitely generated superset of every submodule. This means that UU appears at some point in the infinite chain, but it is also a superset of every submodule of the infinite chain, so some point <U<<U\ldots<U<\ldots<U. This implies Wi=Wi+1W_i=W_{i+1} at some point. So every infinite chain of WiW_i is not strictly increasing, contradiction.
  • (\from) Let wiw_i be a special set of generators of W, inductively constructed by having W0={0}W_0=\{0\} be generated by w0={}w_0=\{\}, and having WWW'\le W be generated by the existing wiw_i together with an element v in WW but not WW'. Since we’re constructing a strictly increasing chain of submodules WiW_i, and there is no infinite strictly increasing chain of submodules by the ACC, this process ends with a finite number of generators in wiw_i.

Recall that integral domains that satisfy the ascending chain condition on principal ideals (ACCP) are exactly the factorization domains. We can say something similar here. A noetherian RR-module is one where every submodule is finitely generated, i.e. satisfies the ACC (by the above proof). Every Noetherian RR-module can be described by a presentation matrix.

In this section, we explore the properties of Noetherian rings.

More generally, a noetherian ring is one where every ideal is finitely generated, i.e satisfies the ACC. Note that it’s possible for a ring to be finitely generated but have ideals that are not finitely generated, so proving noetherianity requires proving facts about ideals, not the whole ring. We’re going to prove a number of theorems that let us work with such rings:

Theorem: Surjective homomorphisms preserve noetherianity. (If the domain is Noetherian, so is the codomain.)

Because surjective homomorphisms preserve all subset relations, and thus preserves the ACC.

Theorem: Quotients preserve noetherianity. (If RR is Noetherian, so are its quotients R/IR/I.)

Because the canonical map π:RR/I\pi:R\to R/I is surjective, and surjective homomorphisms preserve noetherianity.

Theorem: Finite direct sums preserve noetherianity. (If R,SR,S are Noetherian, so is the direct sum RSR\oplus S.)
  • Ideals in RSR\oplus S are in the form IRISI_R\oplus I_S, where IRI_R is an ideal in RR and ISI_S is an ideal in SS. This is precisely because there is no way for elements of RR to influence elements of SS and vice versa.
  • Because R,SR,S are Noetherian, IRI_R and ISI_S are finitely generated, from which you can construct a finite set of generators for IRISI_R\oplus I_S.

Corollary: Free modules over Noetherian rings are Noetherian. (If RR is a Noetherian ring, then RnR^n is a Noetherian RR-module.)

You can take the direct sum of nn copies of RR to get RnR^n. Since finite direct sum preserves noetherianity, when RR is Noetherian, RnR^n is Noetherian.

Theorem: Every ideal of a Noetherian ring RR is contained in a maximal ideal, except RR itself.

All Noetherian rings satisfy ACC, which says every ideal is contained in a chain up to some point, i.e. the maximal ideal.

Hilbert Basis Theorem: If RR is Noetherian, so is R[x]R[x].
  • The goal is to prove that an arbitrary ideal II for R[x]R[x] is finitely generated, given the ACC for RR.
  • Lemma: The leading coefficients of degree nn polynomials in any ideal II of R[x]R[x] form an ideal CnC_n of RR.

    If ciCnc_i\in C_n is the leading coefficient of fiIf_i\in I, then ci+cjc_i+c_j is the leading coefficient of fi+fjf_i+f_j, and rcirc_i (for arbitary rRr\in R) is the leading coefficient of rfirf_i. fi+fjf_i+f_j and rfirf_i are both in II since it’s an ideal, and therefore ci+cjc_i+c_j and rcirc_i are both in CnC_n, making it an ideal.

  • We have CnCn+1C_n\subseteq C_{n+1} because for every ciCnc_i\in C_n and corresponding degree nn polynomial fif_i, xfixf_i exists (due to II being an ideal) as a degree n+1n+1 polynomial with the same leading coefficient ciCn+1c_i\in C_{n+1}.
  • Thus the CnC_n form a chain of ideals in RR, which stops strictly increasing at some point due to RR being Noetherian and therefore satisfying the ACC. Let CNC_N be the last strictly increasing ideal in this chain (the one where Cn=CNC_n=C_N for all nNn\ge N).
  • Take JJ as all the polynomials fif_i corresponding to the generators cic_i of CNC_N. Each fif_i has degree at most NN. This is a finite number of polynomials since each CnC_n is finitely generated, due to RR being Noetherian. Clearly JIJ\subseteq I, as the fif_i are taken from II.
  • We can prove any polynomial fIf\in I can be expressed in terms of these fif_i, i.e. IJI\subseteq J.
    • First, if deg f>N\deg f>N then pick some polynomial gJg\in J with the same leading coefficient as ff, which exists since Cdeg f=CNC_{\deg f}=C_N. Then f=fxdeg fdeg ggf'=f-x^{\deg f-\deg g}g is a lower-degree polynomial in II due to II being an ideal. Repeat this process until you obtain a polynomial whose degree is N\le N, reducing it to the second case:
    • Then if deg fN\deg f\le N, then the leading coefficient of ff is in CNC_N and therefore you can generate ff using the elements in JJ. To see this, notice that you can match the leading coefficient by adding polynomials corresponding to the leading coefficients in C_N, and you can do the same process for every lower coefficient.
  • Since I=JI=J, and JJ is finitely generated, any arbitrary ideal II of R[x]R[x] is finitely generated, thus R[x]R[x] is Noetherian.

Corollary: If RR is a Noetherian ring, so is anything in the form R[x]/(f)R[x]/(f) because quotients preserve noetherianity.

Theorem: The Noetherian Bézout domains are exactly the PIDs.
  • (\to) Every ideal is finitely generated in a Noetherian ring, and every finitely generated ideal is principal in a Bézout domain. Thus every ideal is principal in a Noetherian Bézout domain.
  • (\from) In a PID, every ideal is principal making it trivially Bézout, and every ideal is generated by one element making it trivially Noetherian.

Structure Theorem for Modules over a PID: if MM is a finitely generated RR-module where RR is a PID, then it can be uniquely decomposed as a direct sum of quotients of RR. To be specific, MR/(f1)R/(fn)M\iso R/(f_1)\oplus\ldots\oplus R/(f_n) where fif_i are elements of RR that divide each other: f1fnf_1\mid\ldots\mid f_n.
  • We can always find the RR-presentation matrix AA for any finitely generated Noetherian RR-module MM (so that MRn/im AM\iso R^n/\im A). In this case, MM is given as finitely generated over a PID, and is Noetherian because PIDs are Noetherian.
  • The Smith normal form is defined for any matrix defined over a Bézout domain. PIDs are also Bézout domains, so taking the Smith normal form of AA gives A=PDQ1A=PDQ^{-1}, obtaining a diagonal matrix DD of invariant factors fif_i.
  • In particular, the resulting diagonal matrix D=[f1f2fn]D=\left[\begin{matrix}f_1&&&\\&f_2&&\\&&\ddots&\\&&&f_n\end{matrix}\right] has the property that f1fnf_1\mid\ldots\mid f_n, which is a property of the Smith normal form of any matrix.
  • Then, since PP and QQ are invertible (i.e. isomorphisms), we have im Aim PDQ1im D\im A\iso\im PDQ^{-1}\iso\im D
  • The image of a matrix is the span of its columns. For a diagonal matrix DD, each column corresponds to a scalar multiplication of a single dimension of MM, and so the matrix itself can be expressed as a direct sum ifiR\bigoplus_i f_iR, with one submodule fiRf_iR for each dimension.
  • Then we have MRn/im ARn/(f1Rf2RfnR)M\iso R^n/\im A\iso R^n/(f_1R\oplus f_2R\oplus\ldots\oplus f_nR) which by the Third Isomorphism Theorem is isomorphic to MR/(f1)R/(f2)R/(fn)M\iso R/(f_1)\oplus R/(f_2)\oplus\ldots\oplus R/(f_n)
  • For the proof of uniqueness, any other decomposition into some R/(g1)R/(g2)R/(gn)R/(g_1)\oplus R/(g_2)\oplus\ldots\oplus R/(g_n) would imply that the gig_i are invariant factors of AA appearing on the diagonal of its Smith normal form, which must be the same invariant factors as fif_i since the Smith normal form is unique.

This theorem is a generalization of the structure theorem for finitely generated abelian groups, which is the specific case where R=ZR=\ZZ, using the fact that the Z\ZZ-modules are isomorphic to the abelian groups.

In this section, we examine the components of finitely generated RR-modules.

In the above theorem, we found that any finitely generated RR-module (over a PID RR) can be expressed as a direct sum of components in the form R/(f)R/(f). RR-modules in the form R/(f)R/(f) are called cyclic RR-modules.

In general, a cyclic RR-module MM is one in which every element of the module can be expressed as scalar multiples of a single element mMm\in M. Because of this, cyclic RR-modules can be written as RmRm. If mRm\in R, then RmRm is isomorphic to the principal ideal (m)(m).

Theorem: R/(f)R/(f) is a cyclic RR-module.

Since every element of R/(f)R/(f) is a coset in the form [f][f], the multiplicative identity is the coset [1][1]. But the multiplicative identity generates all elements of R/(f)R/(f), thus R/(f)R/(f) is a cyclic RR-module.

So finitely generated RR-modules are composed of a direct sum of cyclic RR-submodules R/(f)R/(f).

Recall that quotienting a ring/module essentially sends all quotiented elements to zero. In other words, R/(f)R/(f) means “RR, where f=0f=0”. Any relation can be expressed this way – if you want to apply the relation a=ba=b, then you quotient by aba-b to send it to zero.

Cyclic RR-modules RmRm make this even simpler. Since every possible element can be expressed in the form amam for some element aa, every relation on RmRm can be expressed in the form am=0am=0 instead of ab=0a-b=0.

In fact, the elements aa that make am=0am=0 true are given a special name: the annihilator of mm.

Theorem: Given a cyclic RR-module RmRm, the elements aRa\in R such that am=0am=0 form an ideal of RR.
  • It is enough to prove that these elements aa are closed under subtraction and by multiplication with elements rRr\in R.
  • If am=0am=0 and bm=0bm=0, then (ab)m=ambm=00=0(a-b)m=am-bm=0-0=0, thus elements aa are closed under subtraction.
  • If am=0am=0, then for arbitrary rRr\in R we have (ra)m=r(am)=r0=0(ra)m=r(am)=r0=0.
  • Thus these elements aRa\in R where am=0am=0 form an ideal of RR.

So this ideal of RR, containing all elements aRa\in R where am=0am=0, is called the annihilator of mRmm\in Rm, and is written AnnR(m)\Ann_R(m). The idea is that the annihilator consists of all elements that send the generator mm (and therefore all elements) to zero. In fact:

Theorem: Every cyclic module RmRm is isomorphic to R/AnnR(m)R/Ann_R(m).
  • Let σ:RRm\sigma: R\to Rm be the map rrmr\mapsto rm.
  • By definition of cyclic module, the image of σ\sigma is all of RmRm, thus σ\sigma is surjective.
  • The kernel of σ\sigma consists of elements aa where am=0am=0, which is exactly AnnR(m)\Ann_R(m).
  • Then by the First Isomorphism Theorem we have R/ker σim σR/\ker\sigma\iso\im\sigma, which is R/AnnR(m)RmR/\Ann_R(m)\iso Rm.

Thus we have that every quotient by a principal ideal R/(f)R/(f) is cyclic (with m=[1]m=[1]), and every cyclic module RmRm is isomorphic to the quotient R/AnnR(m)R/\Ann_R(m). If RR is a PID, then AnnR(m)\Ann_R(m) is a principal ideal as well, and we get an isomorphism between quotients by principal ideals R/(f)R/(f) and cyclic modules RmRm. The important part is: R/(f)R/(f) is either the free module RR (when f=0f=0) or a torsion module R/(f)R/(f) (when fchar "338=0f\ne 0).

Recall the structure theorem of modules over a PID. Now that we know more about cyclic modules, we know that any module MM defined over a PID RR is isomorphic to a direct sum of free submodules R/(0)RR/(0)\iso R, and torsion cyclic submodules R/(fi)R/(f_i), which are ordered by divisibility: f1frf_1\mid\ldots\mid f_r for some order of the fif_i.

MRnfreeR/(f1)R/(f2)torsionM\iso\underbrace{R^n}_{\text{free}}\oplus\underbrace{R/(f_1)\oplus R/(f_2)\oplus\ldots}_{\text{torsion}}

We can go a bit further:

Theorem: Over a PID RR, every cyclic module RmRm is a direct sum of cyclic modules, each of which is either a free cyclic module R/(0)RR/(0)\iso R, or a torsion cyclic module R/(pk)R/(p^k) where pp is irreducible in RR and k>0k>0.
  • Since RR is a PID, AnnR(m)\Ann_R(m) (being an ideal) is a principal ideal (a)(a).
  • Recall that RmR/AnnR(m)Rm\iso R/\Ann_R(m).
  • If a=0a=0 then RmR/(0)RRm\iso R/(0)\iso R, meaning R/(m)R/(m) is a free cyclic module.
  • Otherwise, if aa is nonzero, note that RR is a PID and therefore a UFD. Then every nonzero aa has a unique factorization into primes p1k1p2k2pnknp_1^{k^1}p_2^{k^2}\ldots p_n^{k^n}.
  • So we have R/(m)R/AnnR(m)R/(p1k1p2k2pnkn)R/(m)\iso R/\Ann_R(m)\iso R/(p_1^{k^1}p_2^{k^2}\ldots p_n^{k^n}) which factors by the Chinese Remainder Theorem into R/(p1k1)R/(p2k2)R/(pnkn)R/(p_1^{k^1})\oplus R/(p_2^{k^2})\oplus\ldots\oplus R/(p_n^{k^n}).
  • Therefore RmRm is either a free cyclic module, or factors into torsion cyclic modules where the annihilator is a prime power pkp^k.

Thus another way to decompose a finitely generated module over a PID is MRnfreeR/(p1k1)R/(p2k2)torsionM\iso\underbrace{R^n}_{\text{free}}\oplus\underbrace{R/(p_1^{k_1})\oplus R/(p_2^{k_2})\oplus\ldots}_{\text{torsion}} where these prime power factors pikip_i^{k_i} are known as elementary divisors.

Thus there are two ways to decompose a module according to the structure theorem:

The product of invariant factors must be equal to the product of the elementary divisors, since these describe the same module.


In this exploration, we discovered how RR-matrices can be used to fully describe all the homomorphisms between free RR-modules. As a bonus, we also learned how presentation matrices can be used to fully describe any RR-module, and further, broke down the structure of finitely generated modules over a PID.

Next time we’ll be applying this structure theorem to characterize completely all the homomorphisms between non-free modules.


< Back to category Exploration 1: Modules and vector spaces (permalink)
Exploration 2: Module actions