modules
Module theory follows on the heels of ring theory, in the sense that modules are exactly vector spaces defined over rings. Module theory makes up a large portion of representation theory and character theory, and touches a bit of category theory (which we might go over.)
Here’s my notes on module theory.
-
December 27, 2023.
Exploration 1: Modules and vector spaces
Questions:
- How do we use matrices to represent mathematical objects?
This entire exploration is actually going to be about matrices. Specifically, we’re going to talk about R-matrices, which are matrices with entries in a ring R. It turns out the study of R-matrices subsumes the study of modules, which we will see by the end of this exploration.
What are modules? It is easiest to start with free modules Rn, which are additive abelian groups of R-vectors, which are row or column vectors whose n entries are in R, where addition and scalar multiplication are defined in the usual way: a1⋮an+b1⋮bn=a1+b1⋮an+bn and ra1⋮an=ra1⋮ran for r∈R
These should be familiar from linear algebra, only that the entries of R-vectors are elements in the ring R. In this context, elements of R are called scalars and are not in Rn themselves — only R-vectors are in Rn. Since Rn is composed of R-vectors, it is straightforward to describe n×m R-matrices as maps Rm→Rn, like in linear algebra: r11r21⋮rn1r12r22⋮rn2……⋱…r1mr2m⋮rnmv1v2⋮vn=∑i=1mvir1i∑i=1mvir2i⋮∑i=1mvirni where the LHS vector is in Rm and the RHS vector is in Rn.
Theorem: Every ring R is a free module over itself.Every element in the ring can be treated as a scalar or a (one-dimensional) R-vector, so scalar multiplication is given by multiplication in R.
This theorem introduces some ambiguity: writing R could either mean the ring R, or the one-dimensional module over R. For clarity, here we write R when we mean the ring, and we always write R-modules using the letters M or N or V.
In this section, we describe R-modules beyond free modules.
R-modules are more general than free modules Rn, however. They are additive abelian groups of elements v (not necessarily R-vectors!), where elements r∈R come into play by defining a scalar multiplication r⋅v governed by the following laws:
- Scalar distributivity: r⋅(v+w)=rv+rw
- Vector distributivity: (r1+r2)⋅v=r1⋅v+r2⋅v
- Scalar associativity: r1⋅(r2⋅v)=(r1⋅r2)v
- Scalar unit law: 1⋅v=v
Theorem: The Z-modules are exactly the abelian groups.- (→) Every R-module is an additive abelian group by definition.
- (←) If you have an additive abelian group, you can define scalar multiplication n⋅a as the sum of n copies of a. This satisfies the laws above, so we obtain a Z-module.
Like with groups, rings, and fields, we can define a couple concepts for R-modules:
- A submodule W of an R-module M is the same as subgroup/subring/subfield – it’s just an R-module that is a subset of another R-module. The zero module {0} is always a submodule of M, and so is M itself.
- The quotient module M/N is always defined – there is no need for N to be a special kind of module the same way you need ideals for quotient rings and normal subgroups for quotient groups. This is because we can always eliminate the elements generating N from the elements generating M without breaking any of the module laws.
- The direct product of two R-modules is called the direct sum N⊕M. It works identically to the direct product (elements are pairs (n,m) with n∈N and m∈M) except we write ⊕ instead of ⊗, referring to the additive nature of R-modules.
In this section, we describe how to construct R-modules.
Just like how groups and rings can be finitely generated, we can construct an R-module M out of a generating set of elements {v1,v2,…,vn}, known as a spanning set for M. We can write M=span({v1,v2,…,vn}), and say that M consists of R-linear combinations of its spanning set {v1,v2,…,vn}, i.e. M is the set M=span({v1,v2,…,vn})={i∑rivi} where the coefficients ri are elements of R, and each sum is finite (though the spanning set can be infinite).
No matter the spanning set, it is always possible to generate the zero element of the R-module — just take the R-linear combination where all the coefficients ri are zero. Of interest is whether this is the only way to generate the zero element. If there is another way to generate the zero element, i.e. using a nonzero coefficient somewhere, then we say that the spanning set is linearly dependent. Otherwise, there is only one way to generate the zero vector, and we have a linearly independent spanning set, also known as a basis.
If we consider spanning sets as a map Rn→M from a n-tuple of coefficients in R to elements of the R-module, then a spanning set that is a basis is interesting. This is because all spanning sets are trivially surjective (the spanning set generates the module), and if the spanning set is a basis, then there is only one way to generate the zero vector — thus the kernel is trivial, thus the map is injective and bijective. This bijection between n-tuples of coefficients and elements of an R-module means that elements of the R-module are essentially n-tuples of coefficients, also known as R-vectors. Sound familiar?
Theorem: A free module Rn is exactly an R-module generated by a basis.- Every spanning set gives rise to a surjective map Rn→M from n-tuples of coefficients to elements of the module.
- If the spanning set is a basis, then only the zero tuple maps to the zero element, i.e. the map has a trivial kernel and is injective. Therefore a basis represents a bijection between n-tuples of coefficients, i.e. R-vectors, and elements of the module: a1⋮anrepresentsa1v1+a2v2+…+anvn
- Thus the elements of the R-module can be expressed as R-vectors, which is the definition of a free module Rn.
For a free module Rn, the standard basis ei is defined as the columns of the n×n identity matrix In: e1=10⋮,e2=01⋮,…,en=⋮01
Linear algebra is exactly when R is a field F. In fact, F-modules are exactly F-vector spaces. We will draw parallels to linear algebra extensively throughout this exploration.
Theorem: Every spanning set of a F-module V contains a basis, if F is a field.- Let {v1,v2,…,vn} be an arbitrary spanning set for V. If this spanning set is linearly independent, it is a basis and we are done. Otherwise, assume that {v1,v2,…,vn} is linearly dependent.
- If {v1,v2,…,vn} is linearly dependent, then zero can be represented by some F-linear combination ∑irivi=0 with a nonzero coefficient. WLOG let r1 be one of the nonzero coefficients.
- Since we’re working in a field F, r1 has a multiplicative inverse r11. Then the corresponding v1 can be written as a F-linear combination of the other elements: 0v1=r1v1+r2v2…+rnvn=−r1r2v1−…−r1rnvn
- Since every v1 can be represented as a F-linear combination of the others, its contribution to the spanning set is redundant. We can remove v1 from the spanning set because {v1,v2,…,vn} and {v2,…,vn} generate the same module.
- If the resulting spanning set {v2,…,vn} is not a basis, we can repeat this process to further decrease the size of the spanning set. This process terminates either at a basis or at the empty set, which is trivially a basis (for the zero module). Thus the original spanning set {v1,v2,…,vn} contains a basis for V.
In this section, we start exploring R-matrices for real.
At the beginning we mentioned that this entire exploration is going to be about studying R-matrices. One big reason is because R-matrices are homomorphisms of free R-modules. For instance: A=3−11241−40−1 is a 3×3 R-matrix. Left multiplication by A corresponds to a homomorphism φ:R3→R3. B=251480183314822 is a 3×5 R-matrix. Left multiplication by B corresponds to a homomorphism φ:R5→R3. Typically we just refer to the R-matrix itself as the homomorphism, falling back to φ when we’re talking about homomorphisms in the abstract.
Our first question is, which R-matrices correspond to isomorphisms?
This translates to the question: when are R-matrices invertible? This requires the concept of a determinant, which should be familiar from linear algebra. For n×n (square) R-matrices, the determinant det(A) is the unique function Rn→R that is:
- Multilinear: linear in each row and column.
- Alternating: Swapping two rows or columns swaps the sign, which also implies that two identical rows/columns makes the function zero (since only 0 is invariant under swapping those rows/columns).
- Equal to 1 for the identity matrix I.
This unique function is det(A)=∑σ∈Snsgn(σ)∏i=1nai,σ(i), which:
- Is multilinear: each term of the sum is a product ∏i=1nai,σ(i) of matrix entries of in each column σ(i).
- Is alternating: since a property of sgn(σ) is that it switches sign when you swap two elements in the permutation (i.e. you swap two rows or columns).
- Is 1 for identity: because the summand ∏i=1nai,σ(i) is only nonzero for the identity permutation σ=1 where sgn(σ)=1.
Although the above shows that the given function is a determinant, the proof that it is the unique function that is the determinant requires some more advanced tools we’ll introduce later. Let’s first determine when an square R-matrix is invertible.
Therefore: We can write the determinant recursively as det(A)=∑j=1naijCij, where Cij=(−1)i+jdet(Mij), for an arbitrary row index i. This is known as the cofactor expansion of the determinant along the row i.Recall the definition of the determinant from above: det(A)=σ∈Sn∑sgn(σ)i=1∏nai,σ(i) The determinant det(A) can be defined recursively. First, we factor a single ai,σ(i) out of the product: det(A)=σ∈Sn∑sgn(σ)ai,σ(i)k=1,k=i∏nak,σ(k) Now note that since the sum goes over every permutation σ∈Sn, the i we just took out gets permuted to every value j from 1 to n, (n−1)! times each (since there are (n−1)! permutations in Sn that takes i to any given value). This means aij appears (n−1)! times in the sum, so we can move ai,σ(i) out as aij iff there are (n−1)! permutations for which σ(i)=j. But this is true since there are (n−1)! permutations of the remaining n−1 elements as well. det(A)=j=1∑naijσ∈Sn,σ(i)=j∑sgn(σ)k=1,k=i∏nak,σ(k) Note that the inner sum and product σ∈Sn,σ(i)=j∑sgn(σ)k=1,k=i∏nak,σ(k) is very close to the formula for the determinant of a matrix that has the row i removed (since ai,σ(i) is excluded from the product) and the column j removed (since σ is defined to send i to j, meaning the column j only appears in the form ai,σ(i)=aij, but the row i is excluded so it never appears.)
Define the minor Mij as the matrix A with the row i and column j removed. Then we have det(Mij)=σ∈Sn,σ(i)=j∑sgn(σ)k=1,k=i∏nak,σ(k) This is almost the determinant function as defined above, except we’re relying on n instead of the actual size of the matrix n−1. To correct this, we can take permutations τ∈Sn−1 and use the entries bij of the minor instead: det(Mij)=τ∈Sn−1∑sgn(σ)k=1∏nbk,τ(k) Note that we’re still mentioning σ, which we define as the permutation in Sn corresponding to τ where σ(i)=j. To convert sgn(σ) to sgn(τ), let’s move the ith row to the top of the matrix, which is i−1 transpositions, multiplying the determinant by (−1)i−1 due to the alternating property of the determinant. Similarly, we multiply the determinant by (−1)j−1 to represent moving the jth column to the left of the matrix. This ensures that all the rows and columns of the minor are contiguous, so the permutations in Sn−1 use the correct indices. Overall, we’re multiplying sgn(σ) by a total of (−1)i+j−2=(−1)i+j to get sgn(τ), and thus: (−1)i+jdet(Mij)=τ∈Sn−1∑sgn(τ)k=1∏nbk,τ(k) which matches our original definition of determinant for the minor Mij. The value (−1)i+jdet(Mij) is known as the cofactor Cij of the matrix A.
Finally, we can plug this back into our original expression for det(A) to get: det(A)=j=1∑naijCij for some fixed row index i.
The cofactor expansion can be used to directly define the inverse of an R-matrix.
Therefore: A−1=det(A)adj(A), which only exists when det(A) is a unit in the ring.Recall the cofactor expansion: det(A)=j=1∑naijCij Notice how similar it is to matrix multiplication: [A⋅B]ik=j=1∑naijbjk In fact, let adj(A) be the adjugate matrix of A where each entry bij is equal to Cji (note the swapped indices). Then we have exactly [A⋅adj(A)]ik=j=1∑naijbjk=j=1∑naijCkj={det(A)0if i=kif i=k=[det(A)⋅I]ik The reason that ∑j=1naijCkj is zero when i=k is because if you copy the values in row i to row k, resulting in a matrix A′, the sum becomes j=1∑nakjCkj=det(A′) where Ckj is unchanged since it is calculated on a minor with the row k removed. But since A′ has two identical rows, then by the alternating property of determinants, det(A′)=0. Thus the sum is zero.
So we have proved the following: A⋅adj(A)=det(A)⋅I Note that if det(A) is a unit in the ring R, then we have A⋅det(A)adj(A)=I implying that det(A)adj(A) is the right inverse of A. A similar argument shows that det(A)adj(A) is the left inverse of A as well.
Corollary: An R-matrix is invertible iff its determinant is a unit in R.Since you can only divide by units in a ring, and the inverse matrix is exactly the adjugate divided by the determinant, the determinant must be a unit in order to define the inverse matrix.
This should be familiar from linear algebra, where you have F-matrices invertible iff its determinant is nonzero. Of course, this is exactly because the only nonunit in a field F is zero. For an example that isn’t a field, note that the units of Z are ±1. Thus a Z-matrix is invertible iff its determinant is ±1.
For us, invertible matrices are significant because, being reversible, their corresponding homomorphism is an isomorphism. More importantly, this means multiplying an R-matrix A by an invertible matrix P is the same as composing an isomorphism τ to the R-module homomorphism σ corresponding to A. Right-multiplying by P is precomposition by τ, and left-multiplying by P is postcomposition by τ. As we know, isomorphisms are essentially renamings, so composing a homomorphism by isomorphisms gives you essentially the same homomorphism.
So we have proved the existence and a formula for the inverse of an R-matrix. To build on this, let’s introduce how they let us manipulate R-matrices for the rest of this exploration.
We introduce the three row and column operations.
First, we may swap two rows of a matrix by left-multiplying by a suitable elementary matrix. 100001010123123123=132132132
Second, we may multiply any row by a unit (here, assume 10 is a unit) by left-multiplying by another kind of elementary matrix. 1000010001123123123=102310231023
Third, we may add a multiple of any row to another row, again by left-multiplying by a third kind of elementary matrix. 1010010001123123123=121312131213
Column operations use the same (transposed) elementary matrices, except you right-multiply instead of left-multiply.
Obviously these operations can be undone by doing the operation again – you can un-swap by swapping again, un-scale by scaling by the inverse unit, and un-add the row cri by adding −cri. This implies that the elementary matrices are invertible. In fact, the inverse of an elementary matrix represents the inverse operation.
When you apply row and column operations to a matrix, you’re essentially doing a clever renaming of the domain (column operations) and codomain (row operations). The resulting matrix represents the same homomorphism but on the renamed domain/codomain.
How do the row and column operations affect the determinant? We can give a short proof of each:
Theorem: Swapping two rows flips the sign of the determinant.This is exactly the alternating property of determinants.
Theorem: Scaling a row by a unit will scale the determinant by the same unit.This follows immediately from the multilinear property of determinants.
Theorem: Adding a multiple of a row to another row does not change the determinant.If row ri becomes row ri+crj, then by multilinearity, det(matrix where ri=ri+crj)=det(original matrix where ri=ri)+c⋅det(matrix where ri=rj) But the last matrix has two identical rows ri and rj, so by the alternating property its determinant is zero. Therefore the resulting matrix has determinant equal to that of the original matrix.
Corollary: Row and column operations multiply the determinant by a unit.Follows immediately from the above three theorems, which show that the three row and column operations multiply the determinant by −1,c,1 respectively (c a unit).
Theorem: The determinant of a product of matrices det(AB) is the product of their individual determinants det(A)det(B).Using the formula for determinant for AB, we have det(AB)=σ∈Sn∑sgn(σ)i=1∏n(AB)i,σ(i)=σ∈Sn∑sgn(σ)i=1∏nk=1∑naikbk,σ(i) Summing over the index k from 1 to n can be done in any order since addition is commutative. Thus we can add them in an arbitrary order. Since we’re multiplying together a sum ∏i∑k, by the distributive property the result is essentially adding together all the ways of picking one term from each sum and multiplying them together, just like how (a+b)(c+d)=ac+ad+bc+bd. This is the same as adding together all the ways of picking one k for each i, so we can express this as the sum of using all permutations taking i to k: det(AB)=σ∈Sn∑sgn(σ)τ∈Sn∑i=1∏nai,τ(i)bτ(i),σ(i)=τ∈Sn∑σ∈Sn∑sgn(σ)(i=1∏nai,τ(i))(i=1∏nbτ(i),σ(i))=τ∈Sn∑(i=1∏nai,τ(i))(σ∈Sn∑sgn(σ)i=1∏nbτ(i),σ(i)) We can reorder the last product by applying τ−1 to the indices i: det(AB)=τ∈Sn∑(i=1∏nai,τ(i))(σ∈Sn∑sgn(σ)i=1∏nbi,σ(τ−1(i))) Such a reordering is equivalent to multiplying by the signature of τ: det(AB)=τ∈Sn∑(i=1∏nai,τ(i))(σ∈Sn∑sgn(σ)sgn(τ)i=1∏nbi,σ(i))=(τ∈Sn∑sgn(τ)i=1∏nai,τ(i))(σ∈Sn∑sgn(σ)i=1∏nbi,σ(i))=det(A)det(B)
In this section, we show how row and column operations can be used to simplify a matrix.
Therefore: If R is a Bézout domain, then any R-matrix A can be reduced to Smith normal form, a diagonal matrix where each nonzero entry divides the next.Using row and column operations, you can always reduce an R-matrix down into the block form [D000], where D is a diagonal R-matrix, i.e. all its nonzero entries are on the main diagonal.
There is a way to get D into the form such that the diagonal elements d1,d2,… divide each order: d1∣d2∣d3∣…. This form is called Smith normal form. Any R-matrix can be reduced to Smith normal form, provided R is a Bézout domain, which is an integral domain in which every sum of principal ideals is principal. For instance, in PIDs every ideal is principal, so PIDs are Bézout domains.
Theorem: Bézout domains define a GCD d=gcd(a,b,…) and have Bézout’s identity: for every {a,b,…}, there exist {x,y,…} where ax+yb+…=gcd(a,b,…).- The fact that every sum of principal ideals is principal
(a)+(b)+…=(d)
implies two things:
- First, since (d) contains each of {(a),(b),…}, we know that d is a common divisor of {a,b,…}.
- Second, (a)+(b)+…=(d) represents all the linear combinations of {a,b,…}. Since a common divisor of {a,b,…} must divide all linear combinations of those elements =(d), every common divisor must divide d in particular.
- From this, we can conclude two things:
- Since every common divisor divides d, it is in fact the greatest common divisor gcd(a,b,…).
- gcd(a,b,…) is some linear combination of {a,b,…}, i.e. we have Bézout’s identity ax+yb+…=gcd(a,b,…) for some {x,y,…}.
Method: To put a matrix into Smith normal form, choose d1 to be the GCD of all its entries. Then use row/column operations to isolate d1 in the upper left such that its row and column are zero except for d1, getting a block matrix d10⋮0…B where B is the minor M11 where every entry is divisible by d1 because of how we picked d1. Because of Bézout’s identity, it’s always possible to isolate the GCD using row and column operations this way. Repeating the process for the minor B gives you Smith normal form. Here’s a short example:
==== [21−12] [13−12][1101]−1 [1305]([1011][1101])−1 [1305][2111]−1 [1−301]−1[1005][2111]−1 add col 2 to col 1 add col 1 to col 2 subtract 3(row 1) from row 2
Taking the Smith normal form of a matrix A can be written A=UA′V, where U,V are the elementary matrices used to reduce A to the diagonal matrix A′. The entries di along the diagonal of A′ are called the invariant factors of A, because you will always arrive at these factors in the Smith normal form regardless of how you do the row and column operations.
Theorem: The invariant factors of an R-matrix A are unique up to units.At each step, the invariant factor di is chosen as the GCD of the remaining elements, and since GCDs are unique up to units, the invariant factors are unique up to units.
Theorem: The determinant of a diagonal R-matrix is equal to the product of its entries.In the definition of determinant, det(A)=σ∈Sn∑sgn(σ)i=1∏nai,σ(i) you can see that unless σ(i)=i for each i, the product will include a zero term that makes the whole product zero. This is because aij for i=j is zero in a diagonal matrix. Thus the only σ that results in a nonzero term is the identity permutation, and so for diagonal matrices the formula simplifies down to det(A)=i=1∏naii
Corollary: The determinant of an R-matrix is equal to the product of its invariant factors, up to units.This follows immediately from the fact that the determinant of a matrix equals (up to units) the determinant of its Smith normal form, which is equal to the product of the diagonal entries, which are the invariant factors.
Theorem: The invariant factors of a matrix A are all ones iff A is invertible.If A is invertible, then its determinant is a unit, which can only be a product of units. But since the determinant is the product of invariant factors up to units, that’s the same as saying every invariant factor is a unit, and therefore associate to 1.
Corollary: The Smith normal form of an invertible matrix is the identity matrix.
Corollary: Every invertible matrix over a Bézout domain R factors into elementary matrices.For matrices over a Bézout domain, the Smith normal form is obtained via row and column operations, a process that only factors out corresponding elementary matrices. By the previous theorem, the resulting Smith normal form of an invertible matrix is the identity matrix, which is also an elementary matrix. Thus the original matrix factors into elementary matrices.
In this section, we abstractly represent relationships between R-modules via R-matrices.
We already know that a n×m R-matrix M defines a homomorphism Rm→Rn. Let’s bring back our example matrix A and explore some properties of this homomorphism: A=3−11241−40−1 First, the columns of an R-matrix M generate its image (also known as column space), which is written im M=MRm=span(columns of M) For instance with A above, im A=AR3=span3−11,241,−40−1 Second, the solutions to AX=0 comprise the kernel (also known as null space) of A. The kernel of a diagonal matrix D is relatively straightforward. The equation d1d2⋱x1x2⋮=0 is equivalent to the system of equations d1x1d2x2=0=0… In an integral domain, xi is zero when di=0, and xi can take on any value when di=0. This means that the kernel of a diagonal matrix is span(ei,ej,…) where {i,j,…} are the indices i where di=0. In particular, if there are no zeroes along the main diagonal of a diagonal matrix, then its kernel is trivial.
Theorem: An R-matrix defines an injective R-module homomorphism iff its kernel is trivial.- (→) If A:Rm→Rn is injective, then AX=AY implies X=Y. In particular, elements X of the kernel, where AX=0, must be zero because AX=0⟹AX=A0⟹X=0.
- (←) If A:Rm→Rn has a trivial kernel, then consider AX=AY, which implies AX−AY=0 and A(X−Y)=0. Since the kernel is trivial, we have X−Y=0 and thus X=Y, showing that A is injective.
Theorem: An R-matrix defines an surjective R-module homomorphism iff its image is the codomain.By definition. Since surjective means the whole codomain is mapped to, it’s the same as saying that the image is equal to the codomain.
Corollary: Since the columns of A:Rm→Rn generate im A by definition, if A is surjective, then its columns generate Rn.
Theorem: The kernel of an R-matrix A over a Bézout domain R is the same as the kernel of its Smith normal form QA′P−1.For example, let’s take the Smith normal form of A: A=QA′P−1=−3−1−2−2−1−12211000200031−3−1011−241 and then solve QA′P−1X=0 with these steps: QA′P−1X=−3−1−2−2−1−12211000200031−3−1011−241X=1000200031−3−1011−241X=1−3−1011−241X=X=1−3−1011−241−1X= 0 0 0000000000 left-multiply by Q−1 solve for P−1X left-multiply by P In this case, A has a trivial kernel {0}. You can see that this is precisely because its Smith normal form has no zero entries along the diagonal. We’ll be talking about the kernel in the abstract, defined as the solutions to AX=0. We can see that the kernel is easily calculable when R is a Bézout domain, but it may not be as easy for other rings.
Theorem: The kernel and image of a homomorphism σ:M→N is a submodule of the domain M and codomain N respectively.- Contains 0: σ must send 0 to 0.
- Closed under addition: because σ(m)+σ(n)=σ(m+n), if both m and n are in the kernel/image so is m+n.
- Closed under scalar multiplication: because rσ(m)=σ(rm), if m is in the kernel/image so is rm.
Like with groups, an R-module is simple if has no proper non-trivial submodules.
Theorem: Any homomorphism σ:M→N between simple modules M and N is either trivial or an isomorphism.Since kernel and image must both be submodules of M and N respectively, and simple modules have only two possible choices for submodules (trivial submodule and the module itself), the only possible homomorphisms are when the kernel is trivial (implying that the image is N, therefore an isomorphism) and when the kernel is M (implying that the image is trivial, therefore the trivial homomorphism).
In this section, we describe properties of R-modules with only diagrams.
Here we’re going to represent R-matrices abstractly as homomorphisms σ on R-modules.
Consider two homomorphisms σ:A→B,τ:B→C where the image of σ is the kernel of τ. In other words, everything σ maps to will get mapped to 0 by τ. This property is called exactness, and we say this sequence is exact at B, and any sequence of homomorphisms AσBτCυ… with the property that “the image of one homomorphism is the kernel of the next” is called an exact sequence.
In fact, we have discussed the concept of exact sequences before for groups. Exact sequences for groups and exact sequences for R-modules are precisely the same concept, but since R-modules have a richer structure, we may use exact sequences in ways that we cannot do for groups.
To recap, we showed the following for exact sequences for groups. These are true for R-modules as well:
- Theorem: The exact sequence {0}σBτC implies τ is injective.
- Theorem: The exact sequence AσBτ{0} implies σ is surjective.
- First Isomorphism Theorem: Given σ:M→N, M/ker σ≅im σ.
- An exact sequence of the form
{0}→AσBτC→{0}
is known as a short exact sequence, with the following
properties:
- Theorem: A is isomorphic to a submodule of B.
- Theorem: If σ is an inclusion map, then C≅B/A.
- Splitting lemma: If there is an left inverse homomorphism σ−1:B→A, or if there is an right inverse homomorphism τ−1:C→B, then the sequence splits and we have B≅A⊕C.
- A short exact sequence describes an embedding of A into B, and then C≅B/A captures the structure that A doesn’t account for. When σ or τ has an inverse, then the splitting lemma says you can directly piece together the structures A and C to obtain B, via A⊕C≅B.
For an example of how exact sequences are used, we can look at projections. A projection is a idempotent endomorphism, i.e. a homomorphism π:M→M such that π2=π.
Theorem: A module M can be decomposed into the direct sum M=ker π⊕im π iff a projection π:M→M exists.- (←) Any direct sum A⊕B has a canonical left and right projection, so the forward direction is trivial.
- (→) For any homomorphism like π, we can always write the short exact sequence {0}→ker πιMπim π→{0} because the kernel of π is exactly the image of the injective inclusion map ι:ker π→M, and π is surjective onto its image im π.
- Since π is idempotent, it acts like the identity on its image im π. Since that implies π(m)=m for m∈im π, we have π(π(m))=m, meaning π itself is its own inverse for elements in its image m∈im π.
- By the splitting lemma, since π:M→im π has an inverse π′:im π→M, the sequence splits and therefore M=ker π⊕im π.
In this section, we utilize exact sequences in the context of R-modules specifically.
Take this exact sequence, for instance: RmARnσM→{0} This exact sequence is known as a presentation of the R-module M. The idea is that M is entirely described by the presentation, and the presentation is entirely described by the R-matrix A, called the presentation matrix. In other words, it only takes a single R-matrix, A, to define all you need to know about an R-module M. Let’s see why this is.
Theorem: In the above exact sequence, M is isomorphic to the elements X of Rm where AX=0.- By the
first
isomorphism theorem on
σ,
we have
Rn/ker σ≅im σ.
However:
- The exact sequence Rn→M→{0} implies that the homomorphism σ:Rn→M is surjective, and thus M is isomorphic to the image of σ. Thus im σ≅M.
- By exactness, im A=ker σ.
- Then we have Rn/im A≅M. This is the same as saying M is Rn, except we send all elements that are linear combinations of the columns of A to 0 (AX=0).
In the context of presentations, the basis of Rn (which is the codomain of A) are known as generators of M, and the columns of A describes relations AX=0 on the generated elements X. So the R-matrix A, by virtue of defining both the generators of M, and the relations on those generators of M, completely determines the R-module M. Because every R-matrix A defines a R-module this way, we say that M≅Rn/im A is the cokernel of A. It is so named because there is a duality between the kernel and the cokernel. Where the kernel is comprised of the elements in the domain sent to zero by A, and a trivial kernel implies injectivity of A, the cokernel is comprised of the structure in the codomain not sent to zero by A, and a trivial cokernel implies surjectivity of A.
Theorem: An R-matrix defines an surjective homomorphism iff its cokernel is trivial.Surjectivity of A:Rn→Rn means im A=Rn. But then the cokernel Rn/im A is trivial if and only if im A=Rn.
Hopefully this makes clear the reason we only study R-matrices in this exploration — because every R-module can be defined as the cokernel of a suitable R-matrix, studying R-matrices completely subsumes the need to study R-modules directly!
In this section, we describe how to derive the R-matrix that presents a given R-module.
Let’s do the reverse. How do you construct the A-matrix (the presentation matrix) that presents a given (finitely generated) R-module? Here is a step-by-step:
- Step 1: Find its generators vi.
- Step 2: Find the relations among those generators.
- Step 3: Express the relation vectors as columns of the presentation matrix.
- Step 4: Reduce the matrix.
(Note that a non-finitely generated R-module will ask for infinitely many vi, so describing that class of R-modules would require a different process.)
Example:
Let M be a Z-module generated by (v1,v2,v3) under the relations 3v1+2v2+v38v1+4v2+2v37v1+6v2+2v39v1+6v2+v3=0=0=0=0 Express each relation as a column of the presentation matrix A: A=321842762961 so that the above system of relations can be summarized as [v1v2v3]A=[0000] And theoretically we are done, since A presents M.
However, we can reduce this matrix down a bit. The following simplifying operations also do not change the isomorphism class of the module presented by A:
- Row or column operations.
- Removing a column of zeroes (since that represents the zero relation 0v1+0v2+0v3=0)
- Removing the ith row and jth column whenever the jth column consists of zeros except for one 1 (since that represents the relation vi=0, and so we can remove vi from consideration)
Reducing our matrix above:
======= 321842762961 001202122641 [201264] [2−4106−8] [−4−8] [−40] [−4] [4]r1−3r3→r1r2−2r3→r2remove c1,r3r2−2r1→r2remove c2,r1−2c1+c2→c2remove c1−1c1→c1
Since the image of [4]:Z→Z is 4Z, the big 3×4 matrix above actually just presents the module Z/4Z! Thus M=Z/4Z.
We’ve seen how it’s done for a specific finitely generated R-module, and this approach generalizes for all finitely generated R-modules. Thus we can say that every finitely generated R-module M can be presented by an R-matrix. But how do you tell when M is finitely generated in the first place?
In this section, we learn the conditions under which an R-module is finitely generated.
To see whether an R-module M is finitely generated (and thus presentable), let’s try building one.
Start with the zero submodule W0={0} of M. At every step, add an element vi∈M outside of Wi−1 to get Wi, generated by vi and the old Wi−1.
The fact of the matter is, if M is finitely generated, then eventually there are no more elements v outside of M to add, so the process stops. Otherwise, the process keeps going. One way to characterize this is known as the ascending chain condition (ACC) on submodules: “There is no infinite strictly increasing chain W1<W2<… of submodules of M.” Satisfying the ACC is the same as saying that the process stops.
Theorem: Every submodule W≤M is finitely generated iff it satisfies the ACC.- (→) Towards contradiction, say you have an infinitely increasing chain W1<W2<… of submodules of M. Let U be the infinite union of these submodules, so U is a submodule of M too, and in particular U is a finitely generated superset of every submodule. This means that U appears at some point in the infinite chain, but it is also a superset of every submodule of the infinite chain, so some point …<U<…<U. This implies Wi=Wi+1 at some point. So every infinite chain of Wi is not strictly increasing, contradiction.
- (←) Let wi be a special set of generators of W, inductively constructed by having W0={0} be generated by w0={}, and having W′≤W be generated by the existing wi together with an element v in W but not W′. Since we’re constructing a strictly increasing chain of submodules Wi, and there is no infinite strictly increasing chain of submodules by the ACC, this process ends with a finite number of generators in wi.
Recall that integral domains that satisfy the ascending chain condition on principal ideals (ACCP) are exactly the factorization domains. We can say something similar here. A noetherian R-module is one where every submodule is finitely generated, i.e. satisfies the ACC (by the above proof). Every Noetherian R-module can be described by a presentation matrix.
In this section, we explore the properties of Noetherian rings.
More generally, a noetherian ring is one where every ideal is finitely generated, i.e satisfies the ACC. Note that it’s possible for a ring to be finitely generated but have ideals that are not finitely generated, so proving noetherianity requires proving facts about ideals, not the whole ring. We’re going to prove a number of theorems that let us work with such rings:
Theorem: Surjective homomorphisms preserve noetherianity. (If the domain is Noetherian, so is the codomain.)Because surjective homomorphisms preserve all subset relations, and thus preserves the ACC.
Theorem: Quotients preserve noetherianity. (If R is Noetherian, so are its quotients R/I.)Because the canonical map π:R→R/I is surjective, and surjective homomorphisms preserve noetherianity.
Theorem: Finite direct sums preserve noetherianity. (If R,S are Noetherian, so is the direct sum R⊕S.)- Ideals in R⊕S are in the form IR⊕IS, where IR is an ideal in R and IS is an ideal in S. This is precisely because there is no way for elements of R to influence elements of S and vice versa.
- Because R,S are Noetherian, IR and IS are finitely generated, from which you can construct a finite set of generators for IR⊕IS.
Corollary: Free modules over Noetherian rings are Noetherian. (If R is a Noetherian ring, then Rn is a Noetherian R-module.)You can take the direct sum of n copies of R to get Rn. Since finite direct sum preserves noetherianity, when R is Noetherian, Rn is Noetherian.
Theorem: Every ideal of a Noetherian ring R is contained in a maximal ideal, except R itself.All Noetherian rings satisfy ACC, which says every ideal is contained in a chain up to some point, i.e. the maximal ideal.
Hilbert Basis Theorem: If R is Noetherian, so is R[x].- The goal is to prove that an arbitrary ideal I for R[x] is finitely generated, given the ACC for R.
- Lemma: The leading coefficients of degree n polynomials in any ideal I of R[x] form an ideal Cn of R.
If ci∈Cn is the leading coefficient of fi∈I, then ci+cj is the leading coefficient of fi+fj, and rci (for arbitary r∈R) is the leading coefficient of rfi. fi+fj and rfi are both in I since it’s an ideal, and therefore ci+cj and rci are both in Cn, making it an ideal.
- We have Cn⊆Cn+1 because for every ci∈Cn and corresponding degree n polynomial fi, xfi exists (due to I being an ideal) as a degree n+1 polynomial with the same leading coefficient ci∈Cn+1.
- Thus the Cn form a chain of ideals in R, which stops strictly increasing at some point due to R being Noetherian and therefore satisfying the ACC. Let CN be the last strictly increasing ideal in this chain (the one where Cn=CN for all n≥N).
- Take J as all the polynomials fi corresponding to the generators ci of CN. Each fi has degree at most N. This is a finite number of polynomials since each Cn is finitely generated, due to R being Noetherian. Clearly J⊆I, as the fi are taken from I.
- We can prove any polynomial
f∈I
can be expressed in terms of these
fi,
i.e. I⊆J.
- First, if deg f>N then pick some polynomial g∈J with the same leading coefficient as f, which exists since Cdeg f=CN. Then f′=f−xdeg f−deg gg is a lower-degree polynomial in I due to I being an ideal. Repeat this process until you obtain a polynomial whose degree is ≤N, reducing it to the second case:
- Then if deg f≤N, then the leading coefficient of f is in CN and therefore you can generate f using the elements in J. To see this, notice that you can match the leading coefficient by adding polynomials corresponding to the leading coefficients in C_N, and you can do the same process for every lower coefficient.
- Since I=J, and J is finitely generated, any arbitrary ideal I of R[x] is finitely generated, thus R[x] is Noetherian.
Corollary: If R is a Noetherian ring, so is anything in the form R[x]/(f) because quotients preserve noetherianity.
Theorem: The Noetherian Bézout domains are exactly the PIDs.- (→) Every ideal is finitely generated in a Noetherian ring, and every finitely generated ideal is principal in a Bézout domain. Thus every ideal is principal in a Noetherian Bézout domain.
- (←) In a PID, every ideal is principal making it trivially Bézout, and every ideal is generated by one element making it trivially Noetherian.
Structure Theorem for Modules over a PID: if M is a finitely generated R-module where R is a PID, then it can be uniquely decomposed as a direct sum of quotients of R. To be specific, M≅R/(f1)⊕…⊕R/(fn) where fi are elements of R that divide each other: f1∣…∣fn.- We can always find the R-presentation matrix A for any finitely generated Noetherian R-module M (so that M≅Rn/im A). In this case, M is given as finitely generated over a PID, and is Noetherian because PIDs are Noetherian.
- The Smith normal form is defined for any matrix defined over a Bézout domain. PIDs are also Bézout domains, so taking the Smith normal form of A gives A=PDQ−1, obtaining a diagonal matrix D of invariant factors fi.
- In particular, the resulting diagonal matrix D=f1f2⋱fn has the property that f1∣…∣fn, which is a property of the Smith normal form of any matrix.
- Then, since P and Q are invertible (i.e. isomorphisms), we have im A≅im PDQ−1≅im D
- The image of a matrix is the span of its columns. For a diagonal matrix D, each column corresponds to a scalar multiplication of a single dimension of M, and so the matrix itself can be expressed as a direct sum ⨁ifiR, with one submodule fiR for each dimension.
- Then we have M≅Rn/im A≅Rn/(f1R⊕f2R⊕…⊕fnR) which by the Third Isomorphism Theorem is isomorphic to M≅R/(f1)⊕R/(f2)⊕…⊕R/(fn)
- For the proof of uniqueness, any other decomposition into some R/(g1)⊕R/(g2)⊕…⊕R/(gn) would imply that the gi are invariant factors of A appearing on the diagonal of its Smith normal form, which must be the same invariant factors as fi since the Smith normal form is unique.
This theorem is a generalization of the structure theorem for finitely generated abelian groups, which is the specific case where R=Z, using the fact that the Z-modules are isomorphic to the abelian groups.
In this section, we examine the components of finitely generated R-modules.
In the above theorem, we found that any finitely generated R-module (over a PID R) can be expressed as a direct sum of components in the form R/(f). R-modules in the form R/(f) are called cyclic R-modules.
In general, a cyclic R-module M is one in which every element of the module can be expressed as scalar multiples of a single element m∈M. Because of this, cyclic R-modules can be written as Rm. If m∈R, then Rm is isomorphic to the principal ideal (m).
Theorem: R/(f) is a cyclic R-module.Since every element of R/(f) is a coset in the form [f], the multiplicative identity is the coset [1]. But the multiplicative identity generates all elements of R/(f), thus R/(f) is a cyclic R-module.
So finitely generated R-modules are composed of a direct sum of cyclic R-submodules R/(f).
Recall that quotienting a ring/module essentially sends all quotiented elements to zero. In other words, R/(f) means “R, where f=0”. Any relation can be expressed this way – if you want to apply the relation a=b, then you quotient by a−b to send it to zero.
Cyclic R-modules Rm make this even simpler. Since every possible element can be expressed in the form am for some element a, every relation on Rm can be expressed in the form am=0 instead of a−b=0.
In fact, the elements a that make am=0 true are given a special name: the annihilator of m.
Theorem: Given a cyclic R-module Rm, the elements a∈R such that am=0 form an ideal of R.- It is enough to prove that these elements a are closed under subtraction and by multiplication with elements r∈R.
- If am=0 and bm=0, then (a−b)m=am−bm=0−0=0, thus elements a are closed under subtraction.
- If am=0, then for arbitrary r∈R we have (ra)m=r(am)=r0=0.
- Thus these elements a∈R where am=0 form an ideal of R.
So this ideal of R, containing all elements a∈R where am=0, is called the annihilator of m∈Rm, and is written AnnR(m). The idea is that the annihilator consists of all elements that send the generator m (and therefore all elements) to zero. In fact:
Theorem: Every cyclic module Rm is isomorphic to R/AnnR(m).- Let σ:R→Rm be the map r↦rm.
- By definition of cyclic module, the image of σ is all of Rm, thus σ is surjective.
- The kernel of σ consists of elements a where am=0, which is exactly AnnR(m).
- Then by the First Isomorphism Theorem we have R/ker σ≅im σ, which is R/AnnR(m)≅Rm.
Thus we have that every quotient by a principal ideal R/(f) is cyclic (with m=[1]), and every cyclic module Rm is isomorphic to the quotient R/AnnR(m). If R is a PID, then AnnR(m) is a principal ideal as well, and we get an isomorphism between quotients by principal ideals R/(f) and cyclic modules Rm. The important part is: R/(f) is either the free module R (when f=0) or a torsion module R/(f) (when f=0).
Recall the structure theorem of modules over a PID. Now that we know more about cyclic modules, we know that any module M defined over a PID R is isomorphic to a direct sum of free submodules R/(0)≅R, and torsion cyclic submodules R/(fi), which are ordered by divisibility: f1∣…∣fr for some order of the fi.
M≅freeRn⊕torsionR/(f1)⊕R/(f2)⊕…
We can go a bit further:
Theorem: Over a PID R, every cyclic module Rm is a direct sum of cyclic modules, each of which is either a free cyclic module R/(0)≅R, or a torsion cyclic module R/(pk) where p is irreducible in R and k>0.- Since R is a PID, AnnR(m) (being an ideal) is a principal ideal (a).
- Recall that Rm≅R/AnnR(m).
- If a=0 then Rm≅R/(0)≅R, meaning R/(m) is a free cyclic module.
- Otherwise, if a is nonzero, note that R is a PID and therefore a UFD. Then every nonzero a has a unique factorization into primes p1k1p2k2…pnkn.
- So we have R/(m)≅R/AnnR(m)≅R/(p1k1p2k2…pnkn) which factors by the Chinese Remainder Theorem into R/(p1k1)⊕R/(p2k2)⊕…⊕R/(pnkn).
- Therefore Rm is either a free cyclic module, or factors into torsion cyclic modules where the annihilator is a prime power pk.
Thus another way to decompose a finitely generated module over a PID is M≅freeRn⊕torsionR/(p1k1)⊕R/(p2k2)⊕… where these prime power factors piki are known as elementary divisors.
Thus there are two ways to decompose a module according to the structure theorem:
- A hierarchial decomposition: M≅freeRn⊕torsionR/(f1)⊕R/(f2)⊕… where the invariant factors fi are ordered by divisibility: f1∣…∣fr, and are obtained by the Smith normal form.
- Into irreducibles: M≅freeRn⊕torsionR/(p1k1)⊕R/(p2k2)⊕… where the elementary divisors piki are powers of irreducibles pi in the ring R, obtained by factoring each R/(fi)=R/(p1k1p2k2…pnkn) via the Chinese Remainder Theorem.
The product of invariant factors must be equal to the product of the elementary divisors, since these describe the same module.
In this exploration, we discovered how R-matrices can be used to fully describe all the homomorphisms between free R-modules. As a bonus, we also learned how presentation matrices can be used to fully describe any R-module, and further, broke down the structure of finitely generated modules over a PID.
Next time we’ll be applying this structure theorem to characterize completely all the homomorphisms between non-free modules.
-
January 2, 2024.
Exploration 2: Module actions
Questions:
- How can we represent algebraic objects as actions?
Recall that R-matrices are homomorphisms between free modules Rn. How do we visualize homomorphisms between non-free modules M?
Let’s first focus on endomorphisms. Every endomorphism T:M→M must satisfy the homomorphism laws for R-modules M:
- T(m+n)=T(m)+T(n) for all m,n∈M
- T(km)=kT(m) for all k∈R,m∈M
In other words, T is R-linear.
R-linearity allows us to define module actions. Here, since T is R-linear, T respects addition and scalar multiplication and therefore can be described as acting on M. We can express this by extending the R-module M to a R[t]-module, which is an R-module together with an action represented by the indeterminate t. When the action is scalar multiplication, a R[t]-module is just a module over the polynomial ring R[t].
Instead of scalar multiplication by t, define t to be an application of T, and define the action of constant polynomials r∈R to be scalar multiplication by r. Thus each endomorphism T:M→M is associated with an R[t]-module M′ whose action is to apply some linear combination of T, represented by a polynomial in R[t].
By this construction, every endomorphism is associated with a unique R[t]-module, and therefore studying these R[t]-modules is enough to classify the endomorphisms.
In this section, we explore properties of R[t]-modules when R[t] is a PID.
When R[t] is a PID, it’s equivalent to saying that R is a field F. So we’ll refer to R[t] as F[t] in this section.
Importantly, it turns out F[t]-modules are torsion, i.e. it has a nontrivial annihilator AnnF[t](M).
Theorem: Given a F[t]-module M where t acts as an endomorphism M→M, there is some polynomial f∈F[t] whose corresponding endomorphism ∈End(M) is the zero map. In other words, AnnF[t](M) is nontrivial.- Since t acts as T∈End(M), we can construct a ring homomorphism φ:F[t]→End(M) that maps linear combinations of t (∑iaiti) to their corresponding endomorphism (∑iaiTi).
- The kernel of φ represents all polynomials ∈F[t] that map to the zero map in End(M). Thus AnnF[t](M)=ker φ, and it is enough to show that φ has a non-trivial kernel.
- Given that End(M) is finitely generated, and that polynomial rings F[t] are always infinitely generated (by {1,t,t2,…}, φ is a mapping from an infinitely generated set to a finitely generated set, thus cannot be injective.
- Since injectivity corresponds with having a trivial kernel, φ has a non-trivial kernel.
Corollary: Every finitely generated F[t]-module M (where t acts as an endomorphism M→M) is torsion.By definition, a torsion module is one with a non-trivial annihilator, which we proved true earlier.
Note that this result can be obtained via the Chinese remainder theorem which says F[t]/(f⋅g)≅F[t]/(f)⊕F[t]/(g) when f,g coprime. Observe that two different linear factors t−α will always be coprime in F[t], thus every f∈F[t] can be split completely into linear factors and
Theorem: Torsion R-modules M have no nontrivial free submodules.If M is torsion, then it has a nonzero element a∈R such that am=0 for all m∈M. Thus all elements of M can be zeroed by a, and this doesn’t change when you take any submodule of M.
Because we’re working over a PID, we can apply the structure theorem to decompose such a F[t]-module M into a direct sum of free submodules ≅F[t] and cyclic torsion submodules ≅F[t]/(fk) (for an irreducible polynomial f).
M≅free(F[t])n⊕torsionF[t]/(f1k1)⊕F[t]/(f2k2)⊕…⊕F[t]/(fnkn)
But M, being torsion, has no nontrivial free submodules, so there is no free part:
M≅F[t]/(f1k1)⊕F[t]/(f2k2)⊕…⊕F[t]/(fnkn)
In summary, we’ve reduced the problem of classifying endomorphisms to studying the torsion cyclic F[t]-submodules F[t]/(fk) of their corresponding F[t]-module.
In this section, we derive the notion of eigenvalues in module theory.
Given a F-vector space V, consider the endomorphism T:V→V. As before, define a F[t]-module VT as V where the action of t is to apply T. Since VT is a torsion module over a PID, apply the structure theorem to break it into a direct sum of torsion cyclic modules ⨁iF[t]/(fi) where f1∣f2∣…∣fn. Note that each fi generates the annihilator for their corresponding cyclic module F[t]/(fi). In this context where the module M is defined over a polynomial ring F[t], we say that fi is the minimal polynomial for M if it is the lowest degree polynomial that annihilates M.
Theorem: f is the minimal polynomial for a cyclic module F[t]/(f).Every element in the annihilator of F[t]/(f) annihilates F[t]/(f) by definition. Since f is the generator of the annihilator of F[t]/(f), it must divide every element of this annihilator, and therefore is the least-degree element in the annihilator. Thus f is the lowest degree polynomial that annihilates F[t]/(f).
Theorem: Every F[t]-module M (where t acts as an endomorphism M→M) has a minimal polynomial f∈F[t] that generates the annihilator of M.We know that the annihilator of M is the kernel of the homomorphism φ:F[t]→End(M). Since the kernel is an ideal of F[t], a PID, the kernel must be generated by a single element f∈F[t]. Since F is a field, we can choose f to be monic, since if it’s not monic we can multiply f by the inverse of its leading coefficient to get a monic f. This makes f the lowest degree monic polynomial in the annihilator of M.
In R-module M, an eigenvalue λ of an R-module endomorphism T:M→M is one where there exists a nonzero element m∈M such that Tm=λm.
An endomorphism T:M→M is nilpotent if some power of T is equal to the zero map.
Lemma: An R-module endomorphism T:M→M has an eigenvalue λ if T−λI is nilpotent.- If T−λI is nilpotent, then (T−λI)k=0 for some k≥1.
- If k=1, then we have T−λI=0 implying T=λI, which directly shows that λ is an eigenvalue.
- Otherwise if k≥2, being nilpotent means (T−λI)k−1=0. So there is some element m∈M such that n=(T−λI)k−1m=0. But then we have (T−λI)n=(T−λI)[(T−λI)k−1m]=(T−λI)km=0 Since there exists an element n∈M where (T−λI)n=0, this proves Tn=λn thus λ is an eigenvalue of T.
A counterexample for the converse is the identity I, which has eigenvalue λ=1 but is not nilpotent.
Lemma: The eigenvalues of an endomorphism of a direct sum T:M⊕N→M⊕N are exactly the eigenvalues of TM and TN combined (where TM denotes T restricted to M.)It is enough to show that the eigenvalues of TM are eigenvalues of T, since the N case has an identical proof. If TM has an eigenvalue λ, then TMm=λm for some m∈M. Thus in M⊕N, we have T(m,0)=(TMm,TN0)=(λm,0)=λ(m,0) proving that the eigenvalue λ of M is an eigenvalue of M⊕N.
Theorem: For a torsion component F[t]/(f) of a F[t]-module M, the roots λ of its minimal polynomial f correspond to eigenvalues of T (the action of t).- If λ is a root of f, then by the factor theorem, (t−λ) is an irreducible factor of f. Note that in a UFD like F[t], irreducibles and primes are the same thing.
- Since the annihilator is generated by a prime power, and we know that the prime is (t−λ), the annihilator is exactly (t−λ)k for some k≥1. So we’re working with F[t]/(t−λ)k.
- Recall that polynomials in F[t] act as endomorphisms on M. The endomorphism corresponding to this polynomial (t−λ)k should look like (T−λI)k, since we defined the action of F[t] to treat constant polynomials like λ as scalar multiplication.
- Since λ is a root, we have (T−λI)k=0 — that is, T−λI is nilpotent on the given component F[t]/(t−λ)k, implying that λ is an eigenvalue of T when restricted to the component F[t]/(t−λ)k.
- But since M is a direct sum including F[t]/(t−λ)k, and therefore λ is also an eigenvalue of T.
Theorem: The minimal polynomial for a F[t]-module M is exactly the largest invariant factor fn of M.If the decomposition is M≅⨁iF[t]/(fi), then since the roots of each fi are eigenvalues of M, the minimal polynomial must contain a factor (t−λ) for each eigenvalue λ for each component. This means the minimal polynomial is the LCM of all fi. Since the decomposition has f1∣f2∣…∣fn, we can observe that fn is a multiple of every fi, making fn the minimal polynomial of M.
Since the roots of f for each F[t]/(f) correspond to the eigenvalues of T:M→M (the action of t), it is useful to encapsulate all of the eigenvalues (including repeats) as the characteristic polynomial χT∈F[t]: a polynomial with a root λ every time λ appears as an eigenvalue of T. Therefore:
Theorem: The characteristic polynomial for a F[t]-module M is exactly the product of all invariant factors fi of M.This follows directly from knowing that the roots of each fi are eigenvalues of M. Then the product of each fi captures all eigenvalues of M.
Corollary: The characteristic polynomial of an endomorphism T:M→M for a finitely generated F[t]-module M is the product of the minimal polynomials fi of its cyclic components.
There is another way to compute this characteristic polynomial:
Theorem: Over a finitely generated F[t]-module M, the characteristic polynomial of an endomorphism T:M→M is exactly det(T−tI).- Since M is finitely generated, we can construct its presentation matrix A so that M≅F[t]n/im A. The fact that M is equivalent to a quotient of a free module F[t]n means all endomorphisms of M (e.g. T) can be expressed as a F[t]-matrix modulo the quotient im A.
- Now consider the equation Tm=tm, where we treat t∈F[t] as a scalar. If this is true for some nonzero m∈M, then t is an eigenvalue of T by definition.
- Rearranging the above to (T−tI)m=0 reduces the problem of finding eigenvalues of T to finding values of t such that T−tI zeroes some nonzero m∈M.
- Recall that we can define a Smith normal form PDQ−1 for every matrix over a Bézout domain F[t], such as T−tI. This results in a diagonal matrix D=f1f2⋱fn where fi are the invariant factors of T−tI (which may differ from the invariant factors of M.)
- Thus our equation becomes (PDQ−1)m=0, which we can simplify to D(Q−1m)=0. Since Q is invertible, Q−1m is also an arbitrary nonzero element of M, so WLOG we may write Dm=0. Because D is diagonal, we may rewrite this as a system of equations: f1m1f2m2⋮fnmn=0=0=0 where the mi are the components of m. Note that a nonzero m only requires at least one of the mi to be nonzero. That means we can assume mi=0 for all but one of the equations.
- When mi is nonzero, then for the equation fimi=0 to be true, fi must be equal to zero. This is because F[t], being an integral domain, has no zero divisors. In summary, if we can find values λ for t that make any fi zero (i.e. the roots of fi), then Dm=0 is true for some nonzero m∈M, so (T−λI)m=0 is true. So the roots of fi are eigenvalues of T.
- This means one can obtain every eigenvalue of T by finding the roots of the product ∏ifi. So the characteristic polynomial χT is ∏ifi, which is exactly the determinant of T−tI.
The companion matrix of a polynomial f∈F[t] is a matrix constructed so that its characteristic polynomial is exactly f. Given a polynomial tn+an−1tn−1+…+a1t+a0, its companion matrix is tn+an−1tn−1+…+a1t+a0⟺010⋮0001⋮0⋯⋯⋯⋱⋯00001−a0−a1−a2⋮−an−1
Corollary: All F-matrices M have at least one eigenvalue iff F is an algebraically closed field.- For the forward direction: since the companion matrix for every polynomial ∈F[x] (of degree ≥1) has an eigenvalue, every polynomial ∈F[x] (of degree ≥1) has a root in F, therefore F is algebraically closed.
- The backward direction is because in an algebraically closed field, all degree ≥1 polynomials in F[t] have a root in F, including det(M−tI), whose roots correspond to eigenvalues of M.
Theorem: Given a F[t]-module M where t acts as an endomorphism M→M, M is cyclic iff the minimal and characteristic polynomials coincide.- Let M≅⨁iF[t]/(fi) by the structure theorem. Since the minimal polynomial is exactly fn and the characteristic polynomial is exactly ∏ifi, they can only coincide when there is only one invariant factor. But that indicates that the decomposition includes only one cyclic submodule F[t]/(fi), meaning the original module M must be cyclic as well.
- The proof in the other direction is trivial: if M is cyclic M≅F[t]/(f), then both fn and ∏ifi are equal to f.
If a matrix T has eigenvalue λ, then the kernel of T−λI is its corresponding eigenspace. If the image of T is equal to one of its eigenspaces, the map is a multiplication by a scalar λ.
In this section, we display the module action of polynomial rings.
We can express such a linear transformation as the action of a polynomial ring F[x] on the F-module.
Let’s explore the F[t]-modules, where F[t] is a polynomial ring defined over a field F.
For instance, take the F[t]-module F[t]/(t−2)3. Since it’s quotiented by a degree 3 polynomial (t−2)3=t3−6t2+12t−8, the obvious basis is {1,t,t2}. We use the fact that 0=t3−6t2+12t−8 in the quotient to get the constraint T(t2)=t3=6t2−12t+8. This means the F[t]-matrix T (with respect to the basis {1,t,t2}) is one where T(t2)=6t2−12t+8, i.e. we have T100=010,T010=001,T001=8−126 which corresponds to the presentation matrix 0100018−126
So we have found T for the F[t]-module F[t]/(t−2)3.
From the structure theorem, we know that direct sums of F[t]-modules describe all F[t]-modules. Let’s do this again where M is a direct sum.
Take the F[t]-module M≅F[t]/(t−2)⊕F[t]/(t−2)2. The obvious basis is {(1,0),(0,1),(0,t)}. We do the same thing as before, realizing 0=t−2 on the left and 0=(t−2)2=t2−4t+4 on the right, and obtaining the constraints T(1)=t=2 on the left and T(t)=t2=4t−4 on the right. We have: T1 =2 ,T 10= 01,T 01= −44 corresponding to the presentation matrix 201−44=[2]⊕[01−44] where blank entries are zeroes.
So we’ve described how to find a presentation matrix for any F[t]-module. We could reduce this matrix using the operations we can do on presentation matrices, but there is a kind of standard form for a presentation matrix in the case of F[t]-presentation matrices.
The Chinese remainder theorem says F[t]/(f⋅g)≅F[t]/(f)⊕F[t]/(g) when f,g coprime. Consider F[t]/(f). Since distinct linear factors are always coprime, we can always split f to get a bunch of powers of linear factors, one for each repeated root. For instance, F[t]/(t−α)2⊕F[t]/(t−α)3.
For an easy example, consider the F[t]-module M≅F[t]/(t−α)4. To find T, first find some basis for M (as a vector space). A clever basis is {1,(t−α),(t−α)2,(t−α)3}. This conveniently gives us the constraints T(1)T(t−α)T(t−α)2T(t−α)3=t(1)=t(t−α)=t(t−α)2=t(t−α)3=(t−α)4+α(t−α)3=α(t−α)3=(t−α)3+α(t−α)2=(t−α)2+α(t−α)=(t−α)+α(1) Converting this into terms of basis vectors, this is T1000=α000,T0100=1α00,T0010=01α0,T0001=001α which corresponds to the presentation matrix α1α1α1α
Such a matrix is known as a Jordan matrix with eigenvalue α with multiplicity 4. It has α along the diagonal and ones on the superdiagonal, and this is precisely because we chose the basis in such a way that the coefficients in each constraint are 1 and α. Next, here’s a direct sum example. Consider the F[t]-module M≅F[t]/(t−α)4⊕F[t]/(t−α)3⊕F[t]/(t−β)3. We’ll end up with the following presentation matrix, which is composed of matrices that are much alike the previous example: α1α1α1α⊕α1α1α⊕β1β1β
This is a direct sum of Jordan matrices! We call this the Jordan normal form of the presentation matrix, which always exists for R-matrices over a PID R. This is the standard form for a presentation matrix for such R-modules.
In general, given a R-module over a PID R, you may:
- Express the R-module as a direct sum of cyclic modules R/(fi) where fi is the minimal polynomial of that cyclic component.
- Split each R/(fi) into a direct sum of powers of linear factors (like R/(t−α)2⊕R/(t−β)3) via factoring the fi, which gives you the eigenvalues and their multiplicities.
- Construct a direct sum of presentation matrices, which is in Jordan normal form.
All this to find the F[t]-matrix T that defines scalar multiplication by t in a F[t]-module.
-
January 6, 2024.
Exploration 3: Representation theory
Questions:
- TODO
Last exploration, we studied the properties of individual R-matrices. This time we’re studying R-matrices collectively.
First, we define HomR(M,N) as the set of all R-module homomorphisms M→N. When M,N are free R-modules, the elements of HomR(M,N) are R-matrices. Generally this means HomR(M,N) is a noncommutative ring, given that matrix multiplication is not commutative in general.
Let’s start by representing groups using subsets of Hom.
For instance, we can express the group C× (nonzero complex numbers under multiplication) as 2×2 real matrices. We define a map ρ:C×→HomR(R2,R2):
ρ(1)=[1001]ρ(i)=[0−110]ρ(a+bi)=[a−bba]
so that ρ((a+bi)(c+di))=ρ(a+bi)ρ(c+di)=[a−bba][c−ddc]=[ac−bd−ad−bcad+bcac−bd]=ρ((ac−bd)+(ad+bc)i) and ρ((a+bi)−1)=(ρ(a+bi))−1=[a−bba]−1=det([a−bba])adj([a−bba])=a2+b21[ab−ba]=ρ(a2+b2a−bi)
Since it preserves product and inverse, these matrices ( homomorphisms) define a group, isomorphic to C×. This is just one of many encodings of complex numbers into matrices. The key is that i2=−I, and that 1,i don’t interact with each other.
Here we found that HomR(R2,R2) was a suitable subgroup of R-module homomorphisms. In general, to represent a group, we must select the subgroup of HomR(M,N) that consists of invertible R-module homomorphisms. In practice, M and N are free modules Rm and Rn, and invertibility implies m=n. Thus, when representing a group, we can choose among the invertible matrices in HomR(Rn,Rn), also written as Aut(Rn), the group of automorphisms on Rn, whose elements we express as invertible n×n R-matrices.
So we assign to each group element an invertible n×n R-matrix. A matrix R-representation of a group is a group homomorphism G→Aut(Rn).
Theorem: There is a one-to-one correspondence between R-module automorphisms and matrix R-representations of a finite group G.- Every representation G→Aut(Rn) assigns each g∈G to a specific permutation matrix in Aut(Rn).
- But this implicitly defines an action of G on Rn. The action is given by matrix muliplication in Rn, where each g acts by multiplying by its corresponding matrix.
- Conversely, by Cayley’s theorem every group is isomorphic to a permutation group, and therefore can act on R∣G∣ by permuting the ∣G∣ basis R-vectors. But permutations are automorphisms, so this essentially assigns each element g∈G an automorphism over R∣G∣, i.e. constructing a matrix R-representation G→Aut(R∣G∣).
The representation G→Aut(R∣G∣) (mentioned in the above proof) is known as the regular representation of a finite group G. The main idea of this representation is to have G permute the indices of R∣G∣ according to the permutation group isomorphic to G.
In this section, we show the module analog of group actions.
In the same manner that adjoining an indeterminate symbol x to R gives you a polynomial ring R[x], we can adjoin a whole group to R to get a group ring R[G]. Elements look like R-linear combinations of elements of G: g∈G∑rgg
Theorem: Every group ring R[G] is a free module.Since every element in R[G] is by definition uniquely represented as a linear combination of elements g∈G, we know that G spans R[G]. Like in polynomial rings, the only linear combination equal to zero is the one where every coefficient is zero, so the elements of G are linearly independent and therefore form a basis of R[G]. Having a basis means the elements of R[G] can be represented by R-vectors of coefficients, and therefore the group ring R[G] is a free module by construction.
Thus a group ring is simultaneously a group and a free R-module.
Theorem: The group ring R[G] is exactly the regular representation of G over R.Since R[G] is a free R-module, it has a basis whose elements are precisely the elements of the group G. It is a property of groups that any element g∈G permutes the elements of g by left-multiplication. Permuting the elements is an automorphism on G and therefore an automorphism on R∣G∣, thus R[G] is exactly the regular representation G→Aut(R∣G∣).
Therefore: Every group action of G on an R-module M can be encoded by an appropriate group ring R[G].A representation G→Aut(M) assigns an automorphism on the R-module M to every element g∈G.
Since the automorphisms on R-modules are exactly the linear transformations by definition, every g∈G is assigned a linear transformation on G. In other words, the action of g on M must be linear in M.
But R[G] is exactly every linear combinations of G with respect to R, and therefore contains every possible action of G on M.
This means that every action defined for G on some module M can be linearly extended to an action of R[G] on M. Specifically, if the group action g⋅m is defined for all g∈G,m∈M, then we define (∑g∈Grgg)⋅m as ∑g∈Grg(g⋅m). Such a module M is known as a R[G]-module, a module over a group ring.
Theorem: Every group ring R[G] is an R[G]-module.R[G] is a free R-module, and G has a natural group action on R[G] by left-multiplication in the ring. Thus it is a R[G]-module.
To preserve the R[G]-module structure, R[G]-submodules of R[G]-modules must be G-invariant: just as how R[G]-modules are closed under the action of R[G], R[G]-submodules W must be closed under the action of R[G] in the sense that for all g∈R[G],w∈W, gw∈W.
Similarly, homomorphisms σ:M→N between R[G]-modules M,N must be G-equivariant: they must commute with the action of R[G] in the sense that for all g∈R[G],m∈M, gσ(m)=σ(gm).
Theorem: Given finite G, we can always construct a G-equivariant version σ~:M→M of any endomorphism σ:M→M of the R[G]-module M.The trick to get G-equivariance is to take a sum over all actions of G: σ~(m)=g∈G∑g−1σ(gm) which we can do because G is finite. Then we can show that for h∈G: hσ~(m)=hg∈G∑g−1σ(gm)=g∈G∑hg−1σ(gm)=g∈G∑h(gh)−1σ((gh)m)=g∈G∑g−1σ(g(hm))=σ~(hm) since g↦gh simply reorders the sum over G and thus σ~ is G-equivariant.
Theorem: If G is finite and ∣G∣ is a unit in R and M is a free R[G]-module, then every R[G]-submodule W of M implicitly defines a G-equivariant projection M→W.- Since M is free, it has a basis, and so does the submodule W. We can obtain a projection π:M→W by mapping the basis vectors not in W to basis vectors in W.
- Recall that endomorphisms like projections can be made G-equivariant via an averaging trick. To ensure that the resulting endomorphism π~:M→M is still a projection, we need to ensure that it is idempotent and that its image is W. It is enough to construct π~ as a map that fixes elements w∈W (thus ensuring idempotence and that the image is at least W) and maps other elements in M to an element in W (thus ensuring that the image is at most W).
- Given that ∣G∣ is a unit in R, the map obtained by the averaging trick can be modified to ensure the above properties: π~(m)=∣G∣1g∈G∑g−1σ(gm)
- This π~ fixes w∈W: π~(w)=∣G∣1g∈G∑g−1π(gw)=∣G∣1g∈G∑g−1π(w′)=∣G∣1g∈G∑g−1w′=∣G∣1g∈G∑w=∣G∣1∣G∣w=w since W is G-invariant since π is a projection onto W since w′=gw and maps all m∈M to an element in W, since ⟹⟹⟹π(gm)∈Wg−1π(gm)∈W∣G∣1g∈G∑g−1σ(gm)∈Wπ~(m)∈W by definition of π since W is G-invariant by linearity of the group action
- Thus π~ as defined is a G-equivariant projection onto W.
In this section, we consider representations as algebraic structures in their own right.
Note that for our representation for C× in the beginning, each element of G is assigned a distinct element of Aut(Rn), i.e. the representation is injective. When the representation is injective (i.e. the domain is isomorphic to its image), we call it a faithful representation, and they are the typical ones where each element of G is represented by a distinct matrix in Aut(Rn).
There are also non-faithful representations. An example of a very not faithful representation is the trivial representation, which represents every element as the zero element in the zero module. Another is the sign representation: Sn→Aut(Z), which simply takes the sign (1 or −1) of every permutation in Sn. Another example is det∘ρ, where ρ is some matrix representation. In general, a representation is a group homomorphism G→Aut(M) (where M is some R-module).
Theorem: A R[G]-module M fully describes a representation G→Aut(M).This result comes naturally from the fact that a group ring R[G] encodes all possible linear group actions and therefore its action on a module M models every possible automorphism on the R-modules M, since automorphisms on R-modules are linear by definition.
Theorem: A faithful representation ρ:G→Aut(M) is one where no g∈G acts as the identity action on V except for the identity element e∈G.A faithful representation is injective, i.e. its domain G is isomorphic to its image Aut(M). This means that only one element g∈G maps to the identity automorphism, so only one element g∈G acts as identity on M.
Theorem: A faithful representation ρ, when represented as a R[G]-module M, is one where each x∈R[G] has a unique action on M.If two elements x,y∈R[G] have the same action on M, then x−y must be the zero action (sending all elements of M to zero). But x−y=0 implies x=y.
The regular representation ρ:G→Aut(R∣G∣) (or equivalently, ρ=R[G]) represents elements of G as automorphisms on the free R-module R∣G∣ (whose elements are R-vectors). It is faithful, since it is composed of distinct permutations of G, so there is always a faithful matrix representation of any group G. Can we do better? Ideally, we want to represent elements of a group G using R-vectors that are perhaps smaller than ∣G∣, without affecting the faithfulness of ρ.
By using the above fact that any representation ρ is equivalent to some R[G]-module M, we can take subrepresentations of ρ as R[G]-submodules of M. To begin, given a representation ρ:G→Aut(M), we can think about quotienting the corresponding R[G]-module M by one of its R[G]-submodules. However, quotienting might create a representation that isn’t faithful. When does quotienting the underlying module M of a faithful representation ρ:G→Aut(M) preserve faithfulness?
Therefore: Faithful representations are exactly those where the group action is injective.Recall that preserving the group action means that every element of the group is identified with a distinct action. In particular, that means there is only one element that behaves like the identity action: the identity element e∈G. But if the representation ρ:G→Aut(Rn) maps only e to the identity in Aut(Rn), then it is injective, i.e. faithful.
Therefore, quotienting M preserves faithfulness of ρ exactly when the quotient M/W preserves the group action on M. In other words, the submodule W must be G-invariant: it must remain unchanged under the group action of G. But since R[G]-submodules are G-invariant by definition, quotienting by an R[G]-submodule always preserves faithfulness.
As it turns out, by quotienting repeatedly by different R[G]-submodules, we can “factor” M into a direct sum of submodules known as irreducible representations, or irreps.
Theorem: For a finite group G, if W is a G-invariant submodule of the R[G]-module M, then M/W is another G-invariant submodule with M≅M/W⊕W, assuming char R∤∣G∣.- Recall that if a projection exists on
M,
then
M
is isomorphic to
ker σ⊕im σ.
So it is enough to define a projection
σ:M→M
where
ker σ≅M/W
and
im σ=W.
Let’s see what those conditions imply.
- To ensure that ker σ≅M/W, we need to ensure that every element in ker σ differs by some element in W.
- The requirement im σ=W implies that σ maps M to the G-invariant submodule W of M. Thus we need σ to be G-equivariant (gσ(m)=σ(gm)) so that it preserves G-invariance.
- Finally, to be a projection, σ must be idempotent.
- We’ll start with the requirement that σ is G-equivariant. In order to get G-equivariance, one trick is to take the sum of all products with g: σ~(m)=g∈G∑gm Then we can show that for h∈G: hσ~(m)=hg∈G∑gm=g∈G∑hgm=g∈G∑gm=σ~(hm) by linearity in R[G]-modules since g↦hg is an automorphism on G where ∑g∈Ghgm=∑g∈Ggm because the act of multiplying every element of the sum by h just permutes the order of the sum. Therefore, σ~ commutes with the action of G on M, and is therefore G-equivariant.
- Next up is ensuring that the elements of ker σ differ by elements of W. One way is to send every gm to W via some projection π:M→W. We redefine σ~: σ~(m)=g∈G∑π(gm) Then σ~ is a linear combination of elements of W. This means that im σ~ is a subset of W, and we can show that two elements of ker σ~ differ by π(gm)−π(gn)∈W: σ~(m)−σ~(n)=g∈G∑π(gm)−g∈G∑π(gn)=g∈G∑π(gm)−π(gn)
- Finally, σ must be a projection. Therefore, it must be idempotent, and every element of W must be mapped to. Without losing the previous properties, we can have σ(w)=w for every w∈W by taking the inverse action of G, and dividing by ∣G∣ (which works since char R∤∣G∣): σ(m)=∣G∣1g∈G∑g−1π(gm) because σ(w)=∣G∣1g∈G∑g−1π(gw)=∣G∣1g∈G∑g−1π(w′)=∣G∣1g∈G∑g−1w′=∣G∣1g∈G∑w=∣G∣1∣G∣w=w since W is G-invariant since π is a projection onto W since w′=gw Therefore σ(w)=w for all w∈W, meaning W⊆im σ. By definition, σ(m) remains a linear combination of elements of W, and therefore im σ⊆W. Therefore im σ=W.
- Finally, since σ is an idempotent endomorphism, we know that M decomposes into ker σ⊕im σ. Since ker σ≅M/W and im σ=W by construction, we have M≅M/W⊕W.
Maschke’s Theorem: Every representation ρ of a finite group G (over a field with characteristic not dividing ∣G∣) is a direct sum of irreducible representations.- If the R[G]-module M corresponding to the representation ρ doesn’t reduce into irreps, there is a proper G-invariant submodule W of M.
- By the above theorem, which we can use since char R∤∣G∣, there is some complementary G-invariant submodule W′ such that M≅W⊕W′.
- By recursively decomposing proper G-invariant submodules W of each of the factors, you get smaller and smaller submodules. Since G is finite, this process eventually stops when you are left with a direct sum of irreps that is isomorphic to the given representation ρ.
A R[G] module that can be written as a direct sum of simple modules is semisimple. Hence:
Corollary: A R[G]-module is semisimple if G is finite and char R∤∣G∣.
In this section, we try to find irreps of any matrix representation.
Now that we know by Maschke’s theorem that we can factor any finite group representation into irreps (assuming char R∤∣G∣), let’s do so for matrix representations (which are always finite).
A matrix representation G→Aut(Rn) is special compared to more general representations G→Aut(M) because it implies that the module M=Rn is a free module. This simplifies finding irreps considerably.
In this section, we find an easier way to discover the eigenvalues in an algebraically closed field.
Find the eigenvalues of a given R-matrix, where R is a PID.
I know that the characteristic equation is used to find eigenvalues in vector spaces, which are defined over fields. What happens if we move to the world of modules defined over integral domains?
-
February 27, 2024.
Exploration 4: Character theory
Questions:
- TODO
There are two problems when it comes to working with matrix representations:
- You have to work with (possibly huge) matrices
- They implicitly require a basis, which seems arbitrary
Let’s start by defining what it means for two (general) representations ρ:G→Aut(M) and ρ′:G→Aut(M′) to be the same representation. Say that these two representations are equivalent if there is an isomorphism ϕ between M and M′ that preserves the group action, i.e. ϕ∘ρ(g)=ρ′(g)∘ϕ for all g∈G, i.e. the following diagram commutes:
M↓ϕM′ρ(g)ρ′(g)M↓ϕM′ where ϕ:M→M′ is an isomorphism
Let’s explore what happens in the case of matrix representations. When M,M′ are free modules Rn, then ρ,ρ′ become matrix representations, and thus Aut(Rn) is composed of n×n matrices. Then the problem of checking whether M,M′ are isomorphic comes down to comparing properties of the representative matrices.
Theorem: Conjugate elements g,h in G are represented by similar matrices A,B in any matrix representation ρ:G→Aut(Rn).- g,h being conjugate elements means kgk−1=h for some k∈G. Applying ρ gives CAC−1=B, which is the same as saying A and B are similar.
Corollary: Conjugacy classes in G are represented by similarity classes in Aut(Rn).
Corollary: Equivalent matrix representations differ only in which similarity class they assign to each conjugacy class.
In this section, we explore some facts about class functions.
Since representations of elements of G in the same conjugacy class are similar matrices, we’d like to classify the similarity classes of matrices by defining a function on matrices that is unchanged under conjugation. So we’re interested in class functions on G, functions G→R that are invariant under conjugation in G.
Theorem: Given R a commutative ring, the class functions G→R form an R-module.- Most of this comes from
R
itself. Given
f1,f2:G→R
are two arbitrary class functions on
G,
we can show that the class functions form an additive abelian group:
- Closure: (f1+f2)(g)=f1(g)+f2(g)
- Identity: 0(g)=0
- Inverse: (−f1)(g)=−f1(g)
- Commutativity: inherited from R
- Associativity: inherited from R
- To show that they form an R-module, we have closure under scalar multiplication: (rf)(g)=r⋅f(g)
Corollary: Given G a finite group and R a commutative ring, the class functions G→R form a free R-module.- This is the same as the previous theorem, but now we need to prove that class functions on finite groups form a free R-module. This requires showing a basis.
- The standard basis (in the form of indicator functions) will do. Since G is finite, index the conjugacy classes from 1 to n, and let ui be the class function whose value is 1 on elements from the ith class and 0 otherwise.
- Then the ui form a basis, meaning that the R-module of class functions is free.
Ideally we can decompose every class function G→R into a matrix representation G→Aut(Rn) followed by a linear class function Aut(Rn)→R. This works out nicely because of the following theorem:
Theorem: For matrices M over an integral domain R, the trace tr(M) is the unique nontrivial linear class function, up to scalar multiplication.- Invariance under conjugation means t(A)=t(BAB−1). Due to linearity of t, this is equivalent to t(AB)=t(BA).
- We decompose the input matrix M into its matrix units Eij, where Eij has a one on the ith row and jth column and zero elsewhere. Then t(AB)=t(BA) implies t(EijEkl)=t(EklEij).
- Assume that both sides are nonzero, since otherwise the integral domain lets us factor out t(I)=0 which implies t(anything)=0 by linearity, and we’re not interested in trivial t.
- Since t(0)=0 by linearity, t(M) is only nonzero when M is nonzero. Over an integral domain, this means EijEkl and EklEij must be nonzero as well.
- But EijEkl is only nonzero when j=k, and EklEij is only nonzero when i=l. Then EijEkl=Eil=Eii and EklEij=Ekj=Ejj must be nonzero. Further, we have t(Eii)=t(Ejj) for all i,j implying that all t(Eii) is a constant λ.
- Therefore t(Eij)=λ if i=j and is zero otherwise. Then we can decompose the original matrix and apply linearity of t to get: t(M)=t(i,j∑mijEij)=i,j∑mijt(Eij)=i∑miit(Eii)=λi∑mii by linearity of t since t(Eij)=0 for i=j since t(Eij)=λ for i=j
- The trace of a matrix tr(M) is exactly ∑imii, therefore t(M)=λtr(M), and so t must be some scalar multiple of the trace operator.
That is, the only linear class function is the trace, and therefore we can decompose every class function G→R into a matrix representation ρ:G→Aut(Rn) followed by the trace (multiplied by some scalar).
In this section, we discover how to classify equivalent matrix representations.
Given a matrix representation ρ:G→Aut(Rn), define the character of ρ as χρ=tr∘ρ, essentially taking the trace of each representative matrix for g∈G. By construction, characters are one-dimensional representations G→R that are invariant under conjugation in G. Because equivalent matrix representations only differ by conjugation in G, representations that are invariant under conjugation are perfect for classifying matrix representations. In particular, each distinct value that χρ(g) takes on is representative of a distinct conjugacy class in G.
Assume we’re in an integral domain R whose characteristic is 0, so that char R∤∣G∣ is true for all finite G. Then Maschke’s theorem lets us describe any group representation in terms of its irreps, and so we shift our focus of study towards irreps. Characters of irreps are known as irreducible characters.
Characters have a number of other properties in an integral domain, and much more in a field. Let’s go over the integral domain properties first.
- χ(e)
is the degree of
χ,
which turns out to be equal to the dimension of the underlying
representation.
Theorem: χ(e) gives the dimension n of its underlying representation in Aut(Rn).
Since the identity element in G is always represented by the identity matrix in Aut(Rn), taking the trace simply adds the n ones along the diagonal.
- TODO
In this section, we view a strategy for classifying group representations over fields.
Recall Schur’s lemma:
Theorem: Any homomorphism σ:M→N between simple modules M and N is either trivial or an isomorphism.
A corollary (often also called Schur’s lemma) appears when R is actually an algebraically closed field F:
Schur’s Lemma: If the field F is algebraically closed, then any finite irrep ρ:G→Aut(F) must be a multiplication by a scalar.- Every automorphism ϕ∈Aut(F) over a vector field is a matrix, and the characteristic polynomial of every matrix over an algebraically closed field F has at least one root λ.
- Since λ is an eigenvalue it is associated with an eigenspace, a subspace of F. Since ρ is an irrep, F is simple, and by our previous form for Schur’s Lemma this means ϕ is either trivial or an isomorphism.
- It’s not possible for ϕ to be trivial since we assume ϕ being an automorphism is not the zero map. So the image of ϕ must be all of F, implying that ϕ−λI is the zero map, thus ϕ is multiplication by a scalar.
- Since ϕ is an automorphism by definition, it must be an isomorphism and not trivial. Since ϕ(x)=λx, this means ϕ=λI, which is multiplication by a scalar.
Character theory is most often used over the field of complex numbers C. This is mostly because C is an algebraically closed field, which gives us the above.
TODO either expand on character theory theorems like orthogonality in C OR go back and highlight the importance of
TODO: three proofs that link together to get schur’s lemma
- In an algebraically closed field, every matrix has eigenvalue
- if matrix has eigenvalue, its image is an eigenspace (by defn)
- if image of linear map is an eigenspace, the map is a multiplication by scalar
Let ϕ : ρ_1 → ρ_2 be a nonzero homomorphism.
prove it in 2 parts:
If ρ_2 is irreducible then ϕ is surjective. If ρ_1 is irreducible then ϕ is injective.
The inner product ⟨,⟩ is a function of two arguments Rn×Rn→R that satisfy the following:
Conjugate Symmetry Linearity in the first argument Positive-definiteness
Irreducible characters are orthogonal
Orthogonal means ⟨φ,ψ⟩=1 if they’re the same representation up to isomorphism, and =0 otherwise.
Irreducible characters are orthogonal, proof is in lecture 22
We call this property character orthogonality, as in “by character orthogonality…”
Computing character tables over ℂ
A character table has irreducible characters χ as rows and conjugacy classes of G as columns. Then each entry is χ((g)) where g is in that conjugacy class of G. For example, let ζ = a primitive third root of unity. Then use the degree 1 irreducible χ⁽ᵏ⁾ = g ↦ ζᵏ. Then the cyclic group C₃ = ⟨g⟩ has the character table:
1 1 1 ← size of the conjugacy class of C₃ (1) (g) (g²) ← the conjugacy classes of C₃ ------------ χ⁽⁰⁾ | 1 1 1 ← trivial representation g ↦ [1] χ⁽¹⁾ | 1 ζ ζ² ← dim 1 representation g ↦ [ζ] χ⁽²⁾ | 1 ζ² ζ ← dim 1 representation g ↦ [ζ²]
If we define χ⁽ᵏ⁾ = g ↦ iᵏ (a dim 1 representation), then the cyclic group C₄ = ⟨g⟩ has the character table:
1 1 1 1 ← size of the conjugacy class of C₃ (1) (g) (g²)(g³) ← the conjugacy classes of C₃ ---------------- χ⁽⁰⁾ | 1 1 1 1 ← trivial representation g ↦ [1] χ⁽¹⁾ | 1 i -1 -i ← dim 1 representation g ↦ [ζ] χ⁽²⁾ | 1 -1 1 -1 ← dim 1 representation g ↦ [ζ²] χ⁽³⁾ | 1 -i -1 i ← dim 1 representation g ↦ [ζ³]
The symmetric group S₃ = ⟨(1 2),(1 2 3)⟩ has the character table:
1 1 1 ← size of the conjugacy class of C₃ (1) (12) (123) ← the conjugacy classes of C₃ --------------- 1 | 1 1 1 ← trivial representation π ↦ [1] χˢ | 1 -1 1 ← dim 1 sign representation π ↦ sign(π) χᵁ | 2 0 -1 ← dim 2 representation π ↦ (number of fixed points of π) - 1
The dihedral group D₄ = ⟨r,s | r⁴=s²=e, srs = r⁻¹⟩ has the character table:
1 1 1 1 1 ← size of the conjugacy class of C₃ {e} {r²} {r,r³} {s,sr²} {sr,sr³} ← the conjugacy classes of C₃ --------------------------------- χ₁ | 1 1 1 1 1 ← trivial representation g ↦ [1] χ₂ | 1 1 -1 1 -1 ← dim 1 representation g ↦ ? χ₃ | 1 1 1 -1 -1 ← dim 1 representation g ↦ -1 if reflection χ₄ | 1 1 -1 -1 1 ← dim 1 representation g ↦ ? χ₅ | 2 -2 0 0 0 ← dim 2 representation g ↦ ?
Things to note:
- The first row is all 1s (since [1] has trace 1)
- You can compute the last row using the rest of the rows
- The first column is always the dimension of the representation. This is because the representation is an identity matrix, which has (dim ) ones.
- A linear (degree 1) character describes a homomorphism : G → Cˣ. We can always find such homomorphisms by taking the abelianization Gᵃᵇ = G/[G,G] where [G,G] is the commutator subgroup of all xyx⁻¹y⁻¹.
- Row self-product: Self-inner product of a row with itself is always |G|, but you need to count differently. (If a conjugacy class has 3 elements, count that entry three times.)
- Column self-product: Self-inner product of a column with itself is always |C|² where |C| is the size of the conjugacy class.
- Row orthogonality: Inner product of two different rows is always zero, but you need to count differently. (If a conjugacy class has 3 elements, count that entry three times.)
- Column orthogonality: Inner product of two different columns is always zero. (Use this to compute the last row.)
- Magic formula: sum of squares of degrees of the irreducible characters equals |G|. (Use this to complete the first column.)
This website gives all the character tables for all groups with at most 10 irreps: https://people.maths.bris.ac.uk/~matyd/GroupNames/characters.html
TODO
Recall that a field F is algebraically closed iff every nonconstant polynomial in F[x] has a root in F.
Theorem: Any linear operator has at least one eigenvalue in an algebraically closed field F.- When we do a simple extension by an element F(α), we adjoin the element α to F and take the field of fractions. By definition, this is the smallest field containing F and α.
- Then F(α)(β) is the smallest field containing F(α) and β, i.e. it is the smallest field containing F, α, and β.
- Likewise, F(β)(α) is the smallest field containing F(β) and α, i.e. it is the smallest field containing F, β, and α.
- Therefore these are the exact same field.
Characters completely determine representations up to isomorphism
The original motivation for characters is to identify representations without regard to a basis or its matrix at all. There is also an amazing theorem that:
Theorem: ⟨χ,χ’⟩ = (1/N)∑_g (̅χ(g))χ’(g), which is just a normalized dot product in the complex vector space.
Corollary: In particular, ⟨χ,χ⟩=1 iff χ is irreducible. (This must be equal to ∑xᵢ², which can only be the dot product of a elementary basis vector with itself. Therefore irreducible)
Theorem: TODO- TODO
Cayley-Hamilton Theorem: Every (square) R-matrix (for a commutative ring R) satisfies its own characteristic equationTODO
Characteristic polynomials
Every n×n matrix A over F has a characteristic polynomial χA equal to det(xI−A).
Motivation: To find the eigenvectors. χA(λ)=0 iff λ is an eigenvalue of A. So just factor χA to get the eigenvalues.
Minimal polynomials
Recall you can evaluate a polynomial m at a matrix A: m(A).
Every n×n matrix A over F has a unique minimal polynomial mA that is monic, lowest degree, and mA(A)=0.
Motivation: the roots of mA are the eigenvalues. Proof: WTS mA(λ)=0 iff λ is an eigenvalue of A. Since mA is lowest degree, there should be no other factors in mA other than the eigenvalues, if this is true. (->) mA∣χA, so χ(λ)=0, which means λ is an eigenvalue of A. (<-) We know mA(A)=0 so we know mA(λ)⋅v=0. If you factor mA in some field where it splits into x−μi, then we must have (x−μi)⋅v=0 for some μi.
Another motivation: to detect diagonalizability of A. If mA factors into distinct factors it’s diagonalizable. Proof involves finding the Jordan block matrix conjugate to A…
Given f is the minimal polynomial of α over F, we can make an isomorphism: φ:F[x]/(f)≅F(α) as φ=g+(f)↦g(α)
Conjugation doesn’t change these polynomials
Recall a conjugate matrix of A is a matrix PAP−1 where P is any matrix of the same size in the same field as A. Note:
- f(PAP−1)=f(A)
- det(tI−PAP−1)=det(xPIP−1−PAP−1)=det(P(xI−A)P−1)=det(xI−A)
So conjugation doesn’t change either of these polynomials.
Cayley-Hamilton theorem: the characteristic polynomial χ_A of T is det(tI-T)=f₁f₂…fᵣ ∈ F[t]. Then χ_T(T) = O.
Character theory: - We care about the conjugacy classes of groups, since they help us find the irreducible characters of groups