Exploration 1: Modules and vector spaces
December 27, 2023.Questions:
- How do we use matrices to represent mathematical objects?
This entire exploration is actually going to be about matrices. Specifically, we’re going to talk about -matrices, which are matrices with entries in a ring . It turns out the study of -matrices subsumes the study of modules, which we will see by the end of this exploration.
What are modules? It is easiest to start with free modules , which are additive abelian groups of -vectors, which are row or column vectors whose entries are in , where addition and scalar multiplication are defined in the usual way:
These should be familiar from linear algebra, only that the entries of -vectors are elements in the ring . In this context, elements of are called scalars and are not in themselves — only -vectors are in . Since is composed of -vectors, it is straightforward to describe -matrices as maps , like in linear algebra: where the LHS vector is in and the RHS vector is in .
Every element in the ring can be treated as a scalar or a (one-dimensional) -vector, so scalar multiplication is given by multiplication in .
This theorem introduces some ambiguity: writing could either mean the ring , or the one-dimensional module over . For clarity, here we write when we mean the ring, and we always write -modules using the letters or or .
In this section, we describe -modules beyond free modules.
-modules are more general than free modules , however. They are additive abelian groups of elements (not necessarily -vectors!), where elements come into play by defining a scalar multiplication governed by the following laws:
- Scalar distributivity:
- Vector distributivity:
- Scalar associativity:
- Scalar unit law:
- () Every -module is an additive abelian group by definition.
- () If you have an additive abelian group, you can define scalar multiplication as the sum of copies of . This satisfies the laws above, so we obtain a -module.
Like with groups, rings, and fields, we can define a couple concepts for -modules:
- A submodule of an -module is the same as subgroup/subring/subfield – it’s just an -module that is a subset of another -module. The zero module is always a submodule of , and so is itself.
- The quotient module is always defined – there is no need for to be a special kind of module the same way you need ideals for quotient rings and normal subgroups for quotient groups. This is because we can always eliminate the elements generating from the elements generating without breaking any of the module laws.
- The direct product of two -modules is called the direct sum . It works identically to the direct product (elements are pairs with and ) except we write instead of , referring to the additive nature of -modules.
In this section, we describe how to construct -modules.
Just like how groups and rings can be finitely generated, we can construct an -module out of a generating set of elements , known as a spanning set for . We can write , and say that consists of -linear combinations of its spanning set , i.e. is the set where the coefficients are elements of , and each sum is finite (though the spanning set can be infinite).
No matter the spanning set, it is always possible to generate the zero element of the -module — just take the -linear combination where all the coefficients are zero. Of interest is whether this is the only way to generate the zero element. If there is another way to generate the zero element, i.e. using a nonzero coefficient somewhere, then we say that the spanning set is linearly dependent. Otherwise, there is only one way to generate the zero vector, and we have a linearly independent spanning set, also known as a basis.
If we consider spanning sets as a map from a -tuple of coefficients in to elements of the -module, then a spanning set that is a basis is interesting. This is because all spanning sets are trivially surjective (the spanning set generates the module), and if the spanning set is a basis, then there is only one way to generate the zero vector — thus the kernel is trivial, thus the map is injective and bijective. This bijection between -tuples of coefficients and elements of an -module means that elements of the -module are essentially -tuples of coefficients, also known as -vectors. Sound familiar?
- Every spanning set gives rise to a surjective map from -tuples of coefficients to elements of the module.
- If the spanning set is a basis, then only the zero tuple maps to the zero element, i.e. the map has a trivial kernel and is injective. Therefore a basis represents a bijection between -tuples of coefficients, i.e. -vectors, and elements of the module:
- Thus the elements of the -module can be expressed as -vectors, which is the definition of a free module .
For a free module , the standard basis is defined as the columns of the identity matrix :
Linear algebra is exactly when is a field . In fact, -modules are exactly -vector spaces. We will draw parallels to linear algebra extensively throughout this exploration.
- Let be an arbitrary spanning set for . If this spanning set is linearly independent, it is a basis and we are done. Otherwise, assume that is linearly dependent.
- If is linearly dependent, then zero can be represented by some -linear combination with a nonzero coefficient. WLOG let be one of the nonzero coefficients.
- Since we’re working in a field , has a multiplicative inverse . Then the corresponding can be written as a -linear combination of the other elements:
- Since every can be represented as a -linear combination of the others, its contribution to the spanning set is redundant. We can remove from the spanning set because and generate the same module.
- If the resulting spanning set is not a basis, we can repeat this process to further decrease the size of the spanning set. This process terminates either at a basis or at the empty set, which is trivially a basis (for the zero module). Thus the original spanning set contains a basis for .
In this section, we start exploring -matrices for real.
At the beginning we mentioned that this entire exploration is going to be about studying -matrices. One big reason is because -matrices are homomorphisms of free -modules. For instance: is a -matrix. Left multiplication by corresponds to a homomorphism . is a -matrix. Left multiplication by corresponds to a homomorphism . Typically we just refer to the -matrix itself as the homomorphism, falling back to when we’re talking about homomorphisms in the abstract.
Our first question is, which -matrices correspond to isomorphisms?
This translates to the question: when are -matrices invertible? This requires the concept of a determinant, which should be familiar from linear algebra. For (square) -matrices, the determinant is the unique function that is:
- Multilinear: linear in each row and column.
- Alternating: Swapping two rows or columns swaps the sign, which also implies that two identical rows/columns makes the function zero (since only is invariant under swapping those rows/columns).
- Equal to for the identity matrix .
This unique function is , which:
- Is multilinear: each term of the sum is a product of matrix entries of in each column .
- Is alternating: since a property of is that it switches sign when you swap two elements in the permutation (i.e. you swap two rows or columns).
- Is for identity: because the summand is only nonzero for the identity permutation where .
Although the above shows that the given function is a determinant, the proof that it is the unique function that is the determinant requires some more advanced tools we’ll introduce later. Let’s first determine when an square -matrix is invertible.
Recall the definition of the determinant from above: The determinant can be defined recursively. First, we factor a single out of the product: Now note that since the sum goes over every permutation , the we just took out gets permuted to every value from to , times each (since there are permutations in that takes to any given value). This means appears times in the sum, so we can move out as iff there are permutations for which . But this is true since there are permutations of the remaining elements as well. Note that the inner sum and product is very close to the formula for the determinant of a matrix that has the row removed (since is excluded from the product) and the column removed (since is defined to send to , meaning the column only appears in the form , but the row is excluded so it never appears.)
Define the minor as the matrix with the row and column removed. Then we have This is almost the determinant function as defined above, except we’re relying on instead of the actual size of the matrix . To correct this, we can take permutations and use the entries of the minor instead: Note that we’re still mentioning , which we define as the permutation in corresponding to where . To convert to , let’s move the th row to the top of the matrix, which is transpositions, multiplying the determinant by due to the alternating property of the determinant. Similarly, we multiply the determinant by to represent moving the th column to the left of the matrix. This ensures that all the rows and columns of the minor are contiguous, so the permutations in use the correct indices. Overall, we’re multiplying by a total of to get , and thus: which matches our original definition of determinant for the minor . The value is known as the cofactor of the matrix .
Finally, we can plug this back into our original expression for to get: for some fixed row index .
The cofactor expansion can be used to directly define the inverse of an -matrix.
Recall the cofactor expansion: Notice how similar it is to matrix multiplication: In fact, let be the adjugate matrix of where each entry is equal to (note the swapped indices). Then we have exactly The reason that is zero when is because if you copy the values in row to row , resulting in a matrix , the sum becomes where is unchanged since it is calculated on a minor with the row removed. But since has two identical rows, then by the alternating property of determinants, . Thus the sum is zero.
So we have proved the following: Note that if is a unit in the ring , then we have implying that is the right inverse of . A similar argument shows that is the left inverse of as well.
Since you can only divide by units in a ring, and the inverse matrix is exactly the adjugate divided by the determinant, the determinant must be a unit in order to define the inverse matrix.
This should be familiar from linear algebra, where you have -matrices invertible iff its determinant is nonzero. Of course, this is exactly because the only nonunit in a field is zero. For an example that isn’t a field, note that the units of are . Thus a -matrix is invertible iff its determinant is .
For us, invertible matrices are significant because, being reversible, their corresponding homomorphism is an isomorphism. More importantly, this means multiplying an -matrix by an invertible matrix is the same as composing an isomorphism to the -module homomorphism corresponding to . Right-multiplying by is precomposition by , and left-multiplying by is postcomposition by . As we know, isomorphisms are essentially renamings, so composing a homomorphism by isomorphisms gives you essentially the same homomorphism.
So we have proved the existence and a formula for the inverse of an -matrix. To build on this, let’s introduce how they let us manipulate -matrices for the rest of this exploration.
We introduce the three row and column operations.
First, we may swap two rows of a matrix by left-multiplying by a suitable elementary matrix.
Second, we may multiply any row by a unit (here, assume is a unit) by left-multiplying by another kind of elementary matrix.
Third, we may add a multiple of any row to another row, again by left-multiplying by a third kind of elementary matrix.
Column operations use the same (transposed) elementary matrices, except you right-multiply instead of left-multiply.
Obviously these operations can be undone by doing the operation again – you can un-swap by swapping again, un-scale by scaling by the inverse unit, and un-add the row by adding . This implies that the elementary matrices are invertible. In fact, the inverse of an elementary matrix represents the inverse operation.
When you apply row and column operations to a matrix, you’re essentially doing a clever renaming of the domain (column operations) and codomain (row operations). The resulting matrix represents the same homomorphism but on the renamed domain/codomain.
How do the row and column operations affect the determinant? We can give a short proof of each:
This is exactly the alternating property of determinants.
This follows immediately from the multilinear property of determinants.
If row becomes row , then by multilinearity, But the last matrix has two identical rows and , so by the alternating property its determinant is zero. Therefore the resulting matrix has determinant equal to that of the original matrix.
Follows immediately from the above three theorems, which show that the three row and column operations multiply the determinant by respectively ( a unit).
Using the formula for determinant for , we have Summing over the index from to can be done in any order since addition is commutative. Thus we can add them in an arbitrary order. Since we’re multiplying together a sum , by the distributive property the result is essentially adding together all the ways of picking one term from each sum and multiplying them together, just like how . This is the same as adding together all the ways of picking one for each , so we can express this as the sum of using all permutations taking to : We can reorder the last product by applying to the indices : Such a reordering is equivalent to multiplying by the signature of :
In this section, we show how row and column operations can be used to simplify a matrix.
Using row and column operations, you can always reduce an -matrix down into the block form , where is a diagonal -matrix, i.e. all its nonzero entries are on the main diagonal.
There is a way to get into the form such that the diagonal elements divide each order: . This form is called Smith normal form. Any -matrix can be reduced to Smith normal form, provided is a Bézout domain, which is an integral domain in which every sum of principal ideals is principal. For instance, in PIDs every ideal is principal, so PIDs are Bézout domains.
- The fact that every sum of principal ideals is principal
implies two things:
- First, since contains each of , we know that is a common divisor of .
- Second, represents all the linear combinations of . Since a common divisor of must divide all linear combinations of those elements , every common divisor must divide in particular.
- From this, we can conclude two things:
- Since every common divisor divides , it is in fact the greatest common divisor .
- is some linear combination of , i.e. we have Bézout’s identity for some .
Method: To put a matrix into Smith normal form, choose to be the GCD of all its entries. Then use row/column operations to isolate in the upper left such that its row and column are zero except for , getting a block matrix where is the minor where every entry is divisible by because of how we picked . Because of Bézout’s identity, it’s always possible to isolate the GCD using row and column operations this way. Repeating the process for the minor gives you Smith normal form. Here’s a short example:
Taking the Smith normal form of a matrix can be written , where are the elementary matrices used to reduce to the diagonal matrix . The entries along the diagonal of are called the invariant factors of , because you will always arrive at these factors in the Smith normal form regardless of how you do the row and column operations.
At each step, the invariant factor is chosen as the GCD of the remaining elements, and since GCDs are unique up to units, the invariant factors are unique up to units.
In the definition of determinant, you can see that unless for each , the product will include a zero term that makes the whole product zero. This is because for is zero in a diagonal matrix. Thus the only that results in a nonzero term is the identity permutation, and so for diagonal matrices the formula simplifies down to
This follows immediately from the fact that the determinant of a matrix equals (up to units) the determinant of its Smith normal form, which is equal to the product of the diagonal entries, which are the invariant factors.
If is invertible, then its determinant is a unit, which can only be a product of units. But since the determinant is the product of invariant factors up to units, that’s the same as saying every invariant factor is a unit, and therefore associate to .
Corollary: The Smith normal form of an invertible matrix is the identity matrix.
For matrices over a Bézout domain, the Smith normal form is obtained via row and column operations, a process that only factors out corresponding elementary matrices. By the previous theorem, the resulting Smith normal form of an invertible matrix is the identity matrix, which is also an elementary matrix. Thus the original matrix factors into elementary matrices.
In this section, we abstractly represent relationships between -modules via -matrices.
We already know that a -matrix defines a homomorphism . Let’s bring back our example matrix and explore some properties of this homomorphism: First, the columns of an -matrix generate its image (also known as column space), which is written For instance with above, Second, the solutions to comprise the kernel (also known as null space) of . The kernel of a diagonal matrix is relatively straightforward. The equation is equivalent to the system of equations In an integral domain, is zero when , and can take on any value when . This means that the kernel of a diagonal matrix is where are the indices where . In particular, if there are no zeroes along the main diagonal of a diagonal matrix, then its kernel is trivial.
- () If is injective, then implies . In particular, elements of the kernel, where , must be zero because .
- () If has a trivial kernel, then consider , which implies and . Since the kernel is trivial, we have and thus , showing that is injective.
By definition. Since surjective means the whole codomain is mapped to, it’s the same as saying that the image is equal to the codomain.
Corollary: Since the columns of generate by definition, if is surjective, then its columns generate .
For example, let’s take the Smith normal form of : and then solve with these steps: In this case, has a trivial kernel . You can see that this is precisely because its Smith normal form has no zero entries along the diagonal. We’ll be talking about the kernel in the abstract, defined as the solutions to . We can see that the kernel is easily calculable when is a Bézout domain, but it may not be as easy for other rings.
- Contains : must send to .
- Closed under addition: because , if both and are in the kernel/image so is .
- Closed under scalar multiplication: because , if is in the kernel/image so is .
Like with groups, an -module is simple if has no proper non-trivial submodules.
Since kernel and image must both be submodules of and respectively, and simple modules have only two possible choices for submodules (trivial submodule and the module itself), the only possible homomorphisms are when the kernel is trivial (implying that the image is , therefore an isomorphism) and when the kernel is (implying that the image is trivial, therefore the trivial homomorphism).
In this section, we describe properties of -modules with only diagrams.
Here we’re going to represent -matrices abstractly as homomorphisms on -modules.
Consider two homomorphisms where the image of is the kernel of . In other words, everything maps to will get mapped to by . This property is called exactness, and we say this sequence is exact at , and any sequence of homomorphisms with the property that “the image of one homomorphism is the kernel of the next” is called an exact sequence.
In fact, we have discussed the concept of exact sequences before for groups. Exact sequences for groups and exact sequences for -modules are precisely the same concept, but since -modules have a richer structure, we may use exact sequences in ways that we cannot do for groups.
To recap, we showed the following for exact sequences for groups. These are true for -modules as well:
- Theorem: The exact sequence implies is injective.
- Theorem: The exact sequence implies is surjective.
- First Isomorphism Theorem: Given , .
- An exact sequence of the form
is known as a short exact sequence, with the following
properties:
- Theorem: is isomorphic to a submodule of .
- Theorem: If is an inclusion map, then .
- Splitting lemma: If there is an left inverse homomorphism , or if there is an right inverse homomorphism , then the sequence splits and we have .
- A short exact sequence describes an embedding of into , and then captures the structure that doesn’t account for. When or has an inverse, then the splitting lemma says you can directly piece together the structures and to obtain , via .
For an example of how exact sequences are used, we can look at projections. A projection is a idempotent endomorphism, i.e. a homomorphism such that .
- () Any direct sum has a canonical left and right projection, so the forward direction is trivial.
- () For any homomorphism like , we can always write the short exact sequence because the kernel of is exactly the image of the injective inclusion map , and is surjective onto its image .
- Since is idempotent, it acts like the identity on its image . Since that implies for , we have , meaning itself is its own inverse for elements in its image .
- By the splitting lemma, since has an inverse , the sequence splits and therefore .
In this section, we utilize exact sequences in the context of -modules specifically.
Take this exact sequence, for instance: This exact sequence is known as a presentation of the -module . The idea is that is entirely described by the presentation, and the presentation is entirely described by the -matrix , called the presentation matrix. In other words, it only takes a single -matrix, , to define all you need to know about an -module . Let’s see why this is.
- By the
first
isomorphism theorem on
,
we have
.
However:
- The exact sequence implies that the homomorphism is surjective, and thus is isomorphic to the image of . Thus .
- By exactness, .
- Then we have . This is the same as saying is , except we send all elements that are linear combinations of the columns of to ().
In the context of presentations, the basis of (which is the codomain of ) are known as generators of , and the columns of describes relations on the generated elements . So the -matrix , by virtue of defining both the generators of , and the relations on those generators of , completely determines the -module . Because every -matrix defines a -module this way, we say that is the cokernel of . It is so named because there is a duality between the kernel and the cokernel. Where the kernel is comprised of the elements in the domain sent to zero by , and a trivial kernel implies injectivity of , the cokernel is comprised of the structure in the codomain not sent to zero by , and a trivial cokernel implies surjectivity of .
Surjectivity of means . But then the cokernel is trivial if and only if .
Hopefully this makes clear the reason we only study -matrices in this exploration — because every -module can be defined as the cokernel of a suitable -matrix, studying -matrices completely subsumes the need to study -modules directly!
In this section, we describe how to derive the -matrix that presents a given -module.
Let’s do the reverse. How do you construct the -matrix (the presentation matrix) that presents a given (finitely generated) -module? Here is a step-by-step:
- Step 1: Find its generators .
- Step 2: Find the relations among those generators.
- Step 3: Express the relation vectors as columns of the presentation matrix.
- Step 4: Reduce the matrix.
(Note that a non-finitely generated -module will ask for infinitely many , so describing that class of -modules would require a different process.)
Example:
Let be a -module generated by under the relations Express each relation as a column of the presentation matrix : so that the above system of relations can be summarized as And theoretically we are done, since presents .
However, we can reduce this matrix down a bit. The following simplifying operations also do not change the isomorphism class of the module presented by :
- Row or column operations.
- Removing a column of zeroes (since that represents the zero relation )
- Removing the th row and th column whenever the th column consists of zeros except for one (since that represents the relation , and so we can remove from consideration)
Reducing our matrix above:
Since the image of is , the big matrix above actually just presents the module ! Thus .
We’ve seen how it’s done for a specific finitely generated -module, and this approach generalizes for all finitely generated -modules. Thus we can say that every finitely generated -module can be presented by an -matrix. But how do you tell when is finitely generated in the first place?
In this section, we learn the conditions under which an -module is finitely generated.
To see whether an -module is finitely generated (and thus presentable), let’s try building one.
Start with the zero submodule of . At every step, add an element outside of to get , generated by and the old .
The fact of the matter is, if is finitely generated, then eventually there are no more elements outside of to add, so the process stops. Otherwise, the process keeps going. One way to characterize this is known as the ascending chain condition (ACC) on submodules: “There is no infinite strictly increasing chain of submodules of .” Satisfying the ACC is the same as saying that the process stops.
- () Towards contradiction, say you have an infinitely increasing chain of submodules of . Let be the infinite union of these submodules, so is a submodule of too, and in particular is a finitely generated superset of every submodule. This means that appears at some point in the infinite chain, but it is also a superset of every submodule of the infinite chain, so some point . This implies at some point. So every infinite chain of is not strictly increasing, contradiction.
- () Let be a special set of generators of W, inductively constructed by having be generated by , and having be generated by the existing together with an element v in but not . Since we’re constructing a strictly increasing chain of submodules , and there is no infinite strictly increasing chain of submodules by the ACC, this process ends with a finite number of generators in .
Recall that integral domains that satisfy the ascending chain condition on principal ideals (ACCP) are exactly the factorization domains. We can say something similar here. A noetherian -module is one where every submodule is finitely generated, i.e. satisfies the ACC (by the above proof). Every Noetherian -module can be described by a presentation matrix.
In this section, we explore the properties of Noetherian rings.
More generally, a noetherian ring is one where every ideal is finitely generated, i.e satisfies the ACC. Note that it’s possible for a ring to be finitely generated but have ideals that are not finitely generated, so proving noetherianity requires proving facts about ideals, not the whole ring. We’re going to prove a number of theorems that let us work with such rings:
Because surjective homomorphisms preserve all subset relations, and thus preserves the ACC.
Because the canonical map is surjective, and surjective homomorphisms preserve noetherianity.
- Ideals in are in the form , where is an ideal in and is an ideal in . This is precisely because there is no way for elements of to influence elements of and vice versa.
- Because are Noetherian, and are finitely generated, from which you can construct a finite set of generators for .
You can take the direct sum of copies of to get . Since finite direct sum preserves noetherianity, when is Noetherian, is Noetherian.
All Noetherian rings satisfy ACC, which says every ideal is contained in a chain up to some point, i.e. the maximal ideal.
- The goal is to prove that an arbitrary ideal for is finitely generated, given the ACC for .
- Lemma: The leading coefficients of degree polynomials in any ideal of form an ideal of .
If is the leading coefficient of , then is the leading coefficient of , and (for arbitary ) is the leading coefficient of . and are both in since it’s an ideal, and therefore and are both in , making it an ideal.
- We have because for every and corresponding degree polynomial , exists (due to being an ideal) as a degree polynomial with the same leading coefficient .
- Thus the form a chain of ideals in , which stops strictly increasing at some point due to being Noetherian and therefore satisfying the ACC. Let be the last strictly increasing ideal in this chain (the one where for all ).
- Take as all the polynomials corresponding to the generators of . Each has degree at most . This is a finite number of polynomials since each is finitely generated, due to being Noetherian. Clearly , as the are taken from .
- We can prove any polynomial
can be expressed in terms of these
,
i.e. .
- First, if then pick some polynomial with the same leading coefficient as , which exists since . Then is a lower-degree polynomial in due to being an ideal. Repeat this process until you obtain a polynomial whose degree is , reducing it to the second case:
- Then if , then the leading coefficient of is in and therefore you can generate using the elements in . To see this, notice that you can match the leading coefficient by adding polynomials corresponding to the leading coefficients in C_N, and you can do the same process for every lower coefficient.
- Since , and is finitely generated, any arbitrary ideal of is finitely generated, thus is Noetherian.
Corollary: If is a Noetherian ring, so is anything in the form because quotients preserve noetherianity.
- () Every ideal is finitely generated in a Noetherian ring, and every finitely generated ideal is principal in a Bézout domain. Thus every ideal is principal in a Noetherian Bézout domain.
- () In a PID, every ideal is principal making it trivially Bézout, and every ideal is generated by one element making it trivially Noetherian.
- We can always find the -presentation matrix for any finitely generated Noetherian -module (so that ). In this case, is given as finitely generated over a PID, and is Noetherian because PIDs are Noetherian.
- The Smith normal form is defined for any matrix defined over a Bézout domain. PIDs are also Bézout domains, so taking the Smith normal form of gives , obtaining a diagonal matrix of invariant factors .
- In particular, the resulting diagonal matrix has the property that , which is a property of the Smith normal form of any matrix.
- Then, since and are invertible (i.e. isomorphisms), we have
- The image of a matrix is the span of its columns. For a diagonal matrix , each column corresponds to a scalar multiplication of a single dimension of , and so the matrix itself can be expressed as a direct sum , with one submodule for each dimension.
- Then we have which by the Third Isomorphism Theorem is isomorphic to
- For the proof of uniqueness, any other decomposition into some would imply that the are invariant factors of appearing on the diagonal of its Smith normal form, which must be the same invariant factors as since the Smith normal form is unique.
This theorem is a generalization of the structure theorem for finitely generated abelian groups, which is the specific case where , using the fact that the -modules are isomorphic to the abelian groups.
In this section, we examine the components of finitely generated -modules.
In the above theorem, we found that any finitely generated -module (over a PID ) can be expressed as a direct sum of components in the form . -modules in the form are called cyclic -modules.
In general, a cyclic -module is one in which every element of the module can be expressed as scalar multiples of a single element . Because of this, cyclic -modules can be written as . If , then is isomorphic to the principal ideal .
Since every element of is a coset in the form , the multiplicative identity is the coset . But the multiplicative identity generates all elements of , thus is a cyclic -module.
So finitely generated -modules are composed of a direct sum of cyclic -submodules .
Recall that quotienting a ring/module essentially sends all quotiented elements to zero. In other words, means “, where ”. Any relation can be expressed this way – if you want to apply the relation , then you quotient by to send it to zero.
Cyclic -modules make this even simpler. Since every possible element can be expressed in the form for some element , every relation on can be expressed in the form instead of .
In fact, the elements that make true are given a special name: the annihilator of .
- It is enough to prove that these elements are closed under subtraction and by multiplication with elements .
- If and , then , thus elements are closed under subtraction.
- If , then for arbitrary we have .
- Thus these elements where form an ideal of .
So this ideal of , containing all elements where , is called the annihilator of , and is written . The idea is that the annihilator consists of all elements that send the generator (and therefore all elements) to zero. In fact:
- Let be the map .
- By definition of cyclic module, the image of is all of , thus is surjective.
- The kernel of consists of elements where , which is exactly .
- Then by the First Isomorphism Theorem we have , which is .
Thus we have that every quotient by a principal ideal is cyclic (with ), and every cyclic module is isomorphic to the quotient . If is a PID, then is a principal ideal as well, and we get an isomorphism between quotients by principal ideals and cyclic modules . The important part is: is either the free module (when ) or a torsion module (when ).
Recall the structure theorem of modules over a PID. Now that we know more about cyclic modules, we know that any module defined over a PID is isomorphic to a direct sum of free submodules , and torsion cyclic submodules , which are ordered by divisibility: for some order of the .
We can go a bit further:
- Since is a PID, (being an ideal) is a principal ideal .
- Recall that .
- If then , meaning is a free cyclic module.
- Otherwise, if is nonzero, note that is a PID and therefore a UFD. Then every nonzero has a unique factorization into primes .
- So we have which factors by the Chinese Remainder Theorem into .
- Therefore is either a free cyclic module, or factors into torsion cyclic modules where the annihilator is a prime power .
Thus another way to decompose a finitely generated module over a PID is where these prime power factors are known as elementary divisors.
Thus there are two ways to decompose a module according to the structure theorem:
- A hierarchial decomposition: where the invariant factors are ordered by divisibility: , and are obtained by the Smith normal form.
- Into irreducibles: where the elementary divisors are powers of irreducibles in the ring , obtained by factoring each via the Chinese Remainder Theorem.
The product of invariant factors must be equal to the product of the elementary divisors, since these describe the same module.
In this exploration, we discovered how -matrices can be used to fully describe all the homomorphisms between free -modules. As a bonus, we also learned how presentation matrices can be used to fully describe any -module, and further, broke down the structure of finitely generated modules over a PID.
Next time we’ll be applying this structure theorem to characterize completely all the homomorphisms between non-free modules.
< Back to category Exploration 1: Modules and vector spaces (permalink)