The Gender Gap in Math: What Can Undergrads Do?

Less than one in five Harvard pure math majors is female and Harvard has no tenured female math faculty. These statistics reflect a well-known gender gap in mathematics, one which I had certainly been aware of before entering Harvard. Throughout high school I never thought that as a student I could make a significant difference in the math community. However, freshman year, I realized that there were many simple, concrete actions I could take that would have a significant impact. This observation, and the encouragement of a few upperclassmen including Meena Boppana, spurred me to co-found Gender Inclusivity in Math (GIIM) at the end of my freshman year, a student organization designed to help reduce the gender gap in math and related departments at Harvard.

Leading GIIM was one of the most eye-opening and transformative experiences of my college career thus far; it has taught me a great deal about how my behavior and language affects others, how much the little actions I take can matter (both positively and negatively), and why it is so vital that we tear down as many barriers as we can in quantitative fields like math. As I step down from GIIM board after a year and a half, I hope to share some of my observations about why inclusivity issues in math matter to all of us and what students specifically can do about the gender gap and other similar inclusivity issues.

A note: My views do not necessarily represent those of GIIM board. Since I do not identify as a woman, I have no firsthand experience with many of the issues I outline; I will justify many of my claims with data from surveys or anecdotal evidence, but these are not meant to represent every single woman’s experience in math. Finally, while my focus has been on gender issues in math, I believe that many of the issues I raise are also applicable to inclusivity issues beyond gender; I have tried to pursue non-gender-specific initiatives as much as possible and will highlight cases where I think a particular barrier applies to other diversity and inclusion issues as well.

I’ll begin with a brief quantitative description of the gender gap. The gender gap is more than just a pipeline issue, as fewer and fewer women stay in math through each stage (40% of Bachelor’s, 28% of PhD’s, and less than 10% of professors are female). At each stage, disproportionately more women are pushed away from continuing in math, indicating that there are problems at every stage of the process. The gender gap also worsens in more “elite” universities. Only 10 – 20% (depending on the year) of Harvard undergrad math majors are female and this figure is representative of most Ivy League universities. Another interesting study on AMC test scores indicates that girls who score well on the AMC disproportionately tend to come from privileged backgrounds (wealthy, parents interested in math, etc.) suggesting that girls who do end up going into math have other advantages that allow them to overcome barriers associated with being a woman. This evidence all suggests that a useful mental model of being female is like carrying an extra weight: it constantly affects you at every stage of the process regardless of how qualified you are and is potentially easier to deal with if you have other counterbalancing advantages.

So what causes this weight? While on GIIM board I have tried to identify some causes that are easily fixable by undergraduate students and discuss how students can help fix them. Here I list 5: explicit sexism within the student body, the “genius” stereotype, the math ability hierarchy, accessibility of resources within the math department, and the lack of communities and role models. I then describe how GIIM has tried to tackle these causes and what measures students at other universities can take to help reduce the gender gap.

  1. Explicit sexism within the student body.
  2. “Of course she got into MIT! She’s a girl….they always take girls!”

    “Girls are only good for shopping.”

    “Well that’s what you get for working with a girl. These kinds of labs aren’t really their thing.”

    “You’re good at math…for a girl.”

    “You’re a girl. Of course you overthink things.”

    “Are you sure you can handle this class? Not a lot of girls can take it.”

    These are all quotes (or paraphrases) said by students in the math community to other students either in high school or college that I have heard anecdotes about or that were published in the recent MIT report on the gender gap in math. I’ve also heard plenty of others, including a few that don’t explicitly refer to gender but have heavy gender-related undertones (for example, “so easy your mom could do it”). Regardless of original intent, they are all unacceptable since they all imply that being a girl somehow implies that one is worse at math. Of course, these statements are not the norm in the math community, but a single gender-based insult can spoil a student’s experience in the math community and add to the weight women experience regardless of how welcoming everyone else is.

    The simplest solution to this problem is to encourage students to be more respectful of who they are speaking to and to avoid overtly sexist language. As a student, you can do more than just watch your own language; respectfully calling out other students when they are being sexist or disrespectful helps the math community as a whole. To address these issues, GIIM has worked with the Math Department to create a training program for CAs (undergraduate course assistants) that helps them respond when other students make sexist comments; this may later be extended to all math undergraduates. It’s too early to tell whether these efforts have been successful, but I think that this is one of the simplest ways for students to make an intervention that can reduce the gender gap.

    Another important solution is to create a supportive community (primarily consisting of women, but men are also welcome) where these sexist comments do not exist. This was one of GIIM’s primary missions from the very beginning: we have hosted sushi community dinners and other socials to build a community of women in math and held discussion groups to provide an opportunity for women to discuss sexist comments they hear (among other things). Math Night was a collaborative problem set/office hours night for all math concentrators created as a joint collaboration between GIIM, the Math Department, and Leverett House; one of its goals was ensuring that math concentrators including women could find supportive problem set groups. One major purpose of our conference WIMS (Women in Math and Statistics) was creating a Boston-wide community of women in math for this reason. These efforts have had varying degrees of success; I’ve heard some resounding testimonials from WIMS, Math Night, and the dinners about the community we have developed, but many people (including me) feel we could do more. Unfortunately, magically forcing people to socialize, especially Harvard students on busy schedules, is very difficult; GIIM will continue to experiment with types of socials to determine which are most effective.

  3. The “genius” stereotype.
  4. The classic stereotype of a mathematician is a white (or perhaps Asian) male bespectacled genius who is incredibly introverted to the point of being unable to function in society, has known he wants to do math (and only math) from an incredibly young age, and is completely incapable of understanding anything that is not directly math. This stereotype has been perpetuated by virtually every form of popular culture, until it’s expected that almost every aspect of this holds true for every math student. Thus many math students spend time trying to find the true geniuses who fit this stereotype and divide their world into those geniuses and everyone else.

    Virtually everything about this stereotype is not only wrong but also incredibly insulting to everyone who does math. Classifying some students (regardless of gender) as geniuses devalues the hard work that fuels their success and any skills they might have in other areas. It also insults every perceived non-genius (which usually includes all the females, racial minorities, and other underrepresented groups according to people’s stereotypes) by saying that they are inherently incapable of doing math no matter how hard they work. Research has shown that convincing students that talent, not hard work, is more relevant for success makes students less resilient to failure and lowers their eventual test scores. Despite all of this, this stereotype has been perpetuated by not just Harvard math undergraduates but also those higher up.

    Obviously, it’s impossible for students to convince Hollywood to accurately portray mathematicians as people, not stereotypes. But we can ensure that our community is free of these insinuations just as we can rid our community of sexist comments. GIIM’s CA training program may prove somewhat successful in this regard as well. We have also found that students, administrators, and professors are unaware of how their language may entrench this stereotype (i.e. “You can’t take [hard math class] and have a social life!”), and encouraging them to be more careful and aware of their language has a positive rate of success. This is yet another area that I think is firmly within the control of the student body to change.

  5. The math-ability hierarchy.
  6. Math students, for whatever reason, love to rank each other based on perceived math ability and create a social hierarchy of perceived competence. At Harvard, this is largely based on the freshman-year math class, since Harvard offers many different freshman-year math tracks at varying intensities depending on your initial background. At colleges without a variety of freshman-year math courses (e.g. MIT), social status is based on your background when entering the community. Within the high school math contest community, it is often based on how many contests you passed in the AMC/AIME/USAMO sequence. Even within a class, students who show off more by solving simple problems with high-powered theorems can ascend farther up this hierarchy. It’s virtually impossible to eliminate this social hierarchy; if math students want to rank each other, they will find some difference and some way to do so. Even worse, almost every single student I have met in math has an inherent tendency to rank others; it takes a strong conscious effort to not do this or let this color one’s vision.

    This hierarchy increases competition and makes for a poor learning environment overall both by encouraging students to focus on how others are doing in class, not how they are doing and by discouraging students who feel that they are relatively weak based on their past experience. But why is it a specifically gendered issue? Women entering the Harvard math department tend to have weaker backgrounds because of external cultural factors out of our control, and thus are, on average, forced to the lower tiers of the hierarchy, since it is fixed based on your initial math knowledge and does not respond to any improvement over time. Thus any action that negatively affects lower tiers of the hierarchy will disproportionately affect women and other minorities. Unfortunately, there are plenty of these: pernicious false myths that only students who take Math 55 (the highest-level freshman math course) can become concentrators pervade the department, many students feel (falsely) that lower-level freshman math courses are more oriented towards an applied math angle or inadequate preparation for a math degree, and the highest tiers of the hierarchy are often treated as geniuses, fueling the genius stereotype above.

    There are two potential ways of tackling this hierarchy. One short-term fix is by encouraging women to get into higher tiers; in Harvard’s case, this is possible since the hierarchy isn’t entirely fixed when you enter college but rather determined by your choice at the beginning of freshman year. There is some evidence that women at Harvard, lacking self-confidence in themselves (for cultural or social reasons), choose to place themselves in lower-level freshman math courses relative to what they are capable of while boys choose the opposite; some freshman math courses have indicated that women routinely score at the top of the class, suggesting they were overqualified to begin with. The official Harvard Math Department description of freshman math classes somewhat exaggerates their difficulty, which definitely could contribute to this. GIIM has published more accurate descriptions, in our view, in the Advising Pamphlet that should help fix this. This is only a short-term fix, since it doesn’t directly tackle the problem of the hierarchy but instead makes it less likely that women will be stuck on its lower tiers.

    The much better approach is removing the hierarchy directly. GIIM has tried doing this to a certain extent by debunking some of the false myths associated with the hierarchy in documents such as the Advising Pamphlet. We have also tried working with the Math Department to decrease hierarchy-based messages in official Math Department documents and working with the student body in ways similar to those described above. In theory, this problem should be tackle-able by a student group since it is just a mindset students happen to possess, but there doesn’t appear to be an easy way to resolve it.

  7. Accessibility of resources within the department.
  8. Many students at Harvard have found official advising either inadequate or difficult to navigate, so they have resorted to other alternatives, including finding professors they know well to talk to on a personal basis or relying on social connections made by being close to or at the top tier of the social hierarchy. This of course simply entrenches the hierarchy and often prevents women from receiving the advising that they deserve. I am fairly certain that similar situations exist at many other schools and colleges as well.

    To help solve this problem, GIIM began running our own advising events around major events (course selection and concentration declaration) aimed at including all students and perspectives in the math department. Further, we collected a large amount of crowd-sourced advice similar to what math concentrators would get and wrote the Advising Pamphlet to help make this information available to everyone in the math department. In addition, community-building initiatives like Math Night have helped women and other underrepresented students gain more access to better peer-to-peer advising. These have all had reasonable success within Harvard, and I’m excited to see GIIM continue to pursue these initiatives and make them even more popular within Harvard. Similar methods of increasing accessibility can undoubtedly work in other colleges and schools.

  9. Lack of communities and role models.

One potential solution I’ve brought up many times above is the creation of a safe community for women, which provides them with a place to safely discuss obstacles unique to being a woman in math, peer-to-peer advising networks and social capital to combat the hierarchy and lack of advising resources, and a location free of stereotypes or gendered comments. Successfully creating this community significantly helps many of the issues above; in fact, according to some students, countries that have strong communities of women have been able to mostly close the gender gap, even if structural or cultural inequities remain.

Another important aspect of community-building is the importance of role models that women can look to for inspiration and mentoring. Many women cite important role models from their childhood, but often these are male, simply due to the lack of prominent female mathematicians to serve as role models. Fortunately, society is gradually improving this situation as formerly unknown figures like Emmy Noether, Sophie Germain and Ada Lovelace become more prominent; the recent book/movie Hidden Figures has helped bring light to these issues. On a more institutional level, we need female math professors or similar role models. Unfortunately, Harvard has no tenured female professors. As a result, GIIM has run a speaker series where we invite prominent female mathematicians from other universities and from industry to speak about their research and challenges they faced; these events have had varying success in inspiring women in the department and raising issues related to the gender gap. The WIMS speakers also play similar roles in both their keynotes and panels as well as the smaller breakout sessions where they get to closely interact with participants; these events were very well-received at WIMS 2016. Thus creating role models for women in math is not by any means easy, but it is definitely doable for a student group even at a university without female tenured professors.

This list is by no means exhaustive, but hopefully it makes clear that there are many actions that we as students can take to help reduce the gender gap with minimal effort. So the only question that remains is this: why do we care so much about gender issues in the first place? To this, in addition to the inherent injustice, I can only give you my answer: because every day, there are students out there, perfectly capable of going into math, who are forced out for reasons totally out of their control. Because every single time a student leaves math or another quantitative field, humanity is less likely to solve any of the major mathematical and scientific challenges facing it today. And because every little way I make math more inclusive and every student I encourage to do math can do far, far more for the field and for humanity than I could ever do alone.

Advertisements

Linear Algebra without Determinants, Part IV: Jordan Canonical Form

In the last post, I introduced the minimal and characteristic polynomial, which I used to show that every operator (over an algebraically closed field) has an eigenvalue and can be written as an upper-triangular matrix. I ended with two remaining problems: how to determine what lies above the diagonal in this upper-triangular matrix and why the minimal and characteristic polynomials aren’t always equal. Here, I introduce Jordan canonical form which conveniently solves both problems for us.

First, note that by the spectral theorem we really only care about the behavior of any T on one particular V^{(\lambda)}. Further, we can subtract off a \lambda \mathop{\mathrm{id}} factor to get a nilpotent operator, since that’s just eliminating the diagonal effectively. So we’ll look at the Jordan canonical form for nilpotent operators, which easily generalizes to all operators.

A. The Regular Operator

The regular operator is our “simplest” nilpotent operator. An operator is regular iff there exists a basis e_1, \ldots, e_n such that T(e_i) = e_{i-1} and T(e_1) = 0, i.e. the matrix with respect to the basis has 0 everywhere except for the superdiagonal.

As usual, we don’t want our definitions to rely on a basis. Therefore we have the following very long but fairly easy lemma.

Lemma 31. If \dim V = n then the following are equivalent.

(1) T is regular.

(2) For some v \in V, v, Tv, T^2 v, \ldots form a basis.

(3) T^{n-1} \neq 0.

(4) \dim \ker(T^i) = i for all 1 \leq i \leq n.

(5) \dim \ker T = 1.

Proof. (1) implies (2) is immediate by picking v = e_n.

(2) implies (3) is obvious as T^{n-1} v \neq 0.

(3) implies (4) follows from the following very nice inequality, which is true for any operator:

\dim \ker(T^{i+1}) - \dim \ker(T^i) \leq \dim \ker(T^i) - \dim \ker(T^{i-1})

The proof of this inequality is left to the reader. As a hint, consider the quotient space \ker(T^{i+1})/\ker(T^i).

(4) implies (5) is immediate.

(5) implies (4) follows from the above inequality as well.

(4) implies (3) is immediate.

(3) implies (2) follows by taking the vector such that T^{n-1} \neq 0 and doing some computations.

(2) implies (1) is also immediate. \blacksquare

Interestingly, we also get the following lemma, which answers one of our questions.

Lemma 32. A regular operator has minimal and characteristic polynomial equal.

Proof. By condition (3) in Lemma 31, T^{n-1} \neq 0 so t^{n-1} can’t be the minimal polynomial. But T^n = 0 so t^n must be the minimal and characteristic polynomial. \blacksquare

So our regular operators both have very nice properties above the diagonal, and have equal minimal and characteristic polynomials. So now we want to decompose our nilpotent operator into regular operators. This is the goal of Jordan form.

B. Jordan Canonical Form

Theorem 33. (Jordan form) If T: V \to V is nilpotent, then

V = \oplus_{i} V_i

such that V_i are T-invariant and T|_{V_i} is regular (these are the Jordan blocks). Further, while the decomposition is not unique, the multiset \{\dim V_1, \dim V_2, \ldots \} is.

Proof. We start with existence. Not surprisingly, we induct on \dim V. The base case is trivial.

For the inductive step, consider the minimal integer n such that T^n = 0 but T^{n-1} \neq 0 (also the degree of the minimal polynomial). Consider v such that T^{n-1} v \neq 0 and let V_1 = \mathop{\mathrm{Span}}(v, Tv, \ldots, T^{n-1} v). It’s T-invariant and T|_{V'} is regular by Lemma 31.

Now we need the rest of our decomposition. Project using the map \pi: V \to V/V' and let T'' be the induced map. We can decompose V/V' = \underset{i} \oplus V''_i by the induction hypothesis. So all we want to do is find an inverse j to our projection map \pi that respects the action of T so we can recover a decomposition of V.

In order to do this, map out of each element V''_i (we can combine the maps obviously) and omit the index i. Since V'' is regular, by Lemma 31 we have a basis w, \ldots, (T'')^{m-1} w where m = \dim V''. Arbitrarily assign j(w) = \bar{w} such that \pi(\bar{w}) = w. Since j must respect the action of T, this provides us with values for T''w, \ldots, (T'')^{m-1} w. However, since T''^m w = 0, we need to check that T^m \bar{w} = 0. If this isn’t the case, we need to find a “correction term” u so that u \in V' and T^m \bar{w} = T^m u. However, when restricted to V', \mathop{\mathrm{Im}} T^m = \ker T^{n-m} since n > m and V' is regular. So T^{n-m} T^m \bar{w} = T^n \bar{w} = 0 as desired, so we are done with existence.

Now for uniqueness. A naive approach is to consider the kernel and show that it uniquely defines the dimensions, but this doesn’t quite work. Instead, let n_i = \dim V_i consider the partition of \dim V corresponding to \{n_1, n_2, \ldots \}. This partition has a corresponding Young diagram. Flipping the diagram gives us a dual partition which we call n_i*. Now we have the following lemma, which implies uniqueness.

Lemma 34. n_i* = \dim \ker T^i - \dim \ker T^{i-1}

Proof. It’s equivalent to show that the first i rows of the Young diagram have volume \dim \ker T^i. However, note that \dim \ker T^i = \sum_{j} \dim \ker T^i|_{V_j}; it’s easy to see that this corresponds to the volume of the first i rows by Lemma 31. \blacksquare

Thus we have our existence and uniqueness, so we are done. \blacksquare

So now we have our answers. The degree of the minimal polynomial corresponds to the dimension of the largest Jordan block, not the dimension of the generalized eigenspace. So the minimal polynomial and the characteristic polynomial are only equal when the two are regular. Further, we can always write a matrix in a form with 1s or 0s on the superdiagonal.

Most linear algebra problems about endomorphisms of vector spaces fall quickly to the following strategy: apply the spectral theorem to focus on only one eigenspace, then apply Jordan form to focus on regular operators within that eigenspace. This gives us a complete characterization of the simplest form that any particular matrix can take.

Thanks to Prof. Gaitsgory, who taught me this material in Harvard’s Math 55a this semester.

Linear Algebra without Determinants, Part III: Minimal and Characteristic Polynomials

In the last post, I presented the spectral theorem, which gave us a way of breaking down a vector space in terms of eigenvalues. This left some important questions unanswered, such as the existence of eigenvalues and the meaning of the non-spectral term. Here we’ll analyze a slightly better way of looking at eigenvalues in terms of the minimal polynomial, which helps us understand when eigenvalues exist and why this is related to the underlying field k.

A. The Minimal Polynomial

Consider the set of polynomials of an operator T: V \to V, i.e. the set of things like T^2 + 2T + 5, etc. We’re going to look at which of these operators are actually zero. First, I claim that we will find the zero operator somewhere in this set.

Lemma 23. Some polynomial of an operator is actually the zero operator.

Proof. Consider a basis of V and for each vector v in the basis, consider the vectors v, Tv, \ldots, T^n v where n = \dim V. These vectors are linearly dependent, since there are more of them than the dimension of the vector space. So we know that some linear combination, i.e. some polynomial of T, evaluates to 0 on v. Doing this for all the vectors of the basis, we can then multiply these polynomials together and get a polynomial of T that is 0 on the whole vector space. \blacksquare

So consider the set of polynomials that evaluate to the zero operator. This set seems pretty hard to characterize, but the following lemma is very motivated from a bit of ring theory.

Lemma 24. This set is actually generated by a polynomial, i.e. it consists of all multiples of a particular polynomial.

Proof. It’s easy to see that if any particular polynomial p is in the set, then all of its multiples are. So we just need to prove that the gcd of any two polynomials is in the set. But this follows immediately from Bezout’s lemma. \blacksquare

Call the polynomial that generates our set of zero operators the minimal polynomial of T, or \mathop{\mathrm{min}}_T.

(Note: If you’re familiar with a little more algebra, a much faster way to say the above is the following. Note that \mathop{\mathrm{End}}(V) is a k-algebra, as is k[t]. There exists a map \phi: k[t] \to \mathop{\mathrm{End}}(V) that sends t \mapsto T. This map must have a kernel, i.e. it is non-injective, since k[t] has “infinite” dimension. So the polynomial that generates its kernel is just the minimal polynomial.)

Computing the minimal polynomial of an operator seems pretty hard. So let’s look at the relationship between the minimal polynomial of an operator and its minimal polynomial on a subspace and quotient space.

Lemma 25. For any T-invariant subspace V' \subset V, we have two induced maps T': V' \to V' and T'': V/V' \to V/V'. Then

(a) \mathop{\mathrm{min}}_{T'} | \mathop{\mathrm{min}}_T and \mathop{\mathrm{min}}_{T''} | \mathop{\mathrm{min}}_T

(b) \mathop{\mathrm{min}}_T | \mathop{\mathrm{min}}_{T'} \mathop{\mathrm{min}}_{T''}

Proof. (a) Obviously \mathop{\mathrm{min}}_T kills everything in V', so we have our first divisibility by definition. Similarly, \mathop{\mathrm{min}}_T kills everything in V'', so we get our second divisibility.

(b) We want to show that multiplying the minimal polynomials of the subspace and the quotient space kills all the vectors in V. However, note that after applying \mathop{\mathrm{min}}_{T''}, we get a vector that projects to zero in V/V', so it is in V'. Then applying \mathop{\mathrm{min}}_{T'} kills it, so we get our desired divisibility. \blacksquare

Now, here’s the link between minimal polynomials and eigenvalues.

Lemma 26. \lambda \in \mathop{\mathrm{Spec}}(T) iff \mathop{\mathrm{min}}_T (\lambda) = 0

Proof. If \lambda is an eigenvalue, then consider V' = \ker(T - \lambda \mathop{\mathrm{id}}) and note that \mathop{\mathrm{min}}_{T'} | \mathop{\mathrm{min}}_T by Lemma 25 (where T' is the restriction to V' as V' is T-invariant). It’s easy to see that \mathop{\mathrm{min}}_{T'} = t - \lambda, so \lambda is a root of \mathop{\mathrm{min}}_T.

For the other direction, if t - \lambda divides \mathop{\mathrm{min}}_T, then T - \lambda \mathop{\mathrm{id}} must be non-invertible, as otherwise \mathop{\mathrm{min}}_T isn’t actually minimal. So we are done. \blacksquare

So eigenvalues actually correspond to linear polynomials that divide the minimal polynomial. It’s interesting to note that non-linear irreducible polynomials could divide the minimal polynomial if k is not algebraically closed; this is discussed in my posts on modules over PIDs.

B. Eigenvalues Always Exist

To make our lives convenient, we’re now going to let k be algebraically closed, which means that (1) every polynomial factors into linear factors; (2) every polynomial has a root; and (3) every irreducible is linear. These are all easily seen to be equivalent. The classic example of an algebraically closed field is \mathbb{C}, but others do exist.

Lemma 27. The non-spectral term is zero if the field is algebraically closed.

Proof. The minimal polynomial always has a root, so eigenvalues always exist. Thus there can’t be a non-spectral term. \blacksquare

Note that this approach is actually the same in spirit as the determinant-free approach presented in Axler. However, the minimal polynomial makes it easier to see the generalization for non-algebraically closed fields. For now, we’ll just note that in any non-algebraically closed field, there exists an operator and vector space with no eigenvalues. The example is left to the reader.

Let’s reconnect this discussion with matrices for a bit.

Lemma 28. V admits a basis in which T is upper-triangular.

Proof. By the spectral theorem and Lemma 27 we get V = \underset{\lambda} V^{(\lambda)}. Pick a basis on each V^{(\lambda)} such that T - \lambda \mathop{\mathrm{id}} is strictly upper-triangular (as it is nilpotent). Then combine these to get a basis of V on which T is upper-triangular. \blacksquare

Lemma 29. On the basis of Lemma 28, the number of occurrences of \lambda \in k as a diagonal entry equals \dim (V^{(\lambda)}.

Proof. This is immediate from the construction above. \blacksquare

So now we have a better idea of the “simplest” possible matrix for any operator: it’s always upper triangular and we know exactly how many occurrences of the eigenvalues will appear on the diagonal. All that remains is to characterize what’s found above the diagonal.

C. The Characteristic Polynomial

We still want to compute the minimal polynomial explicitly though. Lemmas 28 and 29 suggest that we try what follows. Let the characteristic polynomial \mathop{\mathrm{ch}}_T be \underset{\lambda \in \mathop{\mathrm{Spec}}(T)} \prod (t - \lambda)^{\dim V^{(\lambda)}}.

Lemma 30. (A version of Cayley-Hamilton) \mathop{\mathrm{min}}_T always divides \mathop{\mathrm{ch}}_T.

Proof. Every term in the product kills V^{(\lambda)} so by the spectral theorem we are done. \blacksquare

(To fully prove Cayley-Hamilton, this needs to be extended to non-algebraically closed fields. This can be done as shown in my posts about modules over PIDs or it can be done in other ways.)

It’s easy to check using Lemmas 28 and 29 that this characteristic polynomial is the same as canonically defined. Specifically, \mathop{\mathrm{ch}}_T (\mu) = \det(\mu \mathop{\mathrm{id}} - T).

Ideally, we would always have \mathop{\mathrm{ch}}_T = \mathop{\mathrm{min}}_T. Unfortunately, this isn’t actually true. Specifically, consider the operator T: \mathbb{C}^4 \to \mathbb{C}^4 such that T((1, 0, 0, 0)) = (0, 0, 0, 0), T((0, 1, 0, 0)) = (1, 0, 0, 0), T((0, 0, 1, 0)) = (0, 0, 0, 0) and T((0, 0, 0, 1)) = (0, 0, 1, 0). It’s easy to see that this operator is nilpotent so its only eigenvalue is 0. However, V^{(0)} is four-dimensional, while t^2 is the minimal polynomial.

We have two major questions unanswered: what lies above the diagonal of an operator in the simplest possible form and why does the minimal polynomial not equal the characteristic polynomial. Both of these questions are answered in my next post about Jordan canonical form, which completes the structure theory of endomorphisms.

Thanks to Prof. Gaitsgory, who taught me this material in Harvard’s Math 55a this semester.

Linear Algebra without Determinants, Part II: Spectral Theory

In the last post, I presented the basic definitions of a vector space and a linear map and showed that the structure theory of vector spaces alone and of linear maps between different vector spaces is rather trivial. In this post, we begin exploring the structure theory of endomorphisms by looking at spectral theory.

A. Nilpotent and Invertible Endomorphisms

We are already familiar with the invertible endomorphism, and from Lemma 13 we know that invertibility is equivalent to either injectivity or surjectivity. This is extremely useful over a vector space.

Let a subspace V' \subset V be T-invariant if T(V') \subset V'. In that case, let the restriction map T|_{V'} be the map T restricted to V'. Conveniently, we have the following lemma.

Lemma 15. T|_{V'} is invertible if T is invertible.

Proof. It must be injective clearly, so it is invertible. \blacksquare

Now we define the property that is in some sense the “opposite” of invertibility. Let a nilpotent endomorphism T be such that T^n = 0 for some n. It’s easy to see that T is nilpotent iff there exists a m such that T^m v = 0 for all v \in V. Similarly, it’s easy to see that T|_{V'} is nilpotent if T is nilpotent.

A good exercise for a connection to matrices is to show that T is nilpotent iff the matrix of T is strictly upper triangular with respect to some basis. A useful intermediate step is showing that there exists a sequence of subspaces 0 = V_0 \subset V_1 \subset \ldots \subset V_k = V such that T sends V_i to V_{i-1} and \dim(V_i) = i. Using this, it’s not hard to show that T^n = 0 where n = \dim V.

Now most maps are obviously not nilpotent or invertible. But as it turns out, we can split a vector space up so that the map is nilpotent on one part and invertible on the other; this is extremely useful.

B. Eventual Kernel and Eventual Image

Theorem 16. (Eventual kernel and eventual image) If T: V \to V, then there exist V^{\mathop{\mathrm{nilp}}}, V^{\mathop{\mathrm{inv}}} \subset V such that both are T-invariant, T|_{V^{\mathop{\mathrm{nilp}}}} is nilpotent, T|_{V^{\mathop{\mathrm{inv}}}} is invertible, and

V^{\mathop{\mathrm{nilp}}} \oplus V^{\mathop{\mathrm{inv}}} \simeq V

Proof. Before we begin the proof, let’s note a flawed naive approach. It is not enough to take V^{\mathop{\mathrm{nilp}}} = \ker T and V^{\mathop{\mathrm{inv}}} = \mathop{\mathrm{Im}} T since \ker T \oplus \mathop{\mathrm{Im}} T \simeq V fails sometimes. Specifically, Let V = k^2 and T be such that T((0,1)) = (1, 0) and T((1, 0)) = (0, 0). Then \ker T = \mathop{\mathrm{Im}} T = \mathop{\mathrm{Span}} ((1, 0)). Note that our proof earlier for the above statement assumed implicitly that we were not dealing with endomorphisms.

While this naive approach fails, a relatively simple modification works. Specifically, consider the sequence of subspaces \ker T \subset \ker T^2 \subset \ker T^3 \subset \ldots. This sequence stabilizes eventually by a simple dimension argument. The stabilized subspace is the eventual kernel which we denote by \ker(T^{\infty}). It’s easy to see that the eventual kernel is indeed T-invariant and T|_{\ker(T^{\infty})} is nilpotent.

Similarly, consider the sequence \mathop{\mathrm{Im}} T \supset \mathop{\mathrm{Im}} T^2 \supset \mathop{\mathrm{Im}} T^3 \supset \ldots. This sequence also stabilizes eventually, which becomes the eventual image or \mathop{\mathrm{Im}} (T^\infty). It’s easy to see that the eventual image is T-invariant and T is invertible on it.

Now we show that \ker(T^{\infty}) \oplus \mathop{\mathrm{Im}} (T^{\infty}) \simeq V. To show that this is an isomorphism, we need to show injectivity and that the dimensions of both sides are equal by Lemma 13. Injectivity reduces to showing that if T acts on a vector both nilpotently and invertibly, the vector is actually zero. The dimensions argument just follows from rank-nullity. So we are done. \blacksquare

This is a nice result, but we can do better. After all, there are still a lot of nilpotent and invertible endomorphisms, that we have no idea how to work with.

C. Eigenvalues and Eigenvectors

The eigenvalue magically makes our structure theory work. For motivation behind the magic of eigenvalues, see my series of posts on modules over PIDs.

An eigenvector is a vector v \in V such that Tv = \lambda v for some eigenvalue \lambda. The spectrum \mathop{\mathrm{Spec}}(T) is the set of all eigenvalues. The eigenspace V^{\lambda} is just \ker(T - \lambda \mathop{\mathrm{id}}).

We’d like eigenvectors to always exist. Unfortunately this isn’t the case; consider V = \mathbb{R}^2 and the operator T corresponding to rotation about the origin by 90^{\circ}. There are no eigenvectors in this case.

However, we do have the following nice result.

Lemma 17. |\mathop{\mathrm{Spec}}(T)| \leq \dim V.

Proof. I claim that all eigenvectors with different eigenvalues are linearly independent; the proof is left to the reader. Thus we are done. \blacksquare

Let T be diagonalizable if it has a basis consisting entirely of eigenvectors. Note that a map is diagonalizable iff its matrix is diagonal. Diagonal matrices are easy to deal with, so ideally everything should be diagonal! However, this turns out not quite to be the case. The goal of spectral theory is to make everything as diagonal as possible.

A couple more useful computations involving eigenvalues are left as exercises:

1. If V' is T-invariant, then \mathop{\mathrm{Spec}}(T) = \mathop{\mathrm{Spec}}(T|_{V'}) \cap \mathop{\mathrm{Spec}}(T|_{V/V'})

2. If T is upper triangular with respect to some basis, then its diagonal entries are its eigenvalues.

D. Spectral Theory

Just as the kernel didn’t work in Theorem 16, we’re going to find that the eigenspace (a kernel) doesn’t work with spectral theory. So define the generalized eigenspace V^{(\lambda)} to be \ker((T - \lambda \mathop{\mathrm{id}})^\infty). Note that the generalized eigenspace is nonzero iff the eigenspace is nonzero.

Now the spectral theorem allows us to divide the vector space up into generalized eigenspaces and other “junk”, where the junk is nicely characterizable.

Theorem 18. (Spectral Theorem) For any T: V \to V, we have

\left(\underset{\lambda \in \mathop{\mathrm{Spec}}(T)} \bigoplus V^{(\lambda)} \right) \oplus V^{\mathop{\mathrm{non-spec}}} \to V

is an isomorphism, where the non-spectral term is V^{\mathop{\mathrm{non-spec}}} = \underset{\lambda \in \mathop{\mathrm{Spec}}(T)} \bigcap \mathop{\mathrm{Im}}((T - \lambda \mathop{\mathrm{id}})^\infty)

Proof. First we start with the following lemmas.

Lemma 19. (V' \cap \ker(T^\infty)) \oplus (V' \cap \mathop{\mathrm{Im}}(T^\infty)) \simeq V' if V' is T-invariant.

Proof. Note that V' \cap \ker(T^\infty) = \ker(T|_{V'}^\infty) and V' \cap \mathop{\mathrm{Im}}(T^\infty) = \mathop{\mathrm{Im}}(T|_{V'}^\infty). So just apply Theorem 16. \blacksquare

Lemma 20. If S \circ T = T \circ S, then

(\ker(T^\infty) \cap \ker(S^\infty)) \oplus (\ker(T^\infty) \cap \mathop{\mathrm{Im}}(S^\infty)) \oplus (\mathop{\mathrm{Im}}(T^\infty) \cap \ker(S^\infty)) \oplus (\mathop{\mathrm{Im}}(T^\infty) \cap \mathop{\mathrm{Im}}(S^\infty)) \simeq V

Proof. Note that if two operators commute, their eventual kernels and images are invariant with respect to each other. Then apply Lemma 19. \blacksquare

By induction, we get a very obvious but annoying generalization of Lemma 20. Here’s a more useful specific case.

Lemma 21. If we have a family of endomorphisms T_1, \ldots, T_k that pairwise commute and have no pairwise intersection of eventual kernels, then

\underset{i} \bigoplus \ker(T_i^\infty) \bigoplus \underset{i} \bigcap \mathop{\mathrm{Im}} (T_i^\infty) \simeq V

Proof. Apply Lemma 20 as many times as necessary, but then note that the other cross-terms disappear because the pairwise intersection of the eventual kernels is zero. Thus we only need to show that \ker(T_i^\infty) = \ker(T_i^\infty) \oplus \mathop{\mathrm{Im}}(T_j^\infty) for all T_i, T_j that commute with zero pairwise intersection of eventual kernels, but this is immediate from Lemma 19. \blacksquare

Now we prove our theorem. Unsurprisingly, let T_i = T - \lambda_i \mathop{\mathrm{id}} (recall that there are only finitely many of these since the spectrum has finite size by Lemma 17). We can easily check that these pairwise commute, so we only need to check that the pairwise intersection of the eventual kernels is zero to apply Lemma 21. Consider the operators T - \lambda \mathop{\mathrm{id}} and T - \mu \mathop{\mathrm{id}} over V^{(\lambda)} \cap V^{(\mu)}. Both are nilpotent operators that differ by a multiple of the identity. However, I claim the following lemma.

Lemma 22. If both S and S - \nu \mathop{\mathrm{id}} are nilpotent, then the vector space is 0.

Proof. Note that (\mathop{\mathrm{id}} - S)^{-1} = \mathop{\mathrm{id}} + S + S^2 + \ldots. From this, it’s easy to see that S - \nu \mathop{\mathrm{id}} is invertible. Thus if an operator is both invertible and nilpotent, the vector space must be 0. \blacksquare

Thus our proof of the spectral theorem is complete. \blacksquare

The spectral theorem allows us to focus on every eigenspace individually, instead of looking at the vector space as a whole. This is highly advantageous, except for the currently extremely annoying non-spectral term. Our next step will be eliminating the non-spectral term, in order to gain the full power of spectral theory.

Thanks to Prof. Gaitsgory, who taught me this material in Harvard’s Math 55a this semester.

Linear Algebra without Determinants, Part I: Vector Spaces and Linear Maps

Traditionally, linear algebra is taught by heavily employing the determinant, an arbitrary computational construct that is essentially random, hard to motivate, and obscures most of the beauty. In this series of posts, I present a straightforward, completely determinant-free approach to the structure theory of linear algebra, which trivializes essentially all linear algebra problems.

In this part, we begin by introducing the concept of a vector space and a linear map. We then provide a description of the structure theory of vector spaces and of linear maps between different vector spaces.

A. Preliminaries

Linear algebra is the study of vector spaces. A vector space is an algebraic object defined in relation to a field k. Fields are a generalization of familiar concepts such as \mathbb{Q}, \mathbb{R}, and \mathbb{Z}. Specifically, a field k is a set with two operations: addition and multiplication. It is closed, associative, and commutative with respect to addition; the additive identity exists, denoted 0, and additive inverses exist, denoted -a. The same properties hold for multiplication; the multiplicative identity is 1 and multiplicative inverses are a^{-1}. Other examples of fields beyond the ones listed above include finite fields, which are the integers modulo p, a prime.

A kvector space V is a set of objects, called vectors, with two operations: addition and action by k, denoted k \cdot v. Vectors are closed, associative, and commutative with respect to addition, and the additive identity 0 and inverses -v exist. With respect to action, we know that a \cdot (bv) = (ab) \cdot v (a version of associativity), (a+b) \cdot v = a \cdot v + b \cdot v (a version of distributivity), and 1 \cdot v = v. Given these axioms, it’s not hard to show that 0 \cdot v = 0, as expected.

The most common example of a k-vector space is k^n. This vector space consists of the n-tuples of elements of k, where addition and k-action are performed componentwise. For example, \mathbb{R}^2 and \mathbb{R}^3 are the familiar Euclidean plane and space. We’ll see later in this post that all k-vector spaces are actually (non-canonically) isomorphic to k^n.

A generalized conception of the above is the direct sum. Specifically, suppose we have two k-vector spaces V and W. Then we define a vector space V \oplus W to be the set of elements (v, w) with v \in V and w \in W. Addition and k-action are componentwise (as expected). It’s easy to check that this is a k-vector space. Note that there is an obvious isomorphism k^n \oplus k^m \simeq k^{n+m}, so the direct sum works as we expect.

A linear map T is a map of k-vector spaces V \to W that respects addition and k-action, i.e. T(v+w) = T(v) + T(w) and a T(v) = T(av).

B. Subspaces and Quotient Spaces

A subspace V' \subset V is a subset of V that is also a vector space under the same addition and action as V. For example, k^m is a subspace of k^n for any m \leq n.

As is typical in algebra, we want to define a quotient space V/V'. The elements of V/V' are simply sets of elements of V that differ only by an element of V'; for the remainder of these posts, the element containing v will be denoted as v + V'. We want to make this into a vector space by giving it an addition and action. The “canonical” way to do this is as follows: consider the projection map \pi: V \to V/V' such that \pi(v) = v + V'. We’d like \pi to be a linear map. Therefore, we have the following lemma:

Lemma 1. There exists a unique addition and action of V/V' such that \pi is a linear map.

Proof. The addition operation is (v + V') + (w + V') = (v+w) + V' and the action is a \cdot (v + V') = a \cdot v + V'. (this should not be surprising). Proving uniqueness is left to the reader as a nontrivial but fun exercise. Proving that this makes a vector space is left to the highly diligent and bored reader. \blacksquare

Conveniently, our projection map works just as we wanted! (This turns out not be true for subgroups of a non-abelian group in general or for subrings of a ring.) So we have a convenient quotient space.

C. Linear Independence, Span, and Bases

For the following let v_i \in V. If a_i \in k, then sums of the form \sum a_i v_i are known as linear combinations.

Suppose a_i \in k. If \sum a_i v_i = 0 implies that a_i = 0 for all i, then the v_i are linearly independent.

If for all v \in V, there exist a_i \in k such that \sum a_i v_i = v, then the v_i span V.

If the v_i are both linearly independent and span V, then they form a basis of V.

Now it’s easy to check that (1,0) and (0,1) form a basis of k^2, and similarly (1,0, \ldots, 0), \ldots, (0, \ldots, 0, 1) form a basis of k^n. This is the standard basis.

D. All (Finite-Dimensional) Vector Spaces are k^n

Now we’re going to completely characterize all vector spaces.

We start by getting a bit better notion of a basis.

Lemma 2. Let V be a vector space and v_1, \ldots, v_n \in V. The following are equivalent:

(1) v_1, \ldots, v_n are a basis.

(2) v_1, \ldots, v_n span, but no proper subset of them do.

(3) v_1, \ldots, v_n are linearly independent, but adding any other vector w makes them not linearly independent.

Proof. (1) implies (2): Suppose for contradiction that v_2, \ldots, v_n span. Then v_1 must be a linear combination of v_2, \ldots, v_n, contradiction.

(1) implies (3): Clearly w is a linear combination of v_1, \ldots, v_n.

(2) implies (1): Suppose that \sum a_i v_i = 0 with some nonzero a_i. WLOG a_1 \neq 0, but then v_1 is a linear combination of the others, contradiction.

(3) implies (1): Suppose that \sum a_i v_i + a_0 w = 0 for some nonzero a_i. Since v_i are linearly independent, a_0 \neq 0. So w is a linear combination of the others. \blacksquare

This gives some convenient results.

Lemma 3. Any spanning set can be reduced to a basis.

Proof. Take the minimal subset that spans; it works by Lemma 2. \blacksquare

Now define a finite-dimensional vector space to be any space with a spanning set. From now on, we will only work with finite-dimensional vector spaces.

Lemma 4. Any finite-dimensional vector space has a basis.

Proof. Apply Lemma 3. \blacksquare

Lemma 5. Any finite-dimensional vector space is isomorphic to k^n for some n.

Proof. Take the basis of the vector space and consider the linear map that takes every element of that basis to the standard basis of k^n. \blacksquare

It’s still possible though that a single vector space could be isomorphic to multiple k^n for different n. What we want is a theorem stating that all bases of the vector space have the same size. We’ll derive it as a corollary of the following.

Theorem 6. Suppose a spanning set of V v_1, \ldots, v_n has size n and a linearly independent set of V w_1, \ldots, w_n has size m. Then m \leq n.

Proof. Induct on m. If m = 1 then this is obvious. So assume theorem holds for m-1.

First I claim that it suffices to consider v_1 = w_1. Note that w_1 is some linear combination of the v_i. Then v_1 is a linear combination of w_1, v_2, \ldots, v_n (just solve the linear equation). So v_1, \ldots, v_n, w_1 span obviously, but v_1 is a linear combination of previous elements, so v_2, \ldots, v_n, w_1 must also span. Thus we can let v_1 = w_1.

Now, set V' = \mathrm{\mathop{Span}}(w_1) and project as above to V/V'. Clearly v_2 + V', \ldots, v_n + V' span V/V' as w_1 + V' = 0. Now note that the projections of the w_i are still linearly independent. Thus we apply our inductive hypothesis to get m-1 \leq n-1, i.e. m \leq n. \blacksquare

The following series of corollaries now give us all the theory of vector spaces and of linear maps between different vector spaces.

Lemma 7. Any two bases of a vector space have the same size, i.e. the dimension of that space or \dim V.

Proof. If the sizes of two bases are m and n, then Theorem 6 tells us that m \leq n \leq m. \blacksquare

Lemma 8. Any linearly independent set can be completed to a basis.

Proof. Start with a linearly independent set and add elements that are not in its span. Eventually once you reach the dimension of the vector space, at which point you must have a basis. \blacksquare

Also note that any spanning set or linearly independent set with length equal to the dimension of the vector space must be a basis.

Now that we’ve destroyed the theory of vector spaces, we’ll note that the theory of linear maps between different vector spaces is also rather trivial.

E. Linear Maps between Different Vector Spaces

First, note that it’s obvious that any linear maps between two vector spaces are fully defined by their values on the bases of both. A good exercise for the reader is to connect these maps to the traditional matrix. We will not use matrices for linear algebra, however.

Lemma 9. If T: k^n \to k^m is linear, then:

(1) if T is surjective, n \geq m.

(2) If T is injective, n \leq m.

(3) If T is an isomorphism, n = m.

Proof. For (1) and (2), just pick a basis and apply Theorem 6. For (3), combine (1) and (2). \blacksquare

Lemma 10. If V' \subset V, then \dim V = \dim V' + \dim V/V'.

Proof. Pick a basis of V' and of V/V'. It’s easy to see that the two combine to form a basis of V. Thus \dim V = \dim V' + \dim V/V'. \blacksquare

An easy extension of the above gives a (not necessarily canonical) isomorphism V = V' \oplus V/V'.

Let the kernel \ker T of an operator T: V \to W be the subspace of vectors v \in V such that Tv = 0 (this is sometimes called the nullspace). Let the image \mathop{\mathrm{Im}} T of an operator T: V \to W be the subspace of vectors w \in W such that Tv = w for some v \in V. Let the cokernel \mathop{\mathrm{coker}} T be W/\mathop{\mathrm{Im}} T.

Lemma 11. (First Isomorphism Theorem for Vector Spaces) If T: V \to W is a vector space, then V/\ker T \simeq \mathop{\mathrm{Im}} T canonically.

Proof. The isomorphism is simply the map v + \ker T \mapsto Tv. It’s left to the reader to show that this is indeed an isomorphism. \blacksquare

The rank \mathop{\mathrm{rank}} T is \dim \mathop{\mathrm{Im}} T.

Lemma 12. (Rank-nullity) \mathop{\mathrm{rank}} T + \dim \ker T = \dim V

Proof. Apply Lemmas 10 and 11. \blacksquare

Lemma 13. Let T: V \to V. Then T is injective iff it is surjective.

Proof. Note that injectivity is equivalent to \dim \ker T = 0 and surjectivity is equivalent to \dim \mathop{\mathrm{coker}} T = 0. However, note that \dim V - \dim \ker T = \dim V - \dim \mathop{\mathrm{coker}} T by rank-nullity, so we are done. \blacksquare

Lemma 14. (Structure theory of linear maps between different vector spaces) Let T: V \to W be a map of finite-dimensional vector spaces. Show that V admits a basis v_1, \ldots, v_k, v_{k+1}, \ldots, v_n and W admits a basis w_1, \ldots, w_k, w_{k+1}, \ldots, w_m such that Tv_i = w_i for all i \leq k and T(v_i) = 0 for all i > k.

Proof. Note that V \simeq \ker T \oplus \mathop{\mathrm{Im}} T and W \simeq \mathop{\mathrm{Im}} T \oplus \mathop{\mathrm{coker}} T by the above. Let v_1, \ldots v_k and w_1, \ldots, w_k be the appropriate basis of \mathop{\mathrm{Im}} T. Let w_{k+1}, \ldots, w_m be a basis of \mathop{\mathrm{coker}} T and v_{k+1}, \ldots, v_n be a basis of \ker T. It’s left to the reader to check that this works. \blacksquare

Re-writing the above in matrix notation, we find that this implies that all maps between different vector spaces can be written in a matrix entirely consisting of 1s on some diagonal entries and 0s elsewhere when the appropriate basis is picked. This is unfortunately rather boring.

However, note that this doesn’t work if V = W, simply because you can’t pick two different bases of V at the same time. Therefore, in our subsequent posts we’re going to focus entirely on endomorphisms, i.e. maps T: V \to V. This is the beautiful part of linear algebra.

Thanks to Prof. Gaitsgory, who taught me this material in Harvard’s Math 55a this semester.

Structure Theory of Modules over PIDs, Part IV: Jordan Canonical Form

In the last post, I proved the spectral theorem for modules over PIDs and applied it to show the Chinese Remainder Theorem and a more general form of the linear algebraic spectral theorem. In this post, I will do the same for the Jordan canonical form and provide definitions of the minimal and characteristic polynomial.

A. Jordan Canonical Form

Once again, we’ll proceed just as we did with vector spaces and consider a nilpotent operator, i.e. let M = M^{(I)} for some I. The proof here is not surprisingly very similar to that over vector spaces.

Theorem 24. (Jordan canonical form for modules over PIDs) Let R be a PID and I \subset R a maximal ideal. Let M be an R-module such that M = M^{(I)}. There exists a unique multiset \{n_1, n_2, \ldots, n_k\} which add to \mathop{\mathrm{lg}}(M) such that

M \simeq \underset{i} \oplus R/\left(r_I^{n_i}\right)

Proof. We’ll show existence first. Induct on \mathop{\mathrm{lg}}(M). The base case is obvious.

We want to “divide” our modules up in some way where we know one component is isomorphic to R/\left(r_I^n \right) for some n. To do this, let n be the minimal integer such that r_I^n is the zero operator on M. Pick m \in M such that r_I^{n-1} m \neq 0. Consider M' = \mathop{\mathrm{Span}}(m, r_I m, \ldots, r_I^{n-1} m). Note that M \simeq R/\left(r_I^n \right), as the map 1 \left(r_I^n \right) \mapsto m is clearly an isomorphism.

Now define M'' such that 0 \to M' \to M \to M'' \to 0 is short exact and apply the induction hypothesis to M'' \simeq \underset{i} \oplus R/\left(r_I^{n_i} \right). We’d like to extend these submodules to M, by creating a right inverse j to \pi: M \to M'' that respects the action of r_I. In order to do this, we are mapping out of a direct sum so simply map out of each R/\left(r_I^{n_i} \right). Note that we have a basis 1, r_I, \ldots, r_I^{n_i - 1}. Let j(1) = m arbitrarily (such that \pi(m) = 1), and note that we need r_I^{n_i} m = 0. Now I claim that we can find a “correction factor” \bar{m} such that r_I^{n_i} m = r_I^{n_i} \bar{m} and \bar{m} \in M'. But since n > n_i we know that r_I^{n - n_i} r_I^{n_i} m = 0, so r_I^{n_i} m is in \ker(r_I^{n - n_i}) = \mathop{\mathrm{Im}}(r_I^{n_i}), so we can find such an \bar{m}. This proves existence.

Now we prove the uniqueness. The easiest way to do this is by looking at the “orders” at which the action of r_I vanishes on M. In order to do this, we can simply consider the Young diagram created by the n_i. Flipping the Young diagram gives a dual partition defined by n_i^{\vee}. Now I claim that n_i^{\vee} = \mathop{\mathrm{lg}}(\ker \phi_{r_I}^i) - \mathop{\mathrm{lg}}(\ker \phi_{r_I}^{i-1}), or alternatively that the volume of the first i rows is just \mathop{\mathrm{lg}}(\ker \phi_{r_I}^i). But this is easy to see just by drawing out the diagram, so we are done. \blacksquare

Once again, we can refine our original version of Jordan canonical form to work over non-algebraically closed fields, as follows.

Theorem 25. (Jordan canonical form for vector spaces, refined) Let T: V \to V be an endomorphism such that for some irreducible polynomial p, p(T) acts nilpotently on V. Then there exists a unique multiset \{n_1, n_2, \ldots, n_k \} which add to \mathop{\mathrm{lg}}(V) (which is not \dim V) such that

V \simeq \underset{i} \oplus k[t]/\left(p^i\right)

Proof. Apply Theorem 24 with R = k[t] and I = (p). \blacksquare

Note that this is equivalent to the traditional version of Jordan canonical form for p = t, because when T is regular, we know that V looks like k[t]/\left(t^{\dim V} \right).

D. The Characteristic and Minimal Polynomial

The characteristic polynomial \mathop{\mathrm{ch}}_T is traditionally defined as \det(\lambda \mathop{\mathrm{id}} - T), but this definition (as with most things with determinants) isn’t very useful for generalizing to anything conceptual. A slightly more suggestive definition for a generalization is \mathop{\mathrm{ch}}_T = \underset{\lambda \in \mathop{\mathrm{Spec}}(T)} \prod (t - \lambda)^{\dim V^{(\lambda)}}. This is rather suggestive because we know that t - \lambda consists of the irreducibles over an algebraically closed field k and \dim V^{(\lambda)} corresponds to \mathop{\mathrm{lg}}(M^{(I)}.

Therefore, we can define the characteristic of a module \mathop{\mathrm{ch}}_M to be the ideal generated by \underset{I} \prod r_I^{\mathop{\mathrm{lg}}(M^{(I)})}. It’s easy to see that this ideal is generated by \mathop{\mathrm{ch}}_T. This ideal also gives us a good definition of the characteristic polynomial over a non-algebraically closed field: specifically, the polynomial (up to a unit, i.e. a constant multiple) that generates \mathop{\mathrm{ch}}_M. A good exercise is to check that this corresponds exactly to the determinant definition of the characteristic polynomial given above.

Next we consider what the characteristic is generated on \mathbb{Z}-modules, i.e. finite abelian groups. Here it’s not immediately obvious what we should be looking for. However, some basic group theory shows that when we take a flag of A^{(p)} for p prime, we’ll get a length equal to the number of times p divides A. Therefore our characteristic is just the order of the group |A|.

Now we want to define an analogue of the minimal polynomial. Define the annihilator of the module \mathop{\mathrm{Ann}}(M) to be the ideal consisting of all elements r that kill every element m \in M, i.e. r \cdot m = 0. It’s easy to see that the minimal polynomial of a vector space generates the annihilator. We can also explicitly characterize the generator of the annihilator.

Lemma 26. The annihilator is generated by \underset{I} \prod r_I^{n_I} where n_I is the length of the maximal element in the Jordan canonical form.

Proof. By Theorem 20 we can just focus on any one M^{(I)} and multiply the results together. But by Theorem 24 the desired is necessary and sufficient. \blacksquare

Not surprisingly, the annihilator of a finite abelian group is generated by the lcm of the orders of the elements of A.

Our last step is to define an analogue of diagonalizability and regularity. Call a module semi-simple if it is isomorphic to the direct sum of irreducible modules.

Lemma 27. A module is semi-simple iff the Jordan canonical form for every maximal ideal has all lengths equal to 1.

Proof. By Theorem 20 we can just focus on any one M^{(I)}. Theorem 24 immediately tells us that if the Jordan canonical form only has terms of the form R/\left(r_I\right) then the module is semi-simple. On the other hand, if the module is semi-simple, then by Theorem 24 modules of the form R/\left(r_I^{n_I}\right) are also semi-simple, implying that n_I = 1 as desired. \blacksquare

Over algebraically closed fields semi-simplicity is equal to diagonalizability. On the other hand, when the field is not algebraically closed we need the additional condition that the minimal polynomial splits into linear factors. For finite abelian groups, semi-simplicity is equivalent to the order of the group not being divisible by any prime powers.

A module is regular if it is isomorphic to R/(r) for some r \in R. An alternative condition for regularity is similar to the one for vector spaces.

Lemma 28. A module is regular iff \mathop{\mathrm{ch}}_M = \mathop{\mathrm{Ann}}(M).

Proof. Note that if M is regular, then by Theorem 23 we know that M^{(I)} = R/\left(r_I^{n_I} \right) for every maximal ideal I. Thus we can apply Theorem 24 to show that \mathop{\mathrm{ch}}_M = \mathop{\mathrm{Ann}}(M) definitionally for each M^{(I)}, so by theorem 20 this is true for M.

If \mathop{\mathrm{ch}}_M = \mathop{\mathrm{Ann}}(M), then the Jordan canonical form of every M^{(I)} has only one element. Thus it must be R/\left(r_I^{n_I} \right). Theorem 23 tells us that we then have M \simeq R/(r) for some r \in R, as desired. \blacksquare

Note that regularity of a module is similar to that of a vector space. On the other hand, regularity of a finite abelian group is the same as the group being cyclic, i.e. isomorphic to \mathbb{Z}/N \mathbb{Z}.

In the past two posts we’ve seen that linear algebra and finite abelian group theory are both special cases of the structure theory of modules over PIDs. This has given us some elegant (determinant-free) proofs of key results and allowed us to expand our understanding of both beyond what is typically presented, especially to non algebraically-closed fields. There are ways to generalize the structure theory of modules for rings that are not PIDs, but we will not go into them here.

Thanks to Prof. Gaitsgory, who taught me this material in Harvard’s Math 55a this semester.

Structure Theory of Modules over PIDs, Part III: Spectral Theory

In the last post, I introduced the ideas of irreducibility and length, which gave us the basic framework to discuss module structure theory. I also placed additional constraints on the rings by making them PIDs, which makes them easier to work with in what follows. Here I generalize the standard linear algebraic proof of the spectral theorem to prove the spectral theorem for modules over PIDs.

For this post and the next, all modules are of finite length.

A. Eventual Kernel and Eventual Image

Our first goal is proving an analogue of the following linear algebra theorem which is used in the proof of spectral theory. (Note that the corresponding finite abelian group structure theorem is just the First Isomorphism Theorem.)

Theorem 13. For any map T: V \to V, the map

\ker(T^{\infty}) \oplus \mathop{\mathrm{Im}}(T^{\infty}) \to V

is an isomorphism. Note that T acts on the first component nilpotently and on the second component invertibly.

In order to do this, it will be very convenient to have an analogue of the fact that injectivity is equivalent to surjectivity. We start with this analogue.

Lemma 14. If T: M_1 \to M_2 is a map of R-modules and M_1 and M_2 have equal length, then T is injective iff it is surjective.

Proof. Suppose T is injective. Consider any filtration of M_1 and apply T to every component. The resulting quotients must be irreducible since T is injective, so our filtration tells us that T(M_1) is a module of length \mathop{\mathrm{lg}}(M_1) inside M_2. The only such module is M_2, so T is surjective.

Suppose T is surjective. Consider any filtration of M_1 and apply T to every component to get a flag of M_2. If T were not injective, some quotient would be 0 so we would have a filtration of M_2 of length less than \mathop{\mathrm{lg}}(M_1), contradiction. So T is injective. \blacksquare

An almost immediate consequence of this fact is the following lemma.

Lemma 15. If \phi is an endomorphism of M and M' \subset M is a \phi-invariant submodule, then if \phi is invertible, the induced endomorphisms on M' and M/M' are also invertible.

Proof. By Lemma 14 we only need to show injectivity in both cases. Injectivity on M' is immediate. Injectivity on M/M' follows from considering any filtration of M that contains M'. Applying the projection map \pi: M \to M/M' to this filtration gives us a filtration of M/M'. To show injectivity of \phi|_{M/M'}, we just need to show injectivity of every quotient in the filtration. But this is immediate as each quotient is irreducible. \blacksquare

Now we are ready to prove an analogue of theorem 10 for modules. The proof is very similar to that over vector spaces.

Theorem 16. If \phi is an endomorphism of M, then the map

\ker(\phi^\infty) \oplus \mathop{\mathrm{Im}}(\phi^\infty) \to M

is an isomorphism. Note that \phi acts on the first component nilpotently and on the second component invertibly.

Proof. It’s easy to see that \ker(\phi^\infty) and \mathop{\mathrm{Im}}(\phi^\infty) are well-defined by considering the lengths of \ker(\phi^n) and \mathop{\mathrm{Im}}(\phi^n). It’s also easy to see that \phi acts on \ker(\phi^\infty) nilpotently by definition. Note that \phi acts \mathop{\mathrm{Im}}(\phi^\infty) surjectively, so it is invertible by Lemma 14.

Now by Lemma 14, we only need to show either surjectivity or injectivity to get an isomorphism (in addition to doing a length computation). We’ll show injectivity; this is equivalent to showing that there do not exist elements that sum to zero, i.e. there is no intersection between the eventual kernel and the eventual image. But any such intersection must be the zero element, since \phi would act both nilpotently and invertibly on that element.

So we are left with just the length computation. We have the following short exact sequence trivially, which gives us the length computation from Lemma 9.

0 \to \ker(\phi^\infty) \to M \to \mathop{\mathrm{Im}}(\phi^\infty) \to 0

So we are done. \blacksquare

B. Spectral Theory

Now we are going to present the generalization of Theorem 1 and 4: spectral theory of modules over rings. Note that while it is possible to prove a spectral theorem for general commutative rings, we will let the rings be PIDs in order to make our life easier while proving the spectral theorem.

The outline of our proof here is generally similar to that of vector spaces. We’re going to first generalize Theorem 16 to deal with infinitely many endomorphisms instead of just one, and then strategically pick endomorphisms that allow the left hand side to collapse into the spectral theorem that we want.

Theorem 17. Consider a family \phi_i of pairwise commuting endomorphisms over M for all i \in I (I may be infinite). The map

\underset{J \subset I} \oplus M_{(J)} \to M

is an isomorphism, where we define

M_{(J)} = \underset{i \in J} \cap \ker(\phi_i^\infty) \bigcap \underset{i \notin J} \cap \mathop{\mathrm{Im}}(\phi_i^\infty)

Proof. Let’s start with the finite case. Obviously we want to induct on the size of I. We’re going to need the following two lemmas to make this induction work.

Lemma 18. For any \phi-invariant submodule M', we have

(\ker(\phi^\infty) \cap M') \oplus (\mathop{\mathrm{Im}}(\phi^\infty) \cap M') \to M'

is an isomorphism.

Proof. Note that \ker(\phi^\infty) \cap M' = \ker(\phi|_{M'}^\infty) and \mathop{\mathrm{Im}}(\phi^\infty) \cap M' = \mathop{\mathrm{Im}}(\phi|_{M'}^\infty), so we are done by Theorem 16. \blacksquare

Lemma 19. If two endomorphisms of M \phi and \psi commute, then \ker(\phi^\infty) and \mathop{\mathrm{Im}}(\phi^\infty) are \psi-invariant.

Proof. This follows immediately from the fact that \psi \circ \phi^n = \phi^n \circ \psi. \blacksquare

Now our induction argument follows as the base case is given by Theorem 16 and Lemmas 18 and 19 give our inductive step.

We reduce the infinite case to the finite case as follows. Consider all finite subsets I' \subset I. Consider the set I' such that this decomposition has the maximal number of nonzero intersections. We will define a bijection between M_{(J')} where J' \subset I' and M_{(J)} where J \subset I. Specifically, any additional endomorphism \phi_k when applied to M_{(J')} must produce either the eventual image or kernel, and not both, because our I' had the maximal number of nonzero intersections. If it produces the eventual kernel, let k \in J; otherwise let k \notin J. Now our assertion is equivalent to the assertion over I', which we have just proven. \blacksquare

Now in order to create our spectral theorem, we have to decide what our endomorphisms are going to be. Ideally, we would want a set of endomorphisms that reduces the left hand side of Theorem 17 into just a direct sum of kernels. Our examples from linear algebra and finite abelian group theory can motivate us here: specifically, in both cases there we used irreducible polynomials and primes. We know that these are the generators of the maximal ideals within k[t] and \mathbb{Z}, respectively, so we are going to try to make our endomorphisms the action of the generator of the maximal ideal.

Theorem 20. (Spectral theory of modules over PIDs) Let R be a PID and M be an R-module of finite length. Then the map

\underset{I} \oplus M^{(I)} \to M

is an isomorphism, where the direct sum is taken over the maximal ideals I with generators r_I and M^{(I)} is the eventual kernel of the action of r_I.

Proof. Obviously our first step is to apply Theorem 17 where we pick our endomorphisms to be \phi_{r_I}, the action of every r_I \in R. This gives us the isomorphism

\underset{J} \oplus M_{(J)} \simeq M

where we have

M_{(J)} = \underset{r_I \in J} \cap \ker(\phi_{r_I}^\infty) \bigcap \underset{r_I \notin J} \cap \mathop{\mathrm{Im}}(\phi_{r_I}^\infty)

Note that an element is in the eventual kernel of r_I iff it is acted on nilpotently by every element in I. This motivates the use of the following lemma.

Lemma 21. If R is a commutative ring, I \subset R a maximal ideal, and M an R-module such that the action of every element of I on M is nilpotent, then the action of every element r \in R - I on M is invertible.

Proof. Once again, we use the action map. Consider the map T: R \to M such that r \mapsto r \cdot m for some m \neq 0. Clearly I \subset \ker T^\infty, which is an ideal, so either \ker T^\infty = I or \ker T^\infty = R. But the second case implies that m = 0, so we are in the first case. Thus the action of every element r \in R - I must be invertible. \blacksquare

Lemma 21 allows us to clean up the above equation in exactly the way we wanted. Specifically, since no element can be in a maximal ideal and generate a different maximal ideal, no two generators are in the same maximal ideal. Thus by Lemma 21 the intersection of the eventual kernels of any two generators is zero. Similarly, the eventual kernel of any one generator is contained inside the eventual image of every other generator. Finally, the intersection of the eventual image of all the generators is also zero as the module has finite length. So we get precisely the desired result. \blacksquare

This has some nice consequences. For example, it immediately reduces to Theorem 4 when R = \mathbb{Z}. When we move to finite-dimensional vector spaces, i.e. R = k[t], we get a more refined version of Theorem 1 that doesn’t have the V^{\mathop{\mathrm{non-spec}}} term.

Theorem 22. (Spectral theorem of finite-dimensional vector spaces, refined) For any endomorphism T: V \to V, the following map

\underset{p} \oplus V^{(p)} \to V

is an endomorphism, where p runs over all irreducible polynomials in k[t] and V^{(p)} is simply the eventual kernel of p(T).

Proof. This is an immediate consequence of Theorem 20. \blacksquare

Now we know exactly what the V^{\mathop{\mathrm{non-spec}}} term consists of: all irreducible polynomials that weren’t linear, i.e. didn’t correspond exactly to an eigenvalue. So from here we get the correct spectral theorem. Further, we also know why the algebraically closed condition on k implies that the V^{\mathop{\mathrm{non-spec}}} term dies: all irreducibles become linear in this case.

We also get an interesting generalization of Theorem 3 (the Chinese Remainder Theorem) in this case.

Theorem 23. (Chinese Remainder Theorem, refined) Let R be a PID and 0 \neq r \in R a non-unit. Let the factorization of r be

r = \underset{I} \prod r_I^{n_I} \cdot u

Then the map

\underset{I} \oplus R/\left(r_I^{n_I}\right) \to R/(r)

is an isomorphism. (Again, this is identical to the Cartesian product since only finitely many terms of the right hand side are nonzero)

Proof. Apply Theorem 20 to M = R/(r). Now the idea is to show that R/(r)^{(I)} = R/\left(r_I^{n_I} \right). Some simple computations left to the reader show that this ideal is 0 if r_I does not divide r, but otherwise it is the ideal generated by r \left(r_I^{n_I} \right). Thus our map is surjective. We could verify that the ideal shown above is the entire ring, but it’s easier to verify injectivity by noting that \left(\underset{I} \prod r_I^{n_I} \right) = \underset{I} \cap \left(r_I^{n_I} \right), so no nonzero element can map to zero. \blacksquare

This reduces to Theorem 3 when we take R = \mathbb{Z}.

In the next post, I’ll finish the structure theory of modules over PIDs by proving generalizations of the Jordan canonical form and of the minimal and characteristic polynomial, which will allow us to refine linear algebra and finite abelian group theory.

Thanks to Prof. Gaitsgory, who taught me this material in Harvard’s Math 55a this semester.