Crystal Field Theory I

School is back in session, so I figure now is as good a time as any to start writing again!  In this series of posts I will discuss a subject that is a perennial favorite: the physical basis of crystal field theory.  We will consider the origin of the infamous crystal field parameter 10Dq, and later dwell on the Slater/Racah parameters (F0/F2/F4 and A/B/C, respectively).


Crystal field theory (CFT) was first developed by Hans Bethe and John Van Vleck – two physicists who made significant contributions to the modern quantum-mechanical theory of magnetism.  Beth and Van Vleck wanted to understand the paramagnetic behavior of simple transition metal salts (e.g. ferrous chloride). Their ideas grew from the theory of atomic electronic structure, which was relatively well developed at the time; and they elegantly showcase the power of group theory without requiring too much mathematical virtuosity on the part of the reader – a good combination, I think!

Before moving forward, we need to remind ourselves of a few things.  First of all, we need to understand free ion terms.  There are two ways we can think about them.  For first row transition metal ions (TMI), the most commonly-used description comes from the Russell-Saunders (LS) coupling scheme introduced in most undergraduate inorganic chemistry classes.  In the LS coupling  scheme, the spin-orbit splitting of a given free ion term is taken as a perturbation on the relatively large crystal field splitting.  This is in contrast to the splitting of lanthanide free ion terms, whose components’ characteristics are largely determined by spin-orbit coupling.  The validity of the LS coupling scheme for first row TMIs follows from the fact that 1) the 3d orbitals are relatively small, leading to large coulomb/exchange integrals and 2) the magnitude of spin-orbit coupling has a quartic dependence on nuclear charge, and is therefore is small for first row metals.

Implicit in our choice of the LS coupling scheme is the notion that an ‘uncoupled’ basis of direct product microstates

\left| {d_n ,\chi } \right\rangle \in \left\{ {\left| {d_n } \right\rangle \otimes \left| \chi \right\rangle \left| {n = 0, \pm 1, \pm 2,\chi = \alpha ,\beta } \right.} \right\}

can be used to construct good approximate descriptions of the components of first row TMI free ion terms.  For example: a d1 electronic configuration yields a 2D free ion term which is split into 2D5/2 and 2D3/2 levels by S.O. coupling.  The magnitude of the S.O. coupling is taken to be relatively small, however, so we turn our attention to the parent 2D term and its 10 constituent microstates (we will consider S.O. coupling later as a perturbation to the crystal field splitting).

CFT attempts to explain, in a perturbation-theoretic sense, how the components of a parent free ion term (2D in this example) are mixed by an external potential due to a symmetrical collection of point charges (i.e. ligands).  In other words, we will be studying a sort of intramolecular Stark effect. This point of view is admittedly simplistic;  but by throwing covalent contributions to bonding under the bus, we are free to focus our attention on the symmetry-determined aspects of electronic structure that I find most interesting.

The Crystal Field Hamiltonian as a Multipole Expansion

To understand how a 2D free ion term is split by a crystal field, we need to phrase the question in a way that is amenable to calculation.  In other words, the way we represent the crystal field perturbation Hamiltonian \hat H_{cf} should take advantage of whatever symmetry the problem has.  We will consider an octahedral crystal field for the sake of simplicity, but what follows may be generalized to other cases without substantial effort.

Moving forward, the general strategy will be to expand \hat H_{cf} in an appropriate (infinite) basis, and then use symmetry considerations to determine which terms in the expansion contribute to the physics at hand.  Once we have chosen a suitably truncated representation of \hat H_{cf}, we will study how it mixes the 10 components of a 2D free ion term into Eg and T2g levels split by 10Dq.

The angular components of the 2D basis states may be represented by spherical harmonics, i.e.

\left| {d_m } \right\rangle = \left| {R_{n,2} } \right\rangle \otimes \left| {Y_{2,m} } \right\rangle

Spherical harmonics have a number of nice algebraic properties which we can utilize if we expand \hat H_{cf} as a series of spherical harmonics sharing the same coordinate origin as the metal-centered basis states (i.e. as a multipole expansion):

\hat H_{cf} = \alpha \sum\limits_{l = 0}^\infty {\sum\limits_{m = - l}^{m = l} {c_{lm} \hat Y_{lm} } }

where scalar properties of the potential have been subsumed into the constant \alpha (which I will promptly drop and forget about until later posts).

As written, this expansion isn’t terribly useful.  The rules of vector coupling allow us to perform a massive truncation, however:  two states for which l=2 can be only be coupled by components of \hat H_{cf} having l\le  =2+2=4.

\left\langle {d_a ,\chi _b } \right|\hat Y_{l,m} \left| {d_c ,\chi _d } \right\rangle = \left\langle {Y_{2a} } \right|\hat Y_{lm} \left| {Y_{2b} } \right\rangle \times \delta _{bd} = 0 .

Next, we take advantage of the octahedral symmetry of \hat H_{cf}: the potential must remain invariant under Oh symmetries, i.e. it must transform as the totally symmetric (a1g) representation of Oh.  Right off the bat, then, we can toss out the {\hat Y_{1m} } and {\hat Y_{3m} } terms, as they are ungerade and therefore cannot contribute to the totally symmetric potential.  The contribution from the  {\hat Y_{00} } component affects all d orbitals equivalently (since {\hat Y_{00} } transforms as a1g), so it is uninteresting to us.  The {\hat Y_{2m} } components, which transform like the 5 d orbitals, are mixed into eg and t2g sets (in accord with the symmetry of the potential); no linear combination of the {\hat Y_{2m} } spans a1g, so these components are also uninteresting to us.

This leaves the {\hat Y_{4m} } set, which we consider a bit more carefully. First, we formalize our statement about invariance of \hat H_{cf} under Oh symmetries by insisting \hat \Lambda \hat H_{cf} = \hat H_{cf} \forall \hat \Lambda \in O_h. Now suppose \hat \Lambda = \hat C_4. Then

\hat H_{cf} = \hat C_4 \hat H_{cf} = \hat C_4 \sum\limits_{m = - 4}^{m = 4} {c_{4m} \hat Y_{4m} } = \sum\limits_{m = - 4}^{m = 4} {e^{im\pi /2} \delta _{nm} c_{4m} \hat Y_{4m} } = \sum\limits_{n = - 4}^{n = 4} {e^{in\pi /2} c_{4n} \hat Y_{4n} }

The coefficients of the {\hat Y_{4-4} }, {\hat Y_{40} }, and {\hat Y_{44} } components remain invariant, so some linear combination of these components transforms as a1g. To understand the nature of this linear combination, we investigate the action of a few more symmetry operations on \hat H_{cf}. Suppose \hat \Lambda = \hat C'_2. Then

\hat C'_2 \hat H_{cf} = \hat C'_2 \left( {c_{44} \hat Y_{44} + c_{40} \hat Y_{40} + c_{4 - 4} \hat Y_{4 - 4} } \right) = c_{44} \hat Y_{4 - 4} + c_{40} \hat Y_{40} + c_{4 - 4} \hat Y_{44}

From this manipulation, we see c_{44} = c_{4 - 4}, i.e. (ignoring normalization) \hat H_{cf} = \hat Y_{40} + c\left( {\hat Y_{4 - 4} + \hat Y_{44} } \right). To find the value of the remaining coefficient c, we consider one final equation. Suppose \hat \Lambda = \hat C_3. In this case, it is helpful to write out the spherical harmonics in their cartesian form (modulo a multiplicative factor of \sqrt {9/4\pi }:

\hat H_{cf} = \sqrt {\frac{1}{{64}}} \left( {35\frac{{z^4 }}{{r^4 }} + 30\frac{{z^2 }}{{r^2 }} + 3} \right) + c\sqrt {\frac{{35}}{{128}}} \left[ {\left( {\frac{{x + iy}}{r}} \right)^4 + \left( {\frac{{x - iy}}{r}} \right)^4 } \right]

It follows that

\hat C_3 \hat H_{cf} = \sqrt {\frac{1}{{64}}} \left( {35\frac{{x^4 }}{{r^4 }} + 30\frac{{x^2 }}{{r^2 }} + 3} \right) + c\sqrt {\frac{{35}}{{128}}} \left[ {\left( {\frac{{y + iz}}{r}} \right)^4 + \left( {\frac{{y - iz}}{r}} \right)^4 } \right]

Since the two previous expressions must be equivalent, we can solve for c by collecting terms in z4, and ultimately arrive at

3/8 + 2c\sqrt {35/128} = 1 \Rightarrow c = \sqrt {5/14}

which leaves us with a suitable representation of \hat H_{cf} (modulo the constant describing its magnitude):

\hat H_{cf} = \hat Y_{40} + \sqrt {5/14} \left( {\hat Y_{4 - 4} + \hat Y_{44} } \right)

In the next post, we will apply this perturbation Hamiltonian to a 2D free ion term, characterize the resulting states, and hopefully gain a detailed understanding of how the d orbitals mix and split in the presence of an octahedral crystal field!


Leave a comment

Filed under Group Theory in QM

Interlude: What, exactly, is a representation?

I have decided to take a detour from the ongoing series on operator algebra to briefly discuss the titular question, which I think has confounded most of us physical/inorganic chemists at some point. For some, perhaps it still does! There is no shame in this – in my opinion, chemists do a pretty lousy job teaching the representation theory of point groups.  Do not take this as an indictment of chemical education, however.  This is just a consequence of the fact that, as chemists, we are more interested in applications of group theory than the nuts and bolts.  Many are perfectly happy treating group theory as a black box, and this is fine if your work demands no more than an undergraduate level of sophistication.  For ‘higher-level’ manipulations involving applied group theory, however, I think it is important (on principle!) to appreciate the nuts and bolts.  Here, I will try to offer a lucid explanation of what representation theory is all about.  Note that my explanation will assume a working understanding of rings and homomorphisms.  These concepts are both very important and relatively friendly, so if they are foreign, I encourage you to take a few minutes to familiarize yourself before moving forward.  It will be worth the effort!

What is a representation?

What I want to do is pick apart the definition of ‘representation’.  I will toss in some examples along the way to make things concrete. An n-dimensional real/complex representation \Pi of a group G is a group homomorphism from G into the general linear group GL\left( {n,{\rm{R}}} \right)/GL\left( {n,{\rm{C}}} \right) (respectively). \Pi therefore identifies each element of G with an nn invertible matrix, and the resulting collection of matrices (the image of G under \Pi )  obeys the group multiplication law.  From the definition of group homomorphism: for a,b,c \in G, a*b = c{\rm{ }} \Rightarrow {\rm{ }}\prod \left( a \right)\prod \left( b \right) = \prod \left( c \right){\rm{ }}.

I want to emphasize right off the bat that we are concerned with \Pi , the homomorphism, and not \prod \left( G \right) = \left\{ {\prod \left( g \right) \left| {g \in G} \right.} \right\} the image of G under \Pi !  While the \Pi  and its graph \left\{ {\left( {g,\prod \left( g \right) } \right)\left| {g \in G} \right.} \right\} \subset G\times GL\left( {n,R} \right) ultimately provide the same information, I find it much easier to wrap my head around the former.  Note that this is very much akin to how we can study a function f without reference to its argument (e.g. Dirac formalism). Let me illustrate my point with an example.  The spherical harmonics \left\{ {\left| {Y^m _\ell } \right\rangle } \right\} sharing the same value of \ell span the (2\ell+1)-dimensional irreducible representation of the special orthogonal group SO\left( 3 \right).  This means that the \left\{ {\left| {Y^m _\ell } \right\rangle } \right\} provide a homomorphism from SO\left( 3 \right) into GL\left(2\ell+1 ,C \right): for {\hat R} \in SO\left( 3 \right) ,

{\prod _{2\ell + 1} \left( {\hat R} \right)} = \left\langle {Y^i _\ell } \right|\hat R\left| {Y^j _\ell } \right\rangle \in GL\left( {2\ell + 1},C \right)

This example illustrates why {\prod _{2\ell + 1} } necessarily transforms like (i.e. ‘looks’ like) the \left\{ {\left| {Y^m _\ell } \right\rangle } \right\}, and how we can study the transformation properties of {\prod _{2\ell + 1} } with no knowledge of the image \prod _{2\ell + 1} \left( {SO\left( 3 \right)} \right).  I think it also provides some nice insight into why we so often visualize representations as symmetrical objects, or collections of symmetrical objects.  It follows that the statement

d_{xy} spans the A_2 irreducible representation of C_{2v}

really means that, for g \in C_{2v}, A_2 \left( g \right) = \left\langle {d_{xy} } \right|g\left| {d_{xy} } \right\rangle.  To make this more concrete, consider an example:

A_2 \left( \hat \sigma _v \right) = \left\langle {d_{xy} } \right|\hat \sigma _v \left| {d_{xy} } \right\rangle = \left\langle {d_{xy} } \right|\left( { - 1} \right)\left| {d_{xy} } \right\rangle = -\left\langle {d_{xy} } \right|\left. {d_{xy} } \right\rangle = - 1

Note that there was nothing particularly special about d_{xy} in all of this – we just need an object that transforms like d_{xy} to construct the representation and some inner product structure.

Where do representations ‘live’?

Vectors live in vector spaces, and operators corresponding to physical observables inhabit a Lie algebra. So where do representations live?  Anybody who has manipulated representations using a character table has probably realized that representations behave suspiciously like vectors.  As you might have guessed, this is no coincidence! Let \prod be an m-dimensional complex representation of the group G = \left\{ {g_1 ,g_2 ,...,g_n } \right\}, and consider the graph \left\{ {\left( {g,\prod \left( g \right)} \right)\left| {g \in G} \right.} \right\}.  We can identify \left\{ {\left( {g,\prod \left( g \right)} \right)\left| {g \in G} \right.} \right\} (and therefore \prod) in a natural way with the ordered n-tuple

\left( {\prod \left( {g_1 } \right),\prod \left( {g_2 } \right),...,\prod \left( {g_n } \right)} \right) \in GL\left( {m,C} \right)^n


GL\left( {m,C} \right)^n = \underbrace {GL\left( {m,C} \right) \times GL\left( {m,C} \right) \times ... \times GL\left( {m,C} \right)}_{n{\rm{ times}}}

This vector-like quantity, which is equivalent to \prod, is an element of the free module M_{mxm} \left( R \right)^n (edit: thanks u/AngelTC for pointing out that this is a free module).  Free modules are very much like vector spaces.  The crucial difference is that their elements are expressed in terms of coordinates taken from a ring (e.g. the integers, x n matrices, etc.), not (necessarily) a field.  For example, let Z denote the integers.  Then ZxZxis a free module, while RxRxR is a vector space.  It follows that all vector spaces are free modules (since all fields are rings), but not all free modules are vector spaces.

The operations we use to combine representations (i.e. the direct sum\oplus” and Kronecker product\otimes“) do not rely on modular structure beyond what provided by the base ring, however.  For two complex representations of G of dimension n and m (\prod _1 and \prod _2, respectively), we have \prod _1 \otimes \prod _2 :G \to GL(mn,C) and \prod _1 \oplus \prod _2 :G \to GL(m + n,C).  In other words, these operations take us into a different module than the ones we started in.

The integers form a ring under addition and multiplication, and the representations of a group G under \oplus and \otimes have similar structure (thanks to u/eruonna for pointing out that only form a semi-ring, however, as there is no additive inverse defined in this case). The details can really give you a headache here, but we have a tool at our disposal that makes everything nice and tidy: the character function \chi, given as

\chi \prod \left( g \right) = {\rm{Tr}}\prod \left( g \right) \in C

where Tr denotes the matrix trace.  \chi has some nice properties, including:

\begin{array}{l}  \chi \left( {\prod _1 \otimes \prod _2 \left( g \right)} \right) = \chi \left( {\prod _1 \left( g \right)} \right)\chi \left( {\prod _2 \left( g \right)} \right) \\  \chi \left( {\prod _1 \oplus \prod _2 \left( g \right)} \right) = \chi \left( {\prod _1 \left( g \right)} \right) + \chi \left( {\prod _2 \left( g \right)} \right) \\  \end{array}

It follows that \chi is a map from the semi-ring of representations of the group G (under the operations “\oplus” and “\otimes“) into the ring of functions f:G \to C (thanks again to u/eruonna for your correction).  If \prod is an m-dimensional complex representation of the group G = \left\{ {g_1 ,g_2 ,...,g_n } \right\}, then

\chi \left( {\prod \left( {g_1 } \right),\prod \left( {g_2 } \right),...,\prod \left( {g_n } \right)} \right) = \left( {{\rm{Tr}}\prod \left( {g_1 } \right),{\rm{Tr}}\prod \left( {g_2 } \right),...,{\rm{Tr}}\prod \left( {g_n } \right)} \right)

What does this buy us?  Well, for one, this is where character tables come from!  And the inner product on C^n (which formally should be taken as a module in this case) gives us a simple way to define geometric relationships between representations of different dimension.  This structure provides the basis for the ‘great orthogonality theorem’, aka Schur orthogonality relations – the machinery that we use to painlessly decompose reducible representations into direct sums of irreducible representations.  And to think, all of this wonderful structure was hiding below the surface, just waiting to be revealed by \chi!  Of course there is even more to be seen, but I will save it for another day.  In my next post, we will return to the scheduled programming.  Thanks for reading!


Leave a comment

Filed under Representation Theory

Operator Algebra III – Time Evolution and the Hamiltonian

At the end of my last post I promised we would spend some time thinking about the transformations that are brought about by Hermitian operators.  This may sound horribly abstract, so let me remind you of a very concrete claim that I have made a few times in the past: that the Hamiltonian generates time translations.  The goal of this post is to pick apart this statement and understand it on a fundamental level. I believe it is well within our reach now that we have the commutator in our pocket.  Note that the logic presented here generalizes, and I will speak to this generality later, but for now the specific case of the Hamiltonian provides a useful illustration of how to go about studying transformations induced by quantum mechanical operators.

As I said just a moment ago, we know an important fact about the Hamiltonian – i.e. that it generates time translations – and I have, up until now, parroted this fact without providing any intuition for why it is true.  Off the top of my head I can think of two ways to demonstrate the veracity of my claim.  The first is totally superficial, but worth seeing if you haven’t already.  Starting from the time-dependent Schrodinger equation, this is an almost trivial endeavor:

i\hbar \frac{\partial }{{\partial t}}\left| {\psi \left( t \right)} \right\rangle = \hat H\left| {\psi \left( t \right)} \right\rangle \Leftrightarrow \frac{\partial }{{\partial t}}\left| {\psi \left( t \right)} \right\rangle = \frac{{\hat H}}{{i\hbar }}\left| {\psi \left( t \right)} \right\rangle \Leftrightarrow \left| {\psi \left( t \right)} \right\rangle = e^{\hat Ht/i\hbar } \left| {\psi \left( 0 \right)} \right\rangle

(Edit: note that we have taken the Hamiltonian to be time-independent, and will continue to do so for the rest of this post.  Thank you Alec for making me realize I should have pointed this out explicitly!)

While formally correct, this statement is totally opaque.  It doesn’t admit any physical insight because (in my mind) the time-dependent Schrodinger equation is subordinate to the fact that the Hamiltonian generates time translations.  In short, we are letting the tail wag the dog, and we should instead try to understand how the Hamiltonian generates time translations, and then use this intuition to “derive” the TD Schrodinger equation.

I think a brief jaunt through world of classical mechanics will help us achieve this goal. Suppose we have an uncharged, non-relativistic particle moving about in regular 3-d space through a potential that varies with position.  The particle’s trajectory in position space is dictated by its kinetic energy (a function of momentum), while the particle’s trajectory in momentum space is dictated by the potential (a function of position).  In other words, as we march forward in time by an infinitesimal step \varepsilon , the particle’s position in phase space (which is, speaking loosely, the direct sum of position and momentum space) evolves according to

\begin{array}{l}  \vec x\left( {t + \varepsilon } \right) = \vec x\left( t \right) + \varepsilon \frac{{d\vec x\left( t \right)}}{{dt}} = \vec x\left( t \right) + \frac{\varepsilon }{m}\vec p\left( t \right) = \vec x\left( t \right) + \varepsilon \nabla _p T\left( {\vec p\left( t \right)} \right) \\  \vec p\left( {t + \varepsilon } \right) = \vec p\left( t \right) - \varepsilon \nabla _x V\left( {\vec x\left( t \right)} \right) \\  \end{array}

Where T\left( {\vec p\left( t \right)} \right) = \vec p^2 \left( t \right)/2m and V\left( {\vec x\left( t \right)} \right) represent the potential and kinetic energy (respectively) of the particle; and {\nabla _p } and {\nabla _x } operate on the position and momentum components of vector describing the particle’s location in phase space. These observations can be written suggestively as

\begin{array}{l} \frac{d}{{dt}}\left( {\vec x\left( t \right);\vec p\left( t \right)} \right) = \left( {\nabla _p - \nabla _x } \right)\left( {T\left( {\vec x\left( t \right);\vec p\left( t \right)} \right) + V\left( {\vec x\left( t \right);\vec p\left( t \right)} \right)} \right) = ... \\ ... = \left( {\nabla _p - \nabla _x } \right)H\left( {\vec x\left( t \right);\vec p\left( t \right)} \right) \\ \end{array}

Here, we see that the time evolution of the system is given by the symplectic gradient \left( {\nabla _p - \nabla _x } \right) of the Hamiltonian.  This is the mechanism through which the Hamiltonian generates time translation in classical mechanics. We don’t have to work too hard to extend this result to the quantum Hamiltonian. At this point, the commutator becomes quite useful. Recall that the commutator can be thought of as a directional derivative with respect to conjugation

\left[ {A,B} \right] = \mathop {\lim }\limits_{t \to 0} \frac{{e^{ - itA} Be^{itA} - B}}{{it}}

for Hermitian operators A and B. With this in mind, let’s consider the transformation that the Hamiltonian induces on {\hat x} and {\hat p} .

\left[ {\hat H,\hat x} \right] = \left[ {\frac{{\hat p^2 }}{{2m}},\hat x} \right] = \frac{1}{{2m}}\left( {\hat p\left[ {\hat p,\hat x} \right] + \left[ {\hat p,\hat x} \right]\hat p} \right) = \frac{\hbar }{{im}}\hat p = \frac{\hbar }{i}\frac{{\partial \hat T\left( {\hat p} \right)}}{{\partial \hat p}}

\begin{array}{l}  \left[ {\hat H,\hat p} \right] = \left[ {\hat V\left( {\hat x} \right),\hat p} \right] = \left[ {\sum\limits_n {a_n \hat x^n } ,\hat p} \right] = \sum\limits_n {a_n \left[ {\hat x^n ,\hat p} \right]} = ... \\  ... = \sum\limits_n {na_n \hat x^{n - 1} \left[ {\hat x,\hat p} \right]} = - \frac{\hbar }{i}\frac{{\partial \hat V\left( {\hat x} \right)}}{{\partial \hat x}} \\  \end{array}

These results look familiar! It is not a coincidence that we can make the identifications

\begin{array}{l}  \frac{{d\vec x}}{{dt}} = \nabla _p T\left( {\vec p} \right) \Leftrightarrow i/\hbar \left[ {\hat H,\hat x} \right] = \frac{{\partial \hat T\left( {\hat p} \right)}}{{\partial \hat p}} \\  \frac{{d\vec p}}{{dt}} = - \nabla _x V\left( {\vec x} \right) \Leftrightarrow i/\hbar \left[ {\hat H,\hat p} \right] = - \frac{{\partial \hat V\left( {\hat x} \right)}}{{\partial \hat x}} \\  \end{array}

Next, we take advantage of the fact that expectation values necessarily behave classically.  It therefore makes sense to write \frac{{d\left\langle {\hat x} \right\rangle }}{{dt}} = \frac{i}{\hbar}\left\langle {\left[ {\hat H,\hat x} \right]} \right\rangle and \frac{{d\left\langle {\hat p} \right\rangle }}{{dt}} = \frac{i}{\hbar}\left\langle {\left[ {\hat H,\hat p} \right]} \right\rangle .  I will leave out the details, but these two results imply that \frac{{d\left\langle {\hat \Omega } \right\rangle }}{{dt}} = \frac{i}{\hbar}\left\langle {\left[ {\hat H,\hat \Omega } \right]} \right\rangle for an arbitrary operator {\hat \Omega } , since {\hat \Omega } can be written in terms containing powers of {\hat x} and {\hat p} .

This result – Ehrenfest’s theorem – is one of the most important in quantum mechanics, and we will use it to our advantage. Let’s examine the implications of this theorem, working from the definition of the commutator (vide supra).

\begin{array}{l}  \frac{i}{\hbar }\left\langle {\left[ {\hat H,\hat \Omega } \right]} \right\rangle = i\left\langle {\left[ {\hat H/\hbar ,\hat \Omega } \right]} \right\rangle = \left\langle {\mathop {\lim }\limits_{t \to 0} \left( {e^{ - it\hat H/\hbar } \hat \Omega e^{it\hat H/\hbar } - \hat \Omega } \right)/t} \right\rangle = ... \\  ... = \mathop {\lim }\limits_{t \to 0} \left( {\left\langle {\psi \left( 0 \right)} \right|e^{ - it\hat H/\hbar } \hat \Omega e^{it\hat H/\hbar } \left| {\psi \left( 0 \right)} \right\rangle - \left\langle {\psi \left( 0 \right)} \right|\hat \Omega \left| {\psi \left( 0 \right)} \right\rangle } \right)/t \\  \end{array}

For this result to be physically meaningful, it must be true that

\left\langle {\psi \left( 0 \right)} \right|e^{ - it\hat H/\hbar } \hat \Omega e^{it\hat H/\hbar } \left| {\psi \left( 0 \right)} \right\rangle = \left\langle {\psi \left( t \right)} \right|\hat \Omega \left| {\psi \left( t \right)} \right\rangle

It follows that the Hamiltonian generates the dynamics of our system, and we get the time-dependent Schrodinger equation for free:

e^{it\hat H/\hbar } \left| {\psi \left( 0 \right)} \right\rangle = \left| {\psi \left( t \right)} \right\rangle \Leftrightarrow i\hbar \frac{\partial }{{\partial t}}\left| {\psi \left( t \right)} \right\rangle = \hat H\left| {\psi \left( t \right)} \right\rangle

Hopefully this post has provided some insight into why the Hamiltonian generates time translations (in both classical and quantum mechanics).  In the next post, I will generalize the logic presented here to arbitrary operators, and wrap up this subject with a discussion of Noether’s theorem, symmetry, and conservation laws.  Thanks for reading!



Filed under Group Theory in QM

Operator Algebra II – The Commutator

Anybody who has taken an undergraduate course on quantum chemistry or quantum mechanics is familiar with the commutator on some level.  In these courses, I was told how to use it to carry out some basic (but very important!) calculations, and was given a teaser of its physical significance in terms of incompatible observables.  In short, I learned some very important facts about the commutator, but my understanding remained superficial until I started to read about the theory of Lie algebras and Lie groups.

In this post, I want to provide some geometric insight into the commutator.  I want to build this intuition because I like to think geometrically – particularly when it comes to linear algebra and group theory – and because there is often a fundamental connection between physics and geometry.  I will start by saying that Lie groups can have very interesting geometric structure, and that the rules of algebra provide insight into this structure.  In the case of Lie groups and Lie algebras, this insight comes from the commutator.

The geometry of Lie groups and the commutator are tied together by a number of intermediate concepts, so before I go further down the rabbit hole, let’s step back and think about where geometric structure comes from.  We already know that some vector spaces come endowed with a way to talk geometry that is, more or less, in line with our intuition for what is ‘normal’ as non-relativistic creatures living in Euclidean 3-space.  In fact, any Hilbert space has enough structure for us to think about geometry in this way (modulo some extra dimensions).  This all boils down to having a way to define lengths and angles between vectors – structure endowed by an inner product.

Now, suppose we take the vector space \left\{ {\left( {x,y,0} \right)} \right\} \subset R^3 and continuously deform it into a smooth 2-dimensional surface (manifold) embedded in R^3 .  Perhaps this deformation does nothing more than introduce a few hills and valleys.  In any case, it destroys quite a bit of the structure that made life simple in R^2 ; and the resulting manifold is not a vector space.  One casualty of this deformation is the inner product on R^2 .  So how do we talk about the geometry in its absence?  In this case, we can still talk about local geometry without getting into trouble by considering the plane tangent to wherever I happen to be standing on the manifold.  This tangent plane is a perfectly legitimate vector space (isomorphic to R^2 ), and we can use the local structure of this tangent space to make sense of the local structure of the manifold.

We can get a feel for the local ‘shape’ the manifold by noting how it deviates from the tangent plane in each direction.  In other words, we get insight from directional derivatives, and the fact that we can take directional derivatives is due to the local coordinates provided by the tangent space.  From directional derivatives, we recover useful information about things like curvature – information that gives us geometric insight without any reference to the space in which the manifold is embedded.

This is not too hard to think about for a nice, well-behaved 2-d manifold embedded in 3-d space, where we can think about directional derivatives in the ‘normal’ calc II/III sense.  But now I assert that we should be thinking about our favorite Lie group G as a manifold, and its Lie algebra as the tangent space of G at G‘s identity element.  To help us digest this last statement, let’s pause to consider a simple example: the complex unit circle U(1) = \left\{ {e^{i\theta } \left| {\theta \in R} \right.} \right\} .  This is a one-dimensional Lie group under regular multiplication.  The identity of the group is ‘1’, and the tangent space at the identity is therefore \left\{ {iy\left| {y \in R} \right.} \right\} \subset C .  There are two points of interest here:  first, the fact that the Lie group and Lie algebra are related through the exponential map really hits you in the face!  Second: the algebra has linear structure, while the group is…curvy.  The need not be true in general, but it illustrates that the Lie group can have a ‘shape’ to it that is a bit more interesting than that of a linear vector space.

Of course, the geometry of the complex unit circle is not terribly complicated (or interesting).  To return to the concept of directional derivatives, we require a Lie group with a few more dimensions and a bit more algebraic structure, i.e. one for which the commutator does not vanish trivially.  I think SO(3) , the group of 3×3 proper rotation matrices, is a good place to start because it is simple (in both the literal and group theoretical sense) and serves as the foundation for the theory of angular momentum in 3-d space.  We can think of SO(3) as a manifold embedded in M_{3x3} \left( R \right) , and its Lie algebra, so(3) , is the space of 3×3 antisymmetric matrices endowed with the commutator.

Now we are finally ready to talk about directional derivatives. Choose two elements A,B \in so(3) . Exponentiate the first and introduce a parameter t \in R so that A \to e^{tA} \in SO(3) . Suppose we allow e^{tA} to act on B by conjugation. Then B_A (t) = e^{ - tA} Be^{tA} \in so(3) .  Recall from linear algebra that conjugation is equivalent to a change of basis transformation – in this case, a rotation.  And as we vary the parameter 0 \le t < \infty  , the point B_A (t) moves along a trajectory in so(3) . The precise nature of this trajectory will, of course, depend on our choice of A , but we can at least say that it is a closed loop (since we are conjugating by a rotation matrix).  Since so(3) is a vector space, we can take derivatives along B_A (t) with respect to the parameter t without any major issues.  Now, in principle, we can take derivatives anywhere on our trajectory, but it turns out that the derivative evaluated at t=0 is of particular interest:

\frac{{dB_A (t)}}{{dt}} = \mathop {\lim }\limits_{t \to 0} \frac{{e^{ - tA} Be^{tA} - B}}{t} = \frac{d}{{dt}}\left. {e^{ - tA} Be^{tA} } \right|_{t = 0} = AB - BA = [A,B]

And there we have it – a formal definition of the commutator that really drives home the fact that it is a special sort of directional derivative – it evaluates how B transforms in the ‘direction’ of A under conjugation.  This is a pretty neat idea!  Let’s solidify it with an example, again from the algebra of rotations (this time from su(2) , which is very closely related to so(3) ).  The angular momentum operators \left\{ {iJ_x , iJ_y , iJ_z } \right\} generate infinitesimal rotations.  If this does not sound familiar, I will point you to my last post, which contains a simple example of how elements of a Lie algebra can generate infinitesimal transformations.  Let’s consider the canonical commutation relationships for these operators: \left[ {iJ_i ,iJ_j } \right] = -\left[ {J_i ,J_j } \right] = -i\varepsilon _{ijk} J_k where \varepsilon _{ijk} is the Levi-Civita symbol. We know from above that [iJ_x, iJ_y] tells us how iJ_y changes in the ‘direction’ of iJ_x .  Think of iJ_y as a unit vector oriented along the y-axis.  Conjugation by e^{itJ_x} (which represents a rotation by –t radians about the x-axis) rotates this vector clockwise in the yz plane as the parameter t increases.  If we evaluate the derivative of this circular trajectory with respect to t at t=0, the result is a vector pointing in the negative z direction.  In other words, we arrive at a familiar result: [iJ_x,iJ_y] = -[J_x,J_y] = -iJ_z .

I think this example is very nice because it appeals to our geometric intuition.  It is also easy to imagine how it could be generalized to other cases.  For example: if we take it as a given that the Hamiltonian generates time translations, we have the tools to make sense of the fact that \left[ {H, * } \right] = - i\partial /\partial t .  But suppose we didn’t know the Hamiltonian generates time translations.  How could we figure this out for ourselves from first principles?  I hope to address this in the next post as part of a discussion of the transformation properties (symmetries) of operators compatible observables.  Thanks for reading!


Leave a comment

Filed under Group Theory in QM

Operator Algebra – Preamble

As a chemist, I am used to thinking of operators as tools for extracting physical observables from quantum states.  This perspective is very intuitive for me, as the act of making a spectroscopic measurement in the laboratory can be identified in a very concrete way with the action of a quantum-mechanical operator.  The relationship between operators and kets therefore occupies a special place in my heart. I think a true physicist or mathematician (i.e. somebody who is not me!) might be quick to point out a certain backwardness in this point of view, however.  As my understanding of quantum mechanics develops, I gain more of an appreciation for the fact that quantum theories are constructed from the relationships between operators; and that kets are ‘merely’ decoration we hang from the resulting algebraic structure.  Hopefully this series of posts will help you appreciate this point of view if you do not already.  I am still very much in the process of wrapping my head around it, so I invite you to join me as I explore some basic elements of operator algebra in quantum mechanics.  Feel free to ask some questions along the way, and I will do my best to get you an answer.


Where do the algebraic relationships between quantum mechanical operators come from?  We have some axioms – fundamental ‘truths’ about the world that we hold as self evident.  These tell us, for example, what Hilbert space makes a natural a setting for a particular theory.  They also might tell us which states within the Hilbert space are physically meaningful, and which are pathological.  In addition to physical axioms, we have mathematical axioms that are even more fundamental.  Together, these sets of axioms tell us how to go about building a quantum theory.

If we want to understand the mathematical rules governing the operators of a quantum theory, we need to first understand where these operators ‘live’.  Speaking loosely (this will, for better or worse, be the norm here), operators can be thought of as maps from a vector space V onto itself.  For the sake of brevity and my sanity, I will restrict the scope of this discussion to Hermitian operators and unitary operators, which are particularly important in quantum theory .  Let’s start with Hermitian operators.  They have real eigenvalues, and a complete orthonormal basis can be constructed from their eigenvectors.  Great.  These are desirable properties, but I won’t get distracted by them right now.  What is important here is that we can construct a vector space over the real numbers from the set of Hermitian operators acting on a vector space V.  This is in contrast to, say, the unitary operators acting on V, which do not form a group under addition (an axiom satisfied by vector spaces).

The space of Hermitian operators acting on V, which we will call H, has a bit more algebraic structure than a regular vector space.  This is perhaps more obvious if we represent the elements of H as matrices (we will assume for now that this is possible), because matrix multiplication gives us a new way to combine two operators to get a third.  There is a slight problem, though.  We are looking for algebraic structure in the form of a new binary operation on H that is algebraically closed.  The matrix representation of H is not closed under matrix multiplication; as far as I know, the product of two Hermitian matrices A and B is Hermitian if and only if [A,B]=0.  H must be closed under the commutator, however, as i[A,B] is Hermitian if A and B are Hermitian.  The vector space H endowed with the commutator i[*,*] makes H an algebra over the real numbers (a more familiar example of an algebra is R3 equipped with the cross product).  The commutator satisfies some additional axioms that make H a special type of algebra – a Lie algebra –  the natural habitat of the Hermitian operator in quantum theory.

Hopefully we now have a bit of intuition for why Lie algebras can be a natural setting for studying quantum mechanical operators.  What remains to be shown is how they are useful – we want to understand the connection between Lie algebras and the physical world!  The phrase ‘generator of infinitesimal transformation’ is often thrown around by the physics folk when talking about elements of our Lie algebra of Hermitian operators, H.  Since physicists say this, it presumably has a physical interpretation.  Which is…what, exactly?  By infinitesimal transformation, they mean an infinitesimal transformation of state space (e.g. through a change of basis).  And by generator, they mean that which brings about the infinitesimal transformation.  To make this concrete, let’s pick apart a statement that sounds like it should be true: momentum operators generate translations. This exposition will undoubtedly be familiar to some, but I think it illustrates some core concepts nicely, so it is always worth revisiting. Suppose \psi \left( x \right) is a suitable wavefunction (in accord with physical axioms) defined on the real line, and the operator \hat A induces the infinitesimal change of basis \hat A\left| x \right\rangle = \left| {x + \varepsilon } \right\rangle\ . We want to understand what form \hat A\ takes. Because \varepsilon is infinitesimal, we just need a Taylor series to first order to arrive at the desired result:

\  \hat A\psi \left( x \right) = \psi \left( {x + \varepsilon } \right) = \psi \left( x \right) + \varepsilon \frac{{d\psi }}{{dx}} = \left( {1 + \varepsilon \frac{d}{{dx}}} \right)\psi \left( x \right) = \left( {1 + \frac{{i\varepsilon }}{\hbar }\hat p} \right)\psi \left( x \right)  \

We are left to conclude \hat A = \left( {1 + \frac{{i\varepsilon }}{\hbar }\hat p} \right)  \ . Now, if momentum operators generate infinitesimal translations, they should (in a just world) be able to generate finite translations too.  Suppose we lift the restriction that \varepsilon be an infinitesimal quantity. In this case, we need the full Taylor expansion:

\  \hat A\psi \left( x \right) = \psi \left( {x + \varepsilon } \right) = \sum\limits_{n = 0}^\infty {\frac{1}{{n!}}\left( {\varepsilon \frac{d}{{dx}}} \right)^n } \psi \left( x \right) = e^{\varepsilon \frac{d}{{dx}}} \psi \left( x \right) = e^{\frac{{i\varepsilon }}{\hbar }\hat p_x } \psi \left( x \right)  \

We are left to conclude that \hat A = e^{\frac{{i\varepsilon }}{\hbar }\hat p_x } \ . Note that this definition agrees with the former version in the limit that \varepsilon becomes infinitesimal. This technique – constructing, through exponentiation, an operator that induces a finite transformation from an operator that induces an infinitesimal transformation – generalizes nicely.  In fact, the exponential map provides the fundamental connection between our Lie algebra of infinitesimal transformations, and the associated Lie group of finite transformations.  There is a vast amount of information that goes into these statements, and we will spend some time thinking about them over the next posts.

Moving forward, I hope to ‘formalize’ (there will be lots of hand waving) and generalize some of the statements I made here.  The commutator will take a central role in this discussion, so the main goal for the next post is to start thinking about the geometric (and therefore physical) interpretation of this operation.  I don’t think I began to appreciate the depth in statements like ‘the commutator tells us when two observables can be measured simultaneously with arbitrary precision’ until I developed this mathematical intuition, so I think this is a very exciting topic!



Filed under Group Theory in QM