But what exactly is \(i\) really?

Most books and courses on complex numbers introduce the mysterious square root of minus one by assigning it a symbol, usually but not always the letter \(i\), and then use it in calculations as if it were any other number. They make no attempt to explain what \(i\) ‘really is’ and assume that once the student has done enough calculations with it they will forget any metaphysical angst they may originally have had. This could be likened to the attitude in certain branches of physics that has been caricatured as ‘shut up and calculate’. For example in 1831 the great Carl Friedrich Gauss, after initially worrying about ‘the true metaphysics of the square root of \(-1\)’, wrote:

If one formerly contemplated this subject from a false point of view and therefore found a mysterious darkness, this is in large part attributable to clumsy terminology. Had one not called \(+1\), \(-1\) and \(\sqrt{-1}\) positive, negative and imaginary (or even impossible) units but instead, say, direct, inverse, or lateral units, then there could scarcely have been talk of such darkness.

This is just brushing the problem under the carpet. Calling it a ‘lateral unit’ merely renames it without providing any insight, and no matter how familiar you are with its use, as soon as someone asks you what \(i\) really is you will realise you still don’t know. Some books answer this question by saying that a complex number is ‘really’ just a two-dimensional vector but that is only a partial explanation and technically not even that. Fortunately it is possible to give a satisfactory answer to this question. This article explains how.

But before diving into this it may be worth looking at a related question that you also probably didn’t realise you didn’t know the answer to.

Why does \((-1)^2 = +1\) and not \(-1\) or even some as-yet-undiscovered kind of thing?

The fact that the square of a negative number is positive is usually taken for granted, but can you prove it? If you were shown a proof of this at school it may have been along the lines of the diagram shown in Figure 1.

Simplistic argument

The argument being that the square in the upper right quadrant has an area equal to \( (+1) \times (+1) \) while the the square in the lower left quadrant has an area equal to \( (-1) \times (-1).\) These are both equal because you can exactly superimpose one on top of the other. And anyway all areas are positive. This argument is false because it also implies that \((-1) \times (+1)\) is positive.

Another common explanation is based on the fact that multiplication is is just another name for repeated addition, at least when talking about integers. So, for example, if you start with zero and add five bundles of twelve you get the same answer as \(5 \times 12\). By extension, multiplying by a negative number should have the same effect as repeated subtraction, so if you start with zero and subtract five bundles of twelve you get the same answer as \((-5) \times 12\). It is then only a short step to the idea that subtracting five bundles of \(-12\) is the same as \((-5) \times (-12)\), but subtracting five bundles of \(-12\) is the same as adding five bundles of \(+12\).

Yet another explanation I found on YouTube is that it is an arbitrary convention, and that we could have chosen it to be the other way round.

The real reason why the square of a negative number is positive is because of the distributive property of multiplication. That is, for any three numbers, \(a\), \(b\) and \(c\):

\begin{equation}\label{eq:left_distrib} a(b+c)=ab+ac \end{equation}

There are two varieties of this property. Equation \eqref{eq:left_distrib} shows left-distributivity, but there is also right-distributivity, as given by

\begin{equation}\label{eq:right_distrib} (b+c)a=ba+ca \end{equation}

Using only these two properties you can rigorously prove that \((-x)^2 = (+x)^2\) when \(x\) is any type of mathematical entity that has these two properties.

Proof

Let \(y = -x\), which is the same as

\begin{equation}\label{eq:xy0} x + y = 0 \end{equation}

Left-multiply Equation \eqref{eq:xy0} by \(x\)

\begin{equation*} x \left(x + y \right) = 0 \end{equation*}

Expand using left-distributivity

\begin{equation}\label{eq:left_expanded} x^2 + xy = 0 \end{equation}

Right-multiply Equation \eqref{eq:xy0} by \(y\)

\begin{equation*} \left(x + y \right) y = 0 \end{equation*}

Expand using right-distributivity

\begin{equation}\label{eq:right_expanded} xy + y^2 = 0 \end{equation}

Subtract Equation \eqref{eq:right_expanded} from Equation \eqref{eq:left_expanded}:

\begin{equation*} x^2 – y^2 = 0 \end{equation*}

Add \(y^2\) to both sides:

\begin{equation*} x^2 = y^2 \end{equation*}

But we know from Equation \eqref{eq:xy0} that \(y = -x\) so

\begin{equation}\label{eq:plus_squared_equals_minus_squared} (+x)^2 = (-x)^2 \end{equation}

We haven’t relied on the commutative property of multiplication so Equation \eqref{eq:plus_squared_equals_minus_squared} is true for entities such as square matrices that are left and right distributive but not commutative.

Equation \eqref{eq:plus_squared_equals_minus_squared} shows that \((-x)^2\) is the same kind of mathematical entity as \((+x)^2\), otherwise they couldn't be equal. So if one of them is an ordinary number, or a vector or a matrix or some other kind of mathematical object, then so too is the other.

To be able to say that \(x\) is ‘positive’ or ‘negative’ requires that \(x\) has an ordering relation. We cannot call a vector or a matrix positive or negative, for example, because there is no ‘greater than’ or ‘less than’ defined for those kinds of things. So although the relation \((-x)^2 = (+x)^2\) may be true when \(x\) is, for example, a square matrix, we can’t say that either \((+x)^2\) or \((-x)^2\) is greater than or less than zero. For ordinary numbers, which do have an ordering relation, we know that if \(x > 0\) and \(y > 0\) then \(xy > 0\) from which it follows that \((+x)^2 > 0\) and so \((-x)^2 > 0\)

The algebraic structure definition

Now let’s explore the true meaning of \(i\).

Instead of constructing complex numbers using mysterious symbols you can define them solely in terms of their algebraic structure. An algebraic structure is just a set of entities together with one or more operations that act on those entities. In this case the entities are the numbers themselves and, because complex numbers are supposed to be generalisations of ordinary numbers, they have the same operations; namely addition and multiplication.

To construct the required algebraic structure, first consider the form in which complex numbers are usually written:

\begin{equation}\label{eq:complex_number} z = x + iy \end{equation}

If you know \(x\) and \(y\) then you know everything there is to know about \(z\). In fact \(x\) and \(y\) are all you need to construct \(z\). Therefore the ordered pair \((x, \, y)\) is just as complete a specification of \(z\) as is \(x+iy\) so it is perfectly valid to write it as such:

\begin{equation}\label{eq:vector_form} z = (x, \, y) \end{equation}

Next you need to work out the addition and multiplication rules:

\begin{equation}\label{eq:incomplete_addition_rule} (x_1, \, y_1) + (x_2, \, y_2) = \ ????? \end{equation}

and

\begin{equation}\label{eq:incomplete_multiplication_rule} (x_1, \, y_1) \times (x_2, \, y_2) = \ ????? \end{equation}

We can find these by starting with the \(x + iy\) form, performing the operations and then translating to the \((x, \, y)\) form. If you add \(x_1 + iy_1\) and \(x_2 + iy_2\) you get:

\begin{equation*} (x_1 + iy_1) + (x_2 + i y_2) = (x_1 + x_2) + i(y_1 + y_2) \end{equation*}

Replacing \(x + iy\) with \((x, \, y)\) gives:

\begin{equation}\label{eq:addition_rule} \boxed{ (x_1, y_1) + (x_2, y_2) = (x_1 + x_2, \, y_1 + y_2) } \end{equation}

If you multiply them you get:

\begin{equation*} (x_1 + i y_1)(x_2 + i y_2) = (x_1 x_2 – y_1y_2) + i(x_1y_2 + y_1x_2) \end{equation*}

which translates to

\begin{equation}\label{eq:multiplication_rule} \boxed{ (x_1, y_1)(x_2, y_2) = (x_1 x_2 – y_1y_2, \, x_1y_2 + y_1x_2) } \end{equation}

The multiplication rule embodied in Equation \eqref{eq:multiplication_rule} is what makes an ordered pair into a complex number. A good way to remember the order of the terms is that the first component (the “real” part) is of the form \(xx\) minus \(yy\), and the second component (the “imaginary” part) is of the form \(xy\) plus \(xy\). It makes sense that the component with the \(xy\)s should have the plus sign, since there is no reason why either \(x_1y_2\) or \(x_2y_1\) should come first, whereas the component with the \(xx\) and the \(yy\) does provide an unambiguous way of specifying which term should be on the left and which on the right of the minus sign. It also makes sense that the first component should be the one with the minus sign since this represents the ‘real’ part and the product of two positive imaginary numbers is a negative real number.

Equation \eqref{eq:addition_rule} is the same as the addition rule for ordinary two-dimensional vectors, but Equation \eqref{eq:multiplication_rule} doesn't have a counterpart in ordinary vectors because vectors don't generally have a multiplication rule. Certain particular kinds do, such as the three-dimensional ones that feature in continuum mechanics and electromagnetism, but their multiplication rules don’t apply to all vectors, only to that particular kind. Generally vectors only have a rule for addition and a rule for multiplication by a scalar.

Multiplication of complex numbers is commutative since interchanging the subscripts in the right hand side of Equation \eqref{eq:multiplication_rule} leaves it unchanged, provided that the \(x\)s and \(y\)s are commutative, which they are since they are ordinary ‘real’ numbers.

Applying the addition and multiplication rules (Equations \eqref{eq:addition_rule} and \eqref{eq:multiplication_rule}) to complex numbers numbers of the form \((x, \, 0)\) gives:

\begin{equation}\label{eq:addition_rule_real_subset} (x_1, 0) + (x_2, 0) = (x_1 + x_2, \, 0) \end{equation}

and

\begin{equation}\label{eq:multiplication_rule_real_subset} (x_1, 0)(x_2, 0) = (x_1 x_2, \, 0) \end{equation}

which means that there is a one-to-one correspondence between the real number \(x\) and the complex number \((x, \, 0)\), so complex numbers of the form \((x, \, 0)\) have the same algebraic structure as the real numbers. It is therefore possible regard \(x\) and \((x, \, 0)\) as the same object, i.e:

\begin{equation}\label{eq:complex_scalar_identity} x \equiv (x, \, 0) \end{equation}

In other words:

The real number-line is just the horizontal axis of the complex plane.

Applying any combination of the two arithmetic operations, addition and multiplication, to complex numbers of the form \((x, \, 0)\) can only produce another complex number of the form \((x, \, 0)\), so arithmetic performed within the real number line always stays within the real number line.

Unlike ordinary two-dimensional vectors, which have no preferred direction and which can be expressed relative to an arbitrarily chosen coordinate system, the complex plane is fixed and its two axes are absolute and baked into its structure. To see this it is only necessary to observe that a real number \((x,0)\) would acquire an imaginary component if expressed in a rotated coordinate system. It follows that a rotated coordinate system is not a valid representation of the complex plane.

At first sight identity \eqref{eq:complex_scalar_identity} would appear to lead to an infinite regress since:

\begin{align*} x &\equiv \left( x, \, 0\right) \\ &\equiv \left( \left( x, \, 0\right), \, 0\right) \\ &\equiv \left( \left( \left( x, \, 0\right), \, 0\right), \, 0\right) \\ &\equiv \left( \left( \left( \left( x, \, 0\right), \, 0\right), \, 0\right), \, 0\right) \\ & \cdots \end{align*}

and so on, but this does not prevent identity \eqref{eq:complex_scalar_identity} being true. You can always replace \(x\) by \((x, \, 0)\) and \((x, \, 0)\) by \(x\) as you see fit. If you still feel queasy about identity \eqref{eq:complex_scalar_identity} you can regard \(x\) merely as a convenient shorthand for \((x, 0)\).

It follows from the multiplication rule, Equation \eqref{eq:multiplication_rule}, that

\begin{equation*}\label{eq:left-scalar} (k, 0) \cdot (x, y) = (kx, \, ky) \end{equation*} and \begin{align*} (x, y) \cdot (k, 0) &= (xk, \, yk) \notag \\ &= (kx, \, ky) \label{eq:right-scalar} \end{align*}

Substituting \(k\) as a shorthand for \((k, 0)\) gives:

\begin{equation}\label{eq:scalar_multiplication_rule} \boxed{ k \cdot (x, y) = (x, y) \cdot k = (kx, \, ky) } \end{equation}

This looks like the scalar multiplication rule obeyed by ordinary vectors. However, the above derivation shows that it is in fact a special case of the general complex-number multiplication rule (Equation \eqref{eq:multiplication_rule}) so it isn’t necessary to explicitly state it in stand-alone form as part of the axioms of the complex numbers.

It also follows from the multiplication rule Equation \eqref{eq:multiplication_rule} that:

\begin{equation}\label{eq:identity_times_real} (1, 0) \times (x, 0) = (x,0) \end{equation}

and

\begin{equation}\label{eq:imaginary_unit_times_real} (0, 1) \times (y, 0) = (0, y) \end{equation}

which you can verify by directly multiplying them out. Looking at Equation \eqref{eq:imaginary_unit_times_real} it is interesting to observe that multiplying the vector \((y,0)\) by the complex number \((0,1)\) has the effect of rotating it anticlockwise by \(90^\circ\). The significance of this will soon become apparent.

Combining Equations \eqref{eq:identity_times_real} and \eqref{eq:imaginary_unit_times_real} using the addition rule, Equation \eqref{eq:addition_rule}, you get:

\begin{align} (1, 0) \cdot (x,0) \; + \; (0, 1) \cdot (y,0) &= (x, 0) + (0, y) \notag \\[6pt] &= (x, y) \label{eq:unit_vectors} \end{align}

Equation \eqref{eq:complex_scalar_identity} says that a complex number of the form \((x,0)\) is the same thing as the ‘real’ number \(x\), so Equation \eqref{eq:unit_vectors} is the same as:

\begin{equation} (x, y) = (1,0) \cdot x \; + \; (0, 1) \cdot y \end{equation}

We already know that \((1,0)\) is just the real number \(1 \) but what is \((0, 1)\)? To find the answer, square it using the multiplication rule, Equation \eqref{eq:multiplication_rule}:

\begin{align} (0, 1)^2 &= (0, 1)(0, 1) \notag \\ &= ( \; 0 \times 0 – 1 \times 1, \ 0 \times 1 + 1 \times 0 \;) \notag \\ &= (-1, 0) \notag \\ &= -(1,0) \notag \\ &= -1 \end{align}

that is \((0, 1)\) is the square root of \(-1\), a.k.a. \(i\), so Equation \eqref{eq:unit_vectors} can be re-written as:

\begin{align*} (x, y) &= (1,0) \cdot x + (0,1) \cdot y\\ &= 1 \cdot x + i \cdot y\\ &= x + iy \end{align*}

so we have recovered the ‘\(x + iy\)’ notation that we threw away in favour of the ordered pair notation, showing that the addition and multiplication rules do indeed contain all the properties of complex numbers and are all you need to completely define them.

Defining complex numbers this way involves no mysterious symbols, only ‘real’ numbers, and so completely eliminates all metaphysical angst associated with what \(i\) ‘really is’. Complex numbers are ordered pairs of real numbers that obey the addition rule (Equation \eqref{eq:addition_rule}) and the multiplication rule (Equation \eqref{eq:multiplication_rule}). The ‘real’ number \(1\) is ‘really’ the pair \((1,\,0)\) and \(i\) is ‘really’ the pair \((0,\,1)\).

Polar coordinates

The multiplication rule, Equation \eqref{eq:multiplication_rule}, looks rather complicated but it becomes much simpler when written in polar coordinates. Figure 2 below shows how the Cartesian coordinates relate to the polar coordinates.

The polar coordinates of the Cartesian point \((x, \, y)\)

That is

\begin{equation} \label{eq:polar_defi} x = r \cos \theta, \quad y = r \sin \theta \end{equation}

so

\begin{equation*} (x_1, y_1) = (r_1 \cos \theta_1, r_1 \sin \theta_1) \end{equation*}

and

\begin{equation*} (x_2, y_2) = (r_2 \cos \theta_2, r_2 \sin \theta_2) \end{equation*}

Applying the multiplication rule to these gives:

\begin{align} (x_1, y_1)(x_2, y_2) &= (r_1 \cos \theta_1 r_2 \cos \theta_2 – r_1 \sin \theta_1 r_2 \sin \theta_2, \ r_1 \cos \theta_1 r_2 \sin \theta_2 + r_1 \sin \theta_1 r_2 \cos \theta_2)\notag \\ &= \left( r_1 r_2 ( \cos \theta_1 \cos \theta_2 – \sin \theta_1 \sin \theta_2 ), \; r_1 r_2 ( \cos \theta_1 \sin \theta_2 + \sin \theta_1 \cos \theta_2)\right) \label{eq:multiplication_rule_polar_cartesian_1} \end{align}

Hang on, I hear you say. That is NOT simpler! But wait. If you look up the formulae for the sines and cosines of the sum of two angles, you will find the following standard results:

\begin{align} \sin(\theta_1 + \theta_2) &= \sin \theta_1 \cos \theta_2 + \cos \theta_1 \sin \theta_2 \label{eq:trig_formulae_1}\\ \cos(\theta_1 + \theta_2) &= \cos \theta_1 \cos \theta_2 – \sin \theta_1 \sin \theta_2 \label{eq:trig_formulae_2} \end{align}

Substituting \eqref{eq:trig_formulae_1} and \eqref{eq:trig_formulae_2} into the right hand side of \eqref{eq:multiplication_rule_polar_cartesian_1} and substituting \eqref{eq:polar_defi} into the left hand side gives

\begin{equation*} (r_1 \cos \theta_1, r_1 \sin \theta_1)(r_2 \cos \theta_2, r_2 \sin \theta_2) = (r_1 r_2 \cos(\theta_1 + \theta_2), \, r_1 r_2 \sin(\theta_1 + \theta_2) ) \end{equation*}

The above is still in Cartesian form, despite containing lots of \(r\)s and \(\theta\)s. Writing it in actual polar form makes it look like this:

\begin{equation}\label{eq:multiplication_rule_polar_1} \boxed{ (r_1, \, \theta_1)(r_2, \, \theta_2) = (r_1 r_2 \, , \ \theta_1 + \theta_2) } \end{equation}

Now that IS simpler! To really emphasise how simple this is, you can write it as follows:

\begin{equation*}\label{eq:multiplication_rule_polar_2} (r_1,\theta_1)(r_2, \theta_2) = (R, \, \Theta) \end{equation*}

where

\begin{equation*}\label{eq:radius_part_of_multiplication_rule} \boxed{ R = r_1r_2 } \end{equation*}

and

\begin{equation*}\label{eq:angle_part_of_multiplication_rule} \boxed{ \Theta = \theta_1 + \theta_2 } \end{equation*} In other words:
  1. The magnitude of the product vector is the product of the magnitudes of the two multiplicand vectors.
  2. The angle of the product vector is the sum of the angles of the two multiplicand vectors.

This simplicity is not apparent when the rule is expressed in Cartesian coordinates. It is also unexpected. The unexpected emergence of simplicity out of apparent complexity suggests that some sort of hidden reality is being exposed, like unearthing a skeleton in an archaeological dig.

The angle \(\theta\) must be measured from the horizontal axis (i.e. the real line) otherwise Equation \eqref{eq:multiplication_rule_polar_1} won't work. This is because, if \(\phi\) is some arbitrary angle, \((\theta_1 – \phi)+(\theta_2 – \phi) \neq \theta_1 + \theta_2\). This is further evidence of the fact that the complex plane has two absolute directions baked into its structure unlike two-dimensional ordinary vectors, which have no preferred direction.

Equation \eqref{eq:multiplication_rule_polar_1} implies that the effect of multiplying a complex number \(z_1 = \left(r_1, \, \theta_1\right)\) by another complex number \(z_2 = \left(r_ 2, \, \theta_2\right)\) is to rotate the point \(z_1\) about the origin through the angle \(\theta_2\) and then ‘stretch’ it by a factor of \(r_2\). It is also the same as rotating the point \(z_2\) about the origin through the angle \(\theta_1\) and then ‘stretching’ it by a factor of \(r_1\).

An interesting observation is that the operation of taking the complex conjugate is the same in both Cartesian and polar coordinates. In Cartesian coordinates the complex conjugate of \((x, y)\) is \((x, -y)\) and in polar coordinates, the complex conjugate of \((r, \theta)\) is \((r, -\theta)\).

In summary:

  1. The addition rule is simplest in Cartesian coordinates
  2. The multiplication rule is simplest in polar coordinates

That’s really all there is to removing all the metaphysics associated with the mysterious \(i\). However, there are a couple of other interesting developments that follow on directly from this and which can be fitted in the available space:

Are complex numbers vectors?

Most textbooks say that complex numbers are ‘really’ two-dimensional vectors. There are, however, both similarities and differences between these two types of mathematical object.

The most noticeable similarity is that a complex number and a two-dimensional vector with real components can both be represented by an ordered pair of real numbers. However, for vectors there is an infinity of such representations that are all equally valid and which depend on an arbitrarily chosen set of unit vectors called a basis, which acts as a ‘coordinate system’. Vectors therefore have no preferred direction. The axes of the complex plane, on the other hand, are absolute and baked into its structure. The same complex number always has the same values of its components.

The most noticeable difference is that the complex numbers have a multiplication rule, whereas ordinary vectors don't. This difficulty could be avoided by simply ‘bolting on’ the multiplication rule, in which case a complex number could be defined as a vector with this particular multiplication rule. This is the approach adopted by the three-dimensional vectors used in electromagnetism and continuum mechanics, where the vectors have two multiplication rules bolted on but are still called vectors.

Another important difference is that the real numbers in the complex plane are not analogous to the scalars associated with a given type of vector, because:

  1. Complex numbers are generalisations of real numbers, but vectors are not generalisations of scalars. That is, the scalars that are associated with a given type of vector are a different kind of thing from the vectors themselves, whereas the real numbers are the same kind of thing as the complex numbers. Another way of looking at it is that a real number is a complex number whose second component is zero, but a scalar is not a vector in which all components except one are zero, nor is it a one-component vector.
  2. The algebraic structure of a given type of vector is called a vector space and consists of two sets; the vectors themselves and an associated set of objects called scalars, different in kind from the vectors. The algebraic structure of complex numbers, on the other hand, consists of one set, which contains only complex numbers and is complete in itself. The axioms of a vector space have to include a scalar multiplication rule that looks like Equation \eqref{eq:scalar_multiplication_rule}, otherwise there would be nothing to connect the two kinds object, vector and scalar, that comprise it. The complex numbers don’t need such an axiom because Equation \eqref{eq:scalar_multiplication_rule} is a special case of the general multiplication rule.

The \(n\)th root of 1

Textbooks usually write the polar form of complex numbers in exponential notation, \(z = re^{i\theta}\), which is every bit as mysterious as \(x + iy\). The validity of this notation can only be proved using infinite series and is not 100% convincing. Also, exponential notation only works if the angle is expressed in radians whereas simple polar coordinates, (\(r, \, \theta)\), can cope with angles in arbitrary units, such as degrees, which are conceptually easier to deal with (at least for me!). Fortunately it is not necessary to use exponential notation to derive many of the well known results that are usually associated with complex polar coordinates such as, for example, finding the \(n\)th roots of unity. So here goes:

It follows from the polar form of the multiplication rule, Equation \eqref{eq:multiplication_rule_polar_1}, that when \(n\) is a (real) integer:

\begin{equation}\label{eq:complex_number_to_real_integer_power} \left(r,\theta \right)^n = \left(r^n, n\theta \right) \end{equation}

so

\begin{equation*} \left(r^{1/n}, \frac{\theta}{n} \right)^n = \left(r, \theta \right) \end{equation*}

so if we denote by \(\left(r, \theta \right)^{1/n}\) the complex number which, when raised to the power \(n\) (i.e. multiplied by itself \(n\) times) is equal to \(\left(r, \theta \right)\) then

\begin{equation*} \left(r, \theta \right)^{1/n} = \left(r^{1/n}, \frac{\theta}{n} \right) \end{equation*}

In Cartesian coordinates the real number \(1\) is \((1, \; 0)\) and in polar coordinates it is also \((1, \; 0)\) because the distance from the origin is \(1\) and the angle is zero. In polar coordinates all angles of the form \((\theta + n \times 360)^\circ\) map onto the same angle in the plane, which most people would instinctively call \(\theta\) even though it would be just as correct to call it \((\theta + 720)^\circ\) or \((\theta + 1080)^\circ\) and so on. The real number 1 has angle zero so:

\begin{align*} 1 &= (1, \; 0 )\\ &= (1, \; 0 + n \times 360^\circ)\\ &= (1, \; n \times 360^\circ) \end{align*}

where \(n = 0, 1, 2, \ldots \) and so on. So:

\begin{align*} 1^{1/2} &= \left( 1^{1/2}, \;\frac{1}{2} \times n \times 360^\circ \right) \\ &= (1, \; n \times \frac{1}{2} \times 360^\circ) \\ &= (1, \; n \times 180^\circ) \end{align*}

Even multiples of \(180^\circ\) are also multiples of \(360^\circ\) which all map onto \(\theta = 0\) as discussed above (remember that zero is even). The smallest positive odd multiple of \(180^\circ\) is \(180^\circ\), and the complex number \((1, 180^\circ)\) when expressed in Cartesian coordinates is \((-1, 0)\), i.e. the real number -1. So in Cartesian coordinates:

\begin{align*} 1^{1/2} &= \begin{cases} (1, 0) = 1 & n \ \text{even}\\ (-1, 0) = -1 & n \ \text{odd} \end{cases} \end{align*}

which corresponds to what we already knew about the square roots of \(1\). Next consider \(z = -1\).

\begin{align*} z &= -1\\ &= (-1, 0) \quad \text{(in Cartesian coordinates)}\\ &= (1, \; 180^\circ + n \times 360^\circ) \quad \text{(in polar coordinates)} \end{align*}

and so, in polar coordinates:

\begin{align*} (-1)^{1/2} &= \left( 1^{1/2}, \; \frac{1}{2} \times \left(180^\circ + n \times 360^\circ \right) \right) \\ &= \left( 1^{1/2}, \; \left( 90^\circ + n \times 180^\circ \right) \right) \\ &= \begin{cases} (1, 90^\circ) & n \ \text{even}\\ (1, 270^\circ) & n \ \text{odd} \end{cases} \end{align*}

In Cartesian coordinates:

\begin{align*} (-1)^{1/2} &= \begin{cases} (0, 1) = i & n \ \text{even}\\ (0, -1) = -i & n \ \text{odd} \end{cases} \end{align*}

We already knew that, so this hasn't told us anything new. We do discover something new, however, when we consider other roots of \(1\). First, lets consider the cube root of \(1\), or \(1^{1/3}\).

\begin{align*} 1^{1/3} &= (1^{1/3}, \frac{1}{3} \times n \times 360^\circ) \\ &= (1, n \times 120^\circ) \end{align*}

Angles of the form \(n \times 120^\circ\) each map onto one of three distinct values, which are shown in Figure 2. It is actually quite fiddly to convert these points from polar coordinates, in which they are very simple, to Cartesian coordinates, in which they are somewhat more complicated.

The cube roots of \(1\).

We can check that these really do cube to unity by explicitly multiplying them out. It will make the calculation slightly shorter if we first observe that the multiplication rule \eqref{eq:multiplication_rule} implies that the square of a complex number is given by Equation \eqref{eq:square}:

\begin{equation}\label{eq:square} (x, y)^2 = (x^2 – y^2, \, 2xy) \end{equation}

so

\begin{align*} \left( -\frac{1}{2}, \, \frac{\sqrt{3}}{2}\right)^2 &= \left( \frac{1}{4} – \frac{3}{4}, \, -\frac{\sqrt{3}}{2}\right) \\ &= \left( -\frac{1}{2}, \, -\frac{\sqrt{3}}{2}\right) \end{align*}

so

\begin{align*} \left( -\frac{1}{2}, \, \frac{\sqrt{3}}{2}\right)^3 &= \left( -\frac{1}{2}, \, \frac{\sqrt{3}}{2}\right)^2 \left( -\frac{1}{2}, \, \frac{\sqrt{3}}{2}\right)\\ &= \left( -\frac{1}{2}, \, -\frac{\sqrt{3}}{2}\right) \left( -\frac{1}{2}, \, \frac{\sqrt{3}}{2}\right)\\ &= \left( \frac{1}{4} + \frac{3}{4}, \, -\frac{\sqrt{3}}{4} + \frac{\sqrt{3}}{4}\right) \\ &= (1, \, 0) \end{align*}

The corresponding calculation for the remaining root, \(\left( -\frac{1}{2}, \, -\frac{\sqrt{3}}{2}\right)\), is left as an exercise for the reader. Next consider the fifth-roots of \(1\), or \(1^{1/5}\), which are given by

\begin{align*} 1^{1/5} &= (1^{1/5}, \frac{1}{5} \times n \times 360^\circ) \\ &= (1, n \times 72^\circ) \end{align*}

Angles of the form \(n \times 72^\circ\) each map onto one of five distinct values, which are shown in Figure 3.

The fifth roots of \(1\).

It would be far too tedious to calculate the fifth powers of any of the points shown in Figure 3, but you may rest assured that they are all equal to positive unity.

Operator representation

The Wikipedia article on complex numbers mentions that their algebraic structure can be replicated using matrices, since

\begin{equation} \begin{aligned}\ \begin{bmatrix} 0 & -1 \\ 1 & 0\end{bmatrix}^2 &= \begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix} \times \begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix}\\[6pt] &=\begin{bmatrix} -1 & 0\\ 0 & -1 \end{bmatrix}\\[6pt] &= (-1) \times \begin{bmatrix} 1 & 0\\ 0 & 1 \end{bmatrix} \end{aligned} \end{equation}

so a general complex number can be written as:

\begin{align} x + iy &= x \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} + y \begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix} \notag \\[10pt] &= \begin{bmatrix} x & -y \\ y & x \end{bmatrix} \label{eq:matrix_xiy} \end{align}

That is, the set of \(2 \times 2\) matrices of the form \(\begin{bmatrix} x & -y \\ y & x \end{bmatrix}\) is algebraically identical to the set of complex numbers \(x + iy\). The reason for this is that, as you have probably already noticed, the matrices:

\begin{equation} \begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix} \quad \text{and} \quad \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \end{equation}

are specific cases of the rotation matrix

\begin{equation}\label{eq:rotation_matrix_theta} \begin{bmatrix} \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \end{bmatrix} \end{equation}

when \(\theta = 90^\circ\) and \(\theta = 0^\circ\) respectively. These are just representations in matrix form of the more general concept of a rotation operator, \(R_\theta\), and these operators have the same algebraic structure as complex numbers because:

  1. Successive applications of rotation operators \(R_{\theta_1}\) and \(R_{\theta_2}\) is equivalent to the operator \(R_{\theta_1 + \theta_2}\). In other words: \begin{equation} R_{\theta_1} \, R_{\theta_2} \, \vec{x} = R_{\theta_1 + \theta_2} \, \vec{x} \end{equation}

  2. An arbitrary two-dimensional vector in the plane can be transformed into any other by a combination of a stretch and a rotation. A stretch is the same as multiplication by a scalar, and a rotation can be represented by applying the operator \(R_\theta\). So the transformation of an arbitrary vector \(\vec{x}\) into another vector \(\vec{y}\) can be written:

    \begin{equation} \vec{y} = r R_\theta \vec{x} \end{equation}

  3. Successive applications of stretch-rotation can be written as:

    \begin{equation}\label{eq:operator_mult} \left( r_1 R_{\theta_1} \right) \, \left( r_2 R_{\theta_2} \right) \, \vec{x} = r_1 r_2 R_{\theta_1 + \theta_2} \, \vec{x} \end{equation}

    The vector \(\vec{x}\) is irrelevant and could be dropped from Equation \eqref{eq:operator_mult} to emphasise the fact that we are only interested in the behaviour of the operators by themselves, without reference to whatever thing they may happen be operating on. In fact, from now on, we’ll just call \(\vec{x}\) an ‘operand’ to emphasise that we don’t care what it is.

    \begin{equation}\label{eq:operator_mult_2} \left( r_1 R_{\theta_1} \right) \, \left( r_2 R_{\theta_2} \right) = r_1 r_2 R_{\theta_1 + \theta_2} \end{equation}

    Look familiar? Yes, successive application of the stretch-rotation operator (which is itself the result of successive application of rotation followed by stretch or vice versa) has the same form as the polar form of the complex multiplication rule!

    To bring out the correspondence even more clearly, instead of writing what we have called ‘stretch-rotation’ as the product of two symbols each characterised by a single parameter, we can use one symbol, say \(O\), (for ‘operator’) characterised by two parameters:

    \begin{equation}\label{eq:combined_operator} r R_{\theta} = O_{r, \, \theta} \end{equation}

    so Equation \eqref{eq:operator_mult_2} becomes

    \begin{equation}\label{eq:operator_mult_3} O_{r_1, \, \theta_1} O_{r_2, \, \theta_2} = O_{r_1 r_2, \, (\theta_1 + \theta_2)} \end{equation}

    Which leaves absolutely no doubt that the stretch-rotation operator has the same algebraic structure as the polar form of the complex numbers. It follows from \eqref{eq:operator_mult_3} that:

    \begin{align*} O_{1, \, 0}^2 &= O_{1, \, 0} O_{1, \, 0}\\ &= O_{1 \times 1, \, 0+0}\\ &= O_{1, \, 0} \end{align*}

    and

    \begin{align*} O_{1, \, 90^\circ}^2 &= O_{1, \, 90^\circ} O_{1, \, 90^\circ}\\ &= O_{1 \times 1, \, 90^\circ + 90^\circ}\\ &= O_{1, \, 180^\circ}\\ &= O_{-1, \, 0}\\ &= (-1) \times O_{1, \, 0} \end{align*}

    as expected.

If successive application is equivalent to multiplication, what is the operator equivalent of addition? What could possibly be meant by the sum of two operators? Well, if we apply two different operators separately to the same operand then, provided that the operands have an addition rule, we get:

\begin{equation}\label{eq:operator_add_1} \vec{b_1} = O_{r_1, \, \theta_1} \vec{a} \end{equation}

and

\begin{equation}\label{eq:operator_add_2} \vec{b_2} = O_{r_2, \, \theta_2} \vec{a} \end{equation}

so when we add the results, we get:

\begin{align*} \vec{b_1} + \vec{b_2} &= O_{r_2, \, \theta_2} \vec{a} + O_{r_2, \, \theta_2} \vec{a}\\[7pt] &= \left( O_{r_2, \, \theta_2} + O_{r_2, \, \theta_2} \right) \vec{a} \end{align*}

What use is that, I hear you ask? Well, if \(\vec{b_1}\) and \(\vec{b_2}\) are not parallel to each other, then any vector can be written as a linear combination of them, i.e.

\begin{align} \vec{z} &= x\vec{b_1} + y\vec{b_2}\\[7pt] &= \left( xO_{r_2, \, \theta_2} + yO_{r_2, \, \theta_2} \right) \vec{a} \end{align}

or

\begin{equation*} O_{R, \, \Theta} \, \vec{a} = \left( xO_{r_2, \, \theta_2} + yO_{r_2, \, \theta_2} \right) \vec{a} \end{equation*}

dropping the irrelevant operand, we have

\begin{equation*} O_{R, \, \Theta} = xO_{r_2, \, \theta_2} + yO_{r_2, \, \theta_2} \end{equation*}

If the two operators are orthogonal and of unit stretch, then \(x\) and \(y\) are the Cartesian components of the complex number whose polar coordinates are \(R\) and \(\Theta\):

\begin{equation*} O_{R, \, \Theta} = xO_{1, \, 0} + yO_{1, \, 90^\circ} \end{equation*}

Which is the same as Equation \eqref{eq:matrix_xiy}. Does this mean that stretch-rotation operators ‘really are’ complex numbers in the same sense that the ordered pairs discussed above are? Or are they just some random mathematical object that coincidentally happens to have the same algebraic structure? We noticed earlier that multiplication of one complex number, say \(x\), by another, say \(\left(r, \, \theta \right)\), is to stretch it by \(r\) and rotate it by \(\theta\), so in a sense all complex numbers are stretch-rotation operators when applied to other complex numbers, but are all stretch-rotation operators complex numbers?