Chapter 7 Systems of Linear Equations

So-called linear equations arise often in mathematical problems in science and engineering. In this chapter we will see a very useful method for solving simultaneous linear equations, known as Gaussian elimination. We’ll start by understanding the geometric nature of linear equations.

7.1 Lines and planes

A linear equation is one of the form \[a_1x_1+a_2x_2+\dotsb+a_nx_n=b\] where the \(x_i\)’s are variables, the \(a_i\)’s are coefficients, \(b\) is a constant and \(n\) is the number of variables in the equation. For \(n=2\), we have an equation of the form \[a_1x_1+a_2x_2=b\] which we know to be the equation of a line in the Cartesian plane (see section 2.1, where we used the notation \(ax+by=c\)). Suppose we have a pair of such equations. Then we could conveniently write them as \[\begin{align*} a_{11}x_1+a_{12}x_2&=b_1\\ a_{21}x_1+a_{22}x_2&=b_2 \end{align*}\] where now the first subscipt \(i\) on the \(a_{ij}\)’s and \(b_i\)’s denotes the equation it belongs to and the second subscipt \(j\) denotes the variable it belongs to. If we need to solve these equations to find the values of \(x_1\) and \(x_2\) that satisfy both equations simultaneously, then we call this a system of linear equations.

Figure 7.1: The two intersecting lines from example (exm:gaus1).

Geometrically, a solution to the equations corresponds to those coordinates \((x_1,x_2)\) where the lines meet – these are values of \(x_1\) and \(x_2\) that satisfy both equations simultaneously – see figure (fig:sim-lines). We therefore have three different scenarios for the solutions of such equations.

  1. Most of the time, two lines in the plane will cross at exactly one point, hence we have just one solution (one pair of coordinates \((x_1,x_2)\)).
  2. It is possible that both lines are parallel to one another, with different axis intercepts. In this case, the lines never meet – there are no solutions.
  3. Finally, it is possible that the two equations actually describe the same line. In this case, any pair of coordinates \((x_1,x_2)\) that lies on the (common) line is a solution – there are infinitely many solutions.

This idea extends to having more equations. For example, if we have a system of three equations in two variables: \[\begin{align*} a_{11}x_1+a_{12}x_2&=b_1\\ a_{21}x_1+a_{22}x_2&=b_2\\ a_{31}x_1+a_{32}x_2&=b_3 \end{align*}\]

Then it is possible that:

  1. All three lines will cross at exactly one point, hence we have just one solution.
  2. The three lines don’t cross at a common point – there are no solutions. This is the most probable scenario in this set-up: whilst any two of the lines might cross at a point, we need all three to cross at the same point in order for there to be a solution to the simultaneous equations.
  3. Finally, it is possible that the three equations actually describe the same line and again there are infinitely many solutions.

When there are more equations than variables, the system is sometimes called overdetermined, since then we usually have scenario 2 above where trying to satsify the contraints impossed by the three equations is not possible. On the other hand, a system with less equations is sometimes called underdetermined, since in these cases we usually have an infinite number of solutions. For example, if we had a system with just one equation in two variables \[\begin{align*} a_{11}x_1+a_{12}x_2&=b_1\\ \end{align*}\] then all points \((x_1,x_2)\) lying on this line are solutions.

We can extend these ideas to higher dimensional systems, i.e. a higher number of variables. For example, if we have three equations in three variables

\[\begin{align*} a_{11}x_1+a_{12}x_2+a_{13}x_3&=b_1\\ a_{21}x_1+a_{22}x_2+a_{23}x_3&=b_2\\ a_{31}x_1+a_{32}x_2+a_{33}x_3&=b_3 \end{align*}\]

then each equation describes a plane in 3-dimensional Cartesian coordinates. Now we have:

  1. A unique solution \((x_1,x_2,x_3)\) if all three planes cross at a single point – the most likely scenario.
  2. No solutions if none of the planes cross at a common point – think about all the ways this could happen, there are more ways than in two dimensions!
  3. Infinetely many solutions if all three planes intersect along a line or all are the same plane – again, think about the ways this could happen!

You should also think about the meaning of overdetermined and underdetermined systems in 3-dimensions and how they look geometrically. We can of course extend these ideas to as many variables and equations as we wish: a linear equation in \(n\) variables represents a hyperplane in \(n\)-dimensional space, which is a subspace with dimension \(n-1\), but is impossible for us to directly visualise things in dimensions greater than 3!

7.2 Solving linear systems - Gaussian elimination

There are a number of different approaches to solving linear equations. They all ultimately amount to the same thing, but some methods are more straighforward or computationally efficient depending on the situation. In example ?? we use rearrangement and substitution to solve \[\begin{align*} 3x_1+2x_2&=5\\ x_1-4x_2&=1. \end{align*}\]

Now we will instead add and subtract multiples of each equation to reduce them to the solution.

Example 7.1 Solve \[\begin{align*} 3x_1+2x_2&=5\\ x_1-4x_2&=1. \end{align*}\]

First, lets swap the order of the equations so that the coefficient of \(x_1\) in the first equation is \(1\) \[\begin{align*} x_1-4x_2&=1\\ 3x_1+2x_2&=5. \end{align*}\] Next, we subtract 3 times the first equation from the second to eliminate \(x_1\) from the second equation. The second equation becomes \[\begin{align*} 3x_1+2x_2-3(x_1-4x_2)&=5-3\times 1\\ 14x_2&=2 \end{align*}\] so we have the equivalent system \[\begin{align*} x_1-4x_2&=1\\ 14x_2&=2. \end{align*}\] Now divide the second equation by 14 \[\begin{align*} x_1-4x_2&=1\\ x_2&=\frac{1}{7}. \end{align*}\] Add 4 times the second equation to the first to eliminate \(x_2\) from the first equation \[\begin{align*} x_1-4x_2+4x_2&=1+\frac{4}{7}\\ x_2&=\frac{1}{7}. \end{align*}\] and hence we have the solution \[\begin{align*} x_1&=\frac{11}{7}\\ x_2&=\frac{1}{7}. \end{align*}\]

Note that in the above example, it is the coefficients \(a_{ij}\) and \(b_i\) that we are manipulating – the \(x_j\)’s just came along for the ride. So, let’s simplify things by just writing down the coefficients in the system of equations in a table as follows: \[\left(\begin{array}{rr|r} 3&2\\ 1&-4 \end{array} \right). \] This is called the matrix of coefficients. We also need to keep track of the right-hand side of the equation, so we add this as another column separated by a vertical line: \[\left(\begin{array}{rr|r} 3&2&5\\ 1&-4&1 \end{array} \right). \] We call this the augmented matrix. Now in solving the system of equations, we simply add/subtract multiples of rows of this augmented matrix. Labelling each row as \(R_1\) and \(R_2\) we perform the following operations:

  1. swap rows 1 and 2: \(R_1\leftrightarrow R_2\) \[\left(\begin{array}{rr|r} 1&-4&1\\ 3&2&5 \end{array} \right). \]

  2. subtract 3 times row 1 from row 2: \(R_2 \to R_2-3R_1\) \[\left(\begin{array}{rr|r} 1&-4&1\\ 0&14&2 \end{array} \right). \]

  3. divide row 2 by 14: \(R_2\to \frac{1}{14}R_2\) \[\left(\begin{array}{rr|r} 1&-4&1\\ 0&1&\frac{1}{7} \end{array} \right). \]

  4. add 4 times the second row to the first: \(R_1 \to R_1+4 R_2\) \[\left(\begin{array}{rr|r} 1&0&\frac{11}{7}\\ 0&1&\frac{1}{7} \end{array} \right) \]

This represents \[\begin{align*} 1\times x_1 + 0 \times x_2 &=x_1= \frac{11}{7}\\ 0\times x_1 + 1 \times x_2 &=x_2= \frac{1}{7}\\ \end{align*}\] that is, we can just read off the solutions in the right-most column. This method is known as Gaussian9 elimination and provides a systematic way to solve simultaneous linear equations. Note that each step in the above process is reversible. For instance, the operation \[R_1 \to R_1 + 4R_2\] can be reversed by performing \[R_1 \to R_1 - 4R_2.\] This means that the final set of equations produced by Gaussian elimination is equivalent to the original set of equations and therefore has the same set of solutions.

No matter the number of variables or equations, the aim is to reduce the coefficient matrix to a diagonal of \(1\)’s (or as close as possible, we’ll clarify this shortly) and \(0\)’s everywhere else, like so \[\left(\begin{array}{rrrr|r} 1&0&\dotsb&0&\omega_1\\ 0&1&\dotsb&0&\omega_2\\ \vdots&&\ddots&\vdots&\vdots\\ 0&0&\dotsb&1&\omega_n \end{array} \right), \] then read off the solutions as \(x_1=\omega_1, x_2=\omega_2,\dotsc, x_n=\omega_n\) from the right hand column.

The following elementary row operations (EROs) comprise all operations required for Gaussian elimination:

  1. \(R_i \to R_i + \alpha R_k\), where \(k \neq i\) and \(\alpha \neq 0\) is some scalar;
  2. \(R_i \to \alpha R_i\), where \(\alpha \neq 0\) is some scalar;
  3. \(R_i \leftrightarrow R_k\), where \(k\neq i\), denotes the swapping of the \(i\)-th and \(k\)-th rows.

We now present a sketch of the full Gaussian elimination procedure using these EROs. First, note that we call the first non-zero entry (reading from left to right) in any row of a matrix the leading entry. A leading entry will always be used as a pivot, where a pivot is an element used to zero-out the other entries in the same column as the pivot.

  1. select the leading entry in the top row as a pivot;
  2. perform EROs to eliminate the entries below the pivot;
  3. move down to the next row and continue the process;
  4. having reached the bottom row, work back up again, that is:
  5. select the bottom right entry in the matrix of coefficients as the pivot;
  6. perform EROs to eliminate the entries above the pivot;
  7. move to the row above and continue the process;
  8. having reached the top row, stop and read off the solutions.

Let’s have a look at another example applying these steps, this time in 3-dimensions.

Example 7.2 Solve the system of equations \[\begin{align*} x+y+z&=3,\\ 2x+3y+z&=10, \\ 3x-y+2z&=-2. \end{align*}\]

We start by writing down the augmented matrix:

\[ \left(\begin{array}{rrr|r} 1 & 1 & 1 & 3 \\ 2 & 3 & 1 & 10 \\ 3 & -1 & 2 & -2 \end{array}\right). \] We select the top-left hand entry of the augmented matrix as a pivot. We eliminate the entries in the same column that lie below the pivot, that is \[\require{enclose} \begin{array}{l} \phantom{\enclose{circle}{1}}\quad \\ R_2 \to R_2 - 2R_1\colon \quad\\ R_3 \to R_3 - 3R_1\colon \quad \end{array} \left(\begin{array}{crr|r} \enclose{circle}{1} & 1 & 1 & 3 \\ 0 & 1 & -1 & 4 \\ 0 & -4 & -1 & -11 \end{array}\right). \] Then we select the first non-zero entry in the second row as our pivot and eliminate the entry below it, to obtain \[ \begin{array}{l} \quad \\ \phantom{\enclose{circle}{1}} \quad\\ R_3 \to R_3 + 4R_2\colon \quad \end{array} \left(\begin{array}{rcr|r} 1 & 1 & 1 & 3 \\ 0 & \enclose{circle}{1} & -1 & 4 \\ 0 & 0 & -5 & 5 \end{array}\right). \] Next perform \[ \begin{array}{l} \quad \\ \quad\\ R_3 \to -\frac{1}{5}R_3\colon \quad \end{array} \left( \begin{array}{rrr|r} 1 & 1 & 1 & 3, \\ 0 & 1 & -1& 4, \\ 0 & 0 & 1& -1. \end{array}\right). \] We continue, similarly to the steps above, by eliminating the entries above the pivot in the bottom row. \[ \begin{array}{l} R_1 \to R_1 - R_3\colon \quad \\ R_2 \to R_2 + R_3\colon \quad \\ \phantom{\enclose{circle}{1}}\quad \end{array} \left(\begin{array}{rrc|r} 1 & 1 & 0 & 4 \\ 0 & 1 & 0 & 3 \\ 0 & 0 & \enclose{circle}{1} &-1 \end{array}\right). \] Finally, we eliminate the second column using the \(1\) in the second row as a pivot, that is \[ \begin{array}{l} R_1 \to R_1 - R_2\colon \quad \\ \phantom{\enclose{circle}{1}}\quad \\ \quad \end{array} \left(\begin{array}{rcr|r} 1 & 0 & 0 & 1 \\ 0 & \enclose{circle}{1} & 0 & 3 \\ 0 & 0 & 1 & -1 \end{array}\right). \] We read off the solution \((x,y,z) = (1,3,-1)\).

7.3 Echelon Form

Recall that in general there may be one, none or infinitely many solutions to a system of linear equations. We now see what happens in Guassian elimination in these scenarios. When following the proceedure of Gaussian elimination we reduce the coefficient matrix to make the leading entry in each row equal to 1 and then clear all the values (make them zero) above and below this entry. This results in a coefficient matrix in what is known as reduced echelon form. This allows us to easily read off the solutions. Here is a formal definition of the reduced echelon form.

Definition 7.1 (Reduced Echelon Form) A matrix is said to be in reduced echelon form (REF) if 1. all the rows consisting entirely of zeros lie below all non-zero rows (note, there need not be any zero or non-zero rows); 1. each leading entry in a row lies at least one place to the right of all leading entries in any previous rows; 1. every leading entry is \(1\); 1. each leading entry is the only non-zero entry in its column.

The following are some examples of possible reduced echelon forms for \(3\times 3\) coefficient matrices.

\[ \left(\begin{array}{rrr|r} 1 & 0 & 0 & \omega_1 \\ 0 & 1 & 0 & \omega_2 \\ 0 & 0 & 1 & \omega_3 \end{array}\right),\qquad \left(\begin{array}{rrr|r} 1 & 0 & 2 & \omega_1 \\ 0 & 1 & 0 & \omega_2 \\ 0 & 0 & 0 & \omega_3 \end{array}\right),\qquad \left(\begin{array}{rrr|r} 1 & 0 & 0 & \omega_1 \\ 0 & 0 & 1 & \omega_2 \\ 0 & 0 & 0 & \omega_3 \end{array}\right). \]

The previous examples were all square coefficient matrices (the same number of equations as variables). It is also possible that we have rectangular coefficient matrices. Here are some possible echelon forms.

\[ \left(\begin{array}{rrr|r} 1 & 0 & 0 & \omega_1 \\ 0 & 1 & 0 & \omega_2 \\ 0 & 0 & 1 & \omega_3 \\ 0 & 0 & 0 & \omega_4 \end{array}\right),\qquad \left(\begin{array}{rrrr|r} 1 & 0 & 2 & -4&\omega_1 \\ 0 & 1 & 0 & 1 & \omega_2 \\ 0 & 0 & 0 & 0 & \omega_3 \end{array}\right),\qquad \left(\begin{array}{rrrr|r} 1 & 3 & 0 & 0 & \omega_1 \\ 0 & 0 & 1 & 0 & \omega_2 \\ 0 & 0 & 0 & 1 & \omega_3 \end{array}\right). \]

Note that the reduced echelon form resuts in a “staircase” of \(1\)’s appearing either on or above the diagonal, with all entries below the staircase being zero. We can only take one step down any column, but might take multiple steps right along a row.

So far, we have only seen examples like the first matrix above, where we have a diagonal of \(1\)’s. Since in this case we simply read off the solutions, it is easy to see that such forms result in a unique solution. We now give some examples and interpretations for other echelon forms. In the following \(\alpha_i\) and \(\omega_i\) can be any value.

Example 7.3 (Zero coefficient row) \[ \left(\begin{array}{rcr|r} 1 & 0 & \alpha & \omega_1 \\ 0 & 1 & 0 & \omega_2 \\ 0 & 0 & 0 & \omega_3 \end{array}\right),\qquad \] If \(\omega_3=0\) the equations read \[\begin{align*} x_1+\alpha x_3&=\omega_1\\ x_2&=\omega_2\\ 0&=0 \end{align*}\] In effect, we only have two equations. There is no constraint on \(x_3\) and we call this a free variable. We can parameterise the solution by setting \(x_3=\beta\) where \(\beta\) can take any values – there are infinitely many solutions. So the solution is all points \[(\omega_1-\alpha\beta,\omega_2,\beta).\] Geometrically, this is a line in 3-dimensional Cartesian coordinates which is drawn by varying \(\beta\).

On the other hand, if \(\omega_3\neq 0\), then in the bottom row we have the equation \[0=\omega_3\] which is not possible. In this case, there are no solutions to the system of equations. We sometimes say that the set of equations is inconsistent.

Example 7.4 (A rectangular reduced echelon form) \[ \left(\begin{array}{rrrrr|r} 1 & 0 & \alpha_1 & 0 & \alpha_2 & \omega_1 \\ 0 & 1 & \alpha_3 & 0 & \alpha_4 & \omega_2 \\ 0 & 0 & 0 & 1 & \alpha_5 & \omega_3 \end{array}\right). \]

In this case, we have the solutions \[\begin{align*} x_1 & = \omega_1 - \alpha_1x_3 - \alpha_2x_5\\ x_2 & = \omega_2 - \alpha_3x_3 -\alpha_4 x_5\\ x_4 & = \omega_3 - \alpha_5x_5 \end{align*}\] where \(x_3\) and \(x_5\) are free variables, and hence we can set them as parameterise \(x_3=\beta\), \(x_4=\gamma\). More generally, any column that does not have a leading entry will correspond to a free variable. Since we have 2 parameters, this describes a plane in 5-dimensional space. The solution is all points \[(\omega_1 - \alpha_1\beta - \alpha_2\gamma, \omega_2 - \alpha_3\beta -\alpha_4\gamma, \beta, \omega_3 - \alpha_5\gamma, \gamma).\]


  1. The method is named after Carl Friedrich Gauss (1777–1855) born in Braunschweig in the Holy Roman Empire, in what is now Germany. Gauss worked at the University of Gottingen.↩︎