- Remember to keep in mind, each node/processor will need to know
  the "global" indexes of its rows, after the scatter.
  E.g. if we scatter the matrix

     1  2  3  4  5
     6  7  8  9 10
    11 12 13 14 15
    16 17 18 19 20

  to two nodes, they will get

  rank 0: 
     1  2  3  4  5
     6  7  8  9 10

  rank 1:
    11 12 13 14 15
    16 17 18 19 20

  to do this, with custom MPI_Datatype, do it the "rows" and "rowstype" way
  from filewrite2.c + matblockscatter.c

  Then, once each nodes calculates it's own displacement and sendcounts,
  it can know what its global indexes are:

  in the example above, rank 0 knows its offset is 0 and gets 2 rows
  meaning in the global context it has row 0 and 1

  and rank 1 knows its offset is A.rows/2 == 2 and gets 2 rows,
  so it can calculate its global indexs as 2+0 and 2+1

- when doing the outermost (k) loop, good idea to MPI_Barrier() at the top,
  but technically unneeded in some cases

- then, if k is in the range of 'your' global indexes, you are responsible for
  broadcasting the vector l
  e.g. rank 1 will check if k == 2 or k == 3. if so, it has to create l
  and Bcast it

- can use MPI_Type_contiguous as the way to create the derived datatype
  if you store the matrices in a row-major way (which we are)

- Recall that GJ can be used to find inv(A) by solving Ax = (1,0,0,0,...,0) 
  to get the first column, then Ax = (0,1,0,...,0) to get the second, ...,
  and Ax = (0,0,...,1) to get the last column of inv(A).
  With this in mind, you write your inverse function to do exactly that with a
  loop and then put all the results together into one matrix
 
  but the "real" way is to just do them all simultaneously, which will of course
  require some refactoring of the given GJ pseudocode, because B is now going
  to be a whole matrix

- Good idea to test your code! How? Run the function, get b_soln = GJ(A,b)
  and then call your matrix-vector multiply to do A*b_soln and make sure it
  is the same as the b that got passed to GJ.

  Really easy test case: GJ(Identity, b) should be b