- Remember to keep in mind, each node/processor will need to know the "global" indexes of its rows, after the scatter. E.g. if we scatter the matrix 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 to two nodes, they will get rank 0: 1 2 3 4 5 6 7 8 9 10 rank 1: 11 12 13 14 15 16 17 18 19 20 to do this, with custom MPI_Datatype, do it the "rows" and "rowstype" way from filewrite2.c + matblockscatter.c Then, once each nodes calculates it's own displacement and sendcounts, it can know what its global indexes are: in the example above, rank 0 knows its offset is 0 and gets 2 rows meaning in the global context it has row 0 and 1 and rank 1 knows its offset is A.rows/2 == 2 and gets 2 rows, so it can calculate its global indexs as 2+0 and 2+1 - when doing the outermost (k) loop, good idea to MPI_Barrier() at the top, but technically unneeded in some cases - then, if k is in the range of 'your' global indexes, you are responsible for broadcasting the vector l e.g. rank 1 will check if k == 2 or k == 3. if so, it has to create l and Bcast it - can use MPI_Type_contiguous as the way to create the derived datatype if you store the matrices in a row-major way (which we are) - Recall that GJ can be used to find inv(A) by solving Ax = (1,0,0,0,...,0) to get the first column, then Ax = (0,1,0,...,0) to get the second, ..., and Ax = (0,0,...,1) to get the last column of inv(A). With this in mind, you write your inverse function to do exactly that with a loop and then put all the results together into one matrix but the "real" way is to just do them all simultaneously, which will of course require some refactoring of the given GJ pseudocode, because B is now going to be a whole matrix - Good idea to test your code! How? Run the function, get b_soln = GJ(A,b) and then call your matrix-vector multiply to do A*b_soln and make sure it is the same as the b that got passed to GJ. Really easy test case: GJ(Identity, b) should be b