We carry out a series a basic experiments to compare Python related packages (Python, NumPy) and compilers (GNU Fortran, Intel Fortran). We also add Matlab and Java in our study. The primary objective of this exercise is to determine how NumPy performs with respect to the other packages and compilers. All the calculations were carried out in *dali*.

Compilers/Packages | Version |
---|---|

Python | 2.5.4 |

NumPy | 1.3.0 |

Matlab | matlab-R2008a |

GNU Fortran (gfortran) | 4.1.2 |

Intel Fortran (ifort) | intel-11.1.038 |

Java | 1.6.0 |

**Problem 1**

This example shows the importance of avoiding loops (as far as possible) when manipulating arrays in NumPy. We have a 5000x5000x3 matrix A and we want to perform the following operations on A:

* A(i,j,1) = A(i,j,2)*

* A(i,j,3) = A(i,j,1)*

* A(i,j,2) = A(i,j,3)*

For instance, in Python the code looks like:

**for** i in range(5000):
**for** j in range(5000):
a[i,j,0] = a[i,j,1]
a[i,j,2] = a[i,j,0]
a[i,j,1] = a[i,j,2]

We record the elapsed time needed to do the array assignments. The results are summarized on the table below.

* *

* *

Packages/Compilers | Elapsed Time (s) |
---|---|

Python | 48.55 |

NumPy | 0.96 |

Matlab | 2.398 |

gfortran | 0.540 |

gfortran with -O3 | 0.280 |

ifort | 0.276 |

ifort with -O3 | 0.2600 |

Java | 12.5518 |

In Fortran, NumPy and Matlab we did not use the loop to carry out the desired operations. We instead use array indexing. Note how slow was Python and how efficient was NumPy.

**Problem 2**

Here, we want to multiply two randomly generated *nxn* matrices A and B:

C=AxB

For NumPy and Matlab, we use the predefined matrix multiplication functions whereas in Fortran, we wrote the code to perform the multiplication. The elapsed times presented here only measure the times spent on the multiplication (as the size of the matrix varies).

Compilers/Packages | Configuration | n=1000 | n=1200 | n=1500 |
---|---|---|---|---|

NumPy | intrinsic function | 6.1 | 10.48 | 29.31 |

Matlab | intrinsic function | 0.548 | 0.714 | 0.97 |

triple do-loop | 116.90 | 201.16 | 385.95 | |

gfortran | triple do-loop | 7.844 | 13.736 | 28.077 |

matmul | 1.328 | 2.5401 | 8.484 | |

-O3 and triple do-loop | 1.7121 | 3.236202 | 9.156573 | |

-O3 and matmul | 1.328 | 2.532 | 8.364 | |

ifort | triple do-loop | 0.9960630 | 2.112131 | 7.372461 |

-O3 and triple do-loop | 0.9800609 | 2.056128 | 7.356460 | |

matmul | 3.5642 | 6.224 | 12.6927 | |

-O3 and matmul | 0.3760 | 0.6480 | 1.3560 | |

DGEMM(*) | 0.2360 | 0.4480 | 0.8080 | |

Java | 1.92805 | 3.6075 | 10.2055 |

(*) We did not use any compilation option and only link with "*-lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread*".

The above table suggests that built-in functions are more appropriate to perform matrix multiplication. *DGEMM* is far more efficient. It is important to note that *DGEMM *is more suitable for large size matrices. If for instance n=100, the function *matmul* out performs *DGEMM*. An interesting discussion on the performance of DGEMM and matmul using the Intel Fortran compiler can be read at:

NumPy is very slow for matrix multiplication. It was suggested that it should be built against ATLAS BLAS for better performance.

**Problem 3**

We find the numerical solution of the 2D Laplace equation:

_{xx}+ U

_{yy}= 0

We use the Jacobi iterative solver. We are interested in two finite difference schemes:

The second-order scheme:

_{i,j}= (U

_{i-1,j}+ U

_{i,j-1}+ U

_{i+1,j}+ U

_{i,j+1})/4

and the fourth-order compact scheme:

_{i,j}= (4(U

_{i-1,j}+ U

_{i,j-1}+ U

_{i+1,j}+ U

_{i,j+1}) + U

_{i-1,j-1}+ U

_{i+1,j-1}+ U

_{i+1,j+1}+ U

_{i-1,j+1})/20

The Jacobi iterative solver stops when the difference of two consecutive approximations falls below 10^{-6}.

Second-Order Scheme

Compilers/Packages | n=50 | n=100 |
---|---|---|

Python | 31.9193 | 509.859 |

NumPy | 0.41324 | 4.19535 |

Matlab | 0.460252 | 5.483992 |

gfortran | 0.212 | 2.496 |

gfortran with -O3 | 0.056 | 0.696 |

ifort | 0.036 | 0.368 |

ifort -O3 | 0.036 | 0.368 |

Java | 0.0537 | 0.9012 |

Fourth-Order Compact Scheme

Compilers/Packages | n=50 | n=100 |
---|---|---|

Python | 46.15203 | 751.783 |

NumPy | 0.610216 | 6.38891 |

Matlab | 0.640044 | 6.531990 |

gfortran | 0.236 | 3.248 |

gfortran with -O3 | 0.088 | 1.256 |

ifort | 0.052 | 0.656 |

ifort -O3 | 0.052 | 0.672 |

Java | 0.12180 | 2.2022 |

Once again, we observe that Python is not suitable to manipulate arrays. NumPy and Matlab have comparable results whereas the Intel Fortran compiler displays the best performance. Java did not use array indexing like NumPy, Matlab and Fortran, but did better than NumPy and Matlab.

The results presented above are consistent with the ones done by other groups: