We carry out a series a basic experiments to compare Python related packages (Python, NumPy) and compilers (GNU Fortran, Intel Fortran). We also add Matlab and Java in our study. The primary objective of this exercise is to determine how NumPy performs with respect to the other packages and compilers. All the calculations were carried out in dali.
Compilers/Packages  Version 

Python  2.5.4 
NumPy  1.3.0 
Matlab  matlabR2008a 
GNU Fortran (gfortran)  4.1.2 
Intel Fortran (ifort)  intel11.1.038 
Java  1.6.0 
Problem 1
This example shows the importance of avoiding loops (as far as possible) when manipulating arrays in NumPy. We have a 5000x5000x3 matrix A and we want to perform the following operations on A:
A(i,j,1) = A(i,j,2)
A(i,j,3) = A(i,j,1)
A(i,j,2) = A(i,j,3)
For instance, in Python the code looks like:
for i in range(5000): for j in range(5000): a[i,j,0] = a[i,j,1] a[i,j,2] = a[i,j,0] a[i,j,1] = a[i,j,2]
We record the elapsed time needed to do the array assignments. The results are summarized on the table below.
Packages/Compilers  Elapsed Time (s) 

Python  48.55 
NumPy  0.96 
Matlab  2.398 
gfortran  0.540 
gfortran with O3  0.280 
ifort  0.276 
ifort with O3  0.2600 
Java  12.5518 
In Fortran, NumPy and Matlab we did not use the loop to carry out the desired operations. We instead use array indexing. Note how slow was Python and how efficient was NumPy.
Problem 2
Here, we want to multiply two randomly generated nxn matrices A and B:
C=AxB
For NumPy and Matlab, we use the predefined matrix multiplication functions whereas in Fortran, we wrote the code to perform the multiplication. The elapsed times presented here only measure the times spent on the multiplication (as the size of the matrix varies).
Compilers/Packages  Configuration  n=1000  n=1200  n=1500 

NumPy  intrinsic function  6.1  10.48  29.31 





Matlab  intrinsic function  0.548  0.714  0.97 
 triple doloop  116.90  201.16  385.95 





gfortran  triple doloop  7.844  13.736  28.077 
 matmul  1.328  2.5401  8.484 
 O3 and triple doloop  1.7121  3.236202  9.156573 
 O3 and matmul  1.328  2.532  8.364 





ifort  triple doloop  0.9960630  2.112131  7.372461 
 O3 and triple doloop  0.9800609  2.056128  7.356460 
 matmul  3.5642  6.224  12.6927 
 O3 and matmul  0.3760  0.6480  1.3560 
 DGEMM(*)  0.2360  0.4480  0.8080 
    
Java   1.92805  3.6075  10.2055 
(*) We did not use any compilation option and only link with "lmkl_intel_lp64 lmkl_intel_thread lmkl_core liomp5 lpthread".
The above table suggests that builtin functions are more appropriate to perform matrix multiplication. DGEMM is far more efficient. It is important to note that DGEMM is more suitable for large size matrices. If for instance n=100, the function matmul out performs DGEMM. An interesting discussion on the performance of DGEMM and matmul using the Intel Fortran compiler can be read at:
NumPy is very slow for matrix multiplication. It was suggested that it should be built against ATLAS BLAS for better performance.
Problem 3
We find the numerical solution of the 2D Laplace equation:
U_{xx} + U_{yy} = 0
We use the Jacobi iterative solver. We are interested in two finite difference schemes:
The secondorder scheme:
U_{i,j} = (U_{i1,j} + U_{i,j1} + U_{i+1,j} + U_{i,j+1})/4
and the fourthorder compact scheme:
U_{i,j} = (4(U_{i1,j} + U_{i,j1} + U_{i+1,j} + U_{i,j+1}) + U_{i1,j1} + U_{i+1,j1} + U_{i+1,j+1} + U_{i1,j+1} )/20
The Jacobi iterative solver stops when the difference of two consecutive approximations falls below 10^{6}.
SecondOrder Scheme
Compilers/Packages  n=50  n=100 

Python  31.9193  509.859 
NumPy  0.41324  4.19535 
Matlab  0.460252  5.483992 
gfortran  0.212  2.496 
gfortran with O3  0.056  0.696 
ifort  0.036  0.368 
ifort O3  0.036  0.368 
Java  0.0537  0.9012 
FourthOrder Compact Scheme
Compilers/Packages  n=50  n=100 

Python  46.15203  751.783 
NumPy  0.610216  6.38891 
Matlab  0.640044  6.531990 
gfortran  0.236  3.248 
gfortran with O3  0.088  1.256 
ifort  0.052  0.656 
ifort O3  0.052  0.672 
Java  0.12180  2.2022 
Once again, we observe that Python is not suitable to manipulate arrays. NumPy and Matlab have comparable results whereas the Intel Fortran compiler displays the best performance. Java did not use array indexing as the like NumPy, Matlab and Fortran, but did better than NumPy and Matlab.
The results presented here above are consistent with the ones done by other groups: