Skip navigation
NASA Logo, National Aeronautics and Space Administration
Currently Being Moderated

Basic Comparison of Python, Julia, R, Matlab and IDL

VERSION 20  Click to view document history
Created on: Dec 15, 2016 9:25 AM by Jules Kouatchou - Last Modified:  Feb 20, 2018 11:47 AM by Jules Kouatchou

February 20, 2018: An updated version of this analysis can be found HERE.

 

History:

  • January 9, 2017 (Original ducument)
  • February 28, 2017 (Added Problem 5 and Problem 6)
  • June 22, 2017 (At the recommendation of Simon Byrne, edited the Julia script for Problem 6)
  • July 26, 2017 (Simon Danish proposed different optimization options for solving Problem 3 with Julia)

 

 

Comparing programming languages such as Python, Julia, R, etc. is not an easy task. Many researchers and practinioners have attempted to determine how fast a particular language performs against others when solving a specific problem (or a set of problems). Raschka presents Matlab, Numpy, R and Julia while they performed matrix calculations (Raschka, 2014). Hirsch does a benchmarking analysis of Matlab, Numpy, Numba CUDA, Julia and IDL (Hirsch, 2016). From his experiments, he states which language has the best speed in doing matrix multiplication and iteration. Rogozhnikov uses the calculation of the log-likelihood of normal distribution to compare Numpy, Cython, Parakeet, Fortran, C++, etc. He draws conclusions on which ones of them are faster to solve the problem (Rogozhnikov, 2015). Puget determines how several languages scire in carrying out the LU factorization (Puget, 2016).

 

All these analyses are important to assess how fast a language performs. However, focusing only on the speed may not give us a good picture on the capability of each language. It turns out if we compare how fast languages execute a given computation over the years, we might reach different conclusions as some of them evolve over time (to be more efficiency in solving a set of problems). To determine the usefulness of a language, we want to take into consideration its accessibility (open source or commercial), its readability, its support base, how it can interface with other languages, its strengths/weaknesses, the availabilty of a vast collection of libraries.

 

As we deal with legacy scientific applications (written in Fortran or C for instance), our primary intent is not to find a new language that can be used to rewrite existing codes. We rather want to identify and leverage "new" languages to facilitate and speed up pre/post-processing, initialization and visualization procedures. As far as possible, we may want to interface our legacy codes to "new" languages. We also intend to use new language to prototype some applications before they are written in languages like Fortran and C.

 

In this work, we are intested in how each package handles loops and vectorization, reads a large collection of netCDF files and does multiprocessing. The goal is not to highlight which software is faster than the other but to provide  basic information on the strengths and weaknesses of individual packages when dealing with specific applications.

 

 

All the experiments presented here were done on Intel Xeon Haswell processor node. Each node has 28 cores (2.6 GHz each) and 128 Gb of available memory. We consider the following versions of the languages:


LanguageVersion
Open Source?
Python2.7.1Yes
Julia0.6.0Yes
R3.2.2Yes
IDL8.5No
MatlabR2016aNo
GNU Compilers 6.1.0Yes
Intel Compilers17.0.0.098No
Scala2.12.1Yes

 

Remark: We assume that Python refers to Numpy too.

 

 

We consider here six problems.

 

 

Problem 1

Consider an arbitrary nxnx3 matrix A. We want to perform the following operations on A:

 

             A(i,j,1) = A(i,j,2)

             A(i,j,3) = A(i,j,1)

             A(i,j,2) = A(i,j,3)

 

For instance, in Python the code looks like:

 

for i in range(n):

     for j in range(n):

          A[i,j,0] = A[i,j,1]

          A[i,j,2] = A[i,j,0]

          A[i,j,1] = A[i,j,2]

 

The above code segment uses loops. We are also interested on how the same operations are done using vectorization:

 

          A[:,:,0] = A[:,:,1]

          A[:,:,2] = A[:,:,0]

          A[:,:,1] = A[:,:,2]

 

The problem allows us to see how each language handles loops and vectorization. We record the elapsed time needed to do the array assignments. The results are summarized on the tables below.

 

 

LanguageArray/Matrix Storage
Optionn=5000n=7000n=9000
PythonRow-major
19.1237.4961.97
Python + Numba

0.250.220.30
JuliaColumn-major
0.100.220.34
RColumn-major
233.78451.77744,93
IDLColumn-major
7.7515.2114.77
MatlabColumn-major
2.204.116.80
FortranColumn-majorgfortran0.230.330.76


gfortran -O30.0680.1360.22


ifort0.070.180.29


ifort -O30.0680.1360.22
CRow-majorgcc0.170.340.55


gcc -Ofast0.090.180.37


icc0.090.180.30


icc -Ofast0.090.180.42
ScalaRow-major
0.220.440.707

Table 1.1: Elapsed times obtained by copying a matrix using loops.

 

 

Language
Optionn=5000n=7000n=9000
Python
0.500.971.61
Python + Numba
0.360.540.77
Julia
0.460.731.09
R
3.556.9511.46
IDL
0.561.602.93
Matlab
0.280.530.90
Fortrangfortran0.140.280.50

gfortran -O30.0960.180.30

ifort0.160.300.50

ifort -O30.120.230.38

Table 1.2: Elapsed times obtained by copying a matrix using vectorization.

 

 

Apart from Julia, vectorization is the fastest method for accessing arrays/matrices.

 

Problem 2

We multiply two randomly generated nxn matrices A and B:

 

C=AxB

 

This problem shows the importance of taking advantage of built-in libraries available in each language.

 

The elapsed times presented here only measure the times spent on the multiplication (as the size of the matrix varies).

 

 

LanguageOptionn=1500n=1750n=2000
Pythonintrinsic0.490.800.95
Python + Numba (loops)
3.66.2813.4
Juliaintrinsic0.540.630.73
Rintrinsic12.0919.1828.63
IDLintrinsic0.220.280.36
Matlabintrinsic0.771.020.99
Fortrangfortran (loop)24.0142.5783.66

gfortran -O3 (loop)3.325.3112.13

gfortran (matmul)1.582.524.34

gfortran -O3 (matmul)1.282.053.68

ifort (loop)1.552.014.48

ifort -O3 (loop)0.510.811.24

ifort (matmul)1.562.474.48

ifort -O3 (matmul)0.520.821.25

ifort (DGEMM)0.190.230.33
Cgcc (loop)13.3321.1831.77

gcc -Ofast (loop)1.342.354.30

icc  (loop)1.252.193.99

icc -Ofast (loop)1.231.722.62
ScalaSimple loop10.7618.2232.30

with la4j4.0566.3549.592

with JAMA3.516.2119.377

Table 2.1: Elapsed times (in seconds) obtained by multiplying two randomly generated matrices.

 

 

The above table suggests that built-in functions are more appropriate to perform matrix multiplication. DGEMM is far more efficient. It is important to note that DGEMM is more suitable for large size matrices. If for instance n=100, the function matmul out performs DGEMM. An interesting discussion on the performance of DGEMM and matmul using the Intel Fortran compiler can be read at:

 

How to calculate a multiplication of two matrices efficiently?

 

 

 

 

Problem 3

We find the numerical solution of the 2D Laplace equation:

 

Uxx + Uyy = 0

 

We use the Jacobi iterative solver. We are interested in fourth-order compact finite difference scheme (Gupta, 1984):

 

Ui,j =  (4(Ui-1,j + Ui,j-1 + Ui+1,j + Ui,j+1) + Ui-1,j-1 + Ui+1,j-1 + Ui+1,j+1 + Ui-1,j+1 )/20

 

The Jacobi iterative solver stops when the difference of two consecutive approximations falls below 10^{-6}.

 

LanguageOptionn=100n=150n=200
Python
144.54715.962196.97
Python + Numba
1.235.3716.34
Julia
1.0495.25318.00

optimized_time_step1.1025.61718.91

optimized_time_step_simd0.8403.99413.075
R
935.934560.91-
IDL
95.48498.231521.97
Matlab
5.0612.5023.40
Fortrangfortran1.215.5615.64

gfortran -O30.6683.0728.897

ifort0.382.156.10

ifort -O30.5362.467.15
Cgcc0.512.477.85

gcc -Ofast0.211.043.18

gcc -fPIC -Ofast -O3 -xc -shared1.1395.700118.318

icc0.452.236.78

icc -Ofast0.321.604.87
Scala
0.692.817.63

Table 3.1: Elapsed times (in seconds) obtained by numerically solving the Poisson equation using a Jacobi iterative solver with loops.

 

 

 

LanguageOption n=100n=150n=200
Python
2.5211.6647.13
Python + Numba
3.5513.0535.59
Julia
2.118.9927.96
R
21.16112.27389.16
IDL
2.3812.1339.67
Matlab
3.578.0116.17
Fortrangfortran0.8724.03211.53

gfortran -O30.3561.825.11

ifort0.2881.5684.568

ifort -O30.2881.5604.284

Table 3.2: Elapsed times (in seconds) obtained by numerically solving the Poisson equation using a Jacobi iterative solver with vectorization.

 

 

 

 

Problem 4

We have a set of daily NetCDF files (7305) covering a period of 20 years (1990-2009). The files for a given month are in a sub-directory labeled YYYYMM (for instance 199001, 199008, 199011). We want to write a script that  opens each file, reads a three-dimensional variable (longitude/latitude/level), manipulates it and does a contour plot after all the files are read. A pseudo code for the script reads:

 

Loop over the years

         Loop over the months

                  Obtain the list of NetCDF files

                  Loop over the files

                           Read the variable (longitude/latitude/level)

                           Compute the zonal mean average (new array of latitude/level)

                           Extract the column array at latitude 86 degree South

                           Append the column array to a "master" array (or matrix)


create a contour plot using the "master" array

(the x-axis should be the days (1 to 7035)to be converted into years)

(the y-axis should be the vertical pressure levels in log scale)

 

A sample plot obtained with Python is shown in the figure below:

 

 

fig_TimeSeries_AgeOfAir.png

This is the kind of problems that a typical user we support faces: a collection of thousands of files that needs to be manipulated to extract the desired information. Having tools that allow us to quickly read data from files (in formats such as NetCDF, HDF4, HDF5, grib) is critical for the work we do.

 

We report in Table 4.1 the elapsed times it took to solve Problem 4 with the various languages.

 

LanguageElapsed time (s)
Python1399
Julia1317 (no plot)
IDL1689
R2220
Matlab1678

Table 4.1: Elapsed time (in seconds) obtained by manipulating 7305 NetCDF files on a single processor. We were not able to produce the plot with Julia because we could not build the plotting tool.

 

All the above runs were conducted on a node that has 28 cores. Basically, only one core was used. We want to take advantage of all the available cores by spreading the reading of the files and making sure that the data of interest are gathered in the proper order. We use the multi-processing capabilities of the various languages to slightly modify the scripts. For each month, the daily files are read in by different threads (cores).The results are shown in Table 4.2. We were able to fully complete the task with Python, R and Julia only. The Julia script is fragile and we could run with 8 threads. We obtained unexpected error messages Matlab and could not resolve the issues (we will continue to look into it). We did not try to do the task in IDL because we could not find a simple IDL multi-processing documentation that could help us.

 

LanguageElapsed time (s)
Python273
Julia520 (no plot)
IDL
R420
Matlab

Table 4.2: Elapsed time (in seconds) obtained by manipulating 7305 NetCDF files using multiple threading.


We observe that the use of multiple threads significantly reduces the processing time without requiring more resources (all the calculations were done within a node). The multi-thread processing scripts were written by making minor modifications of the serial ones. In fact, the multi-thread scripts ended up being more modular (use of functions) and more readable.

 

Problem 5

We implement the Belief Propagation calculations that can be seen as a repeated sequence of matrix multiplications, followed by normalization. The Matlab, C and Julia codes are shown in the Justin Domke's weblog (Domke 2012). We report the computing times for various values of the number of iterations (N) when the matrix dimension is 5000x5000.

 

LanguageOptionN=250N=500N=1000
Python
5.019.518.8
Julia
4.45108.142415.137
R
42.30282.963164.33
IDL
15.47230.08559.318
Matlab
2.39304.57208.4749
Fortrangfortran5.18439.596518.457

gfortran -O35.22439.712618.705

ifort4.69229.004520.881

ifort -O34.69629.024517.709
Cgcc3.83007.570015.160

gcc -Ofast3.54007.100014.240

icc1.78003.44006.9000

icc -Ofast1.73003.46006.9300
Scala



Table 5.1: Elapsed times (in seconds) obtained by doing the Belief Propagation computations.

 

 

Problem 6

We perform calculations for the implementation of a Metropolis-Hastings algorithm using a two dimeensional distribution (Domke 2012). Results are shown when the number of iterations (N) varies.

 

 

LanguageOptionN=5000N=10000N=15000
Python
0.020.050.08
Julia
0.0002280.0004410.000676
R
0.0880.1690.243
IDL
0.009600.018740.0393
Matlab
0.026190.062440.0870
Fortrangfortran0.000000.000000.00000

gfortran -O30.000000.000000.00000

ifort0.000000.0040.004

ifort -O30.0040.000000.008
Cgcc0.000000.000000.00000

gcc -Ofast0.000000.000000.00000

icc0.000000.000000.00000

icc -Ofast0.000000.000000.00000
Scala
0.0110.0130.017

Table 6.1: Elapsed times (in seconds) obtained by doing the Metropolis algorithm computations.

 

Remarks:

  1. All the experiments were done on a Linux cluster (with thousands of nodes) shared by hundreds of users.
  2. We did not attempt to optimize any of the scripts we wrote. It is possible that developers of each languages may come with faster approaches to solve each of the problems presented here.
  3. We also did the tests with Python 3.5 and we obtained the same results as in Python 2.7.
  4. Using IDL and Matlab was difficult because at several occasions, there was not enough available licence.
  5. When we install an open-source software, our preference is to do it from source because we have more control over the installation process (we can freely select any configuration we need). In addition, we want to be able to create a self-contained module (for instance Python together with Numpy, SciPy, Matplotlib, NetCDF4, etc.) and make it available to users. We are not sure that we can achieve it with Julia that seems to assume that each user is expected to add/build on his/her own packages on top of Julia.

 

References

  1. Justin Domke, Julia, Matlab and C, September 17, 2012.
  2. Michael Hirsch, Speed of Matlab vs. Python Numpy Numba CUDA vs Julia vs IDL, June 2016.
  3. Murli M. Gupta, A fourth Order poisson solver, Journal of Computational Physics, 55(1):166-172, 1984.
  4. Jean Francois Puget, A Speed Comparison Of C, Julia, Python, Numba, and Cython on LU Factorization, January 2016.
  5. Alex Rogozhnikov, Log-likelihood benchmark, September 2015.
  6. Sebastian Raschka, Numeric matrix manipulation - The cheat sheet for MATLAB, Python Nympy, R and Julia, June 2014.
  7. Yousef Saad, Iterative Methods for Sparse Linear Systems (2 ed.), SIAM, ISBN 0898715342, 200366

 

 

Source Files

All the source files for the problems presented here are in the attached file: sourceFiles.tar.gz

 

If you have a comment/suggestion/question, contact Jules Kouatchou (Jules.Kouatchou@nasa.gov)

Attachments:
Comments (3)
USAGov logo NASA Logo - nasa.gov