Skip navigation
NASA Logo, National Aeronautics and Space Administration
Currently Being Moderated

Basic Comparison of Python, Julia, Matlab, IDL and Java (2019 Edition)

VERSION 15  Click to view document history
Created on: Jul 2, 2019 2:51 PM by Alexander Medema - Last Modified:  Nov 4, 2019 10:15 PM by Jules Kouatchou

Python, Julia, Java, Scala, IDL, Matlab, R, C, Fortran

_______________________________________________________________________________

Authors:

_______________________________________________________________________________

 

NOTICE: This project is now Open-Source. All the source files are available github.com.

 

We plan to test the updated version of Julia in the future and add results with Python\Numba.

 

 

See the 2018 edition for previous source code.

Introduction

We use simple test cases to compare various high level programming languages. We implement the test cases from an angle of a novice programmer who is not familiar with the optimization techniques available in the languages. The goal is to highlight the strengths and weaknesses of each language but not to claim that one language is better than the others. Timing results are presented in seconds to four digits of precision, and any value less than 0.0001 is considered to be 0.

 

The tests presented here are run on an Intel Xeon Haswell processor node. Each node has 28 cores (2.6 GHz each) and 128 GB of available memory. The Python, Java, and Scala tests are also run on a Mac computer with an Intel i7-7700HQ (4 cores, 2.8 GHz each) with 16 GB of available memory to compare with the Xeon node. We consider the following versions of the languages:

 

LanguageVersion
Free?
Python3.7Yes
Julia0.6.2Yes
Java10.0.2Yes
Scala2.13.0Yes
IDL8.5No
R3.6.1Yes
MatlabR2017bNo
GNU Compilers 9.1Yes
Intel Compilers18.0.5.274No

 

The GNU and Intel compilers are used for C and Fortran. These languages are included to serve as a baseline, which is why their tests also come with optimized (-O3, -Ofast) versions.

 

The test cases are listed in four categories:

  • Loops and Vectorization
  • String Manipulations
  • Numerical Calculations
  • Input/Output

 

Each test is "simple" enough to be quickly written in any of the languages and is meant to address issues such as:

 

  • Access of non-contiguous memory locations
  • Use of recursive functions,
  • Utilization of loops or vectorization,
  • Opening of a large number of files,
  • Manipulation of strings of arbitrary lengths,
  • Multiplication of matrices,
  • Use of iterative solvers
  • etc.

 

The source files are contained in the directories:

 

C\    Fortran\  IDL\  Java\  Julia\  Matlab\  Python\  R\  Scala\

 

There is also a directory


Data\

 

that contains a Python script that generates the NetCDF4 files needed for the test case on reading a large collection of files. It also has sample text files for the "Count Unique Words in a File" test case.

 

Remark:

In the results presented below, we used an older version of Julia because we had difficulties installing the latest version of Julia (1.1.1) on the Xeon Haswell nodes. In addition, the Python experiments did not include Numba because the Haswell nodes we had access to, use an older version of the OS, preventing Numba to be properly installed.


_______________________________________________________________________________


Loops and Vectorization

  • Copying Multidimensional Arrays

 

Given an arbitrary n x n x 3 matrix A, we perform the operations:


A(i, j, 1) = A(i, j, 2)
A(i, j, 3) = A(i, j, 1)
A(i, j, 2) = A(i, j, 3)   

 

using loops and vectorization. This test case is meant to measure the speed of languages' access to non-contiguous memory locations, and to see how each language handles loops and vectorization.

 

Table CPA-1.0: Elapsed times to copy the matrix elements using loops on the Xeon node.

 

LanguageOptionn=5000n=7000n=9000
Python
16.216431.786752.5485
Julia
0.07220.14450.2359
Java
0.18100.32300.5390
Scala
0.27500.48100.7320
IDL
6.466111.906819.4499
R
22.951044.976074.3480
Matlab
0.28490.52030.8461
Fortrangfortran0.17600.34800.5720

gfortran -O30.06800.17200.2240

ifort0.06800.13600.2240

ifort -O30.06800.13600.2800
Cgcc0.17000.34000.5600

gcc -Ofast0.09000.18000.3100

icc0.10000.18000.3000

icc -Ofast0.10000.18000.3000

 

 

Table CPA-1.1: Elapsed times to copy the matrix elements using loops on the i7 Mac.

 

Languagen=5000n=7000n=9000
Python18.667536.404660.2338
Python (Numba)0.33980.30600.3693
Java0.12600.24200.4190
Scala0.20400.34500.5150

 

 

Table CPA-2.0: Elapsed times to copy the matrix elements using vectorization on the Xeon node.

 

LanguageOptionn=5000n=7000n=9000
Python
0.49560.97391.6078
Julia
0.31730.55750.9191
IDL
0.39000.76411.2643
R
3.52906.935011.4400
Matlab
0.28620.55910.9188
Fortrangfortran0.09600.25200.3240

gfortran -O30.09600.24400.3120

ifort0.14000.22800.3840

ifort -O30.12000.23600.4560

 

 

Table CPA-2.1: Elapsed times to copy the matrix elements using vectorization on the i7 Mac.

 

Languagen=5000n=7000n=9000
Python0.56021.08321.8077
Python (Numba)0.85071.36502.0739

 

 

String Manipulations

  • Look and Say Sequence

 

The look and say sequence reads a single integer. In each subsequent entry, the number of appearances of each integer in the previous entry is concatenated to the front of that integer. For example, an entry of

 

1223

 

would be followed by

 

112213,

 

or "one 1, two 2's, one 3." Here, we start with the number

 

1223334444

 

and determine the look and say sequence of order n (as n varies). This test case highlights how languages manipulate strings of arbitrary length.

 

Table LKS-1.0: Elapsed times to find the look and say sequence of order n on the Xeon node.

 

LanguageOptionn=40n=45n=48
Python
2.089044.4155251.1905
Java
0.06940.08990.1211
Scala
0.04700.12700.2170
IDL
20.2926304.50491612.4277
Matlab
423.22416292.7255exceeded time limit
Fortrangfortran0.00800.01200.0120

gfortran -O30.00800.01200.0120

ifort0.00400.01600.0120

ifort -O30.00800.00400.0080
Cgcc0.06000.19000.4300

gcc -Ofast0.04000.18000.4000

icc0.06000.19000.4100

icc -Ofast0.05000.19000.4100

 

 

Table LKS-1.1: Elapsed times to find the look and say sequence of order n on the i7 Mac.

 

Languagen=40n=45n=48
Python1.733122.3870126.0252
Java0.06650.09120.1543
Scala0.04900.09700.2040

 

 

 

  • Unique Words in a File

 

We open an arbitrary file and count the number of unique words in it with the assumption that words such as:

 

ab   Ab   aB    a&*(-b:    17;A#~!b

 

are the same (so that case, special characters, and numbers are ignored). For our tests, we use the four files:

 

world192.txt, plrabn12.txt, bible.txt, and book1.txt

 

taken from The Canterbury Corpus.

 

Table UQW-1.0: Elapsed times to count the unique words in the file on the Xeon node.

 

Language

world192.txt

(19626 words)

plrabn12.txt

(9408 words)

bible.txt

(12605 words)

book1.txt

(12427 words)

Python (dictionary method)0.50020.10900.88690.1850
Python (set method)0.38140.08730.75480.1458
Julia0.21900.03540.32390.0615
Java0.56240.22991.01350.2901
Scala0.46000.21500.69300.2190
R104.58208.644033.821017.6720
Matlab3.02700.96576.03481.0390

 

 

Table UQW 1.1: Elapsed times to count the unique words in the file on the i7 Mac.

 

Language

world192.txt

(19626 words)

plrabn12.txt

(9408 words)

bible.txt

(12605 words)

book1.txt

(12427 words)

Python (dictionary method)0.35410.08660.73460.1448
Python (set method)0.36850.08200.71970.1417
Java0.51290.25300.91830.3220
Scala0.58100.15400.66500.2330

 

 

Numerical Computations

  • Fibonacci Sequence

 

The Fibonacci Sequence is a sequence of numbers where each successive number is the sum of the two that precede it:

 

Fn = Fn-1 + Fn-2.

 

Its first entries are

 

F0 = 0,  F1 = F2 = 1.

 

Fibonacci numbers find applications in the fields of economics, computer science, biology, combinatorics, etc. We measure the elapsed time when calculating an nth Fibonacci number. The calculation times are taken for both iterative and recursive calculation methods.

 

Table FBC-1.0: Elapsed times to find the Fibonacci number using iteration on the Xeon node.

 

LanguageOptionn=25n=35n=45
Python
000
Julia
000
Java
000
Scala
000
IDL
000
R
0.03300.03200.0320
Matlab
0.00260.00340.0038
Fortrangfortran000

gfortran -O3000

ifort000

ifort -O3000
Cgcc000

gcc -Ofast000

icc000

icc -Ofast000

 

 

Table FBC-1.1: Elapsed times to find the Fibonacci number using iteration on the i7 Mac.

 

Languagen=25n=35n=45
Python000
Python (Numba)0.11000.10950.1099
Java000
Scala000

 

 

Table FBC-2.0: Elapsed times to find the Fibonacci number using recursion on the Xeon node.

 

LanguageOptionn=25n=35n=45
Python
0.05937.0291847.9716
Julia
0.00030.03083.787
Java
0.00110.04104.8192
Scala
0.00100.05605.1400
IDL
0.02382.5692304.2198
R
0.00900.01000.0100
Matlab
0.01421.2631149.9634
Fortrangfortran00.084010.4327

gfortran -O3000

ifort000

ifort -O3000
Cgcc00.04005.0600

gcc -Ofast00.02002.2000

icc00.03003.1400

icc -Ofast00.02003.2800

 

 

Table FBC-2.1: Elapsed times to find the Fibonacci number using recursion on the i7 Mac.

 

Languagen=25n=35n=45
Python0.05196.4022800.0381
Python (Numba)0.417243.76045951.6544
Java0.00300.04425.0130
Scala0.00100.04705.7720

 

 

  • Matrix Multiplication

 

Two randomly generated n x n matrices A and B are multiplied. The time to perform the multiplication is measured. This problem shows the importance of taking advantage of built-in libraries available in each language.

 

Table MXM-1.0: Elapsed times to multiply the matrices on the Xeon node.

 

LanguageOptionn=1500n=1750n=2000
Pythonintrinsic0.15600.24300.3457
Juliaintrinsic0.14970.23980.3507
Javaloop13.861017.860032.3370
Scalaloop9.838019.145032.1310
Rintrinsic0.16000.24600.3620
Matlabintrinsic1.36721.39510.4917
IDLintrinsic0.18940.23090.3258
Fortrangfortran (loop)17.437131.466062.1079

gfortran -O3 (loop)3.32825.300312.1648

gfortran (matmul)0.38400.61600.9241

gfortran -O3 (matmul)0.38800.61600.9161

ifort (loop)1.14011.81612.9282

ifort -O3 (loop)1.14811.80812.9802

ifort (matmul)1.14411.81212.9242

ifort -O3 (matmul)0.51600.82811.2441

ifort (DGEMM)0.21600.23600.3320
Cgcc (loop)13.200020.980031.4400

gcc -Ofast (loop)1.45002.36004.0400

icc (loop)1.23002.15004.0500

icc -Ofast (loop)1.15001.75002.5900

 

 

Table MXM-1.1: Elapsed times to multiply the matrices on the i7 Mac.

 

 

LanguageOptionn=1500n=1750n=2000
Pythonintrinsic0.09060.11040.1611

Numba (loop)9.259520.201235.3174
Javaloop32.508047.768082.2810
Scalaloop23.054038.911060.3180

 

 

 

  • Belief Propagation Algorithm


Belief propagation is an algorithm used for inference, often in the fields of artificial intelligence, speech recognition, computer vision, image processing, medical diagnostics, parity check codes, and others. We measure the elapsed time when performing n iterations of the algorithm with a 5000x5000-element matrix. The Matlab, C and Julia code is shown in Justin Domke's weblog (Domke 2012), which states that the algorithm is "a repeated sequence of matrix multiplications, followed by normalization."

 

Table BFP-1.0: Elapsed time to run the belief propagation algorithm on the Xeon node.

 

Language
Optionn=250n=500n=1000
Python
3.70767.082413.8950
Julia
4.02807.822015.1210
Java
63.9240123.3840246.5820
Scala
53.5170106.4950212.3550
IDL
16.960933.208665.7071
R
23.415045.416089.7680
Matlab
1.97603.80877.4036
Fortrangfortran21.001341.010687.6815

gfortran -O34.49238.256517.5731

ifort4.73639.108617.8651

ifort -O34.73639.108621.1973
Cgcc2.64005.290010.5800

gcc -Ofast2.42004.85009.7100

icc2.16004.32008.6500

icc -Ofast2.18004.34008.7100

 

 

Table BFP-1.1: Elapsed time to run the belief propagation algorithm on the i7 Mac.

 

Language
n=250n=500n=1000
Python2.41214.54228.7730
Java55.3400107.7890214.7900
Scala47.956095.3040189.8340

 

 

  • Metropolis-Hastings Algorithm

 

The Metropolis-Hastings algorithm is an algorithm used to take random samples from a probability distribution. This implementation uses a two-dimensional distribution (Domke 2012), and measures the elapsed time to iterate n times.

 

Table MTH-1.0: Elapsed times to run the Metropolis-Hastings algorithm on the Xeon node.

 

LanguageOptionn=5000n=10000n=15000
Python
0.04040.08050.1195
Julia
0.00020.00040.0006
Java
0.00400.00500.0060
Scala
0.00800.00900.0100
IDL
0.01340.01050.0157
R
0.07600.15000.2230
Matlab
0.01830.02110.0263
Fortrangfortran000

gfortran -O3000

ifort0.004000

ifort -O30.00400.00400
Cgcc000

gcc -Ofast000

icc000

icc -Ofast000

 

 

Table MTH-1.1: Elapsed times to run the Metropolis-Hastings algorithm on the i7 Mac.

 

Languagen=5000n=10000n=15000
Python0.03460.06380.0989
Java0.00600.00400.0060
Scala0.00900.01000.0130

 

 

  • Fast Fourier Transform

 

We create an n x n matrix M that contains random complex values. We the compute the Fast Fourier Transform (FFT) of M and the absolute value of the result. The FFT algorithm is used for signal processing and image processing in a wide variety of scientific and engineering fields.

 

Table FFT-1.0: Elapsed times to compute the FFT on the Xeon node.

 

LanguageOptionn=10000n=15000n=20000
Pythonintrinsic8.079719.635734.7400
Juliaintrinsic3.97911.49020.751
IDLintrinsic16.669938.985770.8142
Rintrinsic58.2550150.1260261.5460
Matlabintrinsic2.62436.001010.66232

 

 

Table FFT-1.1: Elapsed times to compute the FFT on the i7 Mac.

 

LanguageOption n=10000n=15000n=20000
Pythonintrinsic7.953821.535555.9375

 

 

  • Iterative Solver

 

We use the Jacobi iterative solver to numerically approximate a solution of the two-dimensional Laplace equation that was discretized with a fourth order compact scheme (Gupta, 1984). We record the elapsed time as the number of grid points varies.

 

Table ITS-1.0: Elapsed times to compute the approximate solution using iteration on the Xeon node.

 

LanguageOptionn=100n=150n=200
Python
158.2056786.34252437.8560
Julia
1.03085.187016.1651
Java
0.41301.89505.2220
Scala
0.5402.10305.7380
IDL
73.2353364.13291127.1094
R
157.1490774.70802414.1030
Matlab
2.81635.05438.6276
Fortrangfortran0.82403.732010.7290

gfortran -O30.66803.07208.8930

ifort0.54002.47207.1560

ifort -O30.54002.46807.1560
Cgcc0.50002.42007.7200

 

gcc -Ofast0.22001.05003.1900

icc0.46002.23006.7800

icc -Ofast0.33001.60004.8700

 

 

Table ITS-1.1: Elapsed times to compute the approximate solution using iteration on the i7 Mac.

 

Languagen=100n=150n=200
Python174.7663865.12032666.3496
Python (Numba)1.32265.032415.1793
Java0.46001.76904.7530
Scala0.59702.09505.2830

 

 

Table ITS-2.0: Elapsed times to compute the approximate solution using vectorization on the Xeon node.

 

LanguageOptionn=100n=150n=200
Python
2.627214.650540.2124
Julia
2.458313.191841.0302
IDL
1.711928.684128.0683

R


25.2150121.9870340.4990
Matlab
3.32917.648615.9766
Fortrangfortran0.86804.204011.5410

gfortran -O30.36001.80405.0880

ifort0.28001.53604.4560

ifort -O30.28001.56004.4160

 

 

Table ITS-2.1: Elapsed times to compute the approximate solution using vectorization on the i7 Mac.

 

Languagen=100n=150n=200
Python1.70517.457222.0945
Python (Numba)2.44518.509421.7833

 

 

 

  • Square Root of a Matrix

 

Given an n x n matrix A, we are looking for the matrix B such that:

 

B * B = A

 

B is the square root. In our calculations, we consider A with 6s on the diagonal and 1s elsewhere.

 

Table SQM-1.0: Elapsed times to calculate the square root of the matrix on the Xeon node.

 

Languagen=1000n=2000n=4000
Python1.01015.237644.4574
Julia0.42072.508019.0140
R0.56503.066019.2660
Matlab0.35711.65522.6250

 

 

Table SQM-1.1: Elapsed times to calculate the square root of the matrix on the i7 Mac.

 

Languagen=1000n=2000n=4000
Python0.56533.396325.9180

 

 

  • Gauss-Legendre Quadrature

 

Gauss-Legendre quadrature is a numerical method for approximating definite integrals. It uses a weighted sum of n values of the integrand function. The result is exact if the integrand function is a polynomial of degree 0 to 2n - 1. Here we consider an exponential function over the interval [-3, 3] and record the time to perform the integral when n varies.

 

Table GLQ-1.0: Elapsed times to find the approximate value of the integral on the Xeon node.

 

LanguageOptionn=50n=75n=100
Python
0.00790.00950.0098
Julia
0.00020.00040.0007
IDL
0.00430.00090.0014
R
0.02600.02400.0250
Matlab
0.74760.07310.4982
Fortrangfortran00.00400.0080

gfortran -O300.01200.0120

ifort0.00800.00800.0080

ifort -O30.00800.00400.0080

 

 

Table GLQ-1.1: Elapsed times to find the approximate value of the integral on the i7 Mac.

 

Languagen=50n=75n=100
Python0.01400.00350.0077

 

 

  • Trigonometric Functions

 

We iteratively calculate trigonometric functions on an n-element list of values, and then compute inverse trigonometric functions on the same list. The time to complete the full operation is measured as n varies.

 

Table TRG-1.0: Elapsed times to evaluate the trigonometric functions on the Xeon node.

 

LanguageOptions n=80000n=90000n=100000
Python
14.689116.508423.6273
Julia
55.392062.949069.2560
IDL
37.441341.969535.2387
R
91.5250102.8720113.8600
Matlab
5.27945.86496.3699
Scala
357.3730401.8960446.7080
Java
689.6560774.9110865.057
Fortrangfortran53.483360.031766.6921

gfortran -O349.927156.023562.1678

ifort18.641120.957323.2654

ifort -O318.645120.957323.2694
Cgcc107.4400120.7300134.0900

gcc -Ofast93.0400104.5700116.0600

icc76.260085.790095.3100

icc -Ofast48.840054.960061.0600

 

 

Table TRG-1.1: Elapsed times to evaluate the trigonometric functions on the i7 Mac.

 

Languagen=80000n=90000n=100000
Python3.53996.19846.9207

 

 

  • Munchausen Numbers

 

A Munchausen number is a natural number that is equal to the sum of its digits raised their own power. In base 10, there are four such numbers: 0, 1, 3435 and 438579088. We determine how much time it takes to find them.

 

Table MCH-1.0: Elapsed times to find the Munchausen numbers on the Xeon node.


LanguageOptionElapsed time
Python
1130.6220
Julia
102.7760
Java
4.9008
Scala
72.9170
R
exceeded time limit
IDL
exceeded time limit
Matlab
373.9109
Fortrangfortran39.7545

gfortran -O321.3933

ifort29.6458

ifort -O329.52184
Cgcc157.3500

gcc -Ofast126.7900

icc

228.2300

icc -Ofast228.1900

 

 

Table MCH-1.1: Elapsed times to find the Munchausen numbers on the i7 Mac.


LanguageElapsed time
Python1013.5649
Java4.7434
Scala64.1800

 

 

Input/Output

  • Reading a Large Collection of Files

 

We have a set of daily NetCDF files (7305) covering a period of 20 years. The files for a given year are in a sub-directory labeled YYYY (for instance Y1990, Y1991, Y1992, etc.). We want to write a script that opens each file, reads a three-dimensional variable (longitude/latitude/level), and manipulates it. Pseudocode for the script reads:

 

    Loop over the years
        Obtain the list of NetCDF files
        Loop over the files
            Read the variable (longitude/latitude/level)
            Compute the zonal mean average (new array of latitude/level)
            Extract the column array at latitude 86 degree South
            Append the column array to a "master" array (or matrix)

 

The goal is to be able to do a generate the three-dimensional arrays (year/level/value) and carry out a contour plot. This is the type of problem that a typical user we support faces: a collection of thousands of files that need to be manipulated to extract the desired information. Having tools that can quickly read data from files (in formats such as NetCDF, HDF4, HDF5, grib) is critical for the work we do.

 

Table RCF-1.0: Elapsed times to process the NetCDF files on the Xeon node.

 

LanguageElapsed time
Python660.8084
Julia787.4500
IDL711.2615
R1220.222
Matlab848.5086

 

Table RCF-1.1: Elapsed times to process the NetCDF files on the i7 Mac.

 

LanguageElapsed time
Python89.1922

 

 

Table RCF-2.0: Elapsed times to process the NetCDF files with Python using multiple cores on the Xeon node.

 

CoresElapsed time
1570.9791
2317.6108
4225.4647
8147.4527
1684.0102
2459.7646
2851.2191

 

 

Table RCF-2.1: Elapsed times to process the NetCDF files with Python using multiple cores on the i7 Mac.

 

CoresElapsed time
184.1032
263.5322
456.6156

 

_______________________________________________________________________________

Summary with a Plot

In the plots below, we summarize the above timing results by using as reference the timing numbers (last column only, i.e., largest problem size) obtained with GCC.

 

fig_languages_scatter.png

 

fig_languages_histo.png

 

Findings

General:

  • No single language outperforms the others in all tests.
  • It is important to reduce the memory footprint by creating variables only when necessary and by "emptying" variables that are no longer used.
  • Using intrinsic functions results in improved performance compared to inline code for the same task.
  • Julia and R offer simple benchmarking tools. We wrote a simple Python tool that allows us to run Python test cases as many times as we wish.

 

Loops and Vectorization:

  • Python (and Numpy), IDL, and R consistently run more quickly when vectorized compared to when using loops.
  • When using Numba, Python is faster with loops as long as Numpy arrays are used.
  • With Julia, loops run more quickly than vectorized code.
  • Matlab does not appear to change significantly in performance when using loops versus vectorization in a case that involves no calculations. When calculations are performed, vectorized Matlab code is faster than iterative code.

 

String Manipulations:

  • Java and Scala appear to have notable performance relative to the other languages when manipulating large strings.

 

Numerical Calculations:

  • R appears to have notable performance relative to the other languages when using recursion.
  • Languages' performance in numerical calculation relative to the others depends on the specific task.
  • Matlab's intrinsic FFT function seems to run the most quickly.

 

Input/Output:

  • While some of the languages run the test more quickly than others, running the test on a local Mac instead of the processor node results in the largest performance gain. The processor node uses hard drives, whereas the Mac has a solid-state disk. This indicates that hardware has a larger impact on I/O performance than the language used.

_______________________________________________________________________________

Acknowledgements

This work partially funded by Michigan Space Grant Consortium, NASA grant #NNX15AJ20H.

 

_______________________________________________________________________________

References

  1. Justin Domke, Julia, Matlab and C, September 17, 2012.
  2. Murli M. Gupta, A fourth Order poisson solver, Journal of Computational Physics, 55(1):166-172, 1984.


_______________________________________________________________________________

Source Files

We are currently working on making the files open-source.

Comments (0)

Bookmarked By (0)

More Like This

  • Retrieving data ...

More by Alexander Medema

USAGov logo NASA Logo - nasa.gov