The Julia language for Scientific Computing

Sebastian Nowozin - Fri 02 October 2015 -

Julia is a relatively new programming language with the declared goal to become the leading language for scientific computing.

I have probably annoyed half of my colleagues by raving about how great the language is and what it is good at. Before we get to this, and in my defense, let me provide some context. I have been developing using C and C++ for 20 years now, and have been using Matlab and Python for over ten years now. These are great languages and I can be productive using each, infact I continue to use them regularly.

Also, I tend to be quite conservative in terms of adopting new languages or development tools: while learning a new language and environment is fun it also takes a lot of effort and most languages/tools/libraries tend to come and go rather quickly and every developer carries with him a graveyard of tools and languages long gone.

Because of this short-lived nature of software, when someone approaches me with a new language or tool I am skeptical by default, and my litmus test question is usually how confident they are that this tool will still be around in five years time. This is of course unfair, but I prefer to invest my time in learning things that have long term value. Which brings me to the point that I firmly believe Julia is here to stay and in fact may even become a popular language in scientific computing.

Enough rambling, let's get to the good parts.

I have been using Julia for the last 18 month now, both for work and pleasure. Counting all code I wrote at work (just counting .jl files, no notebooks) I see that I wrote more than 15k lines of Julia code in that time, including several larger projects, ports of existing Matlab and C++ code, and interfaces to C libraries. Given my experience Julia is ready for production in internal projects (as opposed to shipping executable code to a customer) and in particular is very well suited to research-type projects.

Logo of the Julia language

Julia

Developing code for research projects is in many ways similar to developing other software, but the key difference for me is that I need a quick turnaround time from idea to result not just once but in multiple iterations, sometimes changing the idea and implementation drastically.

In a very real sense most research projects should fail to achieve their original goals; almost by definition research is beyond what is known to work. If you only attempt known-to-work ideas it is not research. If your project fails it is important to learn as much as possible from the failure, that is, increasing the understanding of the problem and finding suitable new research ideas, and quick iterations make this process fun. The new ideas are often variants of earlier ideas and thus can reuse code. If this code happens to be compact and flexible this translates directly into productivity.

Matlab, R, and Python achieve this tight cycle of iterations quite successfully, but in all three languages there is a price towards the later iterations in that for achieving a high performance implementation significant parts of the code needs to be rewritten in a more basic language such as C++, which then needs to be interfaced to the other code through some interface specification. For big high-value projects in industry with dedicated engineering support the additional effort required is typically not a problem, but for individual researchers it means hours and days spend writing additional code without adding functionality.

This process is cumbersome, errorprone, and creates a strong coupling, making further iterations of changing ideas and implementations slower. (As an example, in my grante library I prototyped many algorithms in Matlab, then programmed them in C++, then wrote a Matlab interface which by itself is almost 2,000 lines of C++ code.)

Julia also achieves this tight cycle, but does not require you to resort to compiled statically-typed languages such as C++ in order to achieve high performance. Using a single language maintains productivity both at the very beginning (prototyping) and towards the later iterations (productization).

Productivity in Julia (roughly "scientific results per wallclock developer time") is achieved through a number of features:

  • compact syntax, for example I can declare a function using f(x) = 2x+5. As mentioned above, I see the advantage of a compact syntax not in the keystrokes saved initially, but in lowering the barrier to future understanding and modification as the code evolves.
  • optional type annotation, the above function will work for x being an integer, or a float, or anything that has a multiplication and addition with integer arguments defined; in fact, I could write f(x::Float64) = 2x+5 to require that x is a float, but performance-wise they both yield the same code. This means that I can be strict about types when I need to be, but have the feel of a dynamic programming language.
  • Jupyter notebook interface for quick think-implement-results cycles.
  • excellent default choices of numerical libraries, dense linear algebra, sparse linear algebra, numerical optimization libraries, arbitrary precision computation, special functions, FFT, etcetera, most of what you can wish for in a technical computing environment is already there by default or in the many numerical packages available. In terms of numerical optimization codes Julia is probably one of the best environments available. All these libraries are carefully chosen to be the best-in-class for the functions that they implement.
  • foreign function interfaces to a number of languages: C and Fortran, C++ (unfortunately planned only for Julia 0.5), Python, R, Matlab. This makes it relatively easy to use code in any of these languages and I have used several Python libraries without issues.
  • high performance, I regularly find my first-attempt Julia code for a problem to be an order of magnitude faster than the equivalent Matlab code. Infact, I unlearned a number of bad Matlab programming patterns such as using bsxfun and vectorizing all code. Last year I wrote Julia code for a R-tree data structure to maintain a dynamic spatial index. Doing this in Matlab/R/Python in a reasonably performant way would be unthinkable! Instead you have to resort to wrapping native libraries. In Julia it was fun to write and it is fast, and I could add the required methods I needed for my application easily, including fancy filtering iterators.
  • no separation between user and developer, almost all of the base library is implemented in Julia itself, and it is easy to find where things are. For example, if you want to find out how two complex numbers are multiplied in Julia's base library? Enter methods(*) and have a look! This transparency makes it easy to learn good Julian style and extends further to how code is run: Want to see what machine code is executed when you call the sqrt function on a single precision float? Enter code_native(sqrt, (Float32,)) and see
.text
    Filename: math.jl
Source line: 132
    push    RBP
    mov RBP, RSP
    xorps   XMM1, XMM1
    ucomiss XMM1, XMM0
Source line: 132
    ja  6
    sqrtss  XMM0, XMM0
    pop RBP
    ret
    movabs  RAX, 140269793784104
    mov RDI, QWORD PTR [RAX]
    movabs  RAX, 140269778958624
    mov ESI, 132
    call    RAX

Almost nothing is hidden from the eyes of the user and this makes it easy and fun to look into the implementation.

Weak parts

Julia, while ready for serious use, is not yet at version 1.0 and lacks several important features. In my work, I found the following pieces missing (as of version 0.4).

  • Simple single machine parallelism. In C/C++/Fortran this would be OpenMP and in Matlab it is parfor. While Julia does have good support for distributed parallel computing, it currently does not have simple single-machine parallelism. In my experience using the distributed computing abstractions for single machine parallelism has severe performance overheads because all data is serialized and remote method invocations are used to execute code. (Also, I found the use of @everywhere macros cumbersome.) Apparently a simpler single machine parallelism is difficult to implement but in the works, as shown in this recent work by Intel presented at JuliaCon 2015.
  • Debugger. Quite simply, a debugger is essential for larger projects where errors can arise that are difficult to understand and debug without being able to interactively inspect the context in which the error appeared. Currently Julia has Debug.jl which provides debugging at gdb level in terms of functionality. But Julia lacks an interactive debugging capability on par with what is available in Matlab or most C/C++ environments (actually, I am not sure about Python debuggers here, is there a single popular tool?). As far as I understand, this is planned for the 0.5 version of Julia.
  • Shipping/productization/static-compilation. With this I mean the ability to select the distribution mechanism for the software, in particular to select whether all dependencies are included so that the software "will just run" on the target system, and whether binaries or source code is delivered. For most researchers and open-source programmers this is not an issue and the Julia package system caters for all their needs, but I found it relevant in a company environment because explaining to someone how they install Julia and a piece of code takes a while, whereas for C++ I can typically easily send an executable file and some library dependencies. As far as I understand, static compilation is planned for a future version of Julia.

Further Reading

If you want to give Julia a spin, here are a few links:

Packages which I use frequently and can recommend:

The Julia package ecosystem has a lot more packages, so if you are looking for a particular thing, have a look there.