Rust performance
As a part of thought experiment to create super fast Data processing engine at Petabyte scale, Programming language was one of the major consideration. I asked around my friends working/researching in the field of PL/compilers. Lot of them were quite excited about Rust and hence I started exploring it.
While learning about it, I came across the chapter “Comparing Performance: Loops vs. Iterators”. One of the things that caught my attention was this excerpt from “Foundations of C++” (2012) that explains:
“In general, C++ implementations obey the zero-overhead principle: What you don’t use, you don’t pay for. And further: What you do use, you couldn’t hand code any better.”
It did get me thinking that modern languages have introduced many user friendly abstractions and in process introducing performance overhead. So, I wanted to know how fast the Rust is compared to other JVM languages like Scala which are widely being used for processing large amounts of Data. Took an example in the same chapter which has`buffer` array of 10,000 integers and coefficient integer array of 12. For every element of `buffer`, we take dot product of 12 consecutive elements of the `buffer` starting from itself with the coefficient array. Used this example to measure performance in Scala and Rust.
Following snippets were used to check the performance difference:
On initial run, the difference in running the loop (excluding creation of buffer etc) was 12.55 ms in Rust vs 92.96 ms in Scala, almost 7x difference. Then I realized, I had run debug build of Rust through Cargo which was unoptimized. When ran the Rust’s optimized release build it just took 0.05 ms for the loop.
Rust: 0.05 ms vs Scala: 92.96 ms
Now that’s blazing fast !!
On closer look, I figured out declaration of Rust arrays had size specified whereas in Scala, Array collections were used without size information. Size information could very well be used by compiler to do Loop unrolling etc. However, internal buffers in Big Data system are generally user configurable and sizes may not be known at compile time always. Hence, I used the std::vec::Vec instead in Rust implementation and made sure I read both the arrays from file, so that compiler would not guess it’s size. New code snippet for Rust:
The time taken by new code did not change much: 0.086 ms.
However, there are other stuff like newer version of JVM (used version 8 for comparison) etc which might boost the performance of Scala too. Moreover, this example may not be representative of lot of things we might do in our applications. But it is great to see a modern programming language built on the principle of zero-cost abstraction providing functional features at the same time. If you are passionate about programming, you should definitely check out Rust.