Mandelbrot
Purpose
Mandelbrot plot generation is a good example to demonstrate the ability of the language to make efficient use of hardware resources as it requires creating a data structure to hold complex numbers. Some languages including Python already have a builtin Complex data type but the builtin type is intentionally ignored here to make the benchmark a more realistic representation of performance in presence of user defined types.
The Code
The mandelbrot set is generated
by iteratively applying the following function on each point c
in the complex
plane starting with z = 0
:
Those points c
for which the final value of z
remains bounded are
considered to be within the set.
The complex number type
Let's start with a simple mandelbrot demonstration. First we need a class to represent complex numbers:
type Complex {
#default
= (s Self, o Self)
fun construct(s Self, real Float64 = 0, imag Float64 = 0) -> Self {
s.real = real
s.imag = imag
}
+ (s Self, o Self) -> SelfType {
Complex(s.real + o.real, s.imag + o.imag)
}
* (s Self, o Self) -> SelfType {
Complex(s.real * o.real - s.imag * o.imag,
s.real * o.imag + s.imag * o.real)
}
fun norm(s Self) -> Float64 { s.real * s.real + s.imag * s.imag }
real Float64
imag Float64
}
class Complex:
def __init__(self, r, i):
self.real = r
self.imag = i
def __add__(lhs, rhs):
return Complex(lhs.real + rhs.real, lhs.imag + rhs.imag)
def __mul__(lhs, rhs):
return Complex(lhs.real * rhs.real - lhs.imag * rhs.imag, \
lhs.real * rhs.imag + lhs.imag * rhs.real)
def norm(self):
return self.real * self.real + self.imag * self.imag
class MyComplex
attr_reader :real, :imag
def initialize(real, imag)
@real = real
@imag = imag
end
def +(other)
MyComplex.new(@real + other.real, @imag + other.imag)
end
def *(other)
MyComplex.new(@real * other.real - @imag * other.imag,
@real * other.imag + @imag * other.real)
end
def norm
@real * @real + @imag * @imag
end
end
struct C64
real::Float64
imag::Float64
end
function Base.:+(c1::C64, c2::C64)
return C64(c1.real + c2.real, c1.imag + c2.imag)
end
function Base.:*(c1::C64, c2::C64)
real_part = c1.real * c2.real - c1.imag * c2.imag
imag_part = c1.real * c2.imag + c1.imag * c2.real
return C64(real_part, imag_part)
end
function norm(c::C64)
return c.real * c.real + c.imag * c.imag
end
#include <iostream>
#include <chrono>
#include <vector>
class Complex {
public:
Complex(double r, double i) : real(r), imag(i) {}
Complex operator+(const Complex &o) const {
return Complex(real + o.real, imag + o.imag);
}
Complex operator*(const Complex&o) const {
return Complex(real * o.real - imag * o.imag,
real * o.imag + imag * o.real);
}
double norm() const {
return real * real + imag * imag;
}
double real;
double imag;
};
The kernel
To compute the mandelbrot set in practice the number of iterations are are going to be limited. The term bounded is also going to be defined as staying lower than a certain value for all iterations. The following kernel used to generate mandelbrot plots returns the number of iterations after which the above function diverges or otherwise will return the maximum number of iterations defined:
const! kImageWidth = 1920
const! kImageHeight = 1080
const! kDPI = 200
const! kXMin = -2.1
const! kXMax = 1.3
const! kYMin = -1.5
const! kYMax = 1.5
const! kMaxIters = 100
const! kNormBound = 32.0
fun mandelbrot_kernel(c) {
z = c
for i in ..kMaxIters {
if z.norm() > kNormBound then return i end
z = z * z + c
}
kMaxIters
}
const! kImageWidth = 1920
const! kImageHeight = 1080
const! kDPI = 200
const! kXMin = -2.1
const! kXMax = 1.3
const! kYMin = -1.5
const! kYMax = 1.5
const! kMaxIters = 100
const! kNormBound = 32.0
#Gambol.function.nounwind
fun mandelbrot_kernel_opt(c) {
z = c
for i in ..kMaxIters {
if z.norm() > kNormBound then return i end
z = z * z + c
}
kMaxIters
}
IMAGE_WIDTH = 1920
IMAGE_HEIGHT = 1080
DPI = 200
xmin = -2.1
xmax = 1.3
ymin = -1.5
ymax = 1.5
MAX_ITERS = 100
NORM_BOUND = 32.0
def mandelbrot_kernel(c):
z = c
for i in range(MAX_ITERS):
z = z * z + c
if z.norm() > NORM_BOUND:
return i
return MAX_ITERS
IMAGE_WIDTH = 1920
IMAGE_HEIGHT = 1080
DPI = 200
XMIN = -2.1
XMAX = 1.3
YMIN = -1.5
YMAX = 1.5
MAX_ITERS = 100
NORM_BOUND = 32.0
def mandelbrot_kernel(c)
z = c
for i in 0..MAX_ITERS
if z.norm > NORM_BOUND
return i
end
z = z * z + c
end
MAX_ITERS
end
IMAGE_WIDTH = 1920
IMAGE_HEIGHT = 1080
DPI = 200
xmin = -2.1
xmax = 1.3
ymin = -1.5
ymax = 1.5
MAX_ITERS = 100
NORM_BOUND = 32.0
function mandelbrot_kernel(c)
z = c
for i = 1:MAX_ITERS
if norm(z) > NORM_BOUND
return i
end
z = z * z + c
end
return MAX_ITERS
end
constexpr int64_t kImageWidth = 1920;
constexpr int64_t kImageHeight = 1080;
constexpr int64_t kDPI = 200;
constexpr double kXMin = -2.1;
constexpr double kXMax = 1.3;
constexpr double kYMin = -1.5;
constexpr double kYMax = 1.5;
constexpr int64_t kMaxIters = 100;
constexpr double kNormBound = 32.0;
int64_t mandelbrot_kernel(Complex c) {
auto z = c;
for (int64_t i = 0; i < kMaxIters; ++i) {
if (z.norm() > kNormBound) return i;
z = z * z + c;
}
return kMaxIters;
}
Applying the kernel to the complex plane
Next we define a function to compute the mandelbrot plot and put the results into a list along with the execution time:
fun compute_mandelbrot(result List[Int64]) {
dx = (kXMax - kXMin) / kImageWidth
dy = (kYMax - kYMin) / kImageHeight
y = kYMin
start_time = Time.checkpoint()
for j in ..kImageHeight {
x = kXMin
for i in ..kImageWidth {
result[i + j*kImageWidth] = mandelbrot_kernel(Complex(x, y))
x += dx
}
y += dy
}
end_time = Time.checkpoint()
end_time - start_time
}
#Gambol.function.nounwind
fun compute_mandelbrot_opt(result List[Int64]) {
dx = (kXMax - kXMin) / kImageWidth
dy = (kYMax - kYMin) / kImageHeight
y = kYMin
start_time = Time.checkpoint()
for j in ..kImageHeight {
x = kXMin
for i in ..kImageWidth {
result.set_unchecked(mandelbrot_kernel_opt(Complex(x, y)), i + j*kImageWidth)
x += dx
}
y += dy
}
end_time = Time.checkpoint()
end_time - start_time
}
import time
def compute_mandelbrot():
result = [0] * IMAGE_HEIGHT * IMAGE_WIDTH
dx = (xmax - xmin) / IMAGE_WIDTH
dy = (ymax - ymin) / IMAGE_HEIGHT
y = ymin
start = time.time()
for j in range(IMAGE_HEIGHT):
x = xmin
for i in range(IMAGE_WIDTH):
result[i + j*IMAGE_WIDTH] = mandelbrot_kernel(Complex(x, y))
x += dx
y += dy
end = time.time()
print(end - start)
return result
def compute_mandelbrot(result)
dx = (XMAX - XMIN) / IMAGE_WIDTH
dy = (YMAX - YMIN) / IMAGE_HEIGHT
y = YMIN
start = Time.now
for j in 0..IMAGE_HEIGHT
x = XMIN
for i in 0..IMAGE_WIDTH
result.push(mandelbrot_kernel(MyComplex.new(x, y)))
x += dx
end
y += dy
end
puts Time.now - start
end
function compute_mandelbrot()
result = zeros(IMAGE_HEIGHT, IMAGE_WIDTH)
dx = (xmax - xmin) / IMAGE_WIDTH
dy = (ymax - ymin) / IMAGE_HEIGHT
y = ymin
elapsed = @elapsed begin
for j = 1:IMAGE_HEIGHT
x = xmin
for i = 1:IMAGE_WIDTH
result[j, i] = mandelbrot_kernel(C64(x, y))
x += dx
end
y += dy
end
end
println("elapsed ", elapsed)
return result
end
void compute_mandelbrot(std::vector<int64_t> &result) {
auto dx = (kXMax - kXMin) / kImageWidth;
auto dy = (kYMax - kYMin) / kImageHeight;
auto y = kYMin;
auto start = std::chrono::nanoseconds(
std::chrono::steady_clock::now().time_since_epoch()).count();
for (int64_t j = 0; j < kImageHeight; ++j) {
auto x = kXMin;
for (int64_t i = 0; i < kImageWidth; ++i) {
result[i + j*kImageWidth] = mandelbrot_kernel(Complex(x, y));
x += dx;
}
y += dy;
}
auto elapsed = std::chrono::nanoseconds(
std::chrono::steady_clock::now().time_since_epoch()).count() - start;
std::cout << (elapsed/1000000) << std::endl;
}
Generating the plot
And finally to measure the execusion time and plot the results we rely on Python's matplotlib and numpy:
Note
Plotting is not implemented for C++ and Ruby
list auto = List[Int64].full(0, kImageWidth*kImageHeight)
elapsed = compute_mandelbrot(list)
print(elapsed)
import python as py
plt = py.import(`matplotlib.pyplot`)
colors = py.import(`matplotlib.colors`)
np = py.import(`numpy`)
#! converting our List[Int64] to an np array through a Python memory view is much faster
np_arr = np.asarray(py.PythonObject.memory_view(list))
np_arr = np_arr.reshape([kImageHeight, kImageWidth])
fig = plt.figure()
fig.set_figwidth((_ NFloat = kImageWidth)/kDPI)
fig.set_figheight((_ NFloat = kImageHeight)/kDPI)
fig.set_dpi(kDPI)
fig.add_axes([0.0, 0.0, 1.0, 1.0])
light = colors.LightSource(0, 90, 0, 1, 1, 0)
image =
light.shade(np_arr, plt.cm.PuOr, py._None, `overlay`, 0, kMaxIters, 2.0)
plt.imshow(image)
plt.axis(`off`)
plt.show()
import numpy as np
from matplotlib import pyplot as plt
from matplotlib import colors
def make_plot(l):
np_arr = np.array(l)
np_arr = np_arr.reshape((IMAGE_HEIGHT, IMAGE_WIDTH))
fig = plt.figure()
fig.set_figwidth(float(IMAGE_WIDTH)/DPI)
fig.set_figheight(float(IMAGE_HEIGHT)/DPI)
fig.set_dpi(DPI)
fig.add_axes((0.0, 0.0, 1.0, 1.0))
light = colors.LightSource(0, 45, 0, 1, 1, 0)
image = \
light.shade(np_arr, plt.cm.PuOr, None, "overlay", 0, MAX_ITERS, 2.0)
plt.imshow(image)
plt.axis("off")
plt.show()
if __name__ == '__main__':
l = compute_mandelbrot()
make_plot(l)
list = Array.new IMAGE_WIDTH*IMAGE_HEIGHT, 0
compute_mandelbrot(list)
result = compute_mandelbrot()
using Plots
x_range = range(xmin, xmax, IMAGE_WIDTH)
y_range = range(ymin, ymax, IMAGE_HEIGHT)
display(heatmap(x_range, y_range, result, color=:PuOr, cbar=true))
readline()
int main() {
std::vector<int64_t> result(kImageWidth * kImageHeight, 0);
compute_mandelbrot(result);
//to prevent the optimizer from removing the above
std::cout << result[0] << std::endl;
return 0;
}
Results
The above code produces the following plot and execution times:
Gambol (0.0.0) | Gambol Optimized | Python (3.12) | Ruby (3.3) | Julia (1.11) | C++ (Clang++ 18.1) | C++ (g++ 14.2) | |
---|---|---|---|---|---|---|---|
Time | 236ms | 169ms | 49.8s | 40.4s | 4.24s | 188ms | 155ms |
That is a 211x speedup over Python and a surprising 17x speedup over Julia just by writing the same code in Gambol with no modification to the algorithm! The optimized Gambol version is still using exactly the same algorithm with a couple of adjustments to implmentation to disable exception handling and array bounds checking.
There are additional ways to speed up the Mandelbrot algorithm. For example by
rearranging the math formula, fma
instructions and SIMD
could be used to
take advantage of vector instructions in the hardware. Parallelization would
also linearly scale for this particular algorithm though it can be implemented
in Python and other languages as well. This would yield results that are
thousands of times faster than the above canonical single threaded Python
implementation on the right hardware. However optimizations to the algorithm
itself eventhough they demonstrate the capability to take advantage of hardware
don't exactly serve the purpose of this benchmark and are not included here.
There is an excellent 3-part blog
post
by Abdul Dakkak on this subject using the Mojo programming language instead and
the exact same techniques could be implemented in Gambol.