Mandelbrot

Purpose

Mandelbrot plot generation is a good example to demonstrate the ability of the language to make efficient use of hardware resources as it requires creating a data structure to hold complex numbers. Some languages including Python already have a builtin Complex data type but the builtin type is intentionally ignored here to make the benchmark a more realistic representation of performance in presence of user defined types.

The Code

The mandelbrot set is generated by iteratively applying the following function on each point c in the complex plane starting with z = 0:

\[ f_{c}(z) = z^{2}+c \]

Those points c for which the final value of z remains bounded are considered to be within the set.

The complex number type

Let's start with a simple mandelbrot demonstration. First we need a class to represent complex numbers:

GambolPythonRubyJuliaC++

type Complex {

  #default
  = (s Self, o Self)

  fun construct(s Self, real Float64 = 0, imag Float64 = 0) -> Self {

    s.real = real
    s.imag = imag
  }

  + (s Self, o Self) -> SelfType { 

    Complex(s.real + o.real, s.imag + o.imag)
  }

  * (s Self, o Self) -> SelfType {

    Complex(s.real * o.real - s.imag * o.imag,
            s.real * o.imag + s.imag * o.real)
  }

  fun norm(s Self) -> Float64 { s.real * s.real + s.imag * s.imag }

  real Float64
  imag Float64
}

class Complex:

    def __init__(self, r, i):
        self.real = r
        self.imag = i

    def __add__(lhs, rhs):
        return Complex(lhs.real + rhs.real, lhs.imag + rhs.imag)

    def __mul__(lhs, rhs):
        return Complex(lhs.real * rhs.real - lhs.imag * rhs.imag, \
                lhs.real * rhs.imag + lhs.imag * rhs.real)

    def norm(self):
        return self.real * self.real + self.imag * self.imag

class MyComplex
  attr_reader :real, :imag

  def initialize(real, imag)
    @real = real
    @imag = imag
  end

  def +(other)
    MyComplex.new(@real + other.real, @imag + other.imag)
  end

  def *(other)
    MyComplex.new(@real * other.real - @imag * other.imag,
                @real * other.imag + @imag * other.real)
  end

  def norm
    @real * @real + @imag * @imag
  end
end

struct C64
    real::Float64
    imag::Float64
end

function Base.:+(c1::C64, c2::C64)
    return C64(c1.real + c2.real, c1.imag + c2.imag)
end

function Base.:*(c1::C64, c2::C64)
    real_part = c1.real * c2.real - c1.imag * c2.imag
    imag_part = c1.real * c2.imag + c1.imag * c2.real
    return C64(real_part, imag_part)
end

function norm(c::C64)
  return c.real * c.real + c.imag * c.imag
end

#include <iostream>
#include <chrono>
#include <vector>


class Complex {
  public:
    Complex(double r, double i) : real(r), imag(i) {}

    Complex operator+(const Complex &o) const {
      return Complex(real + o.real, imag + o.imag);
    }

    Complex operator*(const Complex&o) const {
      return Complex(real * o.real - imag * o.imag,
                     real * o.imag + imag * o.real);
    }

    double norm() const {
      return real * real + imag * imag;
    }

    double real;
    double imag;
};

The kernel

To compute the mandelbrot set in practice the number of iterations are are going to be limited. The term bounded is also going to be defined as staying lower than a certain value for all iterations. The following kernel used to generate mandelbrot plots returns the number of iterations after which the above function diverges or otherwise will return the maximum number of iterations defined:

GambolGambol OptimizedPythonRubyJuliaC++

const! kImageWidth = 1920
const! kImageHeight = 1080
const! kDPI = 200

const! kXMin = -2.1
const! kXMax = 1.3
const! kYMin = -1.5
const! kYMax = 1.5
const! kMaxIters = 100
const! kNormBound = 32.0

fun mandelbrot_kernel(c) {

  z = c

  for i in ..kMaxIters {

    if z.norm() > kNormBound then return i end

    z = z * z + c

  }

  kMaxIters 
}

const! kImageWidth = 1920
const! kImageHeight = 1080
const! kDPI = 200

const! kXMin = -2.1
const! kXMax = 1.3
const! kYMin = -1.5
const! kYMax = 1.5
const! kMaxIters = 100
const! kNormBound = 32.0

#Gambol.function.nounwind
fun mandelbrot_kernel_opt(c) {

  z = c

  for i in ..kMaxIters {

    if z.norm() > kNormBound then return i end

    z = z * z + c

  }

  kMaxIters 
}

IMAGE_WIDTH = 1920
IMAGE_HEIGHT = 1080
DPI = 200

xmin = -2.1
xmax = 1.3
ymin = -1.5
ymax = 1.5
MAX_ITERS = 100
NORM_BOUND = 32.0

def mandelbrot_kernel(c):

    z = c

    for i in range(MAX_ITERS):

        z = z * z + c

        if z.norm() > NORM_BOUND:
            return i

    return MAX_ITERS

IMAGE_WIDTH = 1920
IMAGE_HEIGHT = 1080
DPI = 200

XMIN = -2.1
XMAX = 1.3
YMIN = -1.5
YMAX = 1.5
MAX_ITERS = 100
NORM_BOUND = 32.0


def mandelbrot_kernel(c)

  z = c

  for i in 0..MAX_ITERS

    if z.norm > NORM_BOUND 
      return i
    end

    z = z * z + c
  end

  MAX_ITERS
end

IMAGE_WIDTH = 1920
IMAGE_HEIGHT = 1080
DPI = 200

xmin = -2.1
xmax = 1.3
ymin = -1.5
ymax = 1.5
MAX_ITERS = 100
NORM_BOUND = 32.0

function mandelbrot_kernel(c)

  z = c

  for i = 1:MAX_ITERS

    if norm(z) > NORM_BOUND
      return i
    end

    z = z * z + c
  end

  return MAX_ITERS
end

constexpr int64_t kImageWidth = 1920;
constexpr int64_t kImageHeight = 1080;
constexpr int64_t kDPI = 200;

constexpr double kXMin = -2.1;
constexpr double kXMax = 1.3;
constexpr double kYMin = -1.5;
constexpr double kYMax = 1.5;
constexpr int64_t kMaxIters = 100;
constexpr double kNormBound = 32.0;


int64_t mandelbrot_kernel(Complex c) {

  auto z = c;

  for (int64_t i = 0; i < kMaxIters; ++i) {

    if (z.norm() > kNormBound) return i;

    z = z * z + c;
  }

  return kMaxIters;
}

Applying the kernel to the complex plane

Next we define a function to compute the mandelbrot plot and put the results into a list along with the execution time:

GambolGambol OptimizedPythonRubyJuliaC++

fun compute_mandelbrot(result List[Int64]) {

  dx = (kXMax - kXMin) / kImageWidth
  dy = (kYMax - kYMin) / kImageHeight

  y = kYMin

  start_time = Time.checkpoint()

  for j in ..kImageHeight {

    x = kXMin

    for i in ..kImageWidth {

      result[i + j*kImageWidth] = mandelbrot_kernel(Complex(x, y))

      x += dx
    }

    y += dy
  }

  end_time = Time.checkpoint()

  end_time - start_time
}

#Gambol.function.nounwind
fun compute_mandelbrot_opt(result List[Int64]) {

  dx = (kXMax - kXMin) / kImageWidth
  dy = (kYMax - kYMin) / kImageHeight

  y = kYMin

  start_time = Time.checkpoint()

  for j in ..kImageHeight {

    x = kXMin

    for i in ..kImageWidth {

      result.set_unchecked(mandelbrot_kernel_opt(Complex(x, y)), i + j*kImageWidth)

      x += dx
    }

    y += dy
  }

  end_time = Time.checkpoint()

  end_time - start_time
}

import time

def compute_mandelbrot():

    result = [0] * IMAGE_HEIGHT * IMAGE_WIDTH

    dx = (xmax - xmin) / IMAGE_WIDTH
    dy = (ymax - ymin) / IMAGE_HEIGHT

    y = ymin

    start = time.time()

    for j in range(IMAGE_HEIGHT):

        x = xmin

        for i in range(IMAGE_WIDTH):
            result[i + j*IMAGE_WIDTH] = mandelbrot_kernel(Complex(x, y))
            x += dx

        y += dy

    end = time.time()

    print(end - start)

    return result

def compute_mandelbrot(result)

  dx = (XMAX - XMIN) / IMAGE_WIDTH
  dy = (YMAX - YMIN) / IMAGE_HEIGHT

  y = YMIN

  start = Time.now

  for j in 0..IMAGE_HEIGHT

    x = XMIN

    for i in 0..IMAGE_WIDTH

      result.push(mandelbrot_kernel(MyComplex.new(x, y)))
      x += dx

    end

    y += dy
  end

  puts Time.now - start

end

function compute_mandelbrot()
  result = zeros(IMAGE_HEIGHT, IMAGE_WIDTH)

  dx = (xmax - xmin) / IMAGE_WIDTH
  dy = (ymax - ymin) / IMAGE_HEIGHT

  y = ymin

  elapsed = @elapsed begin

    for j = 1:IMAGE_HEIGHT

      x = xmin

      for i = 1:IMAGE_WIDTH

        result[j, i] = mandelbrot_kernel(C64(x, y))
        x += dx

      end

      y += dy
    end
  end

  println("elapsed ", elapsed)
  return result
end

void compute_mandelbrot(std::vector<int64_t> &result) {


  auto dx = (kXMax - kXMin) / kImageWidth;
  auto dy = (kYMax - kYMin) / kImageHeight;

  auto y = kYMin;

  auto start = std::chrono::nanoseconds(
      std::chrono::steady_clock::now().time_since_epoch()).count();

  for (int64_t j = 0; j < kImageHeight; ++j) {

    auto x = kXMin;

    for (int64_t i = 0; i < kImageWidth; ++i) {

      result[i + j*kImageWidth] = mandelbrot_kernel(Complex(x, y));

      x += dx;
    }

    y += dy;
  }

  auto elapsed = std::chrono::nanoseconds(
      std::chrono::steady_clock::now().time_since_epoch()).count() - start;

  std::cout << (elapsed/1000000) << std::endl;
}

Generating the plot

And finally to measure the execusion time and plot the results we rely on Python's matplotlib and numpy:

Note

Plotting is not implemented for C++ and Ruby

GambolPythonRubyJuliaC++

list auto = List[Int64].full(0, kImageWidth*kImageHeight)

elapsed = compute_mandelbrot(list)
print(elapsed)

import python as py

plt = py.import(`matplotlib.pyplot`)
colors = py.import(`matplotlib.colors`)
np = py.import(`numpy`)

#! converting our List[Int64] to an np array through a Python memory view is much faster 
np_arr = np.asarray(py.PythonObject.memory_view(list))
np_arr = np_arr.reshape([kImageHeight, kImageWidth])

fig = plt.figure()
fig.set_figwidth((_ NFloat = kImageWidth)/kDPI)
fig.set_figheight((_ NFloat = kImageHeight)/kDPI)
fig.set_dpi(kDPI)
fig.add_axes([0.0, 0.0, 1.0, 1.0])
light = colors.LightSource(0, 90, 0, 1, 1, 0)

image = 
  light.shade(np_arr, plt.cm.PuOr, py._None, `overlay`, 0, kMaxIters, 2.0)

plt.imshow(image)
plt.axis(`off`)
plt.show()

import numpy as np
from matplotlib import pyplot as plt
from matplotlib import colors


def make_plot(l):

    np_arr = np.array(l)
    np_arr = np_arr.reshape((IMAGE_HEIGHT, IMAGE_WIDTH))

    fig = plt.figure()
    fig.set_figwidth(float(IMAGE_WIDTH)/DPI)
    fig.set_figheight(float(IMAGE_HEIGHT)/DPI)
    fig.set_dpi(DPI)
    fig.add_axes((0.0, 0.0, 1.0, 1.0))
    light = colors.LightSource(0, 45, 0, 1, 1, 0)

    image = \
      light.shade(np_arr, plt.cm.PuOr, None, "overlay", 0, MAX_ITERS, 2.0)

    plt.imshow(image)
    plt.axis("off")
    plt.show()

if __name__ == '__main__':
  l = compute_mandelbrot()
  make_plot(l)

list = Array.new IMAGE_WIDTH*IMAGE_HEIGHT, 0

compute_mandelbrot(list)

result = compute_mandelbrot()

using Plots
x_range = range(xmin, xmax, IMAGE_WIDTH)
y_range = range(ymin, ymax, IMAGE_HEIGHT)
display(heatmap(x_range, y_range, result, color=:PuOr, cbar=true))
readline()

int main() {

    std::vector<int64_t> result(kImageWidth * kImageHeight, 0);

    compute_mandelbrot(result);

    //to prevent the optimizer from removing the above
    std::cout << result[0] << std::endl; 

    return 0;
}

Results

The above code produces the following plot and execution times:

Mandelbrot plot generated by Gambol and visualized using numpy and matplotlib

	Gambol (0.0.0)	Gambol Optimized	Python (3.12)	Ruby (3.3)	Julia (1.11)	C++ (Clang++ 18.1)	C++ (g++ 14.2)
Time	236ms	169ms	49.8s	40.4s	4.24s	188ms	155ms

That is a 211x speedup over Python and a surprising 17x speedup over Julia just by writing the same code in Gambol with no modification to the algorithm! The optimized Gambol version is still using exactly the same algorithm with a couple of adjustments to implmentation to disable exception handling and array bounds checking.

There are additional ways to speed up the Mandelbrot algorithm. For example by rearranging the math formula, fma instructions and SIMD could be used to take advantage of vector instructions in the hardware. Parallelization would also linearly scale for this particular algorithm though it can be implemented in Python and other languages as well. This would yield results that are thousands of times faster than the above canonical single threaded Python implementation on the right hardware. However optimizations to the algorithm itself eventhough they demonstrate the capability to take advantage of hardware don't exactly serve the purpose of this benchmark and are not included here. There is an excellent 3-part blog post by Abdul Dakkak on this subject using the Mojo programming language instead and the exact same techniques could be implemented in Gambol.