Introduction
Gambol is a full spectrum progressively typed general purpose application programming language. It aims to bring together the fluidity and extensibility of scripting languages and the performance and safety of systems languages without being either. Gambol relies both on the state of the art and also pushes it further to implement a novel approach in tackling the above issue from the ground up. It is as a result the world's first general purpose progressively typed programming language to be both AoT (Ahead-of-Time) compiled and to provide full AST (Abstract Syntax Tree) reflection at runtime.
Overview
Gambol offers a new approach to solving the two language problem. Using a self caching compiler that lazily loads the live AST at runtime, Gambol can produce executables that are highly optimized, do not suffer from delays and inefficiencies of JIT compiled languages and at the same time offers the full dynamic experience of scripting languages. Gambol in addition offers improvements in many different areas from syntax to the type system, language features and the standard library. To get a peek at what code in Gambol looks like checkout the code snippets demonstrated below.
Syntax
Gambol draws inspiration from a variety of contemporary languages and strives to provide a syntax that is familiar but with improvements that conform to the priciple of progressive disclosure of capability. Let's get started with some naming conventions. Types are upper case, variables lower case and you could write all code in one line or multiple lines with optional semicolons to separate instructions:
w = `world`
print(`hello ` w ` from Gambol`) #! prints hello world from Gambol
The above Hello World program (and that's the entire program) also demonstrates basic string interpolation and automatic type assignment. The following are a few more example programs to illustrate the look and feel of Gambol's syntax.
This may be thought of as too complicated in some languages as a second example
and is not even supported in some other languages but hopefully Gambol's syntax
will make it easier on the eyes! - a user defined type with one property of
type Int64
overloading the +=
operator and the str
method.
type MyType {
+= (s Self, o Self) -> Self { s.prop += o.prop }
fun str(s Self) -> String { `value of prop is ` s.prop }
prop Int64
}
x = MyType() {prop: 1}
y auto = MyType() {prop: 2}
x += y
print(x) #! prints value of prop is 3
You probably also noticed the explicit variable declaration for y
with the
keyword auto
. Always a good practice to explicitly declare but sometimes you
may not have the time to invest. Here is another application
demonstrating progressive disclosure of capability and type inference:
fun add_no_type(a, b) { a + b }
fun add_with_type(a Int64, b Int64) -> Int64 { a + b }
#Gambol.function.alwaysinline
fun add_parametric[T](a T, b Int16) -> T { return a + b }
print(add_no_type(1, 2)) #! prints 3
print(add_with_type(1 Int32, 2)) #! prints 3
print(add_parametric(1, 2)) #! prints 3
And yes you can declare types for literals too including your own types!
basic list comprehension:
l = [3 * x for x in ..20 keeping x % 2 == 0]
print(l)
#! prints [0, 6, 12, 18, 24, 30, 36, 42, 48, 54]
There is a lot more to cover regarding the syntax, semantics, types system and others. If you'd like to learn more follow the Gettting Started Guide.
Type System
One of Gambol's primary goals is to expose every operation available in the
target CPU. This includes vector operations exposed through SIMD
and its
derivatives, fma
operations and other things normally not found in scripting
languages or even some systems languages.
This commitment to performance also means conferring to the programmer the ability to create types with value semantics as well as reference types, to offer unsafe operations when necessary and to enable creation of zero cost abstractions and libraries.
To that end Gambol offers a 4 tiered gradual type system with automatic memory management that allows for both performance focused applications and dynamic operations wherever needed. You could define the level of specificity you like when annotating a variable with a type to indicate if a variable is to strictly, structurally or nominally conform with that type. You could also create dynamic variables that can take up values of any type as in a scripting language.
Some of the additional tools Gambol offers to reuse code, write safe code, draw maximal performance and reduce cognitive load include:
- Automatic type assignment
- parametric polymorphism with automatic type inference
- compile time evaluation of expressions, branches and loops that enable efficient use of hardware resources and conditional compilation
- Hygienic macros
- strict or structural function interfaces
- support for object oriented and functional programming
- support for horizontal as well as vertical extension of types
- the RAII idiom
- enum variants
- arbitrary precision arithmatic
- Support for async/await with a generalized coroutine library
- threading library with no such limitations as Python's GIL
- safe closures that capture by value and by reference
- zero cost exceptions with full exception trace stack accessible at runtime
- function decorators and attributes
- low level features such as pointers, calling conventions, pass by reference etc.
The Mirror Library
The mirror
library provides facilities to parse, compile and run new code at
runtime. It also provides facilities to inspect every element of the live AST
if necessary. The AST is live in the sense that variables in the program e.g.
fat references to objects or functions point to the same AST. As mentioned
earlier AST nodes are loaded as needed to minimize memory usage.
The following is an example of using the mirror
library to inspect a function.
import mirror
fun my_fun(a Int32, b String) -> Int64 { a += b.len(); 123 * a }
f = mirror.Function(my_fun)
print(`function's symbol: ` f.symbol)
print(`number of positional parameters: ` f.parameters.len())
print(`name of the second parameter: ` f.parameters[1].symbol)
print(`function's return type: ` f.return_type)
print(`first instruction in the function: ` f.body.instruction(0))
print(`\nentire function: \n\n` f)
output
function's symbol: my_fun
number of positional parameters: 2
name of the second parameter: b
function's return type: Int64
first instruction in the function: a += b.len()
entire function:
fun my_fun(a Int32, b String) -> Int64 {
a += b.len()
123 * a
};
But you could do even more with the mirror
library. Here is another example
of something that's talked a lot about mostly in the movies! (but also in
scripting languages). Nonetheless, it has its use cases in the real world as
well.
from mirror import Program, Module
a = 12
print(a) #! prints 12
prg_string = `a = 1332`
module, _ = Module.parse(prg_string)
Program.analyze(module)
Program.generate_code(module)
Program.run_module(module)
print(a) #! prints 1332
In the above example new code is created from a string but there are more ways to create new code. You could for example use the API of the mirror library to create new functions programmatically.
One important application of full runtime AST reflection is the possibility to
extend the language to new hardware domains like GPUs, TPUs or custom ASICs.
You could inspect code for a function and then generate binary for a custom
target. This has been one of Python's advantages over systems languages making
projects such as Taichi Lang
or Numba
possible.
Other applications of a fully reflective code include automatic differentiation
of functions, probabilistic programming and generating documentation from the
source code. As a readily available example of the latter the entire
documentation for the standard library on this website is generated from the
source code using a script that uses mirror
to find the signature of every
function and type and their corresponding documentation.
A last hypothetical use of the mirror
library is the creation of continuously
self modifying code that runs fast with applications in optimization or
Artificial Intelligence.
Interoperability
Gambol exposes the GNU libtool
with additional tools to help find and call
foreign functions easily. The ability to override the member access operator
and to define properties with string literals as names helps to create wrapper
modules that expose an idiomatic API for calling into foreign libraries. For
demonstration purposes only, a Python library is included to show the idiomatic
syntax for calling into Python libraries such as matplotlib
below:
from python import *
plt = import(`matplotlib.pyplot`)
plt.plot([1, 2, 3, 4])
plt.ylabel(`some numbers`)
plt.show()
There is a Jupyter kernel included that further demonstrates interactions
with python (in addition to running Gambol in Jupyter of course!). The kernel
uses the mirror
library to compile and run each code snippet.
It is also possible to define #extern
functions in Gambol to be called from
other programs.
In addition Gambol can generate debug symbols and integrate with debugging
tools such as asan
and others which could help shed light on issues in
situations that require low level debugging e.g. in presence of unsafe code.
Prior Art
Gambol's inception sparked primarily out of a frustration with the arsenal of
languages that dominate software development today. While languages such as
Python
, Ruby
and Lua
may seem to offer a much quicker pathway to
translate thoughts into code, their lack of type safety and a capable type
system often times make it more difficult to perform that translation
correctly if the project is or has the potential to grow larger or more
complex than initially planned.
Another important shortcoming of scripting languages is their inability to take
full advantage of the CPU. Where and if this becomes a priority the current
state of affairs is to make use of the extensibility of these languages and
rely on systems languages such as C
, C++
, Rust
etc. to do the heavy
lifting which creates what's known as the two language problem. Systems
languages are not known for their ease of use. They incur a much higher
cognitive burden on the programmer even in doing simple tasks as they require
tracking additional pieces of information throughout the code and sometimes
provide many different options at each step of coding that bifurcate,
trifurcate or break the chain of thought of the programmer. This prohibits not
only the joy of coding but could also hinder productivity at times when it
seems to be completely unnecessary. For example the level of control that
systems languages provide may not be necessary at all to merely take full
advantage of the hardware! it may be necessary to meet certain requirements
when bootstrapping an operating system but not when creating a higher level
application. Some systems languages like C++
also suffer from a bloated
syntax that has evolved over many decades to cram as many new features as
possible as newer competing languages discovered those useful higher level
constructs.
With all that said there have been attempts to combine the best of the two worlds before.
Just AoT Compile it Instead?
One approach is to try to create an AoT compiler for existing scripting
languages. One example of these efforts are projects that attempt to simulate
the syntax of scripting languages like Python
, Ruby
etc. with a compiler
that infers and adds static types during compilation in order to produce
performant code. Recent languages like Codon
and Crystal
would fall into
this category. This approach however does not address the deeper issue of
marrying the dynamic features of the language with the safety and performance
of static typing. Often times as in examples above, dynamic elements of the
parent language are omitted in these derivatives and with that omission goes
the flexibility and extensibility that the scripting languages provide as those
traits arise out of more than just having a nice syntax.
Language Extensions
A different category of languages claim to gradually implement enough features
to one day become supersets of the languages they try to emulate. The
definition of superset here being that the ideal language encompasses the
entire syntax and semantics of the parent language with additional features
added usually to improve performance of the code. Cython
(created 17 years
ago) is an example of such a language that attempts to compile Python code into
C
. It however falls short of its goals to cover the entire Python semantics
(you can't for example inspect Cython code like you can in Python) and the
resultant program is not as efficient as a program that's hand optimized in C.
It also makes debugging such programs extremely difficult as there are multiple
compilers and languages involved in the process. A more recent systems language
Mojo
uses the same approach to compile Python-looking programs to MLIR and
eventually LLVM. While Mojo's static code runs fast due to compiler
optimizations and its compile time evaluation capabilities, it outsources all
dynamic functionality to the Python interpreter (CPython) and therefore loses
the flexibility and extensibility that Python offers at runtime. It for example
is incapable of full AST inspection at runtime, and is not a truely dynamic
language but tries to emulate dynamism by marshalling data back and forth
between the Python interpreter and Mojo proper wherever such assignments are
necessary.
At a higher level, trying to fully emulate another language carries a lot of risk as the semantics of that other language may conflict with the concepts of static typing entirely either now or in the future. The other language is a project of its own with their own decision making processes and may not necessarily take into account compatibility with derivative programming languages. Emulating another language faithfully also means you inherit all the flaws of that language by definition and will have no chance to improve upon them.
JIT Compilers
A third group of languages provide dynamism through JIT (Just-in-Time)
compilation and therefore at the very least suffer from delays at runtime by
definition. The JIT compiler in these languages also may not produce the most
efficient code possible as there are timing constraints imposed on the
compiler. Numba
is a library that JIT compiles Python functions by adding a
decorator on the function. This comes with many limitations and does not work
for every function. Numba for example is not aware of the entire codebase just
the function passed to it and therefore cannot make inferences and deductions
that a proper compiler can. Julia
is a JIT compiled programming language that
offers full dynamism and has great performance. It can be demostrated however
that Julia does not produce the fastest code possible to take full advantage of
the hardware in certain situations. Julia also lacks a fully-fledged type
system limiting its usability to primarily the domain of numerical computing.
It has an unfamiliar syntax that is more suitable for mathematical notation
than software engineering. The JIT compiler in julia also comsumes an enormous
amount of memory at runtime. There are third party packages that attempt to
precompile Julia code and produce an executable to address some of the above
issues however at this time they do not have support for all Julia features and
are experimental. It is unclear if Julia's semantics even allow for the entire
language to be pre-compiled in the first place.