Julia ASTs¶
Julia has two AST representations. First there is a surface syntax AST returned
by the parser (e.g. the parse()
function), and manipulated by macros.
It is a structured representation of code as it is written,
constructed by julia-parser.scm
from a character stream.
Next there is a lowered AST which is used by type inference and code generation.
In the lowered form, there are generally fewer types of nodes, all macros
are expanded, and all control flow is converted to explicit branches and sequences
of statements. The lowered form is constructed by julia-syntax.scm
.
First we will focus on the lowered form, since it is more important to the compiler. It is also less obvious to the human, since it results from a significant rearrangement of the input syntax.
Lowered form¶
The following data types exist in lowered ASTs:
Expr
- has a node type indicated by the
head
field, and anargs
field which is aVector{Any}
of subexpressions. Symbol
- used to name local variables and static parameters within a function.
LambdaStaticData
- wraps the AST of each function, including inner functions.
LineNumberNode
- line number metadata
LabelNode
- branch target, a consecutively-numbered integer starting at 0
GotoNode
- unconditional branch
QuoteNode
- wraps an arbitrary value to reference as data. For example, the function
f()=:a
contains aQuoteNode
whosevalue
field is the symbola
, in order to return the symbol itself instead of evaluating it. GlobalRef
- refers to global variable
name
in modulemod
TopNode
- forces a name to be resolved as a global in Base. This is now mostly
redundant with
GlobalRef(Base,:x)
. SymbolNode
- used to annotate a local variable with a type
GenSym
- refers to a consecutively-numbered (starting at 0) static single assignment (SSA) variable inserted by the compiler.
NewvarNode
- marks a point where a closed variable needs to have a new box allocated.
Expr types¶
These symbols appear in the head
field of Expr
s in lowered form.
call
- function call.
args[1]
is the function to call,args[2:end]
are the arguments. line
- line number and file name metadata. Unlike a
LineNumberNode
, can also contain a file name. gotoifnot
- conditional branch. If
args[1]
is false, goes to label identified inargs[2]
. =
- assignment
method
adds a method to a generic function and assigns the result if necessary.
args[1]
- function name (symbol), or aGlobalRef
, or anExpr
with headkw
. In the(kwf)
case, the method is actually a keyword argument sorting function forf
. It will be stored instead ingeneric_function->env->kwsorter
.If
method
has only one argument, it corresponds to the formfunctionfooend
and only creates a function without adding any methods.args[2]
- aSimpleVector
of argument type data.args[2][1]
is aTupletype
of the argument types, andargs[2][2]
is aSimpleVector
of type variables corresponding to the method’s static parameters.args[3]
- aLambdaStaticData
of the method itself.args[4]
-true
orfalse
, identifying whether the method is staged (@generatedfunction
)const
- declares a (global) variable as constant
null
- has no arguments; simply yields the value
nothing
static_typeof
- a horrible misfeature used to determine the result type of array comprehensions. Planned to be removed.
type_goto
- a virtual control flow edge used to convey type data to
static_typeof
, also to be removed. new
- allocates a new struct-like object. First argument is the type. The
new
pseudo-function is lowered to this, and the type is always inserted by the compiler. This is very much an internal-only feature, and does no checking. Evaluating arbitrarynew
expressions can easily segfault. return
- returns its argument as the value of the enclosing function.
the_exception
- yields the caught exception inside a
catch
block. This is the value of the run time system variablejl_exception_in_transit
. enter
- enters an exception handler (
setjmp
).args[1]
is the label of the catch block to jump to on error. leave
- pop exception handlers.
args[1]
is the number of handlers to pop. boundscheck
- controls turning bounds checks on or off. A stack is maintained; if the
first argument of this expression is true or false (
true
means bounds checks are enabled), it is pushed onto the stack. If the first argument is:pop
, the stack is popped. copyast
- part of the implementation of quasi-quote. The argument is a surface syntax AST that is simply copied recursively and returned at run time.
meta
- metadata. Currently used for inlining hints, represented by the symbols
:inline
and:noinline
.
LambdaStaticData¶
Has an ->ast
field pointing to an Expr
with head lambda
. This
Expr
has the following layout:
args[1]
Vector{Any}
of argument name symbols. For varargs functions, the last element is actually anExpr
with head...
. The argument of thisExpr
is anExpr
with head::
. The first argument of::
is a symbol (the argument name), and the second argument is a type declaration.args[2]
A
Vector{Any}
with variable information:args[2][1]
- An array of 3-elementvarinfo
arrays, one for each argument or local variable. Avarinfo
array has the formAny[:name,type,bits]
. Thebits
field is an integer describing variable properties as follows: - 1 - captured (closed over) - 2 - assigned (only false if there are no assignment statements with this var on the left) - 4 - assigned by an inner function - 8 - const (currently unused for local variables) - 16 - statically assigned once - 32 - might be used before assigned. This flag is only valid after type inference.args[2][2]
- An array ofvarinfo
triples for each outer variable this function captures.args[2][3]
- The types of variables represented byGenSym
objects. GivenGenSym
g
, its type will be atargs[2][3][g.id+1]
.args[2][4]
- The names (symbols) of static parameters.args[3]
- an
Expr
with headbody
whose arguments are the statements comprising the function body.
Surface syntax AST¶
Front end ASTs consist entirely of Expr
s and atoms (e.g. symbols, numbers).
There is generally a different expression head for each visually distinct
syntactic form.
Examples will be given in s-expression syntax. Each parenthesized list corresponds
to an Expr, where the first element is the head.
For example (callfx)
corresponds to Expr(:call,:f,:x)
in Julia.
Calls¶
Input | AST |
---|---|
f(x) | (call f x) |
f(x, y=1, z=2) | (call f x (kw y 1) (kw z 2)) |
f(x; y=1) | (call f (parameters (kw y 1)) x) |
f(x...) | (call f (... x)) |
Do
syntax:
f(x)doa,bbodyend
parses as (callf(->(tupleab)(blockbody))x)
.
Operators¶
Most uses of operators are just function calls, so they are parsed with the
head call
.
However some operators are special forms (not necessarily function calls),
and in those cases the operator itself is the expression head.
In julia-parser.scm these are referred to as “syntactic operators”.
Some operators (+
and *
) use N-ary parsing; chained calls are parsed as
a single N-argument call.
Finally, chains of comparisons have their own special expression structure.
Input | AST |
---|---|
x+y | (call + x y) |
a+b+c+d | (call + a b c d) |
2x | (call * 2 x) |
a&&b | (&& a b) |
x += 1 | (+= x 1) |
a ? 1 : 2 | (if a 1 2) |
a:b | (: a b) |
a:b:c | (: a b c) |
a,b | (tuple a b) |
a==b | (comparison a == b) |
1<i<=n | (comparison 1 < i <= n) |
a.b | (. a (quote b)) |
a.(b) | (. a b) |
Bracketed forms¶
Input | AST |
---|---|
a[i] | (ref a i) |
t[i;j] | (typed_vcat t i j) |
t[i j] | (typed_hcat t i j) |
t[a b; c d] | (typed_vcat t (row a b) (row c d)) |
a{b} | (curly a b) |
a{b;c} | (curly a (parameters c) b) |
[x] | (vect x) |
[x,y] | (vect x y) |
[x;y] | (vcat x y) |
[x y] | (hcat x y) |
[x y; z t] | (vcat (row x y) (row z t)) |
[x for y in z, a in b] | (comprehension x (= y z) (= a b)) |
T[x for y in z] | (typed_comprehension T x (= y z)) |
[a=>b for x in y] | (dict_comprehension (=> a b) (= x y)) |
(k=>v)[a=>b for x in y] | (typed_dict_comprehension (=> k v) (=> a b) (= x y)) |
(a, b, c) | (tuple a b c) |
(a; b; c) | (block a (block b c)) |
Macros¶
Input | AST |
---|---|
@m x y | (macrocall @m x y) |
Base.@m x y | (macrocall (. Base (quote @m)) x y) |
@Base.m x y | (macrocall (. Base (quote @m)) x y) |
Strings¶
Input | AST |
---|---|
“a” | “a” |
x”y” | (macrocall @x_str “y”) |
x”y”z | (macrocall @x_str “y” “z”) |
“x = $x” | (string “x = ” x) |
`a b c` | (macrocall @cmd “a b c”) |
x ~ distr | (macrocall @~ x distr) |
Doc string syntax:
"some docs"f(x)=x
parses as (macrocall(|.|Base'@doc)"somedocs"(=(callfx)(blockx)))
Imports and such¶
Input | AST |
---|---|
import a | (import a) |
import a.b.c | (import a b c) |
import ...a | (import . . . a) |
import a.b, c.d | (toplevel (import a b) (import c d)) |
import Base: x | (import Base x) |
import Base: x, y | (toplevel (import Base x) (import Base y)) |
export a, b | (export a b) |
Numbers¶
Julia supports more number types than many scheme implementations, so not all numbers are represented directly as scheme numbers in the AST.
Input | AST |
---|---|
11111111111111111111 | (macrocall @int128_str “11111111111111111111”) |
0xfffffffffffffffff | (macrocall @uint128_str “0xfffffffffffffffff”) |
1111...many digits... | (macrocall @big_str “1111....”) |
Block forms¶
A block of statements is parsed as (blockstmt1stmt2...)
.
If statement:
ifabelseifcdelseefend
parses as:
(ifa(block(line2)b)(block(line3)(ifc(block(line4)d)(block(line5)e(line6)f))))
A while
loop parses as (whileconditionbody)
.
A for
loop parses as (for(=variter)body)
.
If there is more than one iteration specification, they are parsed as a block:
(for(block(=v1iter1)(=v2iter2))body)
.
break
and continue
are parsed as 0-argument expressions
(break)
and (continue)
.
let
is parsed as (letbody(=var1val1)(=var2val2)...)
.
A basic function definition is parsed as (function(callfx)body)
.
A more complex example:
function f{T}(x::T;k=1)returnx+1end
parses as:
(function(call(curlyfT)(parameters(kwk1))(::xT))(block(line2file.jl)(return(call+x1))))
Type definition:
type Foo{T<:S}x::Tend
parses as:
(type #t (curly Foo (<: T S))
(block (line 2 none) (:: x T)))
The first argument is a boolean telling whether the type is mutable.
try
blocks parse as (trytry_blockvarcatch_blockfinally_block)
.
If no variable is present after catch
, var
is #f
.
If there is no finally
clause, then the last argument is not present.