Julia ASTs

Julia ASTs

Julia has two representations of code. First there is a surface syntax AST returned by the parser (e.g. the Meta.parse function), and manipulated by macros. It is a structured representation of code as it is written, constructed by julia-parser.scm from a character stream. Next there is a lowered form, or IR (intermediate representation), which is used by type inference and code generation. In the lowered form there are fewer types of nodes, all macros are expanded, and all control flow is converted to explicit branches and sequences of statements. The lowered form is constructed by julia-syntax.scm.

First we will focus on the AST, since it is needed to write macros.

Surface syntax AST

Front end ASTs consist almost entirely of Exprs and atoms (e.g. symbols, numbers). There is generally a different expression head for each visually distinct syntactic form. Examples will be given in s-expression syntax. Each parenthesized list corresponds to an Expr, where the first element is the head. For example (call f x) corresponds to Expr(:call, :f, :x) in Julia.

Calls

InputAST
f(x)(call f x)
f(x, y=1, z=2)(call f x (kw y 1) (kw z 2))
f(x; y=1)(call f (parameters (kw y 1)) x)
f(x...)(call f (... x))

do syntax:

f(x) do a,b
    body
end

parses as (do (call f x) (-> (tuple a b) (block body))).

Operators

Most uses of operators are just function calls, so they are parsed with the head call. However some operators are special forms (not necessarily function calls), and in those cases the operator itself is the expression head. In julia-parser.scm these are referred to as "syntactic operators". Some operators (+ and *) use N-ary parsing; chained calls are parsed as a single N-argument call. Finally, chains of comparisons have their own special expression structure.

InputAST
x+y(call + x y)
a+b+c+d(call + a b c d)
2x(call * 2 x)
a&&b(&& a b)
x += 1(+= x 1)
a ? 1 : 2(if a 1 2)
a:b(: a b)
a:b:c(: a b c)
a,b(tuple a b)
a==b(call == a b)
1<i<=n(comparison 1 < i <= n)
a.b(. a (quote b))
a.(b)(. a b)

Bracketed forms

InputAST
a[i](ref a i)
t[i;j](typed_vcat t i j)
t[i j](typed_hcat t i j)
t[a b; c d](typed_vcat t (row a b) (row c d))
a{b}(curly a b)
a{b;c}(curly a (parameters c) b)
[x](vect x)
[x,y](vect x y)
[x;y](vcat x y)
[x y](hcat x y)
[x y; z t](vcat (row x y) (row z t))
[x for y in z, a in b](comprehension x (= y z) (= a b))
T[x for y in z](typed_comprehension T x (= y z))
(a, b, c)(tuple a b c)
(a; b; c)(block a (block b c))

Macros

InputAST
@m x y(macrocall @m (line) x y)
Base.@m x y(macrocall (. Base (quote @m)) (line) x y)
@Base.m x y(macrocall (. Base (quote @m)) (line) x y)

Strings

InputAST
"a""a"
x"y"(macrocall @x_str (line) "y")
x"y"z(macrocall @x_str (line) "y" "z")
"x = $x"(string "x = " x)
`a b c`(macrocall @cmd (line) "a b c")

Doc string syntax:

"some docs"
f(x) = x

parses as (macrocall (|.| Core '@doc) (line) "some docs" (= (call f x) (block x))).

Imports and such

InputAST
import a(import (. a))
import a.b.c(import (. a b c))
import ...a(import (. . . . a))
import a.b, c.d(import (. a b) (. c d))
import Base: x(import (: (. Base) (. x)))
import Base: x, y(import (: (. Base) (. x) (. y)))
export a, b(export a b)

using has the same representation as import, but with expression head :using instead of :import.

Numbers

Julia supports more number types than many scheme implementations, so not all numbers are represented directly as scheme numbers in the AST.

InputAST
11111111111111111111(macrocall @int128_str (null) "11111111111111111111")
0xfffffffffffffffff(macrocall @uint128_str (null) "0xfffffffffffffffff")
1111...many digits...(macrocall @big_str (null) "1111....")

Block forms

A block of statements is parsed as (block stmt1 stmt2 ...).

If statement:

if a
    b
elseif c
    d
else
    e
end

parses as:

(if a (block (line 2) b)
    (elseif (block (line 3) c) (block (line 4) d)
            (block (line 5 e))))

A while loop parses as (while condition body).

A for loop parses as (for (= var iter) body). If there is more than one iteration specification, they are parsed as a block: (for (block (= v1 iter1) (= v2 iter2)) body).

break and continue are parsed as 0-argument expressions (break) and (continue).

let is parsed as (let (= var val) body) or (let (block (= var1 val1) (= var2 val2) ...) body), like for loops.

A basic function definition is parsed as (function (call f x) body). A more complex example:

function f(x::T; k = 1) where T
    return x+1
end

parses as:

(function (where (call f (parameters (kw k 1))
                       (:: x T))
                 T)
          (block (line 2) (return (call + x 1))))

Type definition:

mutable struct Foo{T<:S}
    x::T
end

parses as:

(struct true (curly Foo (<: T S))
        (block (line 2) (:: x T)))

The first argument is a boolean telling whether the type is mutable.

try blocks parse as (try try_block var catch_block finally_block). If no variable is present after catch, var is #f. If there is no finally clause, then the last argument is not present.

Quote expressions

Julia source syntax forms for code quoting (quote and :( )) support interpolation with $. In Lisp terminology, this means they are actually "backquote" or "quasiquote" forms. Internally, there is also a need for code quoting without interpolation. In Julia's scheme code, non-interpolating quote is represented with the expression head inert.

inert expressions are converted to Julia QuoteNode objects. These objects wrap a single value of any type, and when evaluated simply return that value.

A quote expression whose argument is an atom also gets converted to a QuoteNode.

Line numbers

Source location information is represented as (line line_num file_name) where the third component is optional (and omitted when the current line number, but not file name, changes).

These expressions are represented as LineNumberNodes in Julia.

Macros

Macro hygiene is represented through the expression head pair escape and hygienic-scope. The result of a macro expansion is automatically wrapped in (hygienic-scope block module), to represent the result of the new scope. The user can insert (escape block) inside to interpolate code from the caller.

Lowered form

Lowered form (IR) is more important to the compiler, since it is used for type inference, optimizations like inlining, and code generation. It is also less obvious to the human, since it results from a significant rearrangement of the input syntax.

In addition to Symbols and some number types, the following data types exist in lowered form:

Expr types

These symbols appear in the head field of Exprs in lowered form.

Method

A unique'd container describing the shared metadata for a single method.

MethodInstance

A unique'd container describing a single callable signature for a Method. See especially Proper maintenance and care of multi-threading locks for important details on how to modify these fields safely.

CodeInstance

CodeInfo

A (usually temporary) container for holding lowered source code.

Optional Fields:

Boolean properties: