Julia ASTs

Julia ASTs

Julia has two representations of code. First there is a surface syntax AST returned by the parser (e.g. the parse() function), and manipulated by macros. It is a structured representation of code as it is written, constructed by julia-parser.scm from a character stream. Next there is a lowered form, or IR (intermediate representation), which is used by type inference and code generation. In the lowered form there are fewer types of nodes, all macros are expanded, and all control flow is converted to explicit branches and sequences of statements. The lowered form is constructed by julia-syntax.scm.

First we will focus on the lowered form, since it is more important to the compiler. It is also less obvious to the human, since it results from a significant rearrangement of the input syntax.

Lowered form

The following data types exist in lowered form:

Expr types

These symbols appear in the head field of Exprs in lowered form.

Method

A unique'd container describing the shared metadata for a single method.

MethodInstance

A unique'd container describing a single callable signature for a Method. See especially Proper maintenance and care of multi-threading locks for important details on how to modify these fields safely.

CodeInfo

A temporary container for holding lowered source code.

Boolean properties:

Surface syntax AST

Front end ASTs consist entirely of Exprs and atoms (e.g. symbols, numbers). There is generally a different expression head for each visually distinct syntactic form. Examples will be given in s-expression syntax. Each parenthesized list corresponds to an Expr, where the first element is the head. For example (call f x) corresponds to Expr(:call, :f, :x) in Julia.

Calls

InputAST
f(x)(call f x)
f(x, y=1, z=2)(call f x (kw y 1) (kw z 2))
f(x; y=1)(call f (parameters (kw y 1)) x)
f(x...)(call f (... x))

do syntax:

f(x) do a,b
    body
end

parses as (call f (-> (tuple a b) (block body)) x).

Operators

Most uses of operators are just function calls, so they are parsed with the head call. However some operators are special forms (not necessarily function calls), and in those cases the operator itself is the expression head. In julia-parser.scm these are referred to as "syntactic operators". Some operators (+ and *) use N-ary parsing; chained calls are parsed as a single N-argument call. Finally, chains of comparisons have their own special expression structure.

InputAST
x+y(call + x y)
a+b+c+d(call + a b c d)
2x(call * 2 x)
a&&b(&& a b)
x += 1(+= x 1)
a ? 1 : 2(if a 1 2)
a:b(: a b)
a:b:c(: a b c)
a,b(tuple a b)
a==b(call == a b)
1<i<=n(comparison 1 < i <= n)
a.b(. a (quote b))
a.(b)(. a b)

Bracketed forms

InputAST
a[i](ref a i)
t[i;j](typed_vcat t i j)
t[i j](typed_hcat t i j)
t[a b; c d](typed_vcat t (row a b) (row c d))
a{b}(curly a b)
a{b;c}(curly a (parameters c) b)
[x](vect x)
[x,y](vect x y)
[x;y](vcat x y)
[x y](hcat x y)
[x y; z t](vcat (row x y) (row z t))
[x for y in z, a in b](comprehension x (= y z) (= a b))
T[x for y in z](typed_comprehension T x (= y z))
(a, b, c)(tuple a b c)
(a; b; c)(block a (block b c))

Macros

InputAST
@m x y(macrocall @m x y)
Base.@m x y(macrocall (. Base (quote @m)) x y)
@Base.m x y(macrocall (. Base (quote @m)) x y)

Strings

InputAST
"a""a"
x"y"(macrocall @x_str "y")
x"y"z(macrocall @x_str "y" "z")
"x = $x"(string "x = " x)
`a b c`(macrocall @cmd "a b c")

Doc string syntax:

"some docs"
f(x) = x

parses as (macrocall (|.| Core '@doc) "some docs" (= (call f x) (block x))).

Imports and such

InputAST
import a(import a)
import a.b.c(import a b c)
import ...a(import . . . a)
import a.b, c.d(toplevel (import a b) (import c d))
import Base: x(import Base x)
import Base: x, y(toplevel (import Base x) (import Base y))
export a, b(export a b)

Numbers

Julia supports more number types than many scheme implementations, so not all numbers are represented directly as scheme numbers in the AST.

InputAST
11111111111111111111(macrocall @int128_str "11111111111111111111")
0xfffffffffffffffff(macrocall @uint128_str "0xfffffffffffffffff")
1111...many digits...(macrocall @big_str "1111....")

Block forms

A block of statements is parsed as (block stmt1 stmt2 ...).

If statement:

if a
    b
elseif c
    d
else e
    f
end

parses as:

(if a (block (line 2) b)
    (block (line 3) (if c (block (line 4) d)
                        (block (line 5) e (line 6) f))))

A while loop parses as (while condition body).

A for loop parses as (for (= var iter) body). If there is more than one iteration specification, they are parsed as a block: (for (block (= v1 iter1) (= v2 iter2)) body).

break and continue are parsed as 0-argument expressions (break) and (continue).

let is parsed as (let body (= var1 val1) (= var2 val2) ...).

A basic function definition is parsed as (function (call f x) body). A more complex example:

function f{T}(x::T; k = 1)
    return x+1
end

parses as:

(function (call (curly f T) (parameters (kw k 1))
                (:: x T))
          (block (line 2 file.jl) (return (call + x 1))))

Type definition:

mutable struct Foo{T<:S}
    x::T
end

parses as:

(type #t (curly Foo (<: T S))
      (block (line 2 none) (:: x T)))

The first argument is a boolean telling whether the type is mutable.

try blocks parse as (try try_block var catch_block finally_block). If no variable is present after catch, var is #f. If there is no finally clause, then the last argument is not present.