Julia ASTs¶
Julia has two representations of code. First there is a surface syntax AST returned
by the parser (e.g. the parse() function), and manipulated by macros.
It is a structured representation of code as it is written,
constructed by julia-parser.scm from a character stream.
Next there is a lowered form, or IR (intermediate representation), which is used by
type inference and code generation.
In the lowered form there are fewer types of nodes, all macros
are expanded, and all control flow is converted to explicit branches and sequences
of statements. The lowered form is constructed by julia-syntax.scm.
First we will focus on the lowered form, since it is more important to the compiler. It is also less obvious to the human, since it results from a significant rearrangement of the input syntax.
Lowered form¶
The following data types exist in lowered form:
Expr- has a node type indicated by the
headfield, and anargsfield which is aVector{Any}of subexpressions. Slot- identifies arguments and local variables by consecutive numbering.
Slotis an abstract type with subtypesSlotNumberandTypedSlot. Both types have an integer-valuedidfield giving the slot index. Most slots have the same type at all uses, and so are represented withSlotNumber. The types of these slots are found in theslottypesfield of theirLambdaInfoobject. Slots that require per-use type annotations are represented withTypedSlot, which has atypfield. LambdaInfo- wraps the IR of each method.
LineNumberNode- contains a single number, specifying the line number the next statement came from.
LabelNode- branch target, a consecutively-numbered integer starting at 0
GotoNode- unconditional branch
QuoteNode- wraps an arbitrary value to reference as data. For example, the function
f()=:acontains aQuoteNodewhosevaluefield is the symbola, in order to return the symbol itself instead of evaluating it. GlobalRef- refers to global variable
namein modulemod SSAValue- refers to a consecutively-numbered (starting at 0) static single assignment (SSA) variable inserted by the compiler.
NewvarNode- Marks a point where a variable is created. This has the effect of resetting a variable to undefined.
Expr types¶
These symbols appear in the head field of Exprs in lowered form.
call- function call (dynamic dispatch).
args[1]is the function to call,args[2:end]are the arguments. invoke- function call (static dispatch).
args[1]is the LambdaInfo to call,args[2:end]are the arguments (including the function that is being called, atargs[2]). static_parameter- reference a static parameter by index.
line- line number and file name metadata. Unlike a
LineNumberNode, can also contain a file name. gotoifnot- conditional branch. If
args[1]is false, goes to label identified inargs[2]. =- assignment
methodadds a method to a generic function and assigns the result if necessary.
Has a 1-argument form and a 4-argument form. The 1-argument form arises from the syntax
functionfooend. In the 1-argument form, the argument is a symbol. If this symbol already names a function in the current scope, nothing happens. If the symbol is undefined, a new function is created and assigned to the identifier specified by the symbol. If the symbol is defined but names a non-function, an error is raised. The definition of “names a function” is that the binding is constant, and refers to an object of singleton type. The rationale for this is that an instance of a singleton type uniquely identifies the type to add the method to. When the type has fields, it wouldn’t be clear whether the method was being added to the instance or its type.The 4-argument form has the following arguments:
args[1]- A function name, orfalseif unknown. If a symbol, then the expression first behaves like the 1-argument form above. This argument is ignored from then on. When this isfalse, it means a method is being added strictly by type,(::T)(x)=x.args[2]- aSimpleVectorof argument type data.args[2][1]is aTupletype of the argument types, andargs[2][2]is aSimpleVectorof type variables corresponding to the method’s static parameters.args[3]- aLambdaInfoof the method itself. For “out of scope” method definitions (adding a method to a function that also has methods defined in different scopes) this is an expression that evaluates to a:lambdaexpression.args[4]-trueorfalse, identifying whether the method is staged (@generatedfunction)const- declares a (global) variable as constant
null- has no arguments; simply yields the value
nothing new- allocates a new struct-like object. First argument is the type. The
newpseudo-function is lowered to this, and the type is always inserted by the compiler. This is very much an internal-only feature, and does no checking. Evaluating arbitrarynewexpressions can easily segfault. return- returns its argument as the value of the enclosing function.
the_exception- yields the caught exception inside a
catchblock. This is the value of the run time system variablejl_exception_in_transit. enter- enters an exception handler (
setjmp).args[1]is the label of the catch block to jump to on error. leave- pop exception handlers.
args[1]is the number of handlers to pop. inbounds- controls turning bounds checks on or off. A stack is maintained; if the
first argument of this expression is true or false (
truemeans bounds checks are disabled), it is pushed onto the stack. If the first argument is:pop, the stack is popped. boundscheck- indicates the beginning or end of a section of code that performs a bounds
check. Like
inbounds, a stack is maintained, and the second argument can be one of:true,false, or:pop. copyast- part of the implementation of quasi-quote. The argument is a surface syntax AST that is simply copied recursively and returned at run time.
metametadata.
args[1]is typically a symbol specifying the kind of metadata, and the rest of the arguments are free-form. The following kinds of metadata are commonly used::inlineand:noinline: Inlining hints.:push_loc: enters a sequence of statements from a specified source location.args[2]specifies a filename, as a symbol.args[3]optionally specifies the name of an (inlined) function that originally contained the code.
:pop_loc: returns to the source location before the matching:push_loc.
LambdaInfo¶
sparam_syms - The names (symbols) of static parameters.
sparam_vals - The values of the static parameters (once known), indexed by sparam_syms.
code - An Any array of statements, or a UInt8 array with a compressed representation of the code.
slotnames - An array of symbols giving the name of each slot (argument or local variable).
slottypes - An array of types for the slots.
slotflags- A UInt8 array of slot properties, represented as bit flags:- 1 - captured (closed over)
- 2 - assigned (only false if there are no assignment statements with this var on the left)
- 4 - assigned by an inner function
- 8 - const (currently unused for local variables)
- 16 - statically assigned once
- 32 - might be used before assigned. This flag is only valid after type inference.
ssavaluetypes- Either an array or an Int giving the number of compiler-inserted- temporary locations in the function. If an array, specifies a type for each location.
nargs- The number of argument slots. The firstnargsentries of the slots- arrays refer to arguments.
isva - A boolean indicating whether the function is variadic.
Surface syntax AST¶
Front end ASTs consist entirely of Exprs and atoms (e.g. symbols, numbers).
There is generally a different expression head for each visually distinct
syntactic form.
Examples will be given in s-expression syntax. Each parenthesized list corresponds
to an Expr, where the first element is the head.
For example (callfx) corresponds to Expr(:call,:f,:x) in Julia.
Calls¶
| Input | AST |
|---|---|
f(x) | (callfx) |
f(x,y=1,z=2) | (callfx(kwy1)(kwz2)) |
f(x;y=1) | (callf(parameters(kwy1))x) |
f(x...) | (callf(...x)) |
do syntax:
f(x)doa,bbodyendparses as (callf(->(tupleab)(blockbody))x).
Operators¶
Most uses of operators are just function calls, so they are parsed with the
head call.
However some operators are special forms (not necessarily function calls),
and in those cases the operator itself is the expression head.
In julia-parser.scm these are referred to as “syntactic operators”.
Some operators (+ and *) use N-ary parsing; chained calls are parsed as
a single N-argument call.
Finally, chains of comparisons have their own special expression structure.
| Input | AST |
|---|---|
x+y | (call+xy) |
a+b+c+d | (call+abcd) |
2x | (call*2x) |
a&&b | (&&ab) |
x+=1 | (+=x1) |
a?1:2 | (ifa12) |
a:b | (:ab) |
a:b:c | (:abc) |
a,b | (tupleab) |
a==b | (call==ab) |
1<i<=n | (comparison1<i<=n) |
a.b | (.a(quoteb)) |
a.(b) | (.ab) |
Bracketed forms¶
| Input | AST |
|---|---|
a[i] | (refai) |
t[i;j] | (typed_vcattij) |
t[ij] | (typed_hcattij) |
t[ab;cd] | (typed_vcatt(rowab)(rowcd)) |
a{b} | (curlyab) |
a{b;c} | (curlya(parametersc)b) |
[x] | (vectx) |
[x,y] | (vectxy) |
[x;y] | (vcatxy) |
[xy] | (hcatxy) |
[xy;zt] | (vcat(rowxy)(rowzt)) |
[xforyinz,ainb] | (comprehensionx(=yz)(=ab)) |
T[xforyinz] | (typed_comprehensionTx(=yz)) |
(a,b,c) | (tupleabc) |
(a;b;c) | (blocka(blockbc)) |
Macros¶
| Input | AST |
|---|---|
@mxy | (macrocall@mxy) |
Base.@mxy | (macrocall(.Base(quote@m))xy) |
@Base.mxy | (macrocall(.Base(quote@m))xy) |
Strings¶
| Input | AST |
|---|---|
"a" | "a" |
x"y" | (macrocall@x_str"y") |
x"y"z | (macrocall@x_str"y""z") |
"x=$x" | (string"x="x) |
`abc` | (macrocall@cmd"abc") |
x~distr | (macrocall@~xdistr) |
Doc string syntax:
"some docs"f(x)=xparses as (macrocall(|.|Core'@doc)"somedocs"(=(callfx)(blockx))).
Imports and such¶
| Input | AST |
|---|---|
importa | (importa) |
importa.b.c | (importabc) |
import...a | (import...a) |
importa.b,c.d | (toplevel(importab)(importcd)) |
importBase:x | (importBasex) |
importBase:x,y | (toplevel(importBasex)(importBasey)) |
exporta,b | (exportab) |
Numbers¶
Julia supports more number types than many scheme implementations, so not all numbers are represented directly as scheme numbers in the AST.
| Input | AST |
|---|---|
11111111111111111111 | (macrocall@int128_str"11111111111111111111") |
0xfffffffffffffffff | (macrocall@uint128_str"0xfffffffffffffffff") |
1111...manydigits... | (macrocall@big_str"1111....") |
Block forms¶
A block of statements is parsed as (blockstmt1stmt2...).
If statement:
ifabelseifcdelseefendparses as:
(ifa(block(line2)b)(block(line3)(ifc(block(line4)d)(block(line5)e(line6)f))))A while loop parses as (whileconditionbody).
A for loop parses as (for(=variter)body).
If there is more than one iteration specification, they are parsed as a block:
(for(block(=v1iter1)(=v2iter2))body).
break and continue are parsed as 0-argument expressions
(break) and (continue).
let is parsed as (letbody(=var1val1)(=var2val2)...).
A basic function definition is parsed as (function(callfx)body).
A more complex example:
function f{T}(x::T;k=1)returnx+1endparses as:
(function(call(curlyfT)(parameters(kwk1))(::xT))(block(line2file.jl)(return(call+x1))))Type definition:
type Foo{T<:S}x::Tendparses as:
(type #t (curly Foo (<: T S))
(block (line 2 none) (:: x T)))
The first argument is a boolean telling whether the type is mutable.
try blocks parse as (trytry_blockvarcatch_blockfinally_block).
If no variable is present after catch, var is #f.
If there is no finally clause, then the last argument is not present.