Introduction
This book is the primary reference for Oxur, a Lisp syntax for the Rust programming language. In contrast to the Rust reference, it only provides one type of material: the syntax that informally describes each language construct and its use.
This book's content was originally based upon the Rust Reference (source code) and it attempts to track changes there, as they impact Oxur.
Warning: This book is incomplete. Documenting everything takes a while. See the GitHub issues for what is not documented in this book.
What The Reference is Not
This book does not serve as an introduction to the Rust, Lisp, or Oxur languages. Background familiarity with Rust and Lisp is assumed. A separate book will be made available for learning Oxur itself, and thus help those interested in acquiring such background familiarity.
How to Use This Book
This book does not assume you are reading this book sequentially. Each chapter generally can be read standalone, but will cross-link to other chapters for facets of the language they refer to, but do not discuss.
There are two main ways to read this document.
The first is to answer a specific question. If you know which chapter answers
that question, you can jump to that chapter in the table of contents. Otherwise,
you can press s
or the click the magnifying glass on the top bar to search for
keywords related to your question. For example, say you wanted to know when a
temporary value created in a let statement is dropped. If you didn't already
know that the lifetime of temporaries is defined in the expressions chapter,
you could search "temporary let" and the first search result will take you to
that section.
The second is to generally improve your knowledge of a facet of the language. In that case, just browse the table of contents until you see something you want to know more about, and just start reading. If a link looks interesting, click it, and read about that section.
That said, there is no wrong way to read this book. Read it however you feel helps you best.
Conventions
Like all technical books, this book has certain conventions in how it displays information. These conventions are documented here.
-
Statements that define a term contain that term in italics. Whenever that term is used outside of that chapter, it is usually a link to the section that has this definition.
An example term is an example of a term being defined.
-
Notes that contain useful information about the state of the book or point out useful, but mostly out of scope, information are in blockquotes that start with the word "Note:" in bold.
Note: This is an example note.
-
Warnings that show unsound behavior in the language or possibly confusing interactions of language features are in a special warning box.
Warning: This is an example warning.
-
Code snippets inline in the text are inside
<code>
tags.Longer code examples are in a syntax highlighted box that has controls for copying, executing, and showing hidden lines in the top right corner.
# ;; This is a hidden line. (fn main () (println! "This is a code example"))
-
The grammar and lexical structure is in blockquotes with either "Lexer" or "Syntax" in bold superscript as the first line.
Syntax
ExampleGrammar:
~
Expression
|box
ExpressionSee Notation for more detail.
Contributing
We welcome contributions of all kinds.
You can contribute to this book by opening an issue or sending a pull
request to the Oxur Specification repository. If this book does not answer
your question, and you think its answer is in scope of it, please do not
hesitate to file an issue or ask about it in the #oxur-spec
channel on
Slack (here's an invite to the Slack workspace). Knowing what people use
this book for the most helps direct our attention to making those sections
the best that they can be.
Notation
Grammar
The following notations are used by the Lexer and Syntax grammar snippets:
Notation | Examples | Meaning |
---|---|---|
CAPITAL | KW_IF, INTEGER_LITERAL | A token produced by the lexer |
ItalicCamelCase | LetStatement, Item | A syntactical production |
string | x , while , * | The exact character(s) |
\x | \n, \r, \t, \0 | The character represented by this escape |
x? | pub ? | An optional item |
x* | OuterAttribute* | 0 or more of x |
x+ | MacroMatch+ | 1 or more of x |
xa..b | HEX_DIGIT1..6 | a to b repetitions of x |
| | u8 | u16 , Block | Item | Either one or another |
[ ] | [b B ] | Any of the characters listed |
[ - ] | [a -z ] | Any of the characters in the range |
~[ ] | ~[b B ] | Any characters, except those listed |
~string | ~\n , ~*/ | Any characters, except this sequence |
( ) | (, Parameter)? | Groups items |
String table productions
Some rules in the grammar — notably unary operators, binary operators, and keywords — are given in a simplified form: as a listing of printable strings. These cases form a subset of the rules regarding the token rule, and are assumed to be the result of a lexical-analysis phase feeding the parser, driven by a DFA, operating over the disjunction of all such string table entries.
When such a string in monospace
font occurs inside the grammar,
it is an implicit reference to a single member of such a string table
production. See tokens for more information.
Lexical structure
Input format
Oxur input is interpreted as a sequence of Unicode code points encoded in UTF-8.
Keywords
Rust divides keywords into three categories:
Strict keywords
These keywords can only be used in their correct contexts. They cannot be used as the names of:
- Items
- Variables and function parameters
- Fields and variants
- Type parameters
- Lifetime parameters or loop labels
- Macros or attributes
- Macro placeholders
- Crates
Lexer:
KW_AS :as
KW_BREAK :break
KW_CONST :const
KW_CONTINUE :continue
KW_CRATE :crate
KW_ELSE :else
KW_ENUM :enum
KW_EXTERN :extern
KW_FALSE :false
KW_FN :fn
KW_FOR :for
KW_IF :if
KW_IMPL :impl
KW_IN :in
KW_LET :let
KW_LOOP :loop
KW_MATCH :match
KW_MOD :mod
KW_MOVE :move
KW_MUT :mut
KW_PUB :pub
KW_REF :ref
KW_RETURN :return
KW_SELFVALUE :self
KW_SELFTYPE :Self
KW_STATIC :static
KW_STRUCT :struct
KW_SUPER :super
KW_TRAIT :trait
KW_TRUE :true
KW_TYPE :type
KW_UNSAFE :unsafe
KW_USE :use
KW_WHERE :where
KW_WHILE :while
The following keywords were added beginning in the 2018 edition.
Lexer 2018+
KW_ASYNC :async
KW_AWAIT :await
KW_DYN :dyn
Reserved keywords
These keywords aren't used yet, but they are reserved for future use. They have the same restrictions as strict keywords. The reasoning behind this is to make current programs forward compatible with future versions of Rust by forbidding them to use these keywords.
Lexer
KW_ABSTRACT :abstract
KW_BECOME :become
KW_BOX :box
KW_DO :do
KW_FINAL :final
KW_MACRO :macro
KW_OVERRIDE :override
KW_PRIV :priv
KW_TYPEOF :typeof
KW_UNSIZED :unsized
KW_VIRTUAL :virtual
KW_YIELD :yield
The following keywords are reserved beginning in the 2018 edition.
Lexer 2018+
KW_TRY :try
Weak keywords
These keywords have special meaning only in certain contexts. For example, it
is possible to declare a variable or method with the name union
.
-
union
is used to declare a union and is only a keyword when used in a union declaration. -
'static
is used for the static lifetime and cannot be used as a generic lifetime parameter;; error[E0262]: invalid lifetime parameter name: `'static` (fn invalid_lifetime_parameter <'static> (s: &'static str) -> &'static str s)
-
In the 2015 edition,
dyn
is a keyword when used in a type position followed by a path that does not start with::
.Beginning in the 2018 edition,
dyn
has been promoted to a strict keyword.
Lexer
KW_UNION :union
KW_STATICLIFETIME :'static
Lexer 2015
KW_DYN :dyn
Identifiers
Lexer:
IDENTIFIER_OR_KEYWORD :
[a
-z
A
-Z
] [a
-z
A
-Z
0
-9
_
]*
|_
[a
-z
A
-Z
0
-9
_
]+RAW_IDENTIFIER :
r#
IDENTIFIER_OR_KEYWORD Exceptcrate
,self
,super
,Self
NON_KEYWORD_IDENTIFIER : IDENTIFIER_OR_KEYWORD Except a strict or reserved keyword
IDENTIFIER :
NON_KEYWORD_IDENTIFIER | RAW_IDENTIFIER
An identifier is any nonempty ASCII string of the following form:
Either
- The first character is a letter.
- The remaining characters are alphanumeric or
_
.
Or
- The first character is
_
. - The identifier is more than one character.
_
alone is not an identifier. - The remaining characters are alphanumeric or
_
.
A raw identifier is like a normal identifier, but prefixed by r#
. (Note that
the r#
prefix is not included as part of the actual identifier.)
Unlike a normal identifier, a raw identifier may be any strict or reserved
keyword except the ones listed above for RAW_IDENTIFIER
.
Comments
Lexer
LINE_COMMENT :
;;
(~[;
!
] |;;
) ~\n
*
|;;
BLOCK_COMMENT :
#|
(~[#
|
] |||
| BlockCommentOrDoc) (BlockCommentOrDoc | ~|#
)*|#
|#||#
|#|||#
INNER_LINE_DOC :
;;!
~[\n
IsolatedCR]*INNER_BLOCK_DOC :
#|!
( BlockCommentOrDoc | ~[|#
IsolatedCR] )*|#
OUTER_LINE_DOC :
;;;
(~/
~[\n
IsolatedCR]*)?OUTER_BLOCK_DOC :
#||
(~|
| BlockCommentOrDoc ) (BlockCommentOrDoc | ~[|#
IsolatedCR])*|#
BlockCommentOrDoc :
BLOCK_COMMENT
| OUTER_BLOCK_DOC
| INNER_BLOCK_DOCIsolatedCR :
A\r
not followed by a\n
Non-doc comments
The syntax of comments in Oxur follow the general Common Lisp style, but the meaning is not the same. Rust has a specific set up line and block comments serve different purposes. Oxur comments follow that closely.
Doc comments
Line doc comments beginning with exactly three semi-colons (;;;
), and block
doc comments (#|| ... |#
), both inner doc comments, are interpreted as a
special syntax for doc
attributes. That is, they are equivalent to writing
#[doc "..."]
around the body of the comment, i.e., ;;; Foo
turns into
#[doc "Foo"]
and #|| Bar |#
turns into #[doc "Bar"]
.
Line comments beginning with ;;!
and block comments #|! ... |#
are
doc comments that apply to the parent of the comment, rather than the item
that follows. That is, they are equivalent to writing #![doc "..."]
around
the body of the comment. ;;!
comments are usually used to document
modules that occupy a source file.
Isolated CRs (\r
), i.e. not followed by LF (\n
), are not allowed in doc
comments.
Examples
;;! A doc comment that applies to the implicit anonymous module of this crate
(pub mod outer_module
;;! - Inner line doc
;;!! - Still an inner line doc (but with a bang at the beginning)
#|! - Inner block doc |#
#|!! - Still an inner block doc (but with a bang at the beginning) |#
;; - Only a comment
;;; - Outer line doc (exactly 3 slashes)
;;;; - Only a comment
#| - Only a comment |#
#|| - Outer block doc (exactly) 2 asterisks |#
#||| - Only a comment |#
(pub mod inner_module)
(pub mod nested_comments
#| In Oxur #| we can #| nest comments |# |# |#
;; All three types of block comments can contain or be nested inside
;; any other type:
#| #| |# #|| |# #|! |# |#
#|! #| |# #|| |# #|! |# |#
#||! #| |# #|| |# #|! |# |#
(pub mod dummy_item)
}
(pub mod degenerate_cases
;; empty inner line doc
;;!
;; empty inner block doc
#|!|#
;; empty line comment
;;
;; empty outer line doc
;;;
;; empty block comment
#||#
(pub mod dummy_item)
// empty 2-asterisk block isn't a doc block, it is a block comment
#|||#
}
#| The next one isn't allowed because outer doc comments
require an item that will receive the doc |#
;;; Where is my item?
}
Whitespace
Whitespace is any non-empty string containing only characters that have the
Pattern_White_Space
Unicode property, namely:
U+0009
(horizontal tab,'\t'
)U+000A
(line feed,'\n'
)U+000B
(vertical tab)U+000C
(form feed)U+000D
(carriage return,'\r'
)U+0020
(space,' '
)U+0085
(next line)U+200E
(left-to-right mark)U+200F
(right-to-left mark)U+2028
(line separator)U+2029
(paragraph separator)
Oxur is a "free-form" language, meaning that all forms of whitespace serve only to separate tokens in the grammar, and have no semantic significance.
An Oxur program has identical meaning if each whitespace element is replaced with any other legal whitespace element, such as a single space character.
Tokens
Tokens are primitive productions in the grammar defined by regular (non-recursive) languages. Oxur source input can be broken down into the following kinds of tokens:
Within this documentation's grammar, "simple" tokens are given in string
table production form, and appear in monospace
font.
Literals
A literal is an expression consisting of a single token, rather than a sequence of tokens, that immediately and directly denotes the value it evaluates to, rather than referring to it by name or some other evaluation rule. A literal is a form of constant expression, so is evaluated (primarily) at compile time.
Examples
Characters and strings
Example | # sets | Characters | Escapes | |
---|---|---|---|---|
Character | 'H' | 0 | All Unicode | Quote & ASCII & Unicode |
String | "hello" | 0 | All Unicode | Quote & ASCII & Unicode |
Raw string | r#"hello"# | 0 or more* | All Unicode | N/A |
Byte | b'H' | 0 | All ASCII | Quote & Byte |
Byte string | b"hello" | 0 | All ASCII | Quote & Byte |
Raw byte string | br#"hello"# | 0 or more* | All ASCII | N/A |
* The number of #
s on each side of the same literal must be equivalent
ASCII escapes
Name | |
---|---|
\x41 | 7-bit character code (exactly 2 digits, up to 0x7F) |
\n | Newline |
\r | Carriage return |
\t | Tab |
\\ | Backslash |
\0 | Null |
Byte escapes
Name | |
---|---|
\x7F | 8-bit character code (exactly 2 digits) |
\n | Newline |
\r | Carriage return |
\t | Tab |
\\ | Backslash |
\0 | Null |
Unicode escapes
Name | |
---|---|
\u{7FFF} | 24-bit Unicode character code (up to 6 digits) |
Quote escapes
Name | |
---|---|
\' | Single quote |
\" | Double quote |
Numbers
Number literals* | Example | Exponentiation | Suffixes |
---|---|---|---|
Decimal integer | 98_222 | N/A | Integer suffixes |
Hex integer | 0xff | N/A | Integer suffixes |
Octal integer | 0o77 | N/A | Integer suffixes |
Binary integer | 0b1111_0000 | N/A | Integer suffixes |
Floating-point | 123.0E+77 | Optional | Floating-point suffixes |
*
All number literals allow _
as a visual separator: 1_234.0E+18f64
Suffixes
A suffix is a non-raw identifier immediately (without whitespace) following the primary part of a literal.
Any kind of literal (string, integer, etc) with any suffix is valid as a token, and can be passed to a macro without producing an error. The macro itself will decide how to interpret such a token and whether to produce an error or not.
(macro_rules! blackhole ($tt:tt) => () )
(blackhole! "string"suffix); ;; OK
However, suffixes on literal tokens parsed as Rust code are restricted. Any suffixes are rejected on non-numeric literal tokens, and numeric literal tokens are accepted only with suffixes from the list below.
Integer | Floating-point |
---|---|
u8 , i8 , u16 , i16 , u32 , i32 , u64 , i64 , u128 , i128 , usize , isize | f32 , f64 |
Character and string literals
Character literals
Lexer
CHAR_LITERAL :
'
( ~['
\
\n \r \t] | QUOTE_ESCAPE | ASCII_ESCAPE | UNICODE_ESCAPE )'
QUOTE_ESCAPE :
\'
|\"
ASCII_ESCAPE :
\x
OCT_DIGIT HEX_DIGIT
|\n
|\r
|\t
|\\
|\0
UNICODE_ESCAPE :
\u{
( HEX_DIGIT_
* )1..6}
A character literal is a single Unicode character enclosed within two
U+0027
(single-quote) characters, with the exception of U+0027
itself,
which must be escaped by a preceding U+005C
character (\
).
String literals
Lexer
STRING_LITERAL :
"
(
~["
\
IsolatedCR]
| QUOTE_ESCAPE
| ASCII_ESCAPE
| UNICODE_ESCAPE
| STRING_CONTINUE
)*"
STRING_CONTINUE :
\
followed by \n
A string literal is a sequence of any Unicode characters enclosed within two
U+0022
(double-quote) characters, with the exception of U+0022
itself,
which must be escaped by a preceding U+005C
character (\
).
Line-breaks are allowed in string literals. A line-break is either a newline
(U+000A
) or a pair of carriage return and newline (U+000D
, U+000A
). Both
byte sequences are normally translated to U+000A
, but as a special exception,
when an unescaped U+005C
character (\
) occurs immediately before the
line-break, the U+005C
character, the line-break, and all whitespace at the
beginning of the next line are ignored. Thus a
and b
are equal:
(let (a "foobar")
(b "foo\
bar")
(assert_eq! a b))
Character escapes
Some additional escapes are available in either character or non-raw string
literals. An escape starts with a U+005C
(\
) and continues with one of the
following forms:
- A 7-bit code point escape starts with
U+0078
(x
) and is followed by exactly two hex digits with value up to0x7F
. It denotes the ASCII character with value equal to the provided hex value. Higher values are not permitted because it is ambiguous whether they mean Unicode code points or byte values. - A 24-bit code point escape starts with
U+0075
(u
) and is followed by up to six hex digits surrounded by bracesU+007B
({
) andU+007D
(}
). It denotes the Unicode code point equal to the provided hex value. - A whitespace escape is one of the characters
U+006E
(n
),U+0072
(r
), orU+0074
(t
), denoting the Unicode valuesU+000A
(LF),U+000D
(CR) orU+0009
(HT) respectively. - The null escape is the character
U+0030
(0
) and denotes the Unicode valueU+0000
(NUL). - The backslash escape is the character
U+005C
(\
) which must be escaped in order to denote itself.
Raw string literals
Lexer
RAW_STRING_LITERAL :
r
RAW_STRING_CONTENTRAW_STRING_CONTENT :
"
( ~ IsolatedCR )* (non-greedy)"
|#
RAW_STRING_CONTENT#
Raw string literals do not process any escapes. They start with the character
U+0072
(r
), followed by zero or more of the character U+0023
(#
) and a
U+0022
(double-quote) character. The raw string body can contain any sequence
of Unicode characters and is terminated only by another U+0022
(double-quote)
character, followed by the same number of U+0023
(#
) characters that preceded
the opening U+0022
(double-quote) character.
All Unicode characters contained in the raw string body represent themselves,
the characters U+0022
(double-quote) (except when followed by at least as
many U+0023
(#
) characters as were used to start the raw string literal) or
U+005C
(\
) do not have any special meaning.
Examples for string literals:
;; foo
"foo"
r"foo";
;; "foo"
"\"foo\""
r#""foo""#
;; foo #"# bar
"foo #\"# bar"
r##"foo #"# bar"##
;; R
"\x52"
"R"
r"R"
;; \x52
"\\x52"
r"\x52"
Byte and byte string literals
Byte literals
Lexer
BYTE_LITERAL :
b'
( ASCII_FOR_CHAR | BYTE_ESCAPE )'
ASCII_FOR_CHAR :
any ASCII (i.e. 0x00 to 0x7F), except'
,\
, \n, \r or \tBYTE_ESCAPE :
\x
HEX_DIGIT HEX_DIGIT
|\n
|\r
|\t
|\\
|\0
A byte literal is a single ASCII character (in the U+0000
to U+007F
range) or a single escape preceded by the characters U+0062
(b
) and
U+0027
(single-quote), and followed by the character U+0027
. If the character
U+0027
is present within the literal, it must be escaped by a preceding
U+005C
(\
) character. It is equivalent to a u8
unsigned 8-bit integer
number literal.
Byte string literals
Lexer
BYTE_STRING_LITERAL :
b"
( ASCII_FOR_STRING | BYTE_ESCAPE | STRING_CONTINUE )*"
ASCII_FOR_STRING :
any ASCII (i.e 0x00 to 0x7F), except"
,\
and IsolatedCR
A non-raw byte string literal is a sequence of ASCII characters and escapes,
preceded by the characters U+0062
(b
) and U+0022
(double-quote), and
followed by the character U+0022
. If the character U+0022
is present within
the literal, it must be escaped by a preceding U+005C
(\
) character.
Alternatively, a byte string literal can be a raw byte string literal, defined
below. The type of a byte string literal of length n
is &'static [u8; n]
.
Some additional escapes are available in either byte or non-raw byte string
literals. An escape starts with a U+005C
(\
) and continues with one of the
following forms:
- A byte escape escape starts with
U+0078
(x
) and is followed by exactly two hex digits. It denotes the byte equal to the provided hex value. - A whitespace escape is one of the characters
U+006E
(n
),U+0072
(r
), orU+0074
(t
), denoting the bytes values0x0A
(ASCII LF),0x0D
(ASCII CR) or0x09
(ASCII HT) respectively. - The null escape is the character
U+0030
(0
) and denotes the byte value0x00
(ASCII NUL). - The backslash escape is the character
U+005C
(\
) which must be escaped in order to denote its ASCII encoding0x5C
.
Raw byte string literals
Lexer
RAW_BYTE_STRING_LITERAL :
br
RAW_BYTE_STRING_CONTENTRAW_BYTE_STRING_CONTENT :
"
ASCII* (non-greedy)"
|#
RAW_STRING_CONTENT#
ASCII :
any ASCII (i.e. 0x00 to 0x7F)
Raw byte string literals do not process any escapes. They start with the
character U+0062
(b
), followed by U+0072
(r
), followed by zero or more
of the character U+0023
(#
), and a U+0022
(double-quote) character. The
raw string body can contain any sequence of ASCII characters and is terminated
only by another U+0022
(double-quote) character, followed by the same number of
U+0023
(#
) characters that preceded the opening U+0022
(double-quote)
character. A raw byte string literal can not contain any non-ASCII byte.
All characters contained in the raw string body represent their ASCII encoding,
the characters U+0022
(double-quote) (except when followed by at least as
many U+0023
(#
) characters as were used to start the raw string literal) or
U+005C
(\
) do not have any special meaning.
Examples for byte string literals:
;; foo
b"foo"
br"foo"
;; "foo"
b"\"foo\""
br#""foo""#
;; foo #"# bar
b"foo #\"# bar"
br##"foo #"# bar"##
;; R
b"\x52"
b"R"
br"R"
;; \x52
b"\\x52"
br"\x52"
Number literals
A number literal is either an integer literal or a floating-point literal. The grammar for recognizing the two kinds of literals is mixed.
Integer literals
Lexer
INTEGER_LITERAL :
( DEC_LITERAL | BIN_LITERAL | OCT_LITERAL | HEX_LITERAL ) INTEGER_SUFFIX?DEC_LITERAL :
DEC_DIGIT (DEC_DIGIT|_
)*TUPLE_INDEX :
0
| NON_ZERO_DEC_DIGIT DEC_DIGIT*BIN_LITERAL :
0b
(BIN_DIGIT|_
)* BIN_DIGIT (BIN_DIGIT|_
)*OCT_LITERAL :
0o
(OCT_DIGIT|_
)* OCT_DIGIT (OCT_DIGIT|_
)*HEX_LITERAL :
0x
(HEX_DIGIT|_
)* HEX_DIGIT (HEX_DIGIT|_
)*BIN_DIGIT : [
0
-1
]OCT_DIGIT : [
0
-7
]DEC_DIGIT : [
0
-9
]NON_ZERO_DEC_DIGIT : [
1
-9
]HEX_DIGIT : [
0
-9
a
-f
A
-F
]INTEGER_SUFFIX :
u8
|u16
|u32
|u64
|u128
|usize
|i8
|i16
|i32
|i64
|i128
|isize
An integer literal has one of four forms:
- A decimal literal starts with a decimal digit and continues with any mixture of decimal digits and underscores.
- A tuple index is either
0
, or starts with a non-zero decimal digit and continues with zero or more decimal digits. Tuple indexes are used to refer to the fields of tuples, tuple structs, and tuple variants. - A hex literal starts with the character sequence
U+0030
U+0078
(0x
) and continues as any mixture (with at least one digit) of hex digits and underscores. - An octal literal starts with the character sequence
U+0030
U+006F
(0o
) and continues as any mixture (with at least one digit) of octal digits and underscores. - A binary literal starts with the character sequence
U+0030
U+0062
(0b
) and continues as any mixture (with at least one digit) of binary digits and underscores.
Like any literal, an integer literal may be followed (immediately,
without any spaces) by an integer suffix, which forcibly sets the
type of the literal. The integer suffix must be the name of one of the
integral types: u8
, i8
, u16
, i16
, u32
, i32
, u64
, i64
,
u128
, i128
, usize
, or isize
.
The type of an unsuffixed integer literal is determined by type inference:
-
If an integer type can be uniquely determined from the surrounding program context, the unsuffixed integer literal has that type.
-
If the program context under-constrains the type, it defaults to the signed 32-bit integer
i32
. -
If the program context over-constrains the type, it is considered a static type error.
Examples of integer literals of various forms:
123; ;; type i32
123i32; ;; type i32
123u32; ;; type u32
123_u32; ;; type u32
(let (a: u64 123) ;; type u64
...)
0xff; ;; type i32
0xff_u8; ;; type u8
0o70; ;; type i32
0o70_i16; ;; type i16
0b1111_1111_1001_0000; ;; type i32
0b1111_1111_1001_0000i64; ;; type i64
0b________1; ;; type i32
0usize; ;; type usize
Examples of invalid integer literals:
;; invalid suffixes
0invalidSuffix;
;; uses numbers of the wrong base
123AFB43;
0b0102;
0o0581;
;; integers too big for their type (they overflow)
128_i8;
256_u8;
;; bin, hex, and octal literals must have at least one digit
0b_;
0b____;
Note that the Rust syntax considers -1i8
as an application of the unary minus
operator to an integer literal 1i8
, rather than
a single integer literal.
Floating-point literals
Lexer
FLOAT_LITERAL :
DEC_LITERAL.
(not immediately followed by.
,_
or an identifier)
| DEC_LITERAL FLOAT_EXPONENT
| DEC_LITERAL.
DEC_LITERAL FLOAT_EXPONENT?
| DEC_LITERAL (.
DEC_LITERAL)? FLOAT_EXPONENT? FLOAT_SUFFIXFLOAT_EXPONENT :
(e
|E
) (+
|-
)? (DEC_DIGIT|_
)* DEC_DIGIT (DEC_DIGIT|_
)*FLOAT_SUFFIX :
f32
|f64
A floating-point literal has one of two forms:
- A decimal literal followed by a period character
U+002E
(.
). This is optionally followed by another decimal literal, with an optional exponent. - A single decimal literal followed by an exponent.
Like integer literals, a floating-point literal may be followed by a
suffix, so long as the pre-suffix part does not end with U+002E
(.
).
The suffix forcibly sets the type of the literal. There are two valid
floating-point suffixes, f32
and f64
(the 32-bit and 64-bit floating point
types), which explicitly determine the type of the literal.
The type of an unsuffixed floating-point literal is determined by type inference:
-
If a floating-point type can be uniquely determined from the surrounding program context, the unsuffixed floating-point literal has that type.
-
If the program context under-constrains the type, it defaults to
f64
. -
If the program context over-constrains the type, it is considered a static type error.
Examples of floating-point literals of various forms:
123.0f64; ;; type f64
0.1f64; ;; type f64
0.1f32; ;; type f32
12E+99_f64; ;; type f64
(let (x: f64 2.) ;; type f64
...)
This last example is different because it is not possible to use the suffix
syntax with a floating point literal ending in a period. 2.f64
would attempt
to call a method named f64
on 2
.
The representation semantics of floating-point numbers are described in "Machine Types".
Boolean literals
Lexer
BOOLEAN_LITERAL :
true
|false
The two values of the boolean type are written true
and false
.
Lifetimes and loop labels
Lexer
LIFETIME_TOKEN :
'
IDENTIFIER_OR_KEYWORD
|'_
LIFETIME_OR_LABEL :
'
NON_KEYWORD_IDENTIFIER
Lifetime parameters and loop labels use LIFETIME_OR_LABEL tokens. Any LIFETIME_TOKEN will be accepted by the lexer, and for example, can be used in macros.
Punctuation
Punctuation symbol tokens are listed here for completeness. Their individual usages and meanings are defined in the linked pages.
Delimiters
Bracket punctuation is used in various parts of the grammar. An open bracket must always be paired with a close bracket. Brackets and the tokens within them are referred to as "token trees" in macros. The two types of brackets are:
Bracket | Type |
---|---|
[ ] | Square brackets |
( ) | Parentheses |
Paths
A path is a sequence of one or more path segments logically separated by
a namespace qualifier (::
). If a path
consists of only one segment, it refers to either an item or a variable in
a local control scope. If a path has multiple segments, it always refers to an
item.
Two examples of simple paths consisting of only identifier segments:
x;
x::y::z;
Types of paths
Simple Paths
Syntax
SimplePath :
::
? SimplePathSegment (::
SimplePathSegment)*SimplePathSegment :
IDENTIFIER |super
|self
|crate
|$crate
Simple paths are used in visibility markers, attributes, macros, and use
items.
Examples:
(use std::io::(self Write))
(mod m
#[clippy::cyclomatic_complexity = "0"]
(pub (in super) (fn f1 ())))
Paths in expressions
Syntax
PathInExpression :
::
? PathExprSegment (::
PathExprSegment)*PathExprSegment :
PathIdentSegment (::
GenericArgs)?PathIdentSegment :
IDENTIFIER |super
|self
|Self
|crate
|$crate
GenericArgs :
<
>
|<
GenericArgsLifetimes,
?>
|<
GenericArgsTypes,
?>
|<
GenericArgsBindings,
?>
|<
GenericArgsTypes,
GenericArgsBindings,
?>
|<
GenericArgsLifetimes,
GenericArgsTypes,
?>
|<
GenericArgsLifetimes,
GenericArgsBindings,
?>
|<
GenericArgsLifetimes,
GenericArgsTypes,
GenericArgsBindings,
?>
GenericArgsLifetimes :
Lifetime (,
Lifetime)*GenericArgsTypes :
Type (,
Type)*GenericArgsBindings :
GenericArgsBinding (,
GenericArgsBinding)*GenericArgsBinding :
IDENTIFIER=
Type
Paths in expressions allow for paths with generic arguments to be specified. They are used in various places in expressions and patterns.
The ::
token is required before the opening <
for generic arguments to avoid
ambiguity with the less-than operator. This is colloquially known as "turbofish" syntax.
(0..10).collect::(<Vec<_>>)
Vec::<u8>::(with_capacity 1024)
Qualified paths
Syntax
QualifiedPathInExpression :
QualifiedPathType (::
PathExprSegment)+QualifiedPathType :
<
Type (as
TypePath)?>
QualifiedPathInType :
QualifiedPathType (::
TypePathSegment)+
Fully qualified paths allow for disambiguating the path for trait implementations and for specifying canonical paths. When used in a type specification, it supports using the type syntax specified below.
(struct S)
(impl S
(fn f ()
(println! "S")))
(trait T1
(fn f ()
(println! "T1 f")))
(impl T1 for S ())
(trait T2
(fn f ()
(println! "T2 f")))
(impl T2 for S))
S::(f) ;; Calls the inherent impl.
<S as T1>::(f) ;; Calls the T1 trait function.
<S as T2>::(f) ;; Calls the T2 trait function.
Paths in types
Syntax
TypePath :
::
? TypePathSegment (::
TypePathSegment)*TypePathSegment :
PathIdentSegment::
? (GenericArgs | TypePathFn)?TypePathFn :
(
TypePathFnInputs?)
(->
Type)?
Type paths are used within type definitions, trait bounds, type parameter bounds, and qualified paths.
Although the ::
token is allowed before the generics arguments, it is not required
because there is no ambiguity like there is in PathInExpression.
(impl ops::Index<ops::Range<usize>> for S
#| ... |#)
(fn i<'a> () -> (impl Iterator<Item ops::Example<'a>>)
#| ... |#)
(type G std::boxed::Box<dyn std::ops::FnOnce(isize) -> isize>)
Path qualifiers
Paths can be denoted with various leading qualifiers to change the meaning of how it is resolved.
::
Paths starting with ::
are considered to be global paths where the segments of the path
start being resolved from the crate root. Each identifier in the path must resolve to an
item.
Edition Differences: In the 2015 Edition, the crate root contains a variety of different items, including external crates, default crates such as
std
andcore
, and items in the top level of the crate (includinguse
imports).Beginning with the 2018 Edition, paths starting with
::
can only reference crates.
(mod a
(pub fn foo ()))
(mod b
(pub fn foo ()
;; call `a`'s foo function
;; Note that in Rust 2018, `::a` would be interpreted as the crate `a`.
::a::(foo)))
self
self
resolves the path relative to the current module. self
can only be used as the
first segment, without a preceding ::
.
(fn foo ())
(fn bar ()
self::(foo))
Self
Self
, with a capital "S", is used to refer to the implementing type within
traits and implementations.
Self
can only be used as the first segment, without a preceding ::
.
(trait T
(type Item)
(const C: i32)
;; `Self` will be whatever type that implements `T`.
(fn new () -> Self)
;; `Self::Item` will be the type alias in the implementation.
(fn f (&self) -> Self::Item))
(struct S)
(impl T for S
(type Item i32)
(const C: i32 9)
;; `Self` is the type `S`.
(fn new () -> Self
S)
;; `Self::Item` is the type `i32`.
(fn f (&self) -> Self::Item
;; `Self::C` is the constant value `9`.
Self::C))
super
super
in a path resolves to the parent module. It may only be used in leading
segments of the path, possibly after an initial self
segment.
(mod a
(pub fn foo ()))
(mod b
(pub fn foo ()
;; call a's foo function
super::a::(foo)))
super
may be repeated several times after the first super
or self
to refer to
ancestor modules.
(mod a
(fn foo ())
(mod b
(mod c
(fn foo ()
;; call a's foo function
super::super::(foo)
;; call a's foo function
self::super::super::(foo)))))
crate
crate
resolves the path relative to the current crate. crate
can only be used as the
first segment, without a preceding ::
.
(fn foo ())
(mod a
(fn bar ()
crate::(foo)))
$crate
$crate
is only used within macro transcribers, and can only be used as the first
segment, without a preceding ::
. $crate
will expand to a path to access items from the
top level of the crate where the macro is defined, regardless of which crate the macro is
invoked.
(pub fn increment (x: u32) -> u32
(+ x 1))
#[macro_export]
(macro_rules! inc
($x:expr) => ( $crate::increment($x)))
Canonical paths
Items defined in a module or implementation have a canonical path that corresponds to where within its crate it is defined. All other paths to these items are aliases. The canonical path is defined as a path prefix appended by the path segment the item itself defines.
Implementations and use declarations do not have canonical paths, although the items that implementations define do have them. Items defined in block expressions do not have canonical paths. Items defined in a module that does not have a canonical path do not have a canonical path. Associated items defined in an implementation that refers to an item without a canonical path, e.g. as the implementing type, the trait being implemented, a type parameter or bound on a type parameter, do not have canonical paths.
The path prefix for modules is the canonical path to that module. For bare
implementations, it is the canonical path of the item being implemented
surrounded by angle (<>
) brackets. For
trait implementations, it is the canonical path of the item being implemented
followed by as
followed by the canonical path to the trait all surrounded in
angle (<>
) brackets.
The canonical path is only meaningful within a given crate. There is no global namespace across crates; an item's canonical path merely identifies it within the crate.
;; Comments show the canonical path of the item.
(mod a ;; ::a
(pub struct Struct) ;; ::a::Struct
(pub trait Trait ;; ::a::Trait
(fn f (&self))) ;; a::Trait::f
(impl Trait for Struct
(fn f (&self))) ;; <::a::Struct as ::a::Trait>::f
(impl Struct
(fn g (&self)))) ;; <::a::Struct>::g
(mod without ;; ::without
(fn canonicals () ;; ::without::canonicals
(struct OtherStruct) ;; None
(trait OtherTrait ;; None
(fn g (&self))) ;; None
(impl OtherTrait for OtherStruct
(fn g (&self))) ;; None
(impl OtherTrait for ::a::Struct
(fn g (&self))) ;; None
(impl ::a::Trait for OtherStruct
(fn f (&self))))) ;; None