Tokens

Tokens are primitive productions in the grammar defined by regular (non-recursive) languages. Oxur source input can be broken down into the following kinds of tokens:

Keywords
Identifiers
Literals
Lifetimes
Punctuation
Delimiters

Within this documentation's grammar, "simple" tokens are given in string table production form, and appear in monospace font.

Literals

A literal is an expression consisting of a single token, rather than a sequence of tokens, that immediately and directly denotes the value it evaluates to, rather than referring to it by name or some other evaluation rule. A literal is a form of constant expression, so is evaluated (primarily) at compile time.

Examples

Characters and strings

	Example	`#` sets	Characters	Escapes
Character	`'H'`	0	All Unicode	Quote & ASCII & Unicode
String	`"hello"`	0	All Unicode	Quote & ASCII & Unicode
Raw string	`r#"hello"#`	0 or more*	All Unicode	`N/A`
Byte	`b'H'`	0	All ASCII	Quote & Byte
Byte string	`b"hello"`	0	All ASCII	Quote & Byte
Raw byte string	`br#"hello"#`	0 or more*	All ASCII	`N/A`

* The number of #s on each side of the same literal must be equivalent

ASCII escapes

	Name
`\x41`	7-bit character code (exactly 2 digits, up to 0x7F)
`\n`	Newline
`\r`	Carriage return
`\t`	Tab
`\\`	Backslash
`\0`	Null

Byte escapes

	Name
`\x7F`	8-bit character code (exactly 2 digits)
`\n`	Newline
`\r`	Carriage return
`\t`	Tab
`\\`	Backslash
`\0`	Null

Unicode escapes

	Name
`\u{7FFF}`	24-bit Unicode character code (up to 6 digits)

Quote escapes

	Name
`\'`	Single quote
`\"`	Double quote

Numbers

Number literals`*`	Example	Exponentiation	Suffixes
Decimal integer	`98_222`	`N/A`	Integer suffixes
Hex integer	`0xff`	`N/A`	Integer suffixes
Octal integer	`0o77`	`N/A`	Integer suffixes
Binary integer	`0b1111_0000`	`N/A`	Integer suffixes
Floating-point	`123.0E+77`	`Optional`	Floating-point suffixes

* All number literals allow _ as a visual separator: 1_234.0E+18f64

Suffixes

A suffix is a non-raw identifier immediately (without whitespace) following the primary part of a literal.

Any kind of literal (string, integer, etc) with any suffix is valid as a token, and can be passed to a macro without producing an error. The macro itself will decide how to interpret such a token and whether to produce an error or not.

(macro_rules! blackhole ($tt:tt) => () )

(blackhole! "string"suffix); ;; OK

However, suffixes on literal tokens parsed as Rust code are restricted. Any suffixes are rejected on non-numeric literal tokens, and numeric literal tokens are accepted only with suffixes from the list below.

Integer	Floating-point
`u8`, `i8`, `u16`, `i16`, `u32`, `i32`, `u64`, `i64`, `u128`, `i128`, `usize`, `isize`	`f32`, `f64`

Character and string literals

Character literals

^Lexer
CHAR_LITERAL :
   ' ( ~[' \ \n \r \t] | QUOTE_ESCAPE | ASCII_ESCAPE | UNICODE_ESCAPE ) '

QUOTE_ESCAPE :
   \' | \"

ASCII_ESCAPE :
      \x OCT_DIGIT HEX_DIGIT
   | \n | \r | \t | \\ | \0

UNICODE_ESCAPE :
   \u{ ( HEX_DIGIT _^* )^1..6 }

A character literal is a single Unicode character enclosed within two U+0027 (single-quote) characters, with the exception of U+0027 itself, which must be escaped by a preceding U+005C character (\).

String literals

^Lexer
STRING_LITERAL :
   " (
      ~[" \ IsolatedCR]
      | QUOTE_ESCAPE
      | ASCII_ESCAPE
      | UNICODE_ESCAPE
      | STRING_CONTINUE
   )^* "

STRING_CONTINUE :
   \ followed by \n

A string literal is a sequence of any Unicode characters enclosed within two U+0022 (double-quote) characters, with the exception of U+0022 itself, which must be escaped by a preceding U+005C character (\).

Line-breaks are allowed in string literals. A line-break is either a newline (U+000A) or a pair of carriage return and newline (U+000D, U+000A). Both byte sequences are normally translated to U+000A, but as a special exception, when an unescaped U+005C character (\) occurs immediately before the line-break, the U+005C character, the line-break, and all whitespace at the beginning of the next line are ignored. Thus a and b are equal:

(let (a "foobar")
     (b "foo\
        bar")
  (assert_eq! a b))

Character escapes

Some additional escapes are available in either character or non-raw string literals. An escape starts with a U+005C (\) and continues with one of the following forms:

A 7-bit code point escape starts with U+0078 (x) and is followed by exactly two hex digits with value up to 0x7F. It denotes the ASCII character with value equal to the provided hex value. Higher values are not permitted because it is ambiguous whether they mean Unicode code points or byte values.
A 24-bit code point escape starts with U+0075 (u) and is followed by up to six hex digits surrounded by braces U+007B ({) and U+007D (}). It denotes the Unicode code point equal to the provided hex value.
A whitespace escape is one of the characters U+006E (n), U+0072 (r), or U+0074 (t), denoting the Unicode values U+000A (LF), U+000D (CR) or U+0009 (HT) respectively.
The null escape is the character U+0030 (0) and denotes the Unicode value U+0000 (NUL).
The backslash escape is the character U+005C (\) which must be escaped in order to denote itself.

Raw string literals

^Lexer
RAW_STRING_LITERAL :
   r RAW_STRING_CONTENT

RAW_STRING_CONTENT :
      " ( ~ IsolatedCR )^{* (non-greedy)} "
   | # RAW_STRING_CONTENT #

Raw string literals do not process any escapes. They start with the character U+0072 (r), followed by zero or more of the character U+0023 (#) and a U+0022 (double-quote) character. The raw string body can contain any sequence of Unicode characters and is terminated only by another U+0022 (double-quote) character, followed by the same number of U+0023 (#) characters that preceded the opening U+0022 (double-quote) character.

All Unicode characters contained in the raw string body represent themselves, the characters U+0022 (double-quote) (except when followed by at least as many U+0023 (#) characters as were used to start the raw string literal) or U+005C (\) do not have any special meaning.

Examples for string literals:

;; foo
"foo" 
r"foo";

;; "foo"
"\"foo\""
r#""foo""#

;; foo #"# bar
"foo #\"# bar"
r##"foo #"# bar"##

;; R
"\x52"
"R"
r"R"

;; \x52
"\\x52"
r"\x52"

Byte and byte string literals

Byte literals

^Lexer
BYTE_LITERAL :
   b' ( ASCII_FOR_CHAR | BYTE_ESCAPE ) '

ASCII_FOR_CHAR :
   any ASCII (i.e. 0x00 to 0x7F), except ', \, \n, \r or \t

BYTE_ESCAPE :
      \x HEX_DIGIT HEX_DIGIT
   | \n | \r | \t | \\ | \0

A byte literal is a single ASCII character (in the U+0000 to U+007F range) or a single escape preceded by the characters U+0062 (b) and U+0027 (single-quote), and followed by the character U+0027. If the character U+0027 is present within the literal, it must be escaped by a preceding U+005C (\) character. It is equivalent to a u8 unsigned 8-bit integer number literal.

Byte string literals

^Lexer
BYTE_STRING_LITERAL :
b" ( ASCII_FOR_STRING | BYTE_ESCAPE | STRING_CONTINUE )^* "

ASCII_FOR_STRING :
any ASCII (i.e 0x00 to 0x7F), except ", \ and IsolatedCR

A non-raw byte string literal is a sequence of ASCII characters and escapes, preceded by the characters U+0062 (b) and U+0022 (double-quote), and followed by the character U+0022. If the character U+0022 is present within the literal, it must be escaped by a preceding U+005C (\) character. Alternatively, a byte string literal can be a raw byte string literal, defined below. The type of a byte string literal of length n is &'static [u8; n].

Some additional escapes are available in either byte or non-raw byte string literals. An escape starts with a U+005C (\) and continues with one of the following forms:

A byte escape escape starts with U+0078 (x) and is followed by exactly two hex digits. It denotes the byte equal to the provided hex value.
A whitespace escape is one of the characters U+006E (n), U+0072 (r), or U+0074 (t), denoting the bytes values 0x0A (ASCII LF), 0x0D (ASCII CR) or 0x09 (ASCII HT) respectively.
The null escape is the character U+0030 (0) and denotes the byte value 0x00 (ASCII NUL).
The backslash escape is the character U+005C (\) which must be escaped in order to denote its ASCII encoding 0x5C.

Raw byte string literals

^Lexer
RAW_BYTE_STRING_LITERAL :
   br RAW_BYTE_STRING_CONTENT

RAW_BYTE_STRING_CONTENT :
      " ASCII^{* (non-greedy)} "
   | # RAW_STRING_CONTENT #

ASCII :
   any ASCII (i.e. 0x00 to 0x7F)

Raw byte string literals do not process any escapes. They start with the character U+0062 (b), followed by U+0072 (r), followed by zero or more of the character U+0023 (#), and a U+0022 (double-quote) character. The raw string body can contain any sequence of ASCII characters and is terminated only by another U+0022 (double-quote) character, followed by the same number of U+0023 (#) characters that preceded the opening U+0022 (double-quote) character. A raw byte string literal can not contain any non-ASCII byte.

All characters contained in the raw string body represent their ASCII encoding, the characters U+0022 (double-quote) (except when followed by at least as many U+0023 (#) characters as were used to start the raw string literal) or U+005C (\) do not have any special meaning.

Examples for byte string literals:

;; foo
b"foo"
br"foo"

;; "foo"
b"\"foo\""
br#""foo""#

;; foo #"# bar
b"foo #\"# bar"
br##"foo #"# bar"##

;; R
b"\x52"
b"R"
br"R"

;; \x52
b"\\x52"
br"\x52"

Number literals

A number literal is either an integer literal or a floating-point literal. The grammar for recognizing the two kinds of literals is mixed.

Integer literals

^Lexer
INTEGER_LITERAL :
   ( DEC_LITERAL | BIN_LITERAL | OCT_LITERAL | HEX_LITERAL ) INTEGER_SUFFIX^?

DEC_LITERAL :
   DEC_DIGIT (DEC_DIGIT|_)^*

TUPLE_INDEX :
      0    | NON_ZERO_DEC_DIGIT DEC_DIGIT^*

BIN_LITERAL :
   0b (BIN_DIGIT|_)^* BIN_DIGIT (BIN_DIGIT|_)^*

OCT_LITERAL :
   0o (OCT_DIGIT|_)^* OCT_DIGIT (OCT_DIGIT|_)^*

HEX_LITERAL :
   0x (HEX_DIGIT|_)^* HEX_DIGIT (HEX_DIGIT|_)^*

BIN_DIGIT : [0-1]

OCT_DIGIT : [0-7]

DEC_DIGIT : [0-9]

NON_ZERO_DEC_DIGIT : [1-9]

HEX_DIGIT : [0-9 a-f A-F]

INTEGER_SUFFIX :
      u8 | u16 | u32 | u64 | u128 | usize
   | i8 | i16 | i32 | i64 | i128 | isize

An integer literal has one of four forms:

A decimal literal starts with a decimal digit and continues with any mixture of decimal digits and underscores.
A tuple index is either 0, or starts with a non-zero decimal digit and continues with zero or more decimal digits. Tuple indexes are used to refer to the fields of tuples, tuple structs, and tuple variants.
A hex literal starts with the character sequence U+0030 U+0078 (0x) and continues as any mixture (with at least one digit) of hex digits and underscores.
An octal literal starts with the character sequence U+0030 U+006F (0o) and continues as any mixture (with at least one digit) of octal digits and underscores.
A binary literal starts with the character sequence U+0030 U+0062 (0b) and continues as any mixture (with at least one digit) of binary digits and underscores.

Like any literal, an integer literal may be followed (immediately, without any spaces) by an integer suffix, which forcibly sets the type of the literal. The integer suffix must be the name of one of the integral types: u8, i8, u16, i16, u32, i32, u64, i64, u128, i128, usize, or isize.

The type of an unsuffixed integer literal is determined by type inference:

If an integer type can be uniquely determined from the surrounding program context, the unsuffixed integer literal has that type.
If the program context under-constrains the type, it defaults to the signed 32-bit integer i32.
If the program context over-constrains the type, it is considered a static type error.

Examples of integer literals of various forms:

123;                               ;; type i32
123i32;                            ;; type i32
123u32;                            ;; type u32
123_u32;                           ;; type u32
(let (a: u64 123)                  ;; type u64
  ...)

0xff;                              ;; type i32
0xff_u8;                           ;; type u8

0o70;                              ;; type i32
0o70_i16;                          ;; type i16

0b1111_1111_1001_0000;             ;; type i32
0b1111_1111_1001_0000i64;          ;; type i64
0b________1;                       ;; type i32

0usize;                            ;; type usize

Examples of invalid integer literals:

;; invalid suffixes

0invalidSuffix;

;; uses numbers of the wrong base

123AFB43;
0b0102;
0o0581;

;; integers too big for their type (they overflow)

128_i8;
256_u8;

;; bin, hex, and octal literals must have at least one digit

0b_;
0b____;

Note that the Rust syntax considers -1i8 as an application of the unary minus operator to an integer literal 1i8, rather than a single integer literal.

Floating-point literals

^Lexer
FLOAT_LITERAL :
      DEC_LITERAL . (not immediately followed by ., _ or an identifier)
   | DEC_LITERAL FLOAT_EXPONENT
   | DEC_LITERAL . DEC_LITERAL FLOAT_EXPONENT^?
   | DEC_LITERAL (. DEC_LITERAL)^? FLOAT_EXPONENT^? FLOAT_SUFFIX

FLOAT_EXPONENT :
   (e|E) (+|-)? (DEC_DIGIT|_)^* DEC_DIGIT (DEC_DIGIT|_)^*

FLOAT_SUFFIX :
   f32 | f64

A floating-point literal has one of two forms:

A decimal literal followed by a period character U+002E (.). This is optionally followed by another decimal literal, with an optional exponent.
A single decimal literal followed by an exponent.

Like integer literals, a floating-point literal may be followed by a suffix, so long as the pre-suffix part does not end with U+002E (.). The suffix forcibly sets the type of the literal. There are two valid floating-point suffixes, f32 and f64 (the 32-bit and 64-bit floating point types), which explicitly determine the type of the literal.

The type of an unsuffixed floating-point literal is determined by type inference:

If a floating-point type can be uniquely determined from the surrounding program context, the unsuffixed floating-point literal has that type.
If the program context under-constrains the type, it defaults to f64.
If the program context over-constrains the type, it is considered a static type error.

Examples of floating-point literals of various forms:

123.0f64;        ;; type f64
0.1f64;          ;; type f64
0.1f32;          ;; type f32
12E+99_f64;      ;; type f64
(let (x: f64 2.) ;; type f64
  ...)

This last example is different because it is not possible to use the suffix syntax with a floating point literal ending in a period. 2.f64 would attempt to call a method named f64 on 2.

The representation semantics of floating-point numbers are described in "Machine Types".

Boolean literals

^Lexer
BOOLEAN_LITERAL :
true
| false

The two values of the boolean type are written true and false.

Lifetimes and loop labels

^Lexer
LIFETIME_TOKEN :
      ' IDENTIFIER_OR_KEYWORD
   | '_

LIFETIME_OR_LABEL :
      ' NON_KEYWORD_IDENTIFIER

Lifetime parameters and loop labels use LIFETIME_OR_LABEL tokens. Any LIFETIME_TOKEN will be accepted by the lexer, and for example, can be used in macros.

Punctuation

Punctuation symbol tokens are listed here for completeness. Their individual usages and meanings are defined in the linked pages.

Symbol	Name	Usage
`+`	Plus	Addition, Trait Bounds, Macro Kleene Matcher
`-`	Minus	Subtraction, Negation
`*`	Star	Multiplication, Dereference, Raw Pointers, Macro Kleene Matcher
`/`	Slash	Division
`%`	Percent	Remainder
`^`	Caret	Bitwise and Logical XOR
`!`	Not	Bitwise and Logical NOT, Macro Calls, Inner Attributes, Never Type
`&`	And	Bitwise and Logical AND, Borrow, References, Reference patterns
`\|`	Or	Bitwise and Logical OR, Closures, Match
`&&`	AndAnd	Lazy AND, Borrow, References, Reference patterns
`\|\|`	OrOr	Lazy OR, Closures
`<<`	Shl	Shift Left, Nested Generics
`>>`	Shr	Shift Right, Nested Generics
`+=`	PlusEq	Addition assignment
`-=`	MinusEq	Subtraction assignment
`*=`	StarEq	Multiplication assignment
`/=`	SlashEq	Division assignment
`%=`	PercentEq	Remainder assignment
`^=`	CaretEq	Bitwise XOR assignment
`&=`	AndEq	Bitwise And assignment
`\|=`	OrEq	Bitwise Or assignment
`<<=`	ShlEq	Shift Left assignment
`>>=`	ShrEq	Shift Right assignment, Nested Generics
`=`	Eq	Assignment, Attributes, Various type definitions
`==`	EqEq	Equal
`!=`	Ne	Not Equal
`>`	Gt	Greater than, Generics, Paths
`<`	Lt	Less than, Generics, Paths
`>=`	Ge	Greater than or equal to, Generics
`<=`	Le	Less than or equal to
`@`	At	Subpattern binding
`_`	Underscore	Wildcard patterns, Inferred types
`.`	Dot	Field access, Tuple index
`..`	DotDot	Range, Struct expressions, Patterns
`...`	DotDotDot	Variadic functions, Range patterns
`..=`	DotDotEq	Inclusive Range, Range patterns
`,`	Comma	Various separators
`;`	Semi	Terminator for various items and statements, Array types
`:`	Colon	Various separators
`::`	PathSep	Path separator
`->`	RArrow	Function return type, Closure return type
`=>`	FatArrow	Match arms, Macros
`#`	Pound	Attributes
`$`	Dollar	Macros
`?`	Question	Question mark operator, Questionably sized, Macro Kleene Matcher

Delimiters

Bracket punctuation is used in various parts of the grammar. An open bracket must always be paired with a close bracket. Brackets and the tokens within them are referred to as "token trees" in macros. The two types of brackets are:

Bracket	Type
`[` `]`	Square brackets
`(` `)`	Parentheses

The Oxur Specification