Understanding ECMAScript Grammar: Lexical, Syntactic, and Identifier Rules
This article examines the four ECMAScript grammars—lexical, syntactic, regular‑expression, and numeric‑string—explains how context‑free productions define tokens, shows ambiguous cases such as the '/' and '`' characters, and details why the identifier await is prohibited in async functions through static‑semantic rules.
The purpose of this translation is to propose consistent Chinese terms for core concepts in the ECMAScript specification, helping readers better understand the language’s formal definition.
ECMAScript defines four distinct grammars: lexical grammar (mapping Unicode code points to input elements), syntactic grammar (how tokens form valid programs), regular‑expression grammar (how Unicode points become RegExp patterns), and numeric‑string grammar (how strings are converted to numbers). Each grammar is expressed with a context‑free grammar consisting of productions.
Different notations are used for each grammar: the syntactic grammar uses LeftHandSideSymbol : , the lexical and regular‑expression grammars use LeftHandSideSymbol :: , and the numeric‑string grammar uses LeftHandSideSymbol ::: (the number of colons distinguishes them).
Lexical Grammar
ECMAScript source text is a sequence of Unicode code points, allowing identifiers to contain any Unicode character, not just ASCII. Tokenisation cannot be performed without context, e.g., the '/' character may start a division operator or a regular‑expression literal.
const x = 10 / 5;Here '/' is a DivPunctuator . In contrast:
const r = /foo/;Here '/' begins a RegularExpressionLiteral . Template literals also create ambiguities; for example:
const what1 = 'temp';
const what2 = 'late';
const t = `I am a ${ what1 + what2 }`;In this snippet, I am a ${ is a TemplateHead and }` is a TemplateTail . The lexical grammar uses goal symbols such as InputElementDiv to decide whether '/' should be tokenised as DivPunctuator or RegularExpressionLiteral based on surrounding context.
InputElementDiv ::
WhiteSpace
LineTerminator
Comment
CommonToken
DivPunctuator
RightBracePunctuator InputElementRegExp ::
WhiteSpace
LineTerminator
Comment
CommonToken
RightBracePunctuator
RegularExpressionLiteralSyntactic Grammar
The syntactic grammar builds on the lexical grammar, defining how tokens combine into syntactically correct programs. It includes mechanisms for adding new keywords without breaking existing code, illustrated with the await keyword.
function old() {
var await;
}When await becomes a keyword inside async functions, the same code becomes a syntax error:
async function modern() {
var await; // SyntaxError
}Productions use shorthand notations such as [Yield, Await] , +In , and ?Await to express families of productions. For example, the production for VariableStatement[Yield, Await] expands into four concrete productions:
VariableStatement
VariableStatement_Yield
VariableStatement_Await
VariableStatement_Yield_Await
Whether a function body allows await as an identifier depends on the goal symbol: async functions use FunctionBody_Await , non‑async functions use FunctionBody . The static‑semantic rules for BindingIdentifier declare that if the production has the [Await] parameter, the occurrence of await is a syntax error.
BindingIdentifier[Yield, Await] : awaitThis rule, together with Automatic Semicolon Insertion (ASI), prevents await from being interpreted as an identifier in async functions.
Static Semantics and Identifier Names
Static semantics also govern identifier names: an identifier’s StringValue is computed from its Unicode escapes. Thus \u0061wait yields the string "await" and is treated as a reserved word, which the static‑semantic check forbids in async functions.
function old() {
var \u0061wait;
} async function modern() {
var \u0061wait; // SyntaxError
}Conclusion
By studying this article, readers learn about the four ECMAScript grammars, the role of goal symbols, how ambiguous tokens are resolved, and why the identifier await is disallowed in async functions but allowed elsewhere. The next article will explore other interesting parts of the lexical grammar such as ASI and cover grammars.
ByteFE
Cutting‑edge tech, article sharing, and practical insights from the ByteDance frontend team.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.