Fundamentals 13 min read

Understanding ECMAScript Grammar: Lexical, Syntactic, and Identifier Rules

This article examines the four ECMAScript grammars—lexical, syntactic, regular‑expression, and numeric‑string—explains how context‑free productions define tokens, shows ambiguous cases such as the '/' and '`' characters, and details why the identifier await is prohibited in async functions through static‑semantic rules.

ByteFE
ByteFE
ByteFE
Understanding ECMAScript Grammar: Lexical, Syntactic, and Identifier Rules

The purpose of this translation is to propose consistent Chinese terms for core concepts in the ECMAScript specification, helping readers better understand the language’s formal definition.

ECMAScript defines four distinct grammars: lexical grammar (mapping Unicode code points to input elements), syntactic grammar (how tokens form valid programs), regular‑expression grammar (how Unicode points become RegExp patterns), and numeric‑string grammar (how strings are converted to numbers). Each grammar is expressed with a context‑free grammar consisting of productions.

Different notations are used for each grammar: the syntactic grammar uses LeftHandSideSymbol : , the lexical and regular‑expression grammars use LeftHandSideSymbol :: , and the numeric‑string grammar uses LeftHandSideSymbol ::: (the number of colons distinguishes them).

Lexical Grammar

ECMAScript source text is a sequence of Unicode code points, allowing identifiers to contain any Unicode character, not just ASCII. Tokenisation cannot be performed without context, e.g., the '/' character may start a division operator or a regular‑expression literal.

const x = 10 / 5;

Here '/' is a DivPunctuator . In contrast:

const r = /foo/;

Here '/' begins a RegularExpressionLiteral . Template literals also create ambiguities; for example:

const what1 = 'temp';
const what2 = 'late';
const t = `I am a ${ what1 + what2 }`;

In this snippet, I am a ${ is a TemplateHead and }` is a TemplateTail . The lexical grammar uses goal symbols such as InputElementDiv to decide whether '/' should be tokenised as DivPunctuator or RegularExpressionLiteral based on surrounding context.

InputElementDiv ::
  WhiteSpace
  LineTerminator
  Comment
  CommonToken
  DivPunctuator
  RightBracePunctuator
InputElementRegExp ::
  WhiteSpace
  LineTerminator
  Comment
  CommonToken
  RightBracePunctuator
  RegularExpressionLiteral

Syntactic Grammar

The syntactic grammar builds on the lexical grammar, defining how tokens combine into syntactically correct programs. It includes mechanisms for adding new keywords without breaking existing code, illustrated with the await keyword.

function old() {
  var await;
}

When await becomes a keyword inside async functions, the same code becomes a syntax error:

async function modern() {
  var await; // SyntaxError
}

Productions use shorthand notations such as [Yield, Await] , +In , and ?Await to express families of productions. For example, the production for VariableStatement[Yield, Await] expands into four concrete productions:

VariableStatement

VariableStatement_Yield

VariableStatement_Await

VariableStatement_Yield_Await

Whether a function body allows await as an identifier depends on the goal symbol: async functions use FunctionBody_Await , non‑async functions use FunctionBody . The static‑semantic rules for BindingIdentifier declare that if the production has the [Await] parameter, the occurrence of await is a syntax error.

BindingIdentifier[Yield, Await] : await

This rule, together with Automatic Semicolon Insertion (ASI), prevents await from being interpreted as an identifier in async functions.

Static Semantics and Identifier Names

Static semantics also govern identifier names: an identifier’s StringValue is computed from its Unicode escapes. Thus \u0061wait yields the string "await" and is treated as a reserved word, which the static‑semantic check forbids in async functions.

function old() {
  var \u0061wait;
}
async function modern() {
  var \u0061wait; // SyntaxError
}

Conclusion

By studying this article, readers learn about the four ECMAScript grammars, the role of goal symbols, how ambiguous tokens are resolved, and why the identifier await is disallowed in async functions but allowed elsewhere. The next article will explore other interesting parts of the lexical grammar such as ASI and cover grammars.

ECMAScriptSyntaxawaitGrammarLexicalStatic Semantics
ByteFE
Written by

ByteFE

Cutting‑edge tech, article sharing, and practical insights from the ByteDance frontend team.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.