How JavaScript Parsers Turn Code into ASTs: Lexical & Syntax Basics
This article explains how JavaScript parsers transform source code strings into abstract syntax trees through lexical analysis and syntax parsing, covering language types, V8’s execution flow, token generation, AST construction, and practical examples, while also linking to tools like AST Explorer for further exploration.
Explanation of Parsers
Hope to use this article to clearly explain what a parser does.
Preface
Have you ever wondered how the JavaScript code we write, which is just a string, gets executed by the machine?
Concept
For the machine, the JavaScript code we write is just a series of characters; the machine does not recognize them initially.
Runtime Environment
JavaScript runs in browser and Node environments, both of which embed a JavaScript engine. The most common is Google’s open‑source V8 engine. Other engines include Mozilla’s SpiderMonkey for Firefox and JavaScriptCore for Safari.
JavaScript Definition
JavaScript (JS) is a lightweight, function‑first, interpreted or just‑in‑time compiled programming language.
Language Types
From the definition we see references to interpreted, just‑in‑time compiled languages, and there is also the compiled language type.
What are interpreted, compiled, and just‑in‑time compiled languages?
Compiled Language
Before a program runs, it must be compiled by a compiler into a binary file that the machine can read. The binary can be executed directly without recompilation each time. Examples include C/C++ and Go.
Interpreted Language
Programs written in interpreted languages are dynamically interpreted and executed by an interpreter each time they run. Python and JavaScript are examples.
Just‑In‑Time Language
Just‑in‑time (JIT) compilation, also called dynamic translation or runtime compilation, compiles code during execution rather than before execution. JIT compilers continuously analyze running code and compile hot parts to improve performance, outweighing the compilation overhead.
JIT combines the advantages and disadvantages of ahead‑of‑time compilation and interpretation.
V8 Execution Flow
During execution V8 uses both the Ignition interpreter and the TurboFan compiler.
Summary
Regardless of language type, the first step is converting source code to an AST. The focus of this article is how to obtain the AST through lexical analysis and syntax analysis.
Parser
Lexical Analysis
Lexical analysis is the process of converting a character sequence into a token sequence. The program that performs lexical analysis is called a lexical analyzer (lexer) or scanner.
Syntax Analysis
In computer science, syntax analysis (parsing) determines the grammatical structure of an input token sequence according to a formal grammar.
Simple Lexical Parser Example
Pseudocode String
<code>(add 2 (subtract 4 2))</code>Step 1: Tokenization
Tokens are generated by scanning characters one by one, handling various cases such as identifiers, numbers, parentheses, strings, and whitespace.
<code>function tokenizer(input) {
let current = 0;
let tokens = [];
while (current < input.length) {
let char = input[current];
if (char === '(') {
tokens.push({ type: 'paren', value: '(' });
current++;
continue;
}
if (char === ')') {
tokens.push({ type: 'paren', value: ')' });
current++;
continue;
}
let WHITESPACE = /\s/;
if (WHITESPACE.test(char)) {
current++;
continue;
}
let NUMBERS = /[0-9]/;
if (NUMBERS.test(char)) {
let value = '';
while (NUMBERS.test(char)) {
value += char;
char = input[++current];
}
tokens.push({ type: 'number', value });
continue;
}
if (char === '"') {
let value = '';
char = input[++current];
while (char !== '"') {
value += char;
char = input[++current];
}
char = input[++current];
tokens.push({ type: 'string', value });
continue;
}
let LETTERS = /[a-z]/i;
if (LETTERS.test(char)) {
let value = '';
while (LETTERS.test(char)) {
value += char;
char = input[++current];
}
tokens.push({ type: 'name', value });
continue;
}
throw new TypeError('I dont know what this character is: ' + char);
}
return tokens;
}</code>Resulting tokens:
<code>[
{ type: 'paren', value: '(' },
{ type: 'name', value: 'add' },
{ type: 'number', value: '2' },
{ type: 'paren', value: '(' },
{ type: 'name', value: 'subtract' },
{ type: 'number', value: '4' },
{ type: 'number', value: '2' },
{ type: 'paren', value: ')' },
{ type: 'paren', value: ')' }
];</code>Step 2: Build AST
<code>function parser(tokens) {
let current = 0;
function walk() {
let token = tokens[current];
if (token.type === 'number') {
current++;
return { type: 'NumberLiteral', value: token.value };
}
if (token.type === 'string') {
current++;
return { type: 'StringLiteral', value: token.value };
}
if (token.type === 'paren' && token.value === '(') {
token = tokens[++current];
let node = { type: 'CallExpression', name: token.value, params: [] };
token = tokens[++current];
while (token.type !== 'paren' || (token.type === 'paren' && token.value !== ')')) {
node.params.push(walk());
token = tokens[current];
}
current++;
return node;
}
throw new TypeError(token.type);
}
let ast = { type: 'Program', body: [] };
while (current < tokens.length) {
ast.body.push(walk());
}
return ast;
}</code>Resulting AST:
<code>{
type: 'Program',
body: [{
type: 'CallExpression',
name: 'add',
params: [
{ type: 'NumberLiteral', value: '2' },
{
type: 'CallExpression',
name: 'subtract',
params: [
{ type: 'NumberLiteral', value: '4' },
{ type: 'NumberLiteral', value: '2' }
]
}
]
}]
};</code>AST nodes are predefined types that represent the language’s syntax; many more node types exist.
Summary
After generating the AST, the parser’s job is done; the AST can be processed further to build Babel plugins, ESLint rules, Webpack plugins, etc. Common JavaScript parsers include babel/parser and acorn.
Various languages have their own parsers; you can explore them with AST Explorer.
Conclusion
Returning to the opening question, this article covered the first step—parsing. Subsequent steps involve bytecode, machine code, and interpreters, which you can explore further.
Discussion: Compiled languages like C/C++ or Go are written in other languages; can a language compile itself?
References: AST Explorer, Browser Internals, the‑super‑tiny‑compiler.
Goodme Frontend Team
Regularly sharing the team's insights and expertise in the frontend field
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.