Fundamentals 30 min read

Design and Implementation of TinyLanguage Using Flex and Bison

The article walks readers through designing and implementing TinyLanguage—a minimal interpreted language with 4‑byte integers, constants, if/else, for loops, and print—by using Flex for lexical analysis, Bison for parsing into an AST, and an execution engine, even showing a Fibonacci example and future LLVM compilation possibilities.

OPPO Kernel Craftsman
OPPO Kernel Craftsman
OPPO Kernel Craftsman
Design and Implementation of TinyLanguage Using Flex and Bison

This article introduces the concept of computer programming languages and focuses on designing and implementing a minimal language called TinyLanguage . It discusses why one might create a new language, the required features (4‑byte integer variables, constants, if/else, for loops, and a built‑in print function), and provides a high‑level overview of the language’s syntax.

1. Example C program

#include
#include
int main(int argc, char *argv[])
{
  int i, n;
  int t1 = 0, t2 = 1;
  int nextTerm = t1 + t2;

  if (argc != 2) {
    fprintf(stderr, "please input one number\n");
    return -1;
  }
  n = atoi(argv[1]);

  for (i = 3; i <= n; ++i) {
    t1 = t2;
    t2 = nextTerm;
    nextTerm = t1 + t2;
  }

  printf("%dth value is %d\n", n, nextTerm);
  return 0;
}

The program computes the N‑th Fibonacci number and prints the result.

2. Lexical analysis with Flex

Flex is used to generate a C lexer that recognizes digits, identifiers, keywords, and operators. The lexer definition is split into three sections (definitions, rules, and user code). Example Flex rules:

/* definitions */
DIGIT   [0-9]
ID      [a-zA-Z][a-zA-Z0-9]*

%%
{DIGIT}+    { printf("An integer %s(%d)\n", yytext, atoi(yytext)); yylval.integer = atoi(yytext); return INTEGER; }
{ID}       { printf("An identifier(%s)\n", yytext); strcpy(yylval.identifier, yytext); return VARIABLE; }
[\t\n]+   { /* eat up whitespace */ }
.          { printf("Unrecognized character: %s\n", yytext); }
%%
int yywrap() { return 1; }
int main(int argc, char *argv[]) {
  yyin = stdin;
  yylex();
  return 0;
}

Running the lexer on the input 123 abc abc123 produces:

An integer 123(123)
An identifier(abc)
An identifier(abc123)

3. Grammar with Bison

Bison generates a LR parser from a context‑free grammar. The grammar defines non‑terminals such as program , function , stmt , and expr . Tokens and their precedence are declared, and each rule can contain C actions that build an abstract syntax tree (AST). Example Bison fragment:

%union {
  int integer;
  char identifier[256];
  struct listnode *node_ptr;
}
%token
INTEGER
%token
VARIABLE
%token IF ELSE FOR
%nonassoc IFX FOR
%nonassoc ELSE
%nonassoc '='
%left AND OR
%left GE LE EQ NE '>' '<'
%left '+' '-'
%left '*' '/'
%precedence NEG

%%
program: function { DEBUG("program done!\n"); } ;
function: function stmt { list_add_tail(&g_stmt_list, $2); } | /* empty */ ;
stmt: ';' { $$ = parse_operator(';', 2, NULL, NULL); }
    | expr ';' { $$ = $1; }
    | '{' stmt_list '}' { $$ = $2; }
    | IF '(' expr ')' stmt %prec IFX { $$ = parse_operator(IF, 2, $3, $5); }
    | IF '(' expr ')' stmt ELSE stmt { $$ = parse_operator(IF, 3, $3, $5, $7); }
    | FOR '(' expr ';' expr ';' expr ')' stmt { $$ = parse_operator(FOR, 4, $3, $5, $7, $9); }
    ;
expr: INTEGER { $$ = parse_constant($1); }
    | VARIABLE { $$ = parse_variable($1); }
    | VARIABLE '=' expr { $$ = parse_operator('=', 2, parse_variable($1), $3); }
    | expr '+' expr { $$ = parse_operator('+', 2, $1, $3); }
    | expr '-' expr { $$ = parse_operator('-', 2, $1, $3); }
    | expr '*' expr { $$ = parse_operator('*', 2, $1, $3); }
    | expr '/' expr { $$ = parse_operator('/', 2, $1, $3); }
    | '(' expr ')' { $$ = $2; }
    | '-' expr %prec NEG { $$ = parse_operator(NEG, 1, $2); }
    ;
%%

The actions use helper functions such as parse_constant , parse_variable , and parse_operator to construct nodes of types nbr_node , var_node , and opr_node . The opr_node stores the operator code, the number of operands, and a linked list of operand nodes.

4. AST node definitions

enum NodeType { NT_OPR = 0, NT_NBR, NT_VAR, NT_NON };

struct listnode { struct listnode *next; struct listnode *prev; };

struct ast_node { enum NodeType type; struct listnode node; };

struct nbr_node { struct ast_node ast_node; int value; };

struct var_node { struct ast_node ast_node; char name[256]; int value; };

struct opr_node { struct ast_node ast_node; int opr; int nops; struct listnode expr_list; };

Helper functions allocate these structures and link them into a tree. For example, parse_operator uses va_list to accept a variable number of operand nodes and stores them in expr_list .

5. Execution engine

The interpreter walks the AST. run_stmt dispatches on node type and calls run_operator for operators. Sample execution of the + operator:

case '>': {
    struct listnode *left_node = list_head(&opr_node->expr_list);
    struct listnode *right_node = left_node->next;
    return run_stmt(left_node) > run_stmt(right_node);
}

Variable handling uses a global symbol list ( g_sym_list ) to store name/value pairs. If a variable is not found, it is created with an initial value of 0.

6. Full program flow

The driver reads a source file into memory, calls yyparse() (generated by Bison) to build the AST, then iterates over the global statement list ( g_stmt_list ) and executes each statement with run_stmt . The result of the program is the result of the last executed statement.

7. Fibonacci example

The article shows a TinyLanguage program that computes Fibonacci numbers using the language features described above. When compiled and run as ./tiny_lang fibonacci.t 10 , it prints the sequence up to the 10th term:

=====PRINT:1=====
=====PRINT:1=====
=====PRINT:2=====
=====PRINT:3=====
=====PRINT:5=====
=====PRINT:8=====
=====PRINT:13=====
=====PRINT:21=====
=====PRINT:34=====
=====PRINT:55=====
program result:0

8. Extensions

The author notes that TinyLanguage is currently an interpreted language but could be transformed into a compiled language using LLVM, referencing the LLVM tutorial for building a compiler front‑end.

Overall, the article provides a step‑by‑step guide to building a tiny interpreted language, covering lexical analysis with Flex, grammar definition with Bison, AST construction, and execution, making it a valuable resource for developers interested in compiler construction and language design.

ASTcompilerInterpreterbisonflexlexical analysistiny language
OPPO Kernel Craftsman
Written by

OPPO Kernel Craftsman

Sharing Linux kernel-related cutting-edge technology, technical articles, technical news, and curated tutorials

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.