Using Lex to Build a Simple cat‑Like Utility and an SQL Syntax Highlighter
This article demonstrates how to use the Lex lexical analyzer generator to build a minimal cat‑like utility by creating an empty rule file, then extends the technique to develop a simple SQL syntax highlighter with colorized output, explaining the role of %% separators, default rules, and compilation steps.
This tutorial shows how to employ the Lex lexical analyzer generator to create a tiny program that mimics the behavior of the Unix cat command.
First, create an empty Lex rule file that contains only the four percent‑sign separators ( %% ) and no definitions, rules, or user code. Save it as a file (e.g., %%%% ), then run the following commands:
$ lex %%%%\n$ gcc lex.yy.c -llThe resulting executable ( a.out ) reads its standard input and writes it unchanged, effectively acting like cat . For example:
$ ./a.out < /etc/hostsLex works by converting a rule file into C source code. The rule file is divided into three sections separated by %% lines: a definitions section, a rules section, and a user‑code section. An empty file therefore produces a lexer whose only rule is the default one that echoes any unmatched character.
The default rule can be written explicitly as:
%%\n\n.|\n { printf("%s", yytext); }\n%%Building on this, the article presents a more practical example: a Lex program that highlights SQL statements. The Lex file begins with a definitions block ( %{ … %} ) that includes #include <stdio.h> and defines ANSI colour macros ( BOLD , GREEN , BLUE ) and a helper macro _format to wrap strings with colour codes.
%{\n#include <stdio.h>\n\n#define BOLD "\e[1;30m"\n#define GREEN "\e[32m"\n#define BLUE "\e[34m"\n#define FORMAT_RESET "\e[0m"\n\n#define _format(format, str) "%s%s%s", format, str, FORMAT_RESET\n%}\n\n%%\n\nSELECT|FROM|WHERE { printf(_format(BOLD, yytext)); }\n[0-9]+ { printf(_format(GREEN, yytext)); }\n\"[^\"]*\" { printf(_format(BLUE, yytext)); }\n%%\n\nint main() {\n printf("%s\n", "Input SQL:");\n yylex();\n}\nAfter saving this file (e.g., sql.l ), compile it with:
$ lex sql.l\n$ gcc lex.yy.c -llThe resulting program reads an SQL query from standard input and prints keywords in bold, numbers in green, and string literals in blue.
Lex provides a default main() function in its static library ( libl.a ), which is linked when the -ll option is used. The symbol table can be inspected with:
$ nm /usr/lib/x86_64-linux-gnu/libl.a\n\nlibmain.o:\n U _GLOBAL_OFFSET_TABLE_\n U exit\n0000000000000000 T main\n U yylex\n\nlibyywrap.o:\n0000000000000000 T yywrap\nIf the user supplies their own main() , the library’s default is overridden. This flexibility, together with Lex’s ability to generate scanners from concise specifications, makes it a handy tool for building command‑line utilities, compilers, configuration parsers, and syntax‑highlighting programs.
Overall, the article illustrates how a few percent signs and a simple rule file can be leveraged to produce functional tools, and encourages readers to explore more sophisticated Lex specifications for real‑world applications.
IT Services Circle
Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.