Dfastar lexer generator
Author: p | 2025-04-25
DFASTAR Lexer Generator DFASTAR Lexer Generator Description DFASTAR reads regular-expression grammars and generates 4 types of lexical analyzers of different size Download DFASTAR Lexer Generator - A DFA lexer generator, reading a lexical grammar.
Yvonne: DFASTAR Lexer Generator download
Maleenimaleeni is a lexer generator for golang. maleeni also provides a command to perform lexical analysis to allow easy debugging of your lexical specification.InstallationCompiler:$ go install github.com/nihei9/maleeni/cmd/maleeni@latestCode Generator:$ go install github.com/nihei9/maleeni/cmd/maleeni-go@latestUsage1. Define your lexical specificationFirst, define your lexical specification in JSON format. As an example, let's write the definitions of whitespace, words, and punctuation.{ "name": "statement", "entries": [ { "kind": "whitespace", "pattern": "[\\u{0009}\\u{000A}\\u{000D}\\u{0020}]+" }, { "kind": "word", "pattern": "[0-9A-Za-z]+" }, { "kind": "punctuation", "pattern": "[.,:;]" } ]}Save the above specification to a file. In this explanation, the file name is statement.json.⚠️ The input file must be encoded in UTF-8.2. Compile the lexical specificationNext, generate a DFA from the lexical specification using maleeni compile command.$ maleeni compile statement.json -o statementc.json3. Debug (Optional)If you want to make sure that the lexical specification behaves as expected, you can use maleeni lex command to try lexical analysis without having to generate a lexer. maleeni lex command outputs tokens in JSON format. For simplicity, print significant fields of the tokens in CSV format using jq command.⚠️ An encoding that maleeni lex and the driver can handle is only UTF-8.$ echo -n 'The truth is out there.' | maleeni lex statementc.json | jq -r '[.kind_name, .lexeme, .eof] | @csv'"word","The",false"whitespace"," ",false"word","truth",false"whitespace"," ",false"word","is",false"whitespace"," ",false"word","out",false"whitespace"," ",false"word","there",false"punctuation",".",false"","",trueThe JSON format of tokens that maleeni lex command prints is as follows:FieldTypeDescriptionmode_idintegerAn ID of a lex mode.mode_namestringA name of a lex mode.kind_idintegerAn ID of a kind. This is unique among all modes.mode_kind_idintegerAn ID of a lexical kind. This is unique only within a mode. Note that you need to use kind_id field if you want to identify a kind across all modes.kind_namestringA name of a lexical kind.rowintegerA row number where a lexeme appears.colintegerA column number where a lexeme appears. Note that col is counted in code points, not bytes.lexemearray of integersA byte sequense of a lexeme.eofboolWhen this field is true, it means the token is the EOF token.invalidboolWhen this field is true, it means the token is an error token.4. Generate the lexerUsing maleeni-go command, you can generate a source code of the lexer to recognize your lexical specification.$ maleeni-go statementc.jsonThe above command generates the lexer and saves it to statement_lexer.go file. By default, the file name will be {spec name}_lexer.json. To use the lexer, you need to call NewLexer function defined in statement_lexer.go. The following code is a simple example. In this example, the lexer reads a source code from stdin and writes the result, tokens, to stdout.package mainimport ( "fmt" "os")func main() { lex, err := NewLexer(NewLexSpec(), os.Stdin) if err != nil { fmt.Fprintln(os.Stderr, err) os.Exit(1) } for { tok, err := lex.Next() if err != nil { fmt.Fprintln(os.Stderr, err) os.Exit(1) } if tok.EOF { break } if tok.Invalid { fmt.Printf("invalid: %#v\n", string(tok.Lexeme)) } else { fmt.Printf("valid: %v: %#v\n", KindIDToName(tok.KindID), string(tok.Lexeme)) } }}Please save the above source code to main.go and create a directory structure like the one below./project_root├── statement_lexer.go ... Lexer generated from the compiled lexical specification (the result of `maleeni-go`).└── main.go .............. Caller of the lexer.Now, you can perform Moo!Moo is a highly-optimised tokenizer/lexer generator. Use it to tokenize your strings, before parsing 'em with a parser like nearley or whatever else you're into.FastConvenientuses Regular Expressionstracks Line Numbershandles Keywordssupports Statescustom Errorsis even Iterablehas no dependencies4KB minified + gzippedMoo!Is it fast?Yup! Flying-cows-and-singed-steak fast.Moo is the fastest JS tokenizer around. It's ~2–10x faster than most other tokenizers; it's a couple orders of magnitude faster than some of the slower ones.Define your tokens using regular expressions. Moo will compile 'em down to a single RegExp for performance. It uses the new ES6 sticky flag where possible to make things faster; otherwise it falls back to an almost-as-efficient workaround. (For more than you ever wanted to know about this, read adventures in the land of substrings and RegExps.)You might be able to go faster still by writing your lexer by hand rather than using RegExps, but that's icky.Oh, and it avoids parsing RegExps by itself. Because that would be horrible.UsageFirst, you need to do the needful: $ npm install moo, or whatever will ship this code to your computer. Alternatively, grab the moo.js file by itself and slap it into your web page via a tag; moo is completely standalone.Then you can start roasting your very own lexer/tokenizer: const moo = require('moo') let lexer = moo.compile({ WS: /[ \t]+/, comment: /\/\/.*?$/, number: /0|[1-9][0-9]*/, string: /"(?:\\["\\]|[^\n"\\])*"/, lparen: '(', rparen: ')', keyword: ['while', 'if', 'else', 'moo', 'cows'], NL: { match: /\n/, lineBreaks: true }, })And now throw some text at it: lexer.reset('while (10) cows\nmoo') lexer.next()Use The Flex A Fast Lexer Generator To Generate A Lexer
Papyrus Plugin for Notepad++This plugin adds support for Bethesda's Papyrus scripting language toNotepad++.It provides syntax highlighting with automatic recognition of class names/functions/properties,supports keywords matching and hyperlinks to referenced scripts.It also comes with a compiler that can provide compilation errors in a separate list window, as wellas inline annotation and indication where errors are reported, plus anonymization ofcompiled .pex files.This plugin is derived from the original PapyrusPlusPlusplugin created by tschilkroete, with many bug fixes,enhancements, and made to work with the latest Notepad++ release.Changes from original workBug fixes[Lexer] Syntax highlighting with lexer now properly works with the latest Notepad++ version, so you nolonger need to use a separate user-defined language XML, which defeats the purpose of a lexer.[Compiler] Can now handle huge compilation error list (this usually happens when a referenced scripthas errors or referenced script source does not exist), so that status message no longer gets stuck at"Compiling...".[Compiler] In error list window, clicking on an error from a file that has not been opened yet will nowcorrectly move cursor to the error line.[Compiler] Any exceptions while trying to compile the script are now properly shown to user (exceptionslikely won't happen anyway).[Lexer] Operators are now correctly styled with defined Operator style, instead of the wrong Typestyle.[Lexer] Proper syntax highlighting with strings that contain double quote escapes.[Lexer] Proper syntax highlighting with integer literals that start with minus sign.[Lexer] Proper syntax highlighting with float literals that contain decimal point.[Lexer] Proper syntax highlighting with comments where "/" appears before/after ";" with spaces inbetween.[Lexer] Word "property". DFASTAR Lexer Generator DFASTAR Lexer Generator Description DFASTAR reads regular-expression grammars and generates 4 types of lexical analyzers of different size Download DFASTAR Lexer Generator - A DFA lexer generator, reading a lexical grammar.Lexer and Lexer Generators - Naukri Code 360
In comments is now correctly excluded from property handling.[Lexer] Correctly detect properties in edge cases like copying property lines and then editing.[Lexer] White spaces are now styled correctly.[Compiler] Only run compiler if active file is using Papyrus Script lexer. Configurable behavior,default on.[Compiler] Properly release process handle after compilation.Improvements[Lexer] Upgrade to support Scintilla's ILexer5.[Lexer] Support folding on properties. Original plugin likely omitted properties from folding since itcould not exclude those that have definitions done in a single line.[Lexer] Support "folding in code, middle" so that Else and ElseIf can be folded as well.Configurable behavior, default on.[Compiler] Compilation error list window will be hidden when starting a new compilation.[Compiler] Compilation error list window will not contain duplicate error messages.[Compiler] Support more compilation flags:-optimize (all games)-release and -final (Fallout 4)[Compiler] Handle the rare compilation error cases with "-op" flag when errors are reported on .pasfiles.[Compiler] Handle generic compilation errors that are not reported on source files or .pas files, e.g.when one of the import directories is invalid.[Lexer] Separate the list of Papyrus language defined keywords into 2, so Parent/Self/True/False/Nonecan be styled differently.[Compiler] Status bar shows the game name for current Papyrus script file.[Compiler] Status bar shows compiling status if switching to another file and back while compiling.[Compiler] When compilation succeeds or fails, status bar shows the file name alongside result message,if current file window is not the same as the one that got compiled.[Lexer] Slightly better performance in syntax highlighting with property name caching, especially withbig script files.[Lexer] In addition, class name caching Function list configuration file for Papyrus scripts│ │ └── userDefineLangs - user-defined Papyrus language instead of this plugin's lexer│ └── themes - lexer configuration files for specific themes│ └── DarkModeDefault - lexer configuration file for Dark Mode└── src - source code ├── external - source files from external projects (may be modified) │ ├── gsl - references GSL as submodule │ ├── lexilla - Lexilla source files │ ├── npp - Notepad++ source files │ ├── scintilla - Scintilla source files │ ├── tinyxml2 - references TinyXML2 as submodule │ └── XMessageBox - adopted and modified XMessageBox to provide dark mode support └── Plugin - source files of this plugin ├── Common - common definitions and utilities shared by all modules ├── CompilationErrorHandling - show/annotate compilation errors ├── Compiler - invoke Papyrus compiler in a separate thread ├── Lexer - Papyrus script lexer that provides syntax highlighting ├── KeywordMatcher - matching keywords highlighter ├── Settings - read/write Papyrus.ini and provide configuration support to other modules └── UI - other UI dialogs, such as About dialogDisclaimerBoth original work and this plugin are licensed under GPL v3, so make sure you read and understand it if youare creating derived work. Most importantly, you cannot modify the code and only publish binary outputwithout making the modified code also publicly available.General structure of the Coqlex lexer generator.
Can be turned on in Settings menu. However, there is a caveat.See configuration guide for details. Configurable behavior, defaultoff.[Settings] No more forced setup on startup.[Settings] Revamped UI with many more settings now configurable.New features[Compiler] Anonymize compiled .pex file. In case you are not aware, when you use PapyrusCompiler tocompile any script your user account and machine name are stored inside the generated .pex file, so it'sa big privacy concern.[Annotator] Show annotation below error lines, and/or show indications where errors are. Configurablebehavior, default on.[Compiler] Skyrim SE/AE and Fallout 4 support.[Compiler] Auto detection of game/compiler settings to be used based on source script file location.[Lexer] Support of new Papyrus syntax/keywords of Fallout 4.[Lexer] Syntax highlighting of function names.[Lexer] Class names can be styled as links to open the script files. FO4's namespace support is included.Configurable behavior, default on (Ctrl + double click).[Lexer] Hover support on properties.[Matcher] Highlight on matching keywords.[Matcher] Go to matching keyword.[UI] A new Advanced submenu with:Show langID - can be used to find out internal langID assigned to Papyrus Script lexer, which is usefulif you need to manually configure Notepad++'s functionList feature.Install auto completion support - provides auto-completion support for functions defined in base game,SKSE, and even SkyUI.Install function list support - allows using View -> Function List menu to show all defined functionsin a Papyrus script file.[UI] Dark mode support.DownloadGet the latest release from here.InstallationPlease find installation guide here.WARNING:Do not install versions prior to v0.3.0 if you are using Notepad++ v8.3+.Do not install v0.3.0+ if you are usingLexer Generator - asterix-gerrit.ics.uci.edu
Method name, parameter list, and throws clause like so:methodDeclarator : Identifier '(' formalParameterList? ')' dims? ;And so Java8BaseListener has a method enterMethodDeclarator which will be invoked each time this pattern is encountered.So, let’s override enterMethodDeclarator, pull out the Identifier, and perform our check:public class UppercaseMethodListener extends Java8BaseListener { private List errors = new ArrayList(); // ... getter for errors @Override public void enterMethodDeclarator(Java8Parser.MethodDeclaratorContext ctx) { TerminalNode node = ctx.Identifier(); String methodName = node.getText(); if (Character.isUpperCase(methodName.charAt(0))) { String error = String.format("Method %s is uppercased!", methodName); errors.add(error); } }}5.4. TestingNow, let’s do some testing. First, we construct the lexer:String javaClassContent = "public class SampleClass { void DoSomething(){} }";Java8Lexer java8Lexer = new Java8Lexer(CharStreams.fromString(javaClassContent));Then, we instantiate the parser:CommonTokenStream tokens = new CommonTokenStream(lexer);Java8Parser parser = new Java8Parser(tokens);ParseTree tree = parser.compilationUnit();And then, the walker and the listener:ParseTreeWalker walker = new ParseTreeWalker();UppercaseMethodListener listener= new UppercaseMethodListener();Lastly, we tell ANTLR to walk through our sample class:walker.walk(listener, tree);assertThat(listener.getErrors().size(), is(1));assertThat(listener.getErrors().get(0), is("Method DoSomething is uppercased!"));6. Building Our GrammarNow, let’s try something just a little bit more complex, like parsing log files:2018-May-05 14:20:18 INFO some error occurred2018-May-05 14:20:19 INFO yet another error2018-May-05 14:20:20 INFO some method started2018-May-05 14:20:21 DEBUG another method started2018-May-05 14:20:21 DEBUG entering awesome method2018-May-05 14:20:24 ERROR Bad thing happenedBecause we have a custom log format, we’re going to first need to create our own grammar.6.1. Prepare a Grammar FileFirst, let’s see if we can create a mental map of what each log line looks like in our file. Or if we go one more level deep, we might say: := …And so on. It’s important to consider this so we can decide at what level of granularity we want to parse the text.A grammar file is basically a set of lexer and parser rules. Simply put, lexer rules describe the syntax of the grammar while parser rules describe the semantics.Let’s start by defining fragments which are reusable building blocks for lexer rules.fragment DIGIT : [0-9];fragment TWODIGIT : DIGIT DIGIT;fragment LETTER : [A-Za-z];Next, let’s define the remainings lexer rules:DATE : TWODIGIT TWODIGIT '-' LETTER LETTER LETTER '-' TWODIGIT;TIME : TWODIGIT ':' TWODIGIT ':' TWODIGIT;TEXT : LETTER+ ;CRLF : '\r'? '\n' |. DFASTAR Lexer Generator DFASTAR Lexer Generator Description DFASTAR reads regular-expression grammars and generates 4 types of lexical analyzers of different sizemklexer.py: a lexer generator GitHub
Construct keyword objects:Object.fromEntries(['class', 'def', 'if'].map(k => ['kw-' + k, k]))StatesMoo allows you to define multiple lexer states. Each state defines its own separate set of token rules. Your lexer will start off in the first state given to moo.states({}).Rules can be annotated with next, push, and pop, to change the current state after that token is matched. A "stack" of past states is kept, which is used by push and pop.next: 'bar' moves to the state named bar. (The stack is not changed.)push: 'bar' moves to the state named bar, and pushes the old state onto the stack.pop: 1 removes one state from the top of the stack, and moves to that state. (Only 1 is supported.)Only rules from the current state can be matched. You need to copy your rule into all the states you want it to be matched in.For example, to tokenize JS-style string interpolation such as a${{c: d}}e, you might use: let lexer = moo.states({ main: { strstart: {match: '`', push: 'lit'}, ident: /\w+/, lbrace: {match: '{', push: 'main'}, rbrace: {match: '}', pop: 1}, colon: ':', space: {match: /\s+/, lineBreaks: true}, }, lit: { interp: {match: '${', push: 'main'}, escape: /\\./, strend: {match: '`', pop: 1}, const: {match: /(?:[^$`]|\$(?!\{))+/, lineBreaks: true}, }, }) // // => strstart const interp lbrace ident colon space ident rbrace rbrace const strendThe rbrace rule is annotated with pop, so it moves from the main state into either lit or main, depending on the stack.ErrorsIf none of your rules match, MooComments
Maleenimaleeni is a lexer generator for golang. maleeni also provides a command to perform lexical analysis to allow easy debugging of your lexical specification.InstallationCompiler:$ go install github.com/nihei9/maleeni/cmd/maleeni@latestCode Generator:$ go install github.com/nihei9/maleeni/cmd/maleeni-go@latestUsage1. Define your lexical specificationFirst, define your lexical specification in JSON format. As an example, let's write the definitions of whitespace, words, and punctuation.{ "name": "statement", "entries": [ { "kind": "whitespace", "pattern": "[\\u{0009}\\u{000A}\\u{000D}\\u{0020}]+" }, { "kind": "word", "pattern": "[0-9A-Za-z]+" }, { "kind": "punctuation", "pattern": "[.,:;]" } ]}Save the above specification to a file. In this explanation, the file name is statement.json.⚠️ The input file must be encoded in UTF-8.2. Compile the lexical specificationNext, generate a DFA from the lexical specification using maleeni compile command.$ maleeni compile statement.json -o statementc.json3. Debug (Optional)If you want to make sure that the lexical specification behaves as expected, you can use maleeni lex command to try lexical analysis without having to generate a lexer. maleeni lex command outputs tokens in JSON format. For simplicity, print significant fields of the tokens in CSV format using jq command.⚠️ An encoding that maleeni lex and the driver can handle is only UTF-8.$ echo -n 'The truth is out there.' | maleeni lex statementc.json | jq -r '[.kind_name, .lexeme, .eof] | @csv'"word","The",false"whitespace"," ",false"word","truth",false"whitespace"," ",false"word","is",false"whitespace"," ",false"word","out",false"whitespace"," ",false"word","there",false"punctuation",".",false"","",trueThe JSON format of tokens that maleeni lex command prints is as follows:FieldTypeDescriptionmode_idintegerAn ID of a lex mode.mode_namestringA name of a lex mode.kind_idintegerAn ID of a kind. This is unique among all modes.mode_kind_idintegerAn ID of a lexical kind. This is unique only within a mode. Note that you need to use kind_id field if you want to identify a kind across all modes.kind_namestringA name of a lexical kind.rowintegerA row number where a lexeme appears.colintegerA column number where a lexeme appears. Note that col is counted in code points, not bytes.lexemearray of integersA byte sequense of a lexeme.eofboolWhen this field is true, it means the token is the EOF token.invalidboolWhen this field is true, it means the token is an error token.4. Generate the lexerUsing maleeni-go command, you can generate a source code of the lexer to recognize your lexical specification.$ maleeni-go statementc.jsonThe above command generates the lexer and saves it to statement_lexer.go file. By default, the file name will be {spec name}_lexer.json. To use the lexer, you need to call NewLexer function defined in statement_lexer.go. The following code is a simple example. In this example, the lexer reads a source code from stdin and writes the result, tokens, to stdout.package mainimport ( "fmt" "os")func main() { lex, err := NewLexer(NewLexSpec(), os.Stdin) if err != nil { fmt.Fprintln(os.Stderr, err) os.Exit(1) } for { tok, err := lex.Next() if err != nil { fmt.Fprintln(os.Stderr, err) os.Exit(1) } if tok.EOF { break } if tok.Invalid { fmt.Printf("invalid: %#v\n", string(tok.Lexeme)) } else { fmt.Printf("valid: %v: %#v\n", KindIDToName(tok.KindID), string(tok.Lexeme)) } }}Please save the above source code to main.go and create a directory structure like the one below./project_root├── statement_lexer.go ... Lexer generated from the compiled lexical specification (the result of `maleeni-go`).└── main.go .............. Caller of the lexer.Now, you can perform
2025-04-04Moo!Moo is a highly-optimised tokenizer/lexer generator. Use it to tokenize your strings, before parsing 'em with a parser like nearley or whatever else you're into.FastConvenientuses Regular Expressionstracks Line Numbershandles Keywordssupports Statescustom Errorsis even Iterablehas no dependencies4KB minified + gzippedMoo!Is it fast?Yup! Flying-cows-and-singed-steak fast.Moo is the fastest JS tokenizer around. It's ~2–10x faster than most other tokenizers; it's a couple orders of magnitude faster than some of the slower ones.Define your tokens using regular expressions. Moo will compile 'em down to a single RegExp for performance. It uses the new ES6 sticky flag where possible to make things faster; otherwise it falls back to an almost-as-efficient workaround. (For more than you ever wanted to know about this, read adventures in the land of substrings and RegExps.)You might be able to go faster still by writing your lexer by hand rather than using RegExps, but that's icky.Oh, and it avoids parsing RegExps by itself. Because that would be horrible.UsageFirst, you need to do the needful: $ npm install moo, or whatever will ship this code to your computer. Alternatively, grab the moo.js file by itself and slap it into your web page via a tag; moo is completely standalone.Then you can start roasting your very own lexer/tokenizer: const moo = require('moo') let lexer = moo.compile({ WS: /[ \t]+/, comment: /\/\/.*?$/, number: /0|[1-9][0-9]*/, string: /"(?:\\["\\]|[^\n"\\])*"/, lparen: '(', rparen: ')', keyword: ['while', 'if', 'else', 'moo', 'cows'], NL: { match: /\n/, lineBreaks: true }, })And now throw some text at it: lexer.reset('while (10) cows\nmoo') lexer.next()
2025-04-23Papyrus Plugin for Notepad++This plugin adds support for Bethesda's Papyrus scripting language toNotepad++.It provides syntax highlighting with automatic recognition of class names/functions/properties,supports keywords matching and hyperlinks to referenced scripts.It also comes with a compiler that can provide compilation errors in a separate list window, as wellas inline annotation and indication where errors are reported, plus anonymization ofcompiled .pex files.This plugin is derived from the original PapyrusPlusPlusplugin created by tschilkroete, with many bug fixes,enhancements, and made to work with the latest Notepad++ release.Changes from original workBug fixes[Lexer] Syntax highlighting with lexer now properly works with the latest Notepad++ version, so you nolonger need to use a separate user-defined language XML, which defeats the purpose of a lexer.[Compiler] Can now handle huge compilation error list (this usually happens when a referenced scripthas errors or referenced script source does not exist), so that status message no longer gets stuck at"Compiling...".[Compiler] In error list window, clicking on an error from a file that has not been opened yet will nowcorrectly move cursor to the error line.[Compiler] Any exceptions while trying to compile the script are now properly shown to user (exceptionslikely won't happen anyway).[Lexer] Operators are now correctly styled with defined Operator style, instead of the wrong Typestyle.[Lexer] Proper syntax highlighting with strings that contain double quote escapes.[Lexer] Proper syntax highlighting with integer literals that start with minus sign.[Lexer] Proper syntax highlighting with float literals that contain decimal point.[Lexer] Proper syntax highlighting with comments where "/" appears before/after ";" with spaces inbetween.[Lexer] Word "property"
2025-03-30