Metadata-Version: 2.1
Name: lex2
Version: 0.9.4
Summary: Flexible, ruleset-based tokenizer using regex.
Home-page: UNKNOWN
Author: DeltaRazero
Author-email: deltarazero@gmail.com
License: zlib
Keywords: lexer tokenizer sphinx
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Operating System :: OS Independent
Classifier: License :: OSI Approved :: zlib/libpng License
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Software Development :: Compilers
Classifier: Topic :: Text Processing
Requires-Python: >=3.6
Description-Content-Type: text/markdown


# lex2-py3

<img align="right" width=5% src="https://upload.wikimedia.org/wikipedia/commons/c/c3/Python-logo-notext.svg">

<!-- BADGES -->
<div align="left">
    <!--
        Python3 version
    --->
    <img src="https://img.shields.io/badge/python-3.6+-informational.svg?labelColor=363d45&logo=python&logoColor=white"
    alt="Python 3.6+"/>
    <!--
        Library tag version
    --->
    <a href="https://github.com/deltarazero/liblex2-py3/tags">
        <img src="https://img.shields.io/github/v/tag/deltarazero/liblex2-py3?labelColor=363d45&logo=github&logoColor=white"
        alt="Latest release tag version"/></a>
    <!--
        Issues open
    --->
    <a href="https://github.com/deltarazero/liblex2-py3/issues">
        <img src="https://img.shields.io/github/issues/deltarazero/liblex2-py3?labelColor=363d45&logo=github&logoColor=white"
        alt="GitHub issues open"/></a>
    <!--
        License
    --->
    <a href="https://choosealicense.com/licenses/zlib/">
        <img src="https://img.shields.io/github/license/DeltaRazero/liblex2-py3?labelColor=363d45&color=informational"
        alt="zlib license"/></a>
</div>


<!-- BUTTON LINKS -->
<div align="left">
    <!--
        Documentation
    --->
    <!--
    <a href="https://deltarazero.github.io/liblex2-py3">
        <img src="https://img.shields.io/badge/-Documentation_Â»-informational"
        height="24"
        alt="[Documentation]"/></a>
    --->
</div>

<div align="justify"><br/>

_Simple tokenizer using regex._

lex2 is a library intended for [lexical analysis](https://en.wikipedia.org/wiki/Lexical_analysis) (also called [tokenization](https://en.wikipedia.org/wiki/Lexical_analysis)). String analysis is performed using [regular expressions (regex)](https://en.wikipedia.org/wiki/Regular_expression), as specified in user-defined rules. Mechanisms, such as a dynamic ruleset-stack, provide flexibility to some degree at runtime.

The library is written in platform independent, pure Python3, and is portable (i.e.  no usage of language-specific features) so that it is straightforward to port the library to other programming languages. Furthermore, the library is designed to enable the end-user to easily use any external regex engine of their choice, while maintaining to offer a simple to use unified interface.


## Getting Started

It is recommended to install the library from the Python Package Index (PyPI) through Python's package manager ``pip``:
```console
pip install lex2
```

However, you can also choose to manually include the library in your project by downloading a release on GitHub and copying the ``lex2`` folder to your project's includes/libraries folder.

Usage of lex2 is relatively simple, as demonstrated by the short example below. For more in-depth examples and using external regex engines of your choice, see the documentation.

```python
import lex2

# Define ruleset and prepare the lexer object instance
ruleset: lex2.ruleset_t = [
    #        Identifier     Regex pattern
    lex2.Rule("WORD",        r"[a-zA-Z]+"),
    lex2.Rule("NUMBER",      r"[0-9]+"),
    lex2.Rule("PUNCTUATION", r"[.,:;!?\\-]")
]
lexer: lex2.ILexer = lex2.MakeLexer(ruleset=ruleset)

# Load input data by opening a file
lexer.Open(r"C:/path/to/file.txt")
# Or by directly passing a string
lexer.Load("The quick, brown fox jumps over 2 lazy dogs. \nMr. Jock, TV quiz PhD, bags few lynx.")

# Main tokenization loop
token: lex2.Token
while(1):

    # Find the next token in the textstream
    try: token = lexer.GetNextToken()
    except lex2.excs.EndOfData:
        break

    info = [
         "ln: {}".format(token.position.ln +1),
        "col: {}".format(token.position.col+1),
        token.id,
        token.data,
    ]
    print("{: <12} {: <15} {: <20} {: <20}".format(*info))

lexer.Close()
```

```console
>>> ln: 1        col: 1          WORD                 The
>>> ln: 1        col: 5          WORD                 quick
>>> ln: 1        col: 10         PUNCTUATION          ,
>>> ln: 1        col: 12         WORD                 brown
>>> ln: 1        col: 18         WORD                 fox
>>> ln: 1        col: 22         WORD                 jumps
>>> ln: 1        col: 28         WORD                 over
>>> ln: 1        col: 33         NUMBER               2
>>> ln: 1        col: 35         WORD                 lazy
>>> ln: 1        col: 40         WORD                 dogs
>>> ln: 1        col: 44         PUNCTUATION          .
>>> ln: 2        col: 1          WORD                 Mr
>>> ln: 2        col: 3          PUNCTUATION          .
>>> ln: 2        col: 5          WORD                 Jock
>>> ln: 2        col: 9          PUNCTUATION          ,
>>> ln: 2        col: 11         WORD                 TV
>>> ln: 2        col: 14         WORD                 quiz
>>> ln: 2        col: 19         WORD                 PhD
>>> ln: 2        col: 22         PUNCTUATION          ,
>>> ln: 2        col: 24         WORD                 bags
>>> ln: 2        col: 29         WORD                 few
>>> ln: 2        col: 33         WORD                 lynx
>>> ln: 2        col: 37         PUNCTUATION          .
```


## Contributing

The repository is hosted at [deltarazero/liblex2-py3](https://github.com/deltarazero/liblex2-py3) on GitHub. Contribution is always welcome; you can contribute by satisfying one of the following points of action:

* __Submitting a pull request:__ to contribute your own changes to the repository. See ["Proposing changes to your work with pull requests"](https://docs.github.com/en/github/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests) for more information on pull requests using GitHub. Furthermore, please follow the guidelines below:

    1. File an issue to notify the maintainers about what you're working on.
    2. Fork the repo, develop and test your code changes, add docs/unit tests (if applicable).
    3. Make sure that your commit messages clearly describe the changes.
    4. Send a pull request, using the available template.

    For changes that address core functionality or would require breaking changes (i.e. for a major release), it's best to open an issue to discuss your proposal beforehand.

    _Maintaining your own fork of the repository is discouraged. Instead, please submit pull requests and delete your fork afterwards (if applicable). This will make it less confusing for end-users to know which repository is the most up-to-date._

* __Submitting an issue:__ to report a problem with the library, request a new feature, or to discuss potential changes before a pull request is created. Ensure the issue was not already reported. Furthermore, please use one of the available issue templates if possible.


## License

Â© 2020-2021 DeltaRazero.
All rights reserved.

All included scripts, modules, etc. are licensed under the terms of the [zlib license](https://github.com/deltarazero/liblex2-py3/LICENSE), unless stated otherwise in the respective files.

</div>


