Metadata-Version: 2.1
Name: multi-choices-parser
Version: 0.10.0
Summary: An efficient C++ incremental parser for multi-choices grammars (generalization of the trie structure) with Python bindings.
Author-Email: Hichem Ammar Khodja <hichem5696@gmail.com>
Project-URL: Homepage, https://github.com/HichemAK/multi-choices-parser
Requires-Python: >=3.8
Description-Content-Type: text/markdown

[![Coverage](https://codecov.io/gh/HichemAK/multi-choices-parser/branch/main/graph/badge.svg)](https://codecov.io/gh/HichemAK/multi-choices-parser)

# Multi-choices Parser

## Overview
Multi-choices Parser is a C++ efficient incremental parser for multi-choices grammars with Python bindings (3.8+). These grammars are defined as a composition of lists of choices, where each choice is a literal string and can possibly be empty (grammar form below). This parser is optimized for scenarios where the size of the lists of choices is very large, such as representing entities preceded by a determiner.

Here is the type of grammar handled by this parser:

```
start: list1 list2 ... listn
list1: choice1_1 | choice1_2 | ... | choice1_k1
list2: choice2_1 | choice2_2 | ... | choice2_k2
...
listn: choicem_1 | choicem_2 | ... | choicem_kn
```

This parser is a **generalization of tries**, and more precisely a **concatenation of tries**. In fact, it is equivalent to a trie when $n=1$. 

## Installation

```
pip install multi-choices-parser
```

## Features
- Handle large lists of choices efficiently (e.g. millions of choices).
- Incremental parsing: Each node and its transitions can be accessed at any moment of the parsing.
- Extensive testing
- Support for all Python versions >=3.8
- Support for Linux, Windows and MacOS

## Usage
To use the `MultiChoicesParser`, follow these steps:

1. Initialize the parser with a list of choices.
3. Use the `step` method to feed characters to the parser.
4. Check the `success` flag to determine if the parsed string is correct after feeding the End symbol.
5. Reset the parser state using the `reset` method if needed.

### Example
```python

from multi_choices_parser import MultiChoicesParser, DEFAULT_END_SYMB

# Define your list of choices
l = [
    ['the', 'an', "a", ""],
    ['orange', 'apple', 'banana']
]

# Initialize the parser
p = MultiChoicesParser(l)

# Parse a string (don't forget to add the End symbol)
for i, c in enumerate(tuple("anapple") + (DEFAULT_END_SYMB, )):
    print('Step %s' % i)
    print("Authorized characters:", sorted(p.next()))
    print('Adding character:', c)
    p.step(c)
    print("State: Finished=%s, Success=%s" % (p.finished, p.success))
    print()
```

<details> <summary>Example Output</summary>

```
Step 0
Authorized characters: ['a', 'b', 'o', 't']
Adding character: a
State: Finished=False, Success=False

Step 1
Authorized characters: ['a', 'b', 'n', 'o', 'p']
Adding character: n
State: Finished=False, Success=False

Step 2
Authorized characters: ['a', 'b', 'o']
Adding character: a
State: Finished=False, Success=False

Step 3
Authorized characters: ['p']
Adding character: p
State: Finished=False, Success=False

Step 4
Authorized characters: ['p']
Adding character: p
State: Finished=False, Success=False

Step 5
Authorized characters: ['l']
Adding character: l
State: Finished=False, Success=False

Step 6
Authorized characters: ['e']
Adding character: e
State: Finished=False, Success=False

Step 7
Authorized characters: [End]
Adding character: End
State: Finished=True, Success=True
```

</details>

## License
This project is licensed under the GNU GPL v2 License - see the LICENSE.txt file for details.

## Contact
For any queries or bug reports, please open an issue on the GitHub repository ;)

## Credit
This code was originally imported from this [Orange GitHub Repository](https://github.com/Orange-OpenSource/DistFactAssessLM/tree/94c7c8cd8f844d6e2efc045be699a2dded533150/multi-choices-parser).

The main contributions of this repository was the introduction of the C++ implementation of the parser (the original implementation was in pure Python), and the improvement of the unit test suite.

## How to cite?

```
@inproceedings{ammar-khodja-etal-2025-factual,
    title = "Factual Knowledge Assessment of Language Models Using Distractors",
    author = "Ammar Khodja, Hichem  and
      Ait gueni ssaid, Abderrahmane  and
      Bechet, Frederic  and
      Brabant, Quentin  and
      Nasr, Alexis  and
      Lecorv{\'e}, Gw{\'e}nol{\'e}",
    editor = "Rambow, Owen  and
      Wanner, Leo  and
      Apidianaki, Marianna  and
      Al-Khalifa, Hend  and
      Eugenio, Barbara Di  and
      Schockaert, Steven",
    booktitle = "Proceedings of the 31st International Conference on Computational Linguistics",
    month = jan,
    year = "2025",
    address = "Abu Dhabi, UAE",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.coling-main.537/",
    pages = "8043--8056",
    abstract = "Language models encode extensive factual knowledge within their parameters. The accurate assessment of this knowledge is crucial for understanding and improving these models. In the literature, factual knowledge assessment often relies on cloze sentences, which can lead to erroneous conclusions due to the complexity of natural language (out-of-subject continuations, the existence of many correct answers and the several ways of expressing them). In this paper, we introduce a new interpretable knowledge assessment method that mitigates these issues by leveraging distractors{---}incorrect but plausible alternatives to the correct answer. We propose several strategies for retrieving distractors and determine the most effective one through experimentation. Our method is evaluated against existing approaches, demonstrating solid alignment with human judgment and stronger robustness to verbalization artifacts. The code and data to reproduce our experiments are available on GitHub."
}
```
