Metadata-Version: 2.1
Name: wordleaisql
Version: 0.2.7
Summary: Wordle AI with SQL Backend
Home-page: https://github.com/kota7/wordleai-sql
License: UNKNOWN
Description: WORDLE AI with SQL Backend
        ==========================
        [![](https://badge.fury.io/py/wordleaisql.svg)](https://badge.fury.io/py/wordleaisql)
        [![Open in Streamlit](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://share.streamlit.io/kota7/wordleai-sql/main/streamlit/app.py)
        
        This package provides an [Worldle]((https://www.nytimes.com/games/wordle/index.html)) solver with SQL backend.
        
        ## How to use
        
        ```shell
        # Install this library via PyPI
        pip install wordleaisql
        # Then run the executable that comes with the library
        wordleai-sql
        
        # Alternatively, clone this repository and run without pip-install
        python wordleai-sql.py
        ```
        
        
        ## Solver session example
        
        ```shell
        $ wordleai-sql
        
        Hi, this is Wordle AI (SQLite backend, approx).
        
        12947 remaining candidates: ['cigar', 'rebut', 'sissy', 'humph', 'awake', 'blush', 'focal', 'evade', 'naval', 'serve', '...']
        
        Type:
          '[s]uggest <criterion>'     to let AI suggest a word (<criterion> is optional)
          '[u]pdate <word> <result>'  to provide new information
          '[e]xit'                    to finish the session
        
        where
          <criterion>  is either 'max_n', 'mean_n', or 'mean_entropy'
          <result>     is a string of 0 (no match), 1 (partial match), and 2 (exact match)
        
        > s
        [INFO] Start AI evaluation (2022-03-09 00:37:13)
        [INFO] End AI evaluation (2022-03-09 00:37:18, elapsed: 0:00:04.153101)
        * Top 20 candidates ordered by mean_entropy
        --------------------------------------------------------------------
          input_word         max_n        mean_n  mean_entropy  is_candidate
        --------------------------------------------------------------------
               reais            30          12.0         3.094             1
               laers            33          13.5         3.218             1
               aeons            35          14.2         3.312             1
               races            32          14.4         3.323             1
               leads            34          15.1         3.349             1
               strae            33          14.8         3.376             1
               lines            43          16.4         3.386             1
               soral            35          15.6         3.427             1
               cries            48          17.4         3.429             1
               scrae            34          16.2         3.471             1
               rules            42          17.2         3.478             1
               oared            41          17.9         3.511             1
               losen            52          17.9         3.515             1
               sedan            40          17.6         3.516             1
               sured            52          19.1         3.546             1
               artis            45          19.4         3.547             1
               least            42          18.7         3.549             1
               stire            46          18.5         3.552             1
               stria            49          19.3         3.556             1
               nails            55          18.7         3.557             1
        --------------------------------------------------------------------
        12947 remaining candidates: ['cigar', 'rebut', 'sissy', 'humph', 'awake', 'blush', 'focal', 'evade', 'naval', 'serve', '...']
        
        Type:
          '[s]uggest <criterion>'     to let AI suggest a word (<criterion> is optional)
          '[u]pdate <word> <result>'  to provide new information
          '[e]xit'                    to finish the session
        
        where
          <criterion>  is either 'max_n', 'mean_n', or 'mean_entropy'
          <result>     is a string of 0 (no match), 1 (partial match), and 2 (exact match)
        
        > u races 00000
        896 remaining candidates: ['humph', 'outdo', 'digit', 'pound', 'booby', 'loopy', 'lying', 'moult', 'guild', 'thumb', '...']
        
        Type:
          '[s]uggest <criterion>'     to let AI suggest a word (<criterion> is optional)
          '[u]pdate <word> <result>'  to provide new information
          '[e]xit'                    to finish the session
        
        where
          <criterion>  is either 'max_n', 'mean_n', or 'mean_entropy'
          <result>     is a string of 0 (no match), 1 (partial match), and 2 (exact match)
        
        > s
        [INFO] Start AI evaluation (2022-03-09 00:37:35)
        [INFO] End AI evaluation (2022-03-09 00:37:39, elapsed: 0:00:03.439437)
        * Top 20 candidates ordered by mean_entropy
        --------------------------------------------------------------------
          input_word         max_n        mean_n  mean_entropy  is_candidate
        --------------------------------------------------------------------
               monty            41          16.8         3.454             1
               gipon            66          20.5         3.546             1
               lofty            53          20.0         3.686             1
               bilgy            70          24.2         3.746             1
               bundt            69          23.2         3.779             1
               limbo            69          23.6         3.780             1
               bundy            63          23.5         3.782             1
               found            56          23.7         3.816             1
               youth            50          22.6         3.827             1
               joint            65          23.9         3.895             1
               downy            61          25.5         3.902             1
               milko            78          27.5         3.924             1
               fungo            86          29.6         3.926             1
               lumbi            77          29.1         3.976             1
               tupik            68          28.0         3.981             1
               goopy            76          28.3         4.012             1
               jolty            59          24.3         4.015             1
               muhly            65          28.1         4.034             1
               nouny            59          25.0         4.041             1
               touzy            49          25.0         4.066             1
        --------------------------------------------------------------------
        896 remaining candidates: ['humph', 'outdo', 'digit', 'pound', 'booby', 'loopy', 'lying', 'moult', 'guild', 'thumb', '...']
        
        Type:
          '[s]uggest <criterion>'     to let AI suggest a word (<criterion> is optional)
          '[u]pdate <word> <result>'  to provide new information
          '[e]xit'                    to finish the session
        
        where
          <criterion>  is either 'max_n', 'mean_n', or 'mean_entropy'
          <result>     is a string of 0 (no match), 1 (partial match), and 2 (exact match)
        
        > u monty 22220
        'month' should be the answer!
        Thank you!
        ```
        
        ## Suggestion criteria
        
        Input words are evaludated by the three criteria as follows: 
        
        - "max_n": Maximum number of the candidate words that would remain.
        - "mean_n": Average number of the candidate words that would remain.
        - "mean_entropy": Average of the log2 of number of the candidate words that would remain.
        
        Note that if there are `n` candidate words with the equal probability, then probability of each word `i` is `p_i = 1/n`.
        Then, the entropy is given by `-sum(p_i log2(p_i)) = - n * (1/n) log2(1/n) = log2(n)`.
        Hence, the average of `log2(n)` can be seen as the average entropy.
        
        "mean_entropy" is often used in practice and thus set as the default choice of the program.
        "max_n" can be seen as a pessimistic criterion since it reacts to the worst case.
        "mean_n" can seem an intutive criterion but does not work as well as "mean_entropy" perhaps due to the skewed distribution.
        
        See also the simulation results for a comparison of the criteria (notebook at [simulation/simulation-summary.ipynb](simulation/simulation-summary.ipynb) or view on [nbviewer](https://nbviewer.org/github/kota7/wordleai-sql/blob/main/simulation/simulation-summary.ipynb)).
        
        
        ## Play and challenge mode
        
        - By default, `wordleai-sql` command starts an interactive solver session.
        - `wordleai-sql --play` starts a self-play game.
        - `wordleai-sql --challenge` starts a competition against an AI.
        - With `--answer_difficulty` option, one can change the set of possible answer words in the play and challenge mode. The possible choices are`1` for basic to `5` unlimited. Default is `3`.
          ```shell
          # Example
          wordleai-sql --challenge --answer_difficulty 1  # basic words only
          ```
        
        
        ## Using a custom word set
        
        - The default word list is at [wordleaisql/vocab/wordle-level3.txt](wordleaisql/vocab/wordle-level3.txt). The list perhaps is compatible with [New York Times version](https://www.nytimes.com/games/wordle/index.html), with different choice of possible answer words.
        - One may use a different list with `--vocabfile` option.
          - A file should contain words of the same length, separated by the line break ("\n").
          - Each line may contain a nonnegative numeric value separated by a space, which is used as the relative probability that this word is chosen as the answer (in play and challenge mode). If not supecified, the word is given the weight one.
          - A file can be gzip compressed, where the filename must end with ".gz". 
          - Although not tested thoroughly, the program would work with words containing multibyte characters (with utf8 encoding) or digits.
        - By default, the file name without extension is used as the `vocabname`. One may change this by `--vocabname`.
        - See `vocab-examples/` folder for some examples.
        
        ```shell
        # Example
        wordleai-sql --vocabname myvocab --vocabfile my-vocab.txt
        ```
        
        
        ## Backend options
        
        ### SQLite with approximate evaluation (default)
        
        ```shell
        wordleai-sql -b approx
        ```
        
        - With `-b approx` option, we employ approximate evaluation of words by sampling input and/or answer words.
        - The database setup completes quikckly since this does not require precompuation of the judge results.
        - Evaluation also completes quickly since small numbers of input and/or answer words are involved in the calculation.
        - Although approximate, the engine tends to provide close-to-optimal suggestions thanks to the law of large numbers.
        
        ### SQLite with full evaluation
        
        ```shell
        wordleai-sql -b sqlite
        ```
        
        - This engine evaluates all words using the all answer candidates.
        - To enhance the calculation the engine precomputes all judge results for all word pairs on the setup.
          - The file size becomes about 8.4GB.
          - The process may take about an hour, depending on the CPU speed.
          - The time for the setup will be significantly reduced if c++ compiler command (e.g `g++` or `clang++`) is available.
        
        ### Google bigquery backend
        
        ```shell
        # --vocabname is used as the dataset name
        wordleai-sql -bbq --bq_credential "gcp_credentials.json" --vocabname "wordle_dataset"
        ```
        
        - With `-bbq` option, we employ google bigquery as the backend SQL engine.
        - We need to supply a credential json file of the GCP service account with the following permissions:
          ```
          bigquery.datasets.create
          bigquery.datasets.get
          bigquery.jobs.create
          bigquery.jobs.get
          bigquery.routines.create
          bigquery.routines.delete
          bigquery.routines.get
          bigquery.routines.update
          bigquery.tables.create
          bigquery.tables.delete
          bigquery.tables.get
          bigquery.tables.getData
          bigquery.tables.list
          bigquery.tables.update
          bigquery.tables.updateData
          ```
        
        ## Other options
        
        See `wordleai-sql -h` for other options, which should mostly be self-explanatory.
        
        
        ## GUI application
        
        - A browser application built on [streamlit](https://streamlit.io/) is at [streamlit/app.py](streamlit/app.py). This can be run with the following command:
          ```shell
          # install dependencies if not
          pip install pandas streamlit
          streamlit run ./streamlit/app.py
          ```
        - The app is also deployed on the [streamlit cloud](https://share.streamlit.io/kota7/wordleai-sql/main/streamlit/app.py).
        
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Description-Content-Type: text/markdown
