Metadata-Version: 2.4
Name: ankipan
Version: 0.7
Summary: A language learning utility with Anki integration
Author-email: Daniel Otto de Mentock <daniel.mentock@gmail.com>
License: AGPL-3.0
Project-URL: repository, https://gitlab.com/ankipan/ankipan
Classifier: Intended Audience :: Education
Classifier: Topic :: Education
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Typing :: Typed
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: beautifulsoup4
Requires-Dist: chardet
Requires-Dist: PyMuPDF
Requires-Dist: Flask
Requires-Dist: Jinja2
Requires-Dist: langdetect
Requires-Dist: Markdown
Requires-Dist: numpy
Requires-Dist: protobuf
Requires-Dist: pykakasi
Requires-Dist: pysubs2
Requires-Dist: PyYAML
Requires-Dist: Requests
Requires-Dist: tqdm
Requires-Dist: wcwidth
Requires-Dist: pytest
Requires-Dist: google-generativeai
Requires-Dist: ipywidgets
Dynamic: license-file

# Ankipan

Ankipan is a project to democratize language learning in a decentralized way.

It allows you to choose which domains you want to be more fluent in, and creates a custom learning curriculum that aims to get you to your goal as effectively and efficiently as possible.

## Workflow

First, choose which fields you want on your flashcards: which online dictionaries for definitions, which example sentence sources (e.g. YouTube subtitles, particular youtubers, Wikipedia, open news corpora etc.), or other fields such as statistically frequent contexts of the word or gpt-explained comparisons to synonyms/similar words.

Then parse any text you like (plain text, subtitles, PDF, HTML, or Ankipan DB sources).
Ankipan generates a frequency-sorted list of lemmas, [lets you pick which words to learn](/docs/select_new_words_screenshot.png), and creates the corresponding Anki flashcards that can be directly synced with anki using the [AnkiConnect](https://ankiweb.net/shared/info/2055492159) extension.

Inside Anki, you can [color-tag useful example sentences](/docs/anki_screenshot_2.png). The cards will remember and automatically expand those on the next review, so you can try to recall actual sentences you would use instead of just a translation when seeing the word on the frontside of the flashcard.

Example sentences include GPT-generated translations and explanations that are generated by the `ankipan_default` server hosted in germany by default. If it is currently too busy or you just prefer a local solution, you can use your own Google Gemini API key which has a free quota of 1000 requests per day, or a local Ollama setup (see `ankipan/gpt_base.py`).

You can use this tool in the long term to track which words you already know, and filter new sources you are interested in by the most relevant new words so that you can quickly get started and focus your progress on the areas that you are personally interested in. In the long term, we aim to move beyond generating flashcards for just singular words and provide users with a more holistic way to engage with a learning curriculum via full sentences and other useful tasks that are optimally adapted to their individual learning style and language goals.

<p align="center">
  <img src="/docs/select_new_words_screenshot.png" alt="Selection overview" width="47%">
  <img src="/docs/anki_screenshot_1.png" alt="Dictionary definition" width="25%">
  <img src="/docs/anki_screenshot_2.png" alt="Example sentence field" width="25%">
</p>

## Getting started

### 1. Prerequisites

- Download and install anki from https://apps.ankiweb.net/
- Create an account on their website
- Install the [AnkiConnect](https://ankiweb.net/shared/info/2055492159) plugin
- Open the app and login, keep anki open when syncing databases

### 2. Installation

- Using pip:

```bash
pip install ankipan
```

- From source:

```bash
git clone git@gitlab.com:ankipan/ankipan.git
cd ankipan
pip install .
```

### 3. (Optional) Install lemmatizers to parse your own texts

- [Download pytorch](https://pytorch.org/get-started/locally/) for stanza lemma parsing
- install dependencies:

```bash
pip install stanza
pip install HanTa # optional but recommended for german, allows for more accurate lemmatization
```

## Usage

See notebooks in `/examples`.

## Development

The goal of this project is to create a library that is highly modular and scalable, as well as flexible enough to adapt to the needs of any particular language or language learning style. New flashcard fields can easily be added in a modular way by creating a new file in the `ankipan/flashcard_fields` directory.

We try to be as decentralized as possible by allowing anyone to privately or publicly host a server with text data that might be interesting for language learners or provide resources to generate and cache translations and explanations with GPTs.
Users can connect to any number of servers, the default list can be found on /`servers.yaml` and you can also add custom servers or your own servers with `ankipan.Config.add_server('custom_server_name', url)` or `ankipan.Config.add_server('my_local_server', 'http://127.0.0.1:5701')` (default local address when you launch `ankipan_db/server.py` on the same computer).

To initialize [ankipan_db](https://gitlab.com/ankipan/ankipan_db) as a submodule, paste the following commands into CLI:

```bash
git submodule update --init
cd ankipan_db
git fetch --unshallow || true
git config remote.origin.fetch "+refs/heads/*:refs/remotes/origin/*"
git fetch origin
cd .. && git submodule update --remote ankipan_db
```

If you are interested in having your own public server added to the standard `servers.yaml` list, feel free to just create an issue.

## Notes

- Current lemmatization is done via the `stanza` library in the reader.py module. While this works mostly fine, the library still just uses a statistical model to estimate the likely word roots (lemmas) of the different pieces of sentences. It sometimes makes mistakes or produces lemmas which make no sense and requires the users to manually filter them in the `select_new_words` overview, or suspend the card later on in anki.
