Metadata-Version: 2.1
Name: pyglossary
Version: 4.1.0
Summary: A tool for converting dictionary files aka glossaries.
Home-page: https://github.com/ilius/pyglossary
Author: Saeed Rasooli
Author-email: saeed.gnu@gmail.com
License: GPLv3+
Platform: UNKNOWN
Description-Content-Type: text/markdown
Provides-Extra: full
Requires-Dist: PyICU ; extra == 'full'
Requires-Dist: PyYAML ; extra == 'full'
Requires-Dist: beautifulsoup4 ; extra == 'full'
Requires-Dist: libzim ; extra == 'full'
Requires-Dist: lxml ; extra == 'full'
Requires-Dist: marisa-trie ; extra == 'full'
Requires-Dist: python-lzo ; extra == 'full'

PyGlossary
==========

A tool for converting dictionary files aka glossaries.

The primary purpose is to be able to use our offline glossaries in any Open
Source dictionary we like on any OS/device.

There are countless formats, and my time is limited, so I implement formats that
seem more useful for myself, or for Open Source community. Also diversity of
languages is taken into account. Pull requests are welcome.

Screenshots
-----------

<img src="https://raw.githubusercontent.com/wiki/ilius/pyglossary/screenshots/40a-gtk-txt-stardict-aryanpour-dark.png" width="50%" height="50%"/>

Linux - Gtk3-based interface

------------------------------------------------------------------------

<img src="https://raw.githubusercontent.com/wiki/ilius/pyglossary/screenshots/40b-tk-bgl-epub-es-en-2.png" width="50%" height="50%"/>

Windows - Tkinter-based interface

------------------------------------------------------------------------

<img src="https://raw.githubusercontent.com/wiki/ilius/pyglossary/screenshots/32-cmd-freedict-mids-de-ru.png" width="50%" height="50%"/>

Linux - command-line interface

------------------------------------------------------------------------

<img src="https://raw.githubusercontent.com/wiki/ilius/pyglossary/screenshots/40-cmdi-termux-zim-slob-en-med.jpg" width="50%" height="50%"/>

Android Termux - interactive command-line interface


Supported formats
-----------------

Format                                                    |   |    Extension    |Read|Write
----------------------------------------------------------|:-:|:---------------:|:--:|:---:
[Aard 2 (slob)](./doc/p/aard2_slob.md)                    |🔢 | .slob           | ✔  |  ✔
[ABBYY Lingvo DSL](./doc/p/dsl.md)                        |📝 | .dsl            | ✔  |
[Almaany.com](./doc/p/almaany.md) (SQLite3, Arabic)       |🔢 | .db             | ✔  |
[AppleDict Binary](./doc/p/appledict_bin.md)              |🔢 | .dictionary     | ✔  | ❌
[AppleDict Source](./doc/p/appledict.md)                  |📁 |                 |    |  ✔
[Babylon BGL](./doc/p/babylon_bgl.md)                     |🔢 | .bgl            | ✔  | ❌
[CC-CEDICT](./doc/p/cc_cedict.md) (Chinese)               |📝 |                 | ✔  | ❌
[cc-kedict](./doc/p/cc_kedict.md) (Korean)                |📝 |                 | ✔  | ❌
[CSV](./doc/p/csv.md)                                     |📝 | .csv            | ✔  |  ✔
[Dict.cc](./doc/p/dict_cc.md) (SQLite3, German)           |🔢 | .db             | ✔  |
[DICT.org / Dictd server](./doc/p/dict_org.md)            |📁 | (📝.index)      | ✔  |  ✔
[DICT.org / dictfmt source](./doc/p/dict_org_source.md)   |📝 | (.dtxt)         |    |  ✔
[dictunformat output file](./doc/p/dictunformat.md)       |📝 | (.dictunformat) | ✔  |
[DictionaryForMIDs](./doc/p/dicformids.md)                |📁 | (📁.mids)       | ✔  |  ✔
[DigitalNK](./doc/p/digitalnk.md) (SQLite3, N-Korean)     |🔢 | .db             | ✔  |
[EDLIN](./doc/p/edlin.md)                                 |📁 | .edlin          | ✔  |  ✔
[EPUB-2 E-Book](./doc/p/epub2.md)                         |📦 | .epub           | ❌ |  ✔
[FreeDict](./doc/p/freedict.md)                           |📝 | .tei            | ✔  | ❌
[Gettext Source](./doc/p/gettext_po.md)                   |📝 | .po             | ✔  |  ✔
[HTML Directory (by file size)](./doc/p/html_dir.md)      |📁 |                 | ❌ |  ✔
[JMDict](./doc/p/jmdict.md) (Japanese)                    |📝 |                 | ✔  | ❌
[JSON](./doc/p/json.md)                                   |📝 | .json           |    |  ✔
[Kobo E-Reader Dictionary](./doc/p/kobo.md)               |📦 | .kobo.zip       | ❌ |  ✔
[Kobo E-Reader Dictfile](./doc/p/kobo_dictfile.md)        |📝 | .df             | ✔  |  ✔
[Lingoes Source](./doc/p/lingoes_ldf.md)                  |📝 | .ldf            | ✔  |  ✔
[Mobipocket E-Book](./doc/p/mobi.md)                      |🔢 | .mobi           | ❌ |  ✔
[Octopus MDict](./doc/p/octopus_mdict.md)                 |🔢 | .mdx            | ✔  | ❌
[Sdictionary Binary](./doc/p/sdict.md)                    |🔢 | .dct            | ✔  |
[Sdictionary Source](./doc/p/sdict_source.md)             |📝 | .sdct           |    |  ✔
[SQL](./doc/p/sql.md)                                     |📝 | .sql            | ❌ |  ✔
[StarDict](./doc/p/stardict.md)                           |📁 | (📝.ifo)        | ✔  |  ✔
[Tabfile](./doc/p/tabfile.md)                             |📝 | .txt, .tab      | ✔  |  ✔
[Wiktionary Dump](./doc/p/wiktionary_dump.md)             |📝 | .xml            | ✔  | ❌
[Wordset.org](./doc/p/wordset.md)                         |📁 |                 | ✔  |
[XDXF](./doc/p/xdxf.md)                                   |📝 | .xdxf           | ✔  | ❌
[Zim (Kiwix)](./doc/p/zim.md)                             |🔢 | .zim            | ✔  |

Legend:
- 📁	Directory
- 📝	Text file
- 📦	Package/archive file
- 🔢	Binary file
- ✔		Supported
- ❌ 	Will not be supported

Note: SQLite3 `.db` files are not detected by extension; So you need to specify the format.

Requirements
------------

PyGlossary requires **Python 3.7 or higher**, and works in practically all
modern operating systems. While primarily designed for *GNU/Linux*, it works
on *Windows*, *Mac OS X* and other Unix-based operating systems as well.

As shown in the screenshots, there are multiple User Interface types (multiple
ways to use the program).

-	**Gtk3-based interface**, uses [PyGI (Python Gobject Introspection)](http://pygobject.readthedocs.io/en/latest/getting_started.html)
	You can install it on:
	-	Debian/Ubuntu: `apt install python3-gi python3-gi-cairo gir1.2-gtk-3.0`
	-	openSUSE: `zypper install python3-gobject gtk3`
	-	Fedora: `dnf install pygobject3 python3-gobject gtk3`
	-	ArchLinux:
		* `pacman -S python-gobject gtk3`
		* https://aur.archlinux.org/packages/pyglossary/
	-	Mac OS X: `brew install pygobject3 gtk+3`
	-	Nix / NixOS: `nix-shell -p gnome3.gobjectIntrospection python38Packages.pygobject3 python38Packages.pycairo`

-	**Tkinter-based interface**, works in the lack of Gtk. Specially on
	Windows where Tkinter library is installed with the Python itself.
	You can also install it on:
	-	Debian/Ubuntu: `apt-get install python3-tk tix`
	-	openSUSE: `zypper install python3-tk tix`
	-	Fedora: `yum install python3-tkinter tix`
	-	Mac OS X: read <https://www.python.org/download/mac/tcltk/>
	-	Nix / NixOS: `nix-shell -p python38Packages.tkinter tix`

-	**Command-line interface**, works in all operating systems without
	any specific requirements, just type:

	`python3 main.py --help`

	- **Interactive command-line interface**
		- Requires: `pip3 install prompt_toolkit`
		- Perfect for mobile devices (like Termux on Android) where no GUI is available
		- Automatically selected if output file argument is not passed **and** one of these:
			- On Linux and `$DISPLAY` environment variable is empty or not set
				- For example when you are using a remote Linux machine over SSH
			- On Mac and no `tkinter` module is found
		- Manually select with `--cmd` or `--ui=cmd`
			- Minimally: `python3 main.py --cmd`
			- You can still pass input file, or any flag/option
		- If both input and output files are passed, non-interactive cmd ui will be default
		- If you are writing a script, you can pass `--no-interactive` to force disable interactive ui
			- Then you have to pass both input and output file arguments
		- Don't forget to use *Up/Down* or *Tab* keys in prompts!
			- Up/Down key shows you recent values you have used
			- Tab key shows available values/options
		- You can press Control+C (on Linux/Windows) at any prompt to exit



UI (User Interface) Selection
-----------------------------
When you run PyGlossary without any command-line arguments or options/flags,
PyGlossary tries to find PyGI and open the Gtk3-based interface. If it fails,
it tries to find Tkinter and open the Tkinter-based interface. If that fails,
it tries to find `prompt_toolkit` and run interactive command-line interface.
And if none of these libraries are found, it exits with an error.

But you can explicitly determine the user interface type using `--ui`

- `python3 main.py --ui=gtk`
- `python3 main.py --ui=tk`
- `python3 main.py --ui=cmd`


Installation on Windows
-----------------------
- [Download and install Python](https://www.python.org/downloads/windows/) (3.8 or 3.9 is recommended)
- Open Start -> type Command -> right-click on Command Prompt -> Run as administrator
- To ensure you have `pip`, run: `python -m ensurepip --upgrade`
- To install, run: `pip install --upgrade pyglossary`
- Now you should be able to run `pyglossary` command
- If command was not found, make sure Python environment variables are set up:
	<img src="https://raw.githubusercontent.com/wiki/ilius/pyglossary/screenshots/windows-python39-env-vars.png" width="50%" height="50%"/>


Feature-specific Requirements
----------------------------

-	**Using `--remove-html-all` flag**

	`sudo pip3 install lxml beautifulsoup4`


Some formats have additional requirements.
If you have trouble with any format, please check the [link given for that format](#supported-formats) to see its documentations.

**Using Termux on Android?** See [doc/termux.md](./doc/termux.md)


Configuration
-------------
See [doc/config.md](./doc/config.md).

User Plugins
------------
If you want to add your own plugin without adding it to source code directory,
or you want to use a plugin that has been removed from repository,
you can place it in this directory:
- Linux or BSD: `~/.pyglossary/plugins/`
- Mac: `~/Library/Preferences/PyGlossary/plugins`
- Windows: `C:\Users\USERNAME\AppData\Roaming\PyGlossary\plugins`


Using PyGlossary as a Python library
------------------------------------
There are a few examples in [doc/lib-examples](./doc/lib-examples) directory.

Here is a basic script that converts any supported glossary format to [Tabfile](./doc/p/tabfile.md):


	import sys
	import pyglossary
	from pyglossary import Glossary

	# Glossary.init() must be called only once, so make sure you put it
	# in the right place
	Glossary.init()

	glos = Glossary()
	glos.convert(
		inputFilename=sys.argv[1],
		outputFilename=f"{sys.argv[1]}.txt",
		# although it can detect format for *.txt, you can still pass outputFormat
		outputFormat="Tabfile",
		# you can pass readOptions or writeOptions as a dict
		# writeOptions={"encoding": "utf-8"},
	)

You may look at docstring of `Glossary.convert` for full list of keyword arguments.

If you need to add entries inside your Python program (rather than converting one glossary into another), then you use `write` instead of `convert`, here is an example:


	from pyglossary.glossary import Glossary

	Glossary.init()

	glos = Glossary()
	mydict = {
		"a": "test1",
		"b": "test2",
		"c": "test3",
	}
	for word, defi in mydict.items():
		glos.addEntryObj(glos.newEntry(
			word,
			defi,
			defiFormat="m",  # "m" for plain text, "h" for HTML
		))

	glos.setInfo("title", "My Test StarDict")
	glos.setInfo("author", "John Doe")
	glos.write("test.ifo", format="Stardict")


And if you need to read a glossary from file into a `Glossary` object in RAM (without immediately converting it), you can use `glos.read(filename, format=inputFormat)`. Be wary of RAM usage in this case.


Internal Glossary Structure
---------------------------
A glossary contains a number of entries.

Each entry contains:

- Headword (title or main phrase for lookup)
- Alternates (some alternative phrases for lookup)
- Definition

In PyGlossary, headword and alternates together are accessible as a single Python list `entry.l_word`

`entry.defi` is the definition as a Python Unicode `str`. Also `entry.b_defi` is definition in UTF-8 byte array.

`entry.defiFormat` is definition format. If definition is plaintext (not rich text), the value is `m`. And if it's in HTML (contains any html tag), then `defiFormat` is `m`. The value `x` is also allowed for XFXF, but XDXF is not widely supported in dictionary applications.

There is another type of `Entry` which is called **Data Entry**, and generally contains image files, TTL or other audio files, or any file that was included in input glossary. For data entries:
- `entry.s_word` is file name (and `l_word` is still a list containing this string),
- `entry.defiFormat` is `b`
- `entry.data` gives the content of file in `bytes`.



