Metadata-Version: 2.1
Name: test_1_varify_ic
Version: 0.2
Summary: testing and debugging project
Home-page: UNKNOWN
Author: ic
Author-email: quattroporte54@gmail.com
License: UNKNOWN
Platform: UNKNOWN
Requires-Python: >=3.6
Description-Content-Type: text/markdown

UPDATE 05.05.2024:
----------------------
Due to the changes related with the hosting, it is recommended to update the version of the package to the newest one, using command:
```bash
pip install --upgrade speakleash
```

[SpeakLeash](https://pypi.org/project/speakleash) is a lightweight library providing datasets for the Polish language
and tools to make them useful:

- **Website:** https://speakleash.org/
- **Datasets:** https://speakleash.org/dashboard/
- **Source code:** https://github.com/speakleash/speakleash
- **Data in action:** https://github.com/speakleash/speakleash-examples
- **Bug reports:** https://github.com/speakleash/speakleash/issues

Installation
----------------------

Speakleash package can be installed from PyPi and has to be installed in a virtual environment:

```bash
pip install speakleash
```

Basic Usage
----------------------

If you just want to see the details of the datasets:

```python
from speakleash import Speakleash
import os

base_dir = os.path.join(os.path.dirname(__file__))
replicate_to = os.path.join(base_dir, "datasets")

sl = Speakleash(replicate_to)

for d in sl.datasets:
    size_mb = round(d.characters/1024/1024)
    print("Dataset: {0}, size: {1} MB, characters: {2}, documents: {3}".format(d.name, size_mb, d.characters, d.documents))
```

You can use individual properties (e.g.:***characters***, ***documents***), but you can display the entire manifest:

```python
sl = Speakleash(replicate_to)
print(sl.get("plwiki").manifest)
```

If you chose one of them (***.get(name of dataset)***) then you will get a lot of text data:

```python
from speakleash import Speakleash
import os

base_dir = os.path.join(os.path.dirname(__file__))
replicate_to = os.path.join(base_dir, "datasets")

sl = Speakleash(replicate_to)

wiki = sl.get("plwiki").data
for doc in wiki:
    print(doc[:40])
```

If you also need meta data then use the ***ext_data*** property:

```python
ds = sl.get("plwiki").ext_data
for doc in ds:
    print(doc)
    txt, meta = doc
    print(meta.get("title"))
    print(txt)
```

Popular meta data:

* title
* length
* sentences
* words
* verbs
* nouns
* symbols
* punctuations


