Metadata-Version: 2.1
Name: crowdom
Version: 0.2.0
Summary: Crowdom
Author: Oleg Gulyaev
Author-email: o-gulyaev@yandex-team.ru
License: Apache 2.0
Requires-Python: >=3.8.0
Description-Content-Type: text/markdown
License-File: LICENSE
License-File: AUTHORS

# Crowdom

__Crowdom__ is a tool for simplifying data labeling.

Write plain Python code and launch data labeling without knowledge of crowdsourcing and underlying platform
(Crowdom uses [Toloka](https://toloka.ai) as a platform for publishing tasks for workers). Define task you solve and
load source data with few lines of code, choose quality-cost-speed tradeoff in interactive UI form, launch data
labeling, study result labeling in Pandas dataframes.

Crowdom uses [ʎzy](https://github.com/lambdazy/lzy/), cloud workflow runtime, to run data labeling workflow. 
This provides _reliability_ (automatic errors retry, possibility of data labeling relaunch without losing progress) and 
out-of-the-box _data persistence_.

## Quickstart

We recommend you to look first at [image classification](https://nbviewer.org/github/lambdazy/crowdom/blob/main/examples/image_classification/image_classification.html)
example, since it demonstrates full data labeling workflow, proposed in Crowdom, with detailed explanations for each step.

In other [examples](#Examples), you can see how working with data labeling looks like for different types of tasks with
use of Crowdom.

To get the benefits of running on ʎzy, see ʎzy setup [example](https://nbviewer.org/github/lambdazy/crowdom/blob/main/examples/lzy/lzy.ipynb).

Join our [Telegram chat](https://t.me/+axHOcdjbaFwzYzAy) if you want to learn more about the Crowdom or discuss your
task with us.

## Types of tasks

Tasks in Crowdom are divided into two types:
- _Classification tasks_, which have a fixed set of labels as output.
- _Annotation tasks_, for which output has "unlimited" dimension.

In a typical _classification_ task, worker is proposed to make a choice of one of the pre-determined options.
_Side-by-side_ (SbS) comparison is a special case of _classification_ task.

As for _annotation_ task, there may be many potential solutions, and there may be more than one correct one.
_Speech transcription_, _image annotation_ are examples of annotation tasks.

## Examples

The following table contains list of examples, which demonstrates data labeling for different types of tasks, as well
as other aspects of data labeling workflow.

Examples are presented as `.ipynb` files, located in this repository, but displayed by [nbviewer](https://nbviewer.org/github/lambdazy/crowdom/tree/main/examples/),
which do it more precisely than GitHub.

_Image classification_ and _audio transcript_) examples also have `.html` versions. These examples present full
labeling workflow, corresponding two _classification_ and _annotation_ types of  tasks respectively. `.html` allows
to collapse optional sections in notebook to simplify understanding of main steps of workflow,  as well as to display
interactive widgets contents (for example, to display quality-cost-speed tradeoff interactive form).

| Example                                                                                                                                                                                                                                                          | Full workflow | Function       | Data types  | Additionally                                                                                                   |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------|----------------|-------------|----------------------------------------------------------------------------------------------------------------|
| [Image classification](https://nbviewer.org/github/lambdazy/crowdom/blob/main/examples/image_classification/image_classification.ipynb) [(HTML)](https://nbviewer.org/github/lambdazy/crowdom/blob/main/examples/image_classification/image_classification.html) | ✅             | Classification | Image       |                                                                                                                |
| [Audio transcript](https://nbviewer.org/github/lambdazy/crowdom/blob/main/examples/audio_transcript/audio_transcript.ipynb) [(HTML)](https://nbviewer.org/github/lambdazy/crowdom/blob/main/examples/audio_transcript/audio_transcript.html)                     | ✅             | Annotation     | Audio, Text |                                                                                                                |
| [Audio transcripts SbS](https://nbviewer.org/github/lambdazy/crowdom/blob/main/examples/audio_transcripts_sbs/audio_transcripts_sbs.ipynb)                                                                                                                       |               | SbS            | Audio, Text |                                                                                                                |
| [Voice recording](https://nbviewer.org/github/lambdazy/crowdom/blob/main/examples/voice_recording/voice_recording.ipynb)                                                                                                                                         |               | Annotation     | Text, Audio | Media output, checking annotations by the ML model                                                             |
| [Audio transcript, extended](https://nbviewer.org/github/lambdazy/crowdom/blob/main/examples/audio_transcript_ex/audio_transcript_ex.ipynb)                                                                                                                      |               | Annotation     | Text, Audio | Custom task UI, custom task duration calculation, first annotations attempts by the ML model                   |
| [MOS](https://nbviewer.org/github/lambdazy/crowdom/blob/main/examples/mos/MOS.ipynb)                                                                                                                                                                             |               | Classification | Audio       | [MOS](https://www.microsoft.com/en-us/research/wp-content/uploads/2011/05/0002416.pdf) algorithm usage example |
| [Audio questions](https://nbviewer.org/github/lambdazy/crowdom/blob/main/examples/audio_questions/audio_questions.ipynb)                                                                                                                                         |               | Classification | Audio       | Output label set depending on the input data                                                                   |
| [Experts registration](https://nbviewer.org/github/lambdazy/crowdom/blob/main/examples/experts_registration/experts_registration.ipynb)                                                                                                                          |               |                |             | Registration of your private expert workforce                                                                  |
| [Task update](https://nbviewer.org/github/lambdazy/crowdom/blob/main/examples/task_update/task_update.ipynb)                                                                                                                                                     |               |                |             | Task update (instructions, UI and etc.)                                                                        |
| [ʎzy usage](https://nbviewer.org/github/lambdazy/crowdom/blob/main/examples/lzy/lzy.ipynb)                                                                                                                                                                       |               |                |             | ʎzy setup, parallel labelings                                                                                  |

## Communication

Join our communities if you have questions about Crowdom or want to discuss your data labeling task.

- [Telegram](https://t.me/+axHOcdjbaFwzYzAy)
