Metadata-Version: 2.1
Name: datamaestro-text
Version: 2020.1.17
Summary: "Text related datasets"
Home-page: https://github.com/experimaestro/datamaestro_text
Author: Benjamin Piwowarski
Author-email: benjamin@piwowarski.fr
License: GPL-3
Keywords: dataset manager
Platform: any
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.5
Description-Content-Type: text/markdown
Provides-Extra: test
Requires-Dist: datamaestro (>=0.6.10)
Provides-Extra: test
Requires-Dist: tox; extra == 'test'

[![CircleCI](https://circleci.com/gh/experimaestro/datamaestro_text.svg?style=svg)](https://circleci.com/gh/experimaestro/datamaestro) [![PyPI version](https://badge.fury.io/py/datamaestro-text.svg)](https://badge.fury.io/py/datamaestro-text)

# Text-related datasets

This [datamaestro](https://github.com/bpiwowar/datasets) plugin covers text-related datasets:

- Information Retrieval
- Natural Language Processing tasks

The list of available datasets and usage instruction can be found in the [documentation](http://experimaestro.github.io/datamaestro_text/).

## List of available datasets

Below is the list of available datasets along with ids. Some datasets have several versions; in this case, the dataset id is suffixed with this information.

### Documents

- Aquaint `edu.upenn.ldc.aquaint`
- TIPSTER `gov.nist.trec.tipster`
- WikiText-2 and WikiText-103 `io.metamind.research.wikitext`

### Word embeddings

- [Glove](http://nlp.stanford.edu/projects/glove/) `edu.stanford.glove`

### Sentiment analysis

- [IMDB](http://ai.stanford.edu/~amaas/data/sentiment) `edu.stanford.aclimdb`

### Information Retrieval

#### TREC

- [TREC-1 to TREC-8, Robust 2004 and 2005](https://trec.nist.gov/) `gov.nist.trec.adhoc`


