Metadata-Version: 2.1
Name: sigurd
Version: 1.2.5
Summary: Code for presentation at Graz, 6th November 2019
Home-page: https://github.com/clemsciences/comparison_sigurdr_siegfried
Author: Clément Besnier
Author-email: clem@clementbesnier.fr
License: License :: OSI Approved :: MIT License
Keywords: old-norse,middle-high-german,siegfried,sigurdr
Platform: UNKNOWN
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Classifier: License :: OSI Approved :: MIT License
Classifier: Topic :: Text Processing
Classifier: Topic :: Text Processing :: Markup :: XML
Requires-Python: >=3.6
Description-Content-Type: text/markdown
Requires-Dist: cltk
Requires-Dist: requests
Requires-Dist: PyPDF2
Requires-Dist: gensim
Requires-Dist: cltk (==0.1.110)
Requires-Dist: bs4
Requires-Dist: lxml

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/clemsciences/comparison_sigurdr_siegfried.git/master) ![PyPI](https://img.shields.io/pypi/v/sigurd)

# Nibelungenlied and Völsunga saga

Comparison between *Nibelungenlied* and *Völsunga saga*

The name of the package is **sigurd**.

Slides and code are available at [www.clementbesnier.fr/projets/cltk_2019_graz](https://www.clementbesnier.fr/projets/cltk_2019_graz).

## Download and Installation

```bash
$ pip install sigurd
```

## Origins of the data
## Völsunga saga

For Old Norse texts: https://github.com/clemsciences/old_norse_corpus and ultimately https://heimskringla.no/wiki/Main_Page.

### Referenzkorpus Mittelhochdeutsch

#### Source and license
[Main page of the project](https://www.linguistics.rub.de/rem/access/index.html)

> Klein, Thomas; Wegera, Klaus-Peter; Dipper, Stefanie; Wich-Reif, Claudia (2016). Referenzkorpus Mittelhochdeutsch (1050–1350), Version 1.0, https://www.linguistics.ruhr-uni-bochum.de/rem/. ISLRN 332-536-136-099-5.

License: 

> Das Referenzkorpus Mittelhochdeutsch ist lizenziert unter einer [Creative Commons Namensnennung - Weitergabe unter gleichen Bedingungen 4.0 International Lizenz](https://creativecommons.org/licenses/by-sa/4.0/).

No change is made on the corpus. This code is intended to parse the corpus.

#### Corpus retrieval

1. Go to https://www.linguistics.rub.de/rem/access/index.html.
2. Click on "CORA-XML AKS .TAR.XZ" or "CORA-XML ALS .ZIP"
3. Click on "Herunterladen".
4. Uncompress the dowloaded file.
5. You have a folder, named **rem-corraled-20161222** (2019-09-18) with a list of XML files which are annotated texts.



