Metadata-Version: 1.0
Name: recluse
Version: 0.1.6
Summary: Reproducible Experimentation for Computational Linguistics Use
Home-page: https://github.com/lamber/recluse
Author: L. Amber Wilcox-O'Hearn
Author-email: amber@cs.toronto.edu
License: COPYING
Description: Recluse
        
        Author: L. Amber Wilcox-O'Hearn
        
        Contact: amber@cs.toronto.edu
        
        Released under the GNU AFFERO GENERAL PUBLIC LICENSE, see COPYING file for details.
        
        ==============
        Introduction
        ==============
        
        Recluse (Reproducible Experimentation for Computational Linguistics Use) is a set of tools for running computational linguistics experiments reproducibly.
        
        This version contains 
        
        * utils, which has a function for reading and writing unicode with regular or compressed text.
        * article_randomiser, which reproducibly randomly divides a corpus into training, development, and test sets.
        * nltk_based_segmenter_tokeniser, which does sentence segmentation and word tokenisation.
          It is optimised for Wikipedia type text, and it has a mode that preserves the untokenised text (modulo extra whitespace).
        * vocabulary_generator and the helper class vocabulary_cutter.  This wraps srilm as it makes unigram counts, and then selects the most frequent.
        
        
        
        
        
Platform: UNKNOWN
