Metadata-Version: 2.1
Name: carkov
Version: 0.1.2
Summary: A markov chainer library
Home-page: https://github.com/aldercone/carkov
Author: Aldercone Studio
Author-email: alderconestudio@gmail.com
License: BSD
Description: # carkov #
        
        This is a library for creating and walking simple markov chains. It is
        meant for things like text generators (such as ebooks bots and word
        generators) and thus is not 'mathetematically correct'. It has some
        tools for doing text analysis but more are planned in the future
        (stubs exist to illustrate some plans, see TODO.md).
        
        ## Command line interface ##
        
        This library includes a command line interface to analyzing text and
        then walk the chain and generate text from the analysis.
        
        To analyze a corpus of text files, thus:
        
        `carkov analyze mychain.chain textfile1.txt textfile2.txt ... textfileN.txt`
        
        To walk a chain and generate text form it, thus:
        
        `carkov chain mychain.chain -c 10`
        
        There are two analysis modes currently supported, `english` and
        `word`, which are passed to the analyze method with the `-m`
        argument. `english` mode analyzes the input in a word-wise method: the
        input is segmented into (English-style) sentences, each of which are
        analyzed as separate chains of words. `word` segments the input into
        tokens, each of which is analyzed as a series of characters
        separately.
        
        Analysis also allows a window size to be specified, so that each item
        in the chain may be a fixed series of items of a specific length (for
        example, the word `foo` with a window of 2, would analyze to (_, _) ->
        'f', (_, f) -> o, (f, o) -> o, etc). The wider the window, the more
        similar or identical to the input stream the output becomes since
        there are fewer total options to follow any given token. This is
        specified with the analysis command line with the `-w` argument.
        
        ## About Library ##
        
        The library itself exposes objects and interfaces to do the same as
        the command line above. A todo item on this project is to generate
        documentation and examples, but looking at the contents of __main__.py
        should be instructive. The library is written in such a way as to be
        pretty agnostic about the items that are chained, and hypothetically
        any sequential set of things could work for this. Some framework would
        have to be written to support displaying these sorts of things but it
        should be possible if non-textual data were desired.
        
        The library also provides a few mechanisms for serializing a ready to
        use chain for reuse in other projects. The command line makes use of
        the binary serialization mechanism (which uses `msgpack`) to save
        chains from the analysis step for re-use in the chain step. There is
        also a mechanism which produces a python source file tthat can be
        embedded in a target project so that a python project can use the
        chain without having to include an extra data file. It should be noted
        that this of course is extremely inefficient for large chains.
        
Keywords: text,markov,ebooks,chainer,generator,generative
Platform: any
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: BSD License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Topic :: Artistic Software
Classifier: Topic :: Text Processing
Classifier: Topic :: Text Processing :: Linguistic
Description-Content-Type: text/markdown
