Metadata-Version: 1.1
Name: lachesis
Version: 0.0.1.0
Summary: lachesis automates the segmentation of a transcript into closed captions
Home-page: https://github.com/readbeyond/lachesis
Author: Alberto Pettarin
Author-email: alberto@albertopettarin.it
License: GNU Affero General Public License v3 (AGPL v3)
Description: lachesis
        ========
        
        **lachesis** automates the segmentation of a transcript into closed
        captions
        
        -  Version: 0.0.1
        -  Date: 2017-01-18
        -  Developed by: `Alberto Pettarin <http://www.albertopettarin.it/>`__
        -  License: the GNU Affero General Public License Version 3 (AGPL v3)
        -  Contact: info@readbeyond.it
        
        Goal
        ----
        
        TBW
        
        Installation
        ------------
        
        .. code:: bash
        
            pip install lachesis
        
        TODO: add directions about installing model files and Python NLP
        libraries.
        
        Usage
        -----
        
        Tokenize, split sentences, and POS tagging:
        
        .. code:: python
        
            from lachesis.elements import Text
            from lachesis.nlpwrappers import NLPEngine
        
            # work on this Unicode string
            s = u"Hello, World. This is a second sentence, with a comma too! And a third sentence."
        
            # but you can also pass a list with pre-split text
            # s = [u"Hello World.", u"This is a second sentence.", u"Third one, bla bla"]
        
            # create a Text object from the Unicode string
            t = Text(s, language=u"eng")
        
            # tokenize, split sentences, and POS tagging
            # the best NLP library will be chosen,
            # depending on the language of the text
            nlp1 = NLPEngine()
            nlp1.analyze(t)
            for s in t.sentences:
                print(s)
                print(s.tagged_string)
        
            # explicitly specify an NLP library
            # in this case, use "nltk"
            # (other options include: "pattern", "spacy", "udpipe")
            nlp2 = NLPEngine()
            nlp2.analyze(t, wrapper="nltk")
            ...
        
            # preload NLP libraries
            nlp3 = NLPEngine(preload=[
                ("eng", "spacy"),
                ("deu", "nltk"),
                ("ita", "pattern"),
                ("fra", "udpipe")
            ])
            nlp3.analyze(t)
            ...
        
        Download closed captions from YouTube or parse an existing TTML file:
        
        .. code:: python
        
            from lachesis.downloaders import Downloader
        
            # URL of the video
            url = u"http://www.youtube.com/watch?v=NSL_xx2Qnyc"
        
            # download English automatic CC, storing the raw TTML file in /tmp/
            language = u"en"
            options = { "auto": True, "output_file_path": "/tmp/auto.ttml" }
            ccl = Downloader.download_closed_captions(url, language, options)
            print(ccl)
        
            # download English manual CC
            language = u"en"
            options = { "auto": False }
            ccl = Downloader.download_closed_captions(url, language, options)
            print(ccl)
        
            # parse a given TTML file (downloaded from YouTube)
            ifp = "/tmp/auto.ttml"
            ccl = Downloader.read_closed_captions(ifp, options={u"downloader": u"youtube"})
        
            # get various representations of the CCs
            print(ccl.single_string)        # print as a single string, collapsing CCs and lines
            print(ccl.plain_string)         # print as a plain string, one CC per row and collapsed lines
            print(ccl.cc_string)            # print as blank-separated, multiple line, SRT-like string
                                            # (but without timings and ids)
        
        License
        -------
        
        **lachesis** is released under the terms of the GNU Affero General
        Public License Version 3. See the `LICENSE <LICENSE>`__ file for
        details.
        
Keywords: ReadBeyond Sync,ReadBeyond,SBV,SRT,SSV,SUB,TSV,TTML,VTT,aeneas,captioning,captions,closed captions,forced alignment,lachesis,media overlay,speech to text,subtitles,sync,synchronization,transcript,video captions
Platform: UNKNOWN
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: End Users/Desktop
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: GNU Affero General Public License v3
Classifier: Natural Language :: English
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: C
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Topic :: Education
Classifier: Topic :: Multimedia
Classifier: Topic :: Multimedia :: Sound/Audio
Classifier: Topic :: Multimedia :: Sound/Audio :: Analysis
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Classifier: Topic :: Printing
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Mathematics
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Topic :: Text Processing :: Markup
Classifier: Topic :: Text Processing :: Markup :: HTML
Classifier: Topic :: Text Processing :: Markup :: XML
Classifier: Topic :: Utilities
