Metadata-Version: 2.1
Name: blockdivision
Version: 0.1.0
Summary: 
Author: Batyrkhan
Author-email: batirkhangainitdinov@gmail.com
Requires-Python: >=3.8,<4.0
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Dist: pandas (>=2.0.1,<3.0.0)
Requires-Dist: sentence-transformers (>=2.2.2,<3.0.0)
Requires-Dist: sklearn (>=0.0.post5,<0.1)
Requires-Dist: torch (>=2.0.1,<3.0.0)
Requires-Dist: transformers[torch] (>=4.29.2,<5.0.0)
Description-Content-Type: text/markdown

TimeCoder
---------

timecoder.py - is a pipeline for division uploaded subtitles to blocks based on threshold of cosinus similarity

Inside the script there are 2 approaches:
1) first summarization of subtitles followed by calculation of cosinus similarity
2) first calculation of cosinus similarity followed by division by blocks and then summarization of each block

parse_subs.py - is a parser of YouTube subtitles converting them to pd.DataFrame
sentence_similarity.py - script for calculation of cosinus similarity
gpt_shortening.py - script for summarization

Different models for summarization and Sentence Similarity were compared. For similarity now we are using "IlyaGusev/mbart_ru_sum_gazeta". For Sentence Similarity the model called 'symanto/sn-xlm-roberta-base-snli-mnli-anli-xnli'.


