Metadata-Version: 2.1
Name: skipgrammar
Version: 0.1.2
Summary: A framework for representing sequences as embeddings.
Home-page: https://github.com/eifuentes/skipgrammar
License: MIT
Keywords: skipgram,word2vec,sequences,deep-learning,machine-learning
Author: Emmanuel Fuentes
Author-email: emmanuel.i.fuentes+pypi@gmail.com
Requires-Python: >=3.8,<4.0
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Dist: numpy (>=1.17.2,<2.0.0)
Requires-Dist: pandas (>=1.1,<2.0)
Requires-Dist: pytorch-lightning (>=1.6,<1.7)
Requires-Dist: torch (>=1.8.2,<2.0.0)
Project-URL: Documentation, https://github.com/eifuentes/skipgrammar
Project-URL: Repository, https://github.com/eifuentes/skipgrammar
Description-Content-Type: text/markdown

# Skip-Grammar
A framework for representing sequences as embeddings.

## Models

### Skip-gram Negative Sampling (SGNS)

Popular natural language processing models such as `word2vec` and `bert` can be repurposed to learn relationships from arbitrary sequences of items. **Skip-gram Negative Sampling** is such an algorithm part of the `models` module. This is implemented in PyTorch components or can be composed as a PyTorch Lightning module. Both are availble under the relevent namespaces `skipgrammar.models.sgns` and `skipgrammar.models.lighting.sgns`.

## Datasets

### Last.FM

The [Last.FM Dataset-1K](http://ocelma.net/MusicRecommendationDataset/lastfm-1K.html) dataset is comprised of the listening history of approximately 1,000 users from the music service [Last.FM](https://www.last.fm/). The dataset is availble at the project's main site [here](http://ocelma.net/MusicRecommendationDataset/lastfm-1K.html) and also preprocessed [here](https://github.com/eifuentes/lastfm-dataset-1K) for ease of use. The variants in the `dataset` module use the latter.

### MovieLens

The popular recommendation system dataset [MovieLens](https://grouplens.org/datasets/movielens/) is availble in three variants via the `dataset` module.

