Metadata-Version: 2.1
Name: recbox
Version: 0.0.2
Summary: A configurable, tunable, and reproducible library for candidate item matching
Home-page: https://github.com/xue-pai/MatchBox
Author: zhujiem
Author-email: zhujiem@users.noreply.github.com
License: Apache-2.0 License
Download-URL: https://github.com/xue-pai/MatchBox/tags
Keywords: recommender systems,candidate item matching,collaborative filtering,two-tower models,pytorch
Platform: UNKNOWN
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.6
Description-Content-Type: text/markdown
Requires-Dist: pandas
Requires-Dist: numpy
Requires-Dist: h5py
Requires-Dist: PyYAML (<5.0)
Requires-Dist: scikit-learn
Requires-Dist: tqdm

# MatchBox

Industrial recommender systems typically have two main phases: matching and ranking. In the first phase, candidate item matching (also known as candidate retrieval) aims for efficient and high-recall retrieval from a large item corpus. MatchBox provides an open source library for candidate item matching, with stunning features in configurability, tunability, and reproducibility. 


## Model Zoo

| Publication | Model          | Paper                                    | Benchmark | 
|:-----------:|:--------------:|:----------------------------------------------------------------- |:-------------:|
| UAI'09      | [MF-BPR](./model_zoo/MF)         | [BPR: Bayesian Personalized Ranking from Implicit Feedback](https://arxiv.org/ftp/arxiv/papers/1205/1205.2618.pdf)                            | [:arrow_upper_right:](./model_zoo/MF/config) |
| RecSys'16   | [YoutubeNet](./model_zoo/YoutubeNet)        | [Deep Neural Networks for YouTube Recommendations](https://dl.acm.org/doi/10.1145/2959100.2959190)                                            | [:arrow_upper_right:](./model_zoo/YouTubeNet/config) |
| CIKM'21     | [MF-CCL](./model_zoo/MF)/ [SimpleX](./model_zoo/SimpleX)  | [SimpleX: A Simple and Strong Baseline for Collaborative Filtering](https://arxiv.org/pdf/2109.12613.pdfhttps://arxiv.org/pdf/2109.12613.pdf) | [:arrow_upper_right:](./model_zoo/SimpleX/config) | 


## Dependency

We suggest to use the following environment where we test MatchBox only. 

+ python 3.6.x
+ torch 1.0.x
+ PyYAML<5.0
+ pandas
+ scikit-learn
+ numpy
+ h5py
+ tqdm


## Get Started

The code workflow is structured as follows:

```python
# Set the data config and model config
feature_cols = [{...}] # define feature columns
label_col = {...} # define label column
params = {...} # set data params and model params

# Set the feature encoding specs
feature_encoder = FeatureEncoder(feature_cols, label_col, ...) # define the feature encoder
datasets.build_dataset(feature_encoder, ...) # fit feature_encoder and build dataset 

# Load data generators
train_gen, valid_gen, test_gen = h5_generator(feature_encoder, ...)

# Define a model
model = SimpleX(...)

# Train the model
model.fit(train_gen, valid_gen, ...)

# Evaluation
model.evaluate(test_gen)
```

#### Run the benchmark

For reproducing the experiment results, you can run the benchmarking script with the corresponding configs as follows.

+ --config: The config directory where dataset config and model config are located.
+ --expid: The experiment id defined in a model config file to denote a specific setting of hyper-parameters.
+ --gpu: The gpu index used for experiment, and -1 for CPU.

```bash
cd model_zoo/SimpleX
python run_expid.py --config ./config/SimpleX_yelp18_m1 --expid SimpleX_yelp18_m1 --gpu 0
python run_expid.py --config ./config/SimpleX_amazonbooks_m1 --expid SimpleX_amazonbooks_m1 --gpu 0
python run_expid.py --config ./config/SimpleX_gowalla_m1 --expid SimpleX_gowalla_m1 --gpu 0
```

The running logs are also available in each config directory.



