Metadata-Version: 2.1
Name: grape-pi
Version: 1.0.1
Summary: Using graph neural networks to enhance protein identification using protein interaction networks.
Author: Chunhui Gu
Author-email: fduguchunhui@gmail.com
Requires-Python: >=3.11.0,<=3.12.0
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Provides-Extra: notebooks
Requires-Dist: chardet (>=5.2.0,<6.0.0)
Requires-Dist: gseapy (>=1.1.3,<2.0.0) ; extra == "notebooks"
Requires-Dist: h5py (>=3.11.0,<4.0.0)
Requires-Dist: hydra-core (>=1.3.2,<2.0.0)
Requires-Dist: jupyter (>=1.1.1,<2.0.0) ; extra == "notebooks"
Requires-Dist: matplotlib (>=3.9.0,<4.0.0) ; extra == "notebooks"
Requires-Dist: matplotlib-venn (>=1.1.1,<2.0.0) ; extra == "notebooks"
Requires-Dist: numpy (>=1.26.4,<2.0.0)
Requires-Dist: ogb (>=1.3.6,<2.0.0)
Requires-Dist: overrides (>=7.7.0,<8.0.0)
Requires-Dist: pandas (>=2.2.2,<3.0.0)
Requires-Dist: psutil (>=5.9.8,<6.0.0)
Requires-Dist: pyparsing (>=3.1.2,<4.0.0)
Requires-Dist: pytorch-lightning (>=2.2.5,<3.0.0)
Requires-Dist: qtconsole (>=5.6.1,<6.0.0) ; extra == "notebooks"
Requires-Dist: seaborn (==0.12.0) ; extra == "notebooks"
Requires-Dist: tensorboardx (>=2.6.2.2,<3.0.0.0)
Requires-Dist: torch (==2.4.0)
Requires-Dist: torch-geometric (>=2.6.1,<3.0.0)
Requires-Dist: yacs (>=0.1.8,<0.2.0)
Description-Content-Type: text/markdown

# Welcome to GrapePi: Graph neural network using Protein-protein-interaction for Enhancing Protein Identification

<p align="center">
  <img src="figures/grape-pie.png" />
</p>

# Introduction

GRAph-neural-network using Protein-protein-interaction for Enhancing Protein Identification (Grape-Pi) is a deep
learning framework for predict protein existence based on
protein feature generated from Mass spectrometry (MS) instrument/analysis software and protein-protein-interaction (PPI)
network.

The main idea is to promote proteins with medium evidence but are supported by protein-protein-interaction information
as existent. Unlike traditional network analysis, PPI information is used with strong assumptions and restricted to
specific sub-network structures (e.g. clique), Grape-Pi model is a fully data-driven model and can be much more
versatile.

--------------------------------------------------------------------------------

The contribution of Grape-Pi comes in threefold. First, we developed 
a dataloader module designed for loading MS protein data and 
protein-protein-interaction data into dataset format that can be readily
used by torch-geometry. Second, we customized the graphgym module
for the purpose of supervised learning in proteomics data. Third, we
explored the design space and discussed caveats for training such a 
model for the best performance.


# installation
**GrapePi** is built on top of the **PyTorch Geometric**, a geometric 
deep learning extension library for PyTorch. 
It consists of various methods for deep learning on graphs and
other irregular structures, also known as *[geometric deep learning](http://geometricdeeplearning.com/)*, from a 
variety of published papers.
In addition, it consists of easy-to-use mini-batch loaders for operating on many small and single giant
graphs, [multi GPU-support](https://github.com/pyg-team/pytorch_geometric/tree/master/examples/multi_gpu), [`DataPipe` support](https://github.com/pyg-team/pytorch_geometric/blob/master/examples/datapipe.py),
distributed graph learning via [Quiver](https://github.com/pyg-team/pytorch_geometric/tree/master/examples/quiver), a
large number of common benchmark datasets (based on simple interfaces to create your own),
the [GraphGym](https://pytorch-geometric.readthedocs.io/en/latest/advanced/graphgym.html) experiment manager, and
helpful transforms, both for learning on arbitrary graphs as well as on 3D meshes or point clouds.

## Create a virtual environment
It is highly recommended to use a virtual environment to install the required packages for Grape-Pi.
Please refer to https://conda.io/projects/conda/en/latest/user-guide/install/index.html for how to install a miniconda
or anaconda in your local machine.

To create a virtual environment, for example, if using conda
```angular2html
conda create -n [Name-of-the-Virtual-Environment] python=3.11
conda activate [Name-of-the-Virtual-Environment]
```
Replace `[Name-of-the-Virtual-Environment]` with your preferred name.

##  Get a copy of the project
Clone a copy of Grape-Pi from source and navigate to the root directory of the download folder. See [Cloning a repository](https://docs.github.com/en/repositories/creating-and-managing-repositories/cloning-a-repository) for more details about how to make a local copy of project.

```angular2html
git clone https://github.com/FDUguchunhui/GrapePi.git
cd GrapePi
```

## Install dependencies
We recommend the installation through Poetry using`pyproject.toml` which a new standard file for declare dependencies and build system information for Python projects. You can also use poetry to install the dependencies. Poetry is recommended to be installed in the global environment. See more at [Poetry](https://python-poetry.org/docs/)
```
cd GrapePi # navigate to the root directory of the project
poetry install
```

Some package may not be installed correctly, you may need to install them manually, including torch, torch_geometric,
and others packages that torch and torch_geometric depends. Those packages depend on the version of python and 
pytorch, architecture of the system, and operating system. Follow the instruction below to install the required 
packages based on your environment.

The original torch-geometry support python 3.7-3.11 with pytorch 1.3.0-2.0.0. For illustration purpose, we use python=3.10 and pytorch=2.0.0 here.
For using and debugging with other python and pytorch version, please refer
to https://github.com/pyg-team/pytorch_geometric for details


### Install pytorch

For mac
```angular2html
conda install pytorch==2.0.0 torchvision==0.15.0 torchaudio==2.0.0 -c pytorch
```

For Linux and Windows:

```angular2html
# CUDA 11.8
conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=11.8 -c pytorch -c nvidia
# CUDA 12.1
conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=12.1 -c pytorch -c nvidia
# CPU Only
conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 cpuonly -c pytorch
```

Follow the installation instructions to install additional libraries to using Grape-Pi:

torch-scatter, torch-sparse, torch-cluster and torch-spline-conv (if you haven't already). Replace {CUDA} with your 
specific CUDA version or cpu.
```angular2html
pip install torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.1.0+${CUDA}.html
```

If you are using pytorch=2.1.0 cpu only version, run
```angular2html
pip install torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.1.0+cpu.html
```



## Quick start

### Use pre-trained model to make prediction
```angular2html
grapepi --cfg data/gastric_all_data/gastric-graphsage.yaml --checkpoint saved_results/gastric-graphsage/epoch=199-step=4800.ckpt --threshold 0.9 --num-promoted 100
```
The above command will initialize the model that is defined in provided configuration files and  load the trained model from the checkpoint 
file, make prediction on the unconfident proteins, 
which are defined as proteins with raw protein probability below 0.9 and without any groud-truth label, and promote 
100 proteins with the highest prediction protein probability. The prediction result "promoted_protein.csv" will be 
saved in the root directory of the provided data path. You can change the **dataset.dir** in the configuration file 
to the path of your own data to make prediction on your own data. 

You can also overwrite data path by providing 
additional argument in key and value format sequentially after the command. For example,
```angular2html
grapepi --cfg data/gastric_all_data/gastric-graphsage.yaml --checkpoint saved_results/gastric-graphsage/epoch=199-step=11800.ckpt --threshold 0.9 --num-promoted 100 --output {YOUR_DATA_PATH}
```
Replace `{YOUR_DATA_PATH}` with the path of your own data. Using this approach you can override any key-value pair 
related to model training in configuration file without changing the configuration file.

For the best performance, it is recommended to train a new model with your own data. To train a new model, you just 
need to run the same command without the `--checkpoint` argument.
You can use the following command to train a new model with your own data. It will use predefined configuration file,
which use "sageConv" as the 
message-passing layer and hyperparameters that have been optimized for this task. Once the training is done, it will 
be saved in the `results` folder by default. Go into the `results` folder, you can find a subfolder name after the 
configuration file name, the ckpt file will be saved in this subfolder under "{random_seed}/ckpt" folder, where 
{random_seed} is the random seed used for this training. 
```angular2html
### Train a new model with given hyper-parameters options
grapepi --cfg data/gastric_all_data/gastric-graphsage.yaml --threshold 0.9 --num-promoted 100
```
See more details about the format of the `protein` and `protein interaction`. You can also change the 
hyperparameters in the configuration file to optimize the model for your own data once you have a better 
understanding about what each hyperparameter does.

## GrapePi training framework
![GrapePi](figures/flowchart.jpg)
The complete GrapePi training framework consists of three main components: data preparation, hyperparameter training, 
and post-training analysis.

### Data preparation
The GraphPi training framework was based on graphgym. Please check `Design Space for Graph Neural Networks
https://arxiv.org/abs/2011.08843` for details.

The GraphPi framework features a built-in module for easily load raw 
protein and PPI data into torch.geometry.dataset
format that can be used for training model. The only things needed is to 
provide a path to the dataset folder.

The dataset folder structure should look like this, with a sub-folder 
named `raw` inside the `raw` sub-folder, it should have three
sub-folders: protein, interaction, and reference (optional).

The  `protein`  folder must contain exact only one **csv** file: 
the first column and second column must be protein ID and raw protein probability, and other columns can contain additional protein features.

The `interaction` folder must contain exact only one **csv** file: the first 
and second columns must be same type protein ID (e.g. Uniprot 
accession number or gene symbol) refer to the two protein interactors,
other columns can be additional features for the interaction relationship. The ID system used should be matched with 
the protein ID in the `protein` folder.

To provide ground-truth labels for the protein existence in the sample, you need to save the ground-true label information in the `protein` csv file and specify the column name of the ground-true label through `dataset.label_col` in config file.  Only labeled proteins will be used for model training, loss calculation, and then 
backward propagation to update model.

### Set up model configuration
Before you can train a model based on the provided data, a configuration file is needed to for some key information:
where to find the data, how to parse the data, what features to use, model structure, what loss function to use, how to
update model weights, etc.

You could find such an example of such config `Graph-Pi/graphgym/config/protein/protein-yeast-graphsage.yaml`

```ymal
# The recommended basic settings for GNN
# Select device: 'cpu', 'cuda', 'auto'
accelerator: 'auto'
  # number of devices: eg. for 2 GPU set cfg.devices=2
devices: 1
# the number of workers for data loader. Adjust according to your machine.
num_workers: 8
# the output directory for the results. Default is "results" folder in your root directory.
out_dir: results
# the metric used for select the best model. Default is "auc". Other options: accuracy, precision, recall, f1
metric_best: auc
# the random seed for reproducibility. It is used across the project, including data split, model initialization, etc.
seed: 1234
dataset:
  name: protein # dataset type. It should be keep unchanged
  # the directory of the dataset.
  dir: data/gastric_all_data
  # whether to rebuild the dataset. If true, the dataset will be reprocessed based on provided configuration in dataset section.
  # If false, the dataset will be loaded from the processed folder. Set to true if you keep changing your dataset configuration in the testing process
  # and then change it to false during hyperparameter tuning to avoid rebuilding the same data.
  rebuild: true
  # the column name of the label in the dataset
  label_col: hard_label
    # the column names of the node features in the dataset, you should only provide the numeric columns here.
  node_numeric_cols: [protein_probability] # , mRNA_TPM
  # the column name of the edge weight in the dataset. Set to "null" if no edge weight is used.
  # Be careful! Not all layer_type support edge weight. Currently, for use edge weight, you should use "gcnconv_with_edgeweight" as gnn.layer_type.
  # sageconv doesn't support edge weight.
  interaction_weight_col : null
  # the column to use for filter interaction. Set to "null" if use all interactions.
  interaction_conf_col: null
  # the threshold for interaction confidence. Only interactions with confidence higher than this threshold will be used. It only works when interaction_conf_col is not null.
  # be careful the scale of this parameter may differ from dataset to dataset (in STRINGR range between 150-1000).
  # Please use float number even for integer values, such as 700.0 instead of 700.
  interaction_conf_thresh: 0.0
  task: node # Type of task. For Grape-Pi it is a node classification task. It should be keep unchanged
  task_type: classification # Type of task. It should be keep unchanged
  # when you have non-numeric node feature you need to encode them. Only for advanced user.
  node_encoder: false
  # when you have non-numeric edge you need to encode them. Only for advanced user.
  edge_encoder: false
    # the split ratio of the dataset train, validation, and test respectively. The sum of the split ratio should be 1.0
  split: [0.6, 0.2, 0.2]
share:
  # the input dimension of the model. -1 means the input dimension is the same as the number of node features. It should be kept it as -1
  dim_in: -1
  # the output dimension of the model. 2 is for binary classification. It should be kept it as 2 for we use Grapei-Pi for a binary classification task.
  dim_out: 2
# train section is the main sections where hyperparameters tuning happens
train:
  # the number of samples in a batch. If you use sageconv as gnn.layer_type, it means the number of nodes in a batch, you can use 64, 128, 256, ...
  # when you use gcnconv as gnn.layer_type, it should be set to 1 as it is a full batch training (each time the entire graph as input).
  batch_size: 128
  # the period of saving the model checkpoint. It should be set to 10.
  ckpt_period: 10
  # whether to clean the checkpoint folder. If true, only the last checkpoint will be saved. If false, all checkpoints will be saved.
  ckpt_clean: false
  # the sampler used for sampling the neighbor. It should be set to neighbor when you use sageconv as gnn.layer_type. and full_batch when you use gcnconv as gnn.layer_type.
  sampler: neighbor
  # Evaluated model performance for every eval_period epochs.
  eval_period: 10
  # the neighbor sizes for sampling the neighbor for each hop from the target node. It is used when you use sageconv as gnn.layer_type.
  neighbor_sizes: [20, 10, 5]
  # the epoch to resume the training. It should be set to 0 when you start the training from scratch.
  epoch_resume: 0
  # whether to use early stopping. If true, the training will stop when the performance on the validation set does not improve for early_stop_patience epochs.
  early_stop: false
  # the patience for early stopping (numer of epochs without improving). It is only used when early_stop is true.
  early_stop_patience: 50
model:
  type: gnn
  loss_fun: cross_entropy
  # the weight is calculated by num_negative_sample/num_positive_sample
gnn:
  # the number of pre-message passing layers. It should be set to 1. pre-message passing layers are just MLP
  layers_pre_mp: 1
  # the number of message passing layers. The number of mp layer how far (hops) a neighbor node can be reached from the target node.
  layers_mp: 1
  # similar to pre-message passing layers, post-message passing layers are just MLP.
  layers_post_mp: 1
  # the inner dimension of the model. it is same across all inner layers.
  dim_inner: 10
  # the type of the layer. It should be set to sageconv when you use sageconv as gnn.layer_type and gcnconv when you use gcnconv as gnn.layer_type.
  layer_type: sageconv # sageconv gcnconv
  # the type of the stage. It should be set to skipsum when you use sageconv as gnn.layer_type. Other options are stack, skipsum, skipconcat.
  # stack: no skip connection, skipsum: skip connection with sum, skipconcat: skip connection with concatenation.
  stage_type: skipsum
  # whether to use batch normalization. It should be set to false.
  batchnorm: false
  # the activation function. Currently only relu is supported.
  act: relu
  # the dropout rate. It should between 0.0 and 1.0.
  dropout: 0.0
  # whether to normalize the adjacency matrix.
  normalize_adj: false
  # the head of the model. It should be set to protein.
  head: protein
optim:
  # the optimizer used for training. Options: adam, sgd
  optimizer: adam
  # base learning rate.
  base_lr: 0.001
  # weight decay. see https://pytorch.org/docs/stable/generated/torch.optim.Adam.html for more information.
  weight_decay: 5e-4
  # the maximum number of epochs for training.
  max_epoch: 300
  # try not use schedule and check fluctuation of training curve. Options: none, step, cos
  scheduler: none
run:
  # the number of times to repeat the training. When is larger than one. The training will be repeated multiple times with different random seeds (+1 each time).
  repeat: 1
  # the name to save the out_dir. When is none, it will be saved in a subfold in out_dir folder using the configuration file name.
  # useful when you use the same config  template while overriding some parameters and want to save it under a different name.
  name: none
  # whether to mark the configuration file as with suffix "_done" after the training is done. It is useful when you have a large number of configurations to run.
  mark_done: false
```

During the programming first running, a new "processed" folder will create under the provided data folder which stores
the
converted torch_geometry.data.dataset format and additional processed files. This allows a one-time processing and the
next time data the same data is used, the processed file will be loaded directly to save time.

**Caution**: In case you have updated the raw files, you need to manually deleted the entire `processed` folder to let
the program
rebuild processed data from modified raw files.

The instruction above only aim to provide a start point for user to check how we did our experiment.
Please refer to https://github.com/snap-stanford/GraphGym for more details about how to config a batch experiment.

### Set up batch experiment

Batch experiment allow user to run multiple experiments with different hyper-parameters in sequential with or without
parallel. To run a batch experiment, you need to provide three configuration files. The first configuration file is the
model configuration file as described above. The second configuration file is the grid search configuration file which
specify the hyper-parameters to be searched. The third configuration file is the batch experiment configuration file
which
specify the batch experiment setting such as how many times to repeat the experiment, how many jobs to run in parallel,
etc.

### Set up grid search configuration

Grid search configuration file is a text file with each row specify a hyper-parameter to be searched. The first column
is the name of the hyper-parameter in the model configuration file, the second column is the alias of the
hyper-parameter
which will be used in the output file name, the third column is the range of the hyper-parameter to be searched.

```angular2html
# Format for each row: name in config.py; alias; range to search
# No spaces, except between these 3 fields
# Line breaks are used to union different grid search spaces
# Feel free to add '#' to add comments

gnn.dropout drop [0.0,0.3,0.6]
gnn.layers_pre_mp l_pre [1,2]
gnn.layers_mp l_mp [1,2,3]
gnn.layers_post_mp l_post [1,2]
gnn.stage_type stage ['stack','skipsum','skipconcat']
optim.max_epoch epoch [100,200,300]
train.ckpt_clean ckpt_clean [True]
```

### setup batch experiment additional configuration
Batch experiment additional configuration file is a bash file with each row specify a bash variable to be used in the
batch experiment. The first parameter `CONFIG` is the name of the model configuration file (expect to find it under
`configs/protein/` folder. The second parameter `GRID` is the name of the grid search configuration file (expect to find
it under `grids` folder. The third parameter `REPEAT` is the number of times to repeat the experiment. The fourth
parameter
`MAX_JOBS` is the number of jobs to run in parallel. The fifth parameter `SLEEP` is the time to sleep between each job.
The sixth parameter `MAIN` is the name of the main python file to execute each experiment.

```angular2html
CONFIG=${CONFIG:-protein-yeast-gcnconv}
GRID=${GRID:-protein-yeast-gcnconv}
REPEAT=${REPEAT:-3}
MAX_JOBS=${MAX_JOBS:-1}
SLEEP=${SLEEP:-0}
MAIN=${MAIN:-main}

# generate configs (aft
# er controlling computational budget)
# please remove --config_budget, if don't control computational budget
python configs_gen.py --config configs/protein/${CONFIG}.yaml \
  --grid grids/${GRID}.txt \
  --out_dir configs
#python configs_gen.py --config configs/ChemKG/${CONFIG}.yaml --config_budget configs/ChemKG/${CONFIG}.yaml --grid grids/ChemKG/${GRID}.txt --out_dir configs
# run batch of configs
# Args: config_dir, num of repeats, max jobs running, sleep time
bash parallel.sh configs/${CONFIG}_grid_${GRID} $REPEAT $MAX_JOBS $SLEEP $MAIN
# rerun missed / stopped experiments
bash parallel.sh configs/${CONFIG}_grid_${GRID} $REPEAT $MAX_JOBS $SLEEP $MAIN
# rerun missed / stopped experiments
bash parallel.sh configs/${CONFIG}_grid_${GRID} $REPEAT $MAX_JOBS $SLEEP $MAIN

# aggregate results for the batch
python agg_batch.py --dir results/${CONFIG}_grid_${GRID}
```

### run batch experiment

```angular2html
bash run_batch_yeast_gcnconv.sh
```

### Aggregate results
Run `bash run_batch_yeast_gcnconv.sh` should automatically aggregate batch experiment result into `agg` folder.
However, in case it is not generated automatically, you can manually aggregate the results by run

```angular2html 
python agg_batch.py --dir results/protein-yeast-graphsage_grid_protein-yeast-graphsage
```

## Cite

Please cite the following papers if you use this code in your own work::
[Fast Graph Representation Learning with PyTorch Geometric

[Fast Graph Representation Learning with PyTorch Geometric
](https://arxiv.org/abs/1903.02428)

```
@inproceedings{Fey/Lenssen/2019,
  title={Fast Graph Representation Learning with {PyTorch Geometric}},
  author={Fey, Matthias and Lenssen, Jan E.},
  booktitle={ICLR Workshop on Representation Learning on Graphs and Manifolds},
  year={2019},
}
```

Common issues:

1. If you have the following problem during processing the `raw` data into `processed` data

```angular2html
utf8' codec can't decode byte 0x80 in position 3131: invalid start byte
```

This is caused by a hidden `.DS_Store` file created by the Mac OS system
Use terminal enter the `protein` folder under the `raw` folder

```angular2html
ls -a # check if there is a .DS_Store file
rm .DS_Store # remove the file
rm -r ../../processed # remove the ill-created `processed` data
```

2. Mac user may encounter the following problem

```angular2html
/Users/guchunhui/Documents/GNN-PPI/torch_geometric/typing.py:25: UserWarning: An issue occurred while importing 'pyg-lib'. Disabling its usage. Stacktrace: dlopen(/Users/guchunhui/opt/anaconda3/envs/Grape-Pi/lib/python3.10/site-packages/libpyg.so, 0x0006): Library not loaded: '/Users/runner/hostedtoolcache/Python/3.10.8/x64/lib/libpython3.10.dylib'
  Referenced from: '/Users/guchunhui/opt/anaconda3/envs/Grape-Pi/lib/python3.10/site-packages/libpyg.so'
  Reason: tried: '/Users/runner/hostedtoolcache/Python/3.10.8/x64/lib/libpython3.10.dylib' (no such file), '/usr/local/lib/libpython3.10.dylib' (no such file), '/usr/lib/libpython3.10.dylib' (no such file)
  warnings.warn(f"An issue occurred while importing 'pyg-lib'. "
/Users/guchunhui/Documents/GNN-PPI/torch_geometric/typing.py:76: UserWarning: An issue occurred while importing 'torch-sparse'. Disabling its usage. Stacktrace: dlopen(/Users/guchunhui/opt/anaconda3/envs/Grape-Pi/lib/python3.10/site-packages/libpyg.so, 0x0006): Library not loaded: '/Users/runner/hostedtoolcache/Python/3.10.8/x64/lib/libpython3.10.dylib'
  Referenced from: '/Users/guchunhui/opt/anaconda3/envs/Grape-Pi/lib/python3.10/site-packages/libpyg.so'
  Reason: tried: '/Users/runner/hostedtoolcache/Python/3.10.8/x64/lib/libpython3.10.dylib' (no such file), '/usr/local/lib/libpython3.10.dylib' (no such file), '/usr/lib/libpython3.10.dylib' (no such file)
  warnings.warn(f"An issue occurred while importing 'torch-sparse'. "
```

The solution is to run the following

```angular2html
pip uninstall pyg_lib
```

Source: https://github.com/pyg-team/pyg-lib/issues/217

3. If you see the following error during batch training, it means there is no enough system resources to performance
   such
   a batch training. Please be aware of the multiplication of `num_workers` in model configuration file and `MAX_JOBS`
   in
   parallel.sh should not exceed the total number of workers (threads) available in the system. For example, if
   `num_workers`: 2 and MAX_JOBS=${MAX_JOBS:-6} will raise error in a computer with only 8 cpu threads.

```angular2html
Sanity Checking: 0it [00:00, ?it/s]Traceback (most recent call last):
  File "/home/user/miniconda3/envs/grape-pi/lib/python3.10/multiprocessing/queues.py", line 244, in _feed
  File "/home/user/miniconda3/envs/grape-pi/lib/python3.10/multiprocessing/reduction.py", line 51, in dumps
  File "/home/user/miniconda3/envs/grape-pi/lib/python3.10/site-packages/torch/multiprocessing/reductions.py", line 369, in reduce_storage
RuntimeError: user open shared memory object </torch_2070_3445168708_503> in read-write mode: Too many open files (24)
Traceback (mosusercall last):
  File "/home/user/miniconda3/envs/grape-pi/lib/python3.10/multiprocessing/queues.py", line 244, in _feed
  File "/home/user/miniconda3/envs/grape-pi/lib/python3.10/multiprocessing/reduction.py", line 51, in dumps
  File "/home/user/miniconda3/envs/grape-pi/lib/python3.10/site-packages/torch/multiprocessing/reductions.py", line 370, in reduce_storage
  File "/home/user/miniconda3/envs/grape-pi/lib/python3.10/multiprocessing/reduction.py", line 198, in DupFd
  File "/home/user/miniconda3/envs/grape-pi/lib/python3.10/multiprocessing/resource_sharer.py", line 48, in __init__
OSError: [Errno 24] Too many open files
```

4.For Mac Apple M1/M2 user, you may encounter the following error when try Grape-Pi-SAGEConv model

```angular2html
Intel MKL FATAL ERROR: This system does not meet the minimum requirements for use of the Intel(R) Math Kernel Library.
The processor must support the Intel(R) Supplemental Streaming SIMD Extensions 3 (Intel(R) SSSE3) instructions.
The processor must support the Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) instructions.
The processor must support the Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
```

Try to install the following package.

```angular2html
conda install -y clang_osx-arm64 clangxx_osx-arm64 gfortran_osx-arm64
```

Find more details, please refer to https://github.com/rusty1s/pytorch_scatter/issues/241
