Metadata-Version: 2.1
Name: plm-cs
Version: 1.5
Summary: Protein chemical shift prediction based on Protein Language Model
Home-page: https://github.com/doorpro/predict-chemical-shifts-from-protein-sequence.git
Author: Zhu He
Author-email: 2260913071@qq.com
License: UNKNOWN
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Description-Content-Type: text/markdown
License-File: LICENSE.txt
Requires-Dist: torch (==2.5.0)
Requires-Dist: torchaudio (==2.5.0)
Requires-Dist: torchvision (==0.20.0)
Requires-Dist: fair-esm (==2.0.0)
Requires-Dist: numpy (==2.1.2)
Requires-Dist: biopython (==1.84)
Requires-Dist: pandas (==2.2.3)

## PLM-CS  
## Predict protein chemical shifts from sequence


![image](/image/image1.png)
### Train your model
If you want to train your own PLM-CS model, this repository provides all the tools and data. Just follow these steps.

#### Requirement
    'torch == 2.5.0',
    'torchaudio == 2.5.0',
    'torchvision == 0.20.0',
    'fair-esm == 2.0.0',
    'numpy == 2.1.2',
    'biopython == 1.84',
    'pandas == 2.2.3'
#### Training set
We provide the complete training set data in [RefDB training dataset](./dataset/RefDB_test_remove). Each file in this folder is in nmrstar format, and each file corresponds to a protein. All proteins contained in the *SHIFTX test* are removed from it.
#### Training set processing
For convenience, the reasoning process of the ESM model is separate from the training process of our regression model. Therefore, we first use ESM-650M to process the data. Change the "save_path" in [esm_process.py](./esm_process.py) to your own path. A tensordataset containing the training data will be generated.
#### Train
Modify the path in the [train.py](./train.py) to your own parh. Also, be aware that this can only train a model of one type of atom at a time.
#### Training parameters
Different atom types correspond to different optimizer strategies.You can modify the corresponding parameters in the [train.py](./train.py) according to your trained model. The default number of steps for an iteration is 20,000, but you can change it to 5,000 to achieve very close performance while reducing training time
parameters     | Cα | Cβ | C | Hα | H | N
-------- |--|--|--|--|--|--|
learning rate|0.02|5e-4|0.002|0.01|5e-4|5e-4
optimizer| SGD|Adam|Adam|SGD|Adam|Adam
#### Evaluate

### Use PLM-CS through python SDK


