1. Collect PTM Data
Gather experimentally validated PTM sites from databases (UniProt, PhosphoSitePlus). Include protein sequences and mutation (SNP) info.
Remove duplicates and low-confidence sites. Label each PTM site (e.g., phosphorylated or not).
Map PTM sites to 3D structures (AlphaFold, PDB). Extract features like solvent accessibility or disorder regions.
Use protein language models (ProtT5, ProteinBERT) for embeddings. Add sequence or structural features as model input.
Use CNN, LSTM/Transformer, GNN to predict PTM sites or mutation effects.
Evaluate model performance (accuracy, AUC). Compare predictions with experimental results.
7. Experimental Confirmation
Validate top predictions via lab experiments (mass spectrometry). Update model with new verified data.
by KBS