Abstract:
Analysis of the interactions between target proteins and drugs is crucial not only for drug discovery, but also for a better understanding of the possible evolutionary pressure that the drugs exert on the proteins. Based on the hypothesis that similar proteins bind to similar ligands, ligand similarity is utilized with two di erent approaches. We rst introduce ligand-centric network models to analyse the relationships of protein family members via the drugs that they bind to. We build three di erent types of networks in which the proteins are represented as nodes, and two proteins are connected by an edge with a weight that depends on the number of shared identical or similar ligands. As a test case, we focus on -lactamases and Penicillin-Binding Proteins. The use of ligand sharing information to cluster proteins results in modules comprising proteins both with sequence and functional similarity. Consideration of ligand similarity not only enhances the clustering of the target proteins, but also highlights some interactions that were not detected in the identical ligand network. In the second part, we follow a machine learning approach for predicting protein-ligand interactions using Support Vector Machines (SVM) where we focus on comparing di erent ligand similarity kernels. For this task, a larger data set of GPCR and ion channels is examined. Among the 16 di erent ligand kernels we experiment with, LINGO based TF-IDF cosine similarity achieves a 0.009 better AUC score than the widely used 2D Fingerprint Tanimoto model on the GPCR data set.