Abstract:Abstract: Based on the hydrophobicity and relative molecular mass, twenty amino acids were divided into 8 classes and placed on the circumference at different intervals. According to the division by z-axis coordinates, a coordinate space was established and each amino acid corresponded to a point. The amino acids were connected according to the order of amino acids in a certain protein sequence to get the 3D model of the sequence. The 3D model was converted into a 20-dimensional matrix diagram to analyze the number of amino acid pairs in the sequence and the similarity of sequences. The spatial coordinates were further converted into numerical sequences. Discrete Fourier transform (DFT) was performed on the numerical sequences to obtain the power spectrum of the original protein sequence. Then, the power spectrum of different lengths was evenly scaled to the longest length m among the compared sequences. The Euclidean distance of the new power spectral sequences was employed as a measurement of the similarities. At last, the method was tested in different datasets and the clustering results were consistent with the analysis of matrix diagrams. The comparison with other algorithms’ results showed that the method was effective and reasonable.
引用本文:
潘以红, 钱 东, 朱 平. 蛋白质序列图形变换及其相似性聚类分析[J]. 生命科学研究, 2018, 22(3): 191-200.
PAN Yi-hong, QIAN Dong, ZHU Ping. Graphical Transformation and Similarity Clustering Analysis for Protein Sequences. Life Science Research, 2018, 22(3): 191-200.