Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, 2018
Bacteria with resistance genes are becoming ever more common, and new methods of discovering anti... more Bacteria with resistance genes are becoming ever more common, and new methods of discovering antibiotics are being developed. One of these new methods involves researchers creating random peptides and testing their antimicrobial activity. Developing antibiotics from these peptides requires understanding which sequence motifs will be toxic to bacteria. To determine if the toxic peptides of a randomly-generated peptide library can be uniquely classified based solely on sequence motifs, we created the PepSeq Pipeline: a new software that utilizes a Random Forest algorithm to extract motifs from a peptide library. We found that this pipeline can accurately classify 56% of the toxic peptides in the peptide library using motifs extracted from the model. Testing on simulated data with less noise, we could classify up to 94% of the toxic peptides. The pipeline extracted significant toxic motifs in every library that was tested, but its ability to classify all toxic peptides depended on the number of motifs in the library. Once extracted, these motifs can be used both to understand the biology behind why certain peptides are toxic and to create novel antibiotics. The code and data used in this analysis can be found at https://github.com/tjense25/pep-seq-pipeline.
Uploads
Papers by Mark Clement