Publication

BioRxiv (2025)
Predicting protein complexes in biosynthetic gene clusters

Author

Moriwaki, Y.*; Shiraishi, T.; Katsuyama, Y.; Matsuda, K.; Ose, T.; Minami, A.; Oikawa, H.; Kuzuyama, T.; Ishitani, R.; Terada, T.

Category

Preprints

Abstract

Biosynthetic gene clusters (BGCs) are contiguous genomic regions that encode diverse, non-homologous proteins required for the production of specific natural products. Their genetic diversity underlies the structural complexity of these compounds, and their biosynthetic pathways still remain to be clarified in many cases. The biosynthetic mechanisms rely not only on substrate specificity between proteins and ligands, but also on protein–protein interactions that mediate transport of intermediates, regulation of activity, and structural stabilization. However, sequence-based functional predictions have had limited success for many uncharacterized proteins within BGCs. To address this challenge, we built a high-throughput complex prediction pipeline by replacing AlphaFold3's multiple sequence alignment generation with a faster MMSeqs2. We systematically screened 487,828 protein pairs derived from 2,437 BGCs registered in the Minimum Information about a Biosynthetic Gene cluster (MIBiG) database and predicted 15,438 heteromeric interactions with an ipTM ≥ 0.6. Among them, 1,390 protein pairs exhibited structural homology with an RMSD ≤ 2.0 Å. These predictions highlight intriguing molecular mechanisms involving proteins previously annotated as uncharacterized or potentially dysfunctional. Furthermore, we provide these results in a reusable format and as a protein interaction network map to facilitate future experimental validation by researchers.