2013 SuccinctIntervalSplittingTreefo

From GM-RKB
Jump to navigation Jump to search

Subject Headings:

Notes

Cited By

Quotes

Author Keywords

Abstract

Analyzing functional interactions between small compounds and proteins is indispensable in genomic drug discovery. Since rich information on various compound-protein inter-actions is available in recent molecular databases, strong demands for making best use of such databases require to invent powerful methods to help us find new functional compound-protein pairs on a large scale. We present the succinct interval-splitting tree algorithm (SITA) that efficiently performs similarity search in databases for compound-protein pairs with respect to both binary fingerprints and real-valued properties. SITA achieves both time and space efficiency by developing the data structure called interval-splitting trees, which enables to efficiently prune the useless portions of search space, and by incorporating the ideas behind wavelet tree, a succinct data structure to compactly represent trees. We experimentally test SITA on the ability to retrieve similar compound-protein pairs / substrate-product pairs for a query from large databases with over 200 million compound-protein pairs / substrate-product pairs and show that SITA performs better than other possible approaches.

References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2013 SuccinctIntervalSplittingTreefoYasuo Tabei
Akihiro Kishimoto
Masaaki Kotera
Yoshihiro Yamanishi
Succinct Interval-splitting Tree for Scalable Similarity Search of Compound-protein Pairs with Property Constraints10.1145/2487575.24876372013