Monday, June 3, 2013

Interval tree and next generation sequencing

Recently I was working with some genome data on my leisure project. I had some huge number of coordinate of gene and various transcription element . The task was to fetch elements name  between the coordinate given by the required inputs range of the gene.

Though, the above problem challenge can be easily solved, using various approaches, but I wanted to have it best optimized way. Little R&D and small discussion with my programer cousin Najeeb,  I came across a data structure called "interval tree" .I decided to give a try and begin my exploration for this data structure.

Interval tree is basically an ordered tree data structure to hold interval, and youc can use these data structure to find the invterval within a given interval time/point.The tree can be build for both 2d and 1d datapoints.
These are dynamic, means you could add or delete node and a query time of O(log n) , preprocessing time to construct the data structure is O(n log n) and  the space consumption is O(n).
These tree has huge application in next generation sequencing and can also be used to build application to in relation time table management.

You will find many search term but I found the below link the best.Check out friends
 For more check out this tutorial : tutorial 1  (this is the best one)
You can also try your hand interval tree based problem @ rosalind

No comments:

Post a Comment