Description
The goal of the gencart package is to provide a framework for 1)
visualizing spatial dependencies in genomics data, 2) generating
statistical tests of these spatial dependencies, and 3) incorporating
these spatial dependencies into spatially explicit models of genome
data analysis. This framework will be developed in the R statistical
programming language with the eventual goal of providing a web based
service of core functions to facilitate using this package for research
and education in genome biology.
This program is in development, and currently represents a unfunded
side project of James Estill.
Source code will be added to the subversion source
code repository as I can work on it. To date, this code will mostly
relate to short scripts that I have written for spatial genomics
visualization of LTR retrotransposon distribution in maize.
Anticipated Features
Plans are to develop a framework that contains the following:
- Data schema for whole genome scale sequence feature distribution
data that is compatible with existing spatial analysis packages in R
- Genome data visualization tools for exploratory data analysis of
the genomic landscape
- Genome data clustering and classifiction to facilitate 'heatmap'
visualization in the distribution of genomic data.
- Intragenomic spatial data analysis tools
- Spatial point pattern analysis in genome space
- Univariate and multivariate analysis of categorical and
continuous variables mapped in genome space
- Comparitive genome spatial data analysis tools
- Comparitive dot plots
- Automated detection of regions of synteny
- Interoperability with existing genomics software, specifically:
- Import/Export of GFF formated
genome annotation data
- Connection to SQL based annotation data (Chado and BioSQL)
- Export Circos
compatible text files for whole genome and comparitive genome
visualzation
Example Visualization
An example of a chloropleth "heatmap", and the associated color bins
for LTR Retrotransposons coverage in the maize genome is shown below.
This visualiztion provides an overview of the composition of LTR
retrotransposons across the entire genome, and quickly allows the user
to see that LTR retrotransposons have preferentially accumlated in
pericentromeric heterochromatin. This image illustrates an overview of
1) the distribution of the LTR retrotransposons coverage for the 10
chromosomes in the maize genome, 2) the histogram of the percent
coverage of all 1 MB bins in the genome, 3) the empiriculative
cumulative distirbution coverage for these data, and 4) color
assignment of these data into ten color classes using an euqal interval
clustering approach. The above visualization was generated by hand in
the R statistical programming language, and providing a tool for quick
generation of intuitive visualizations such as these could facilitate
exploratory spatial data analysis.