GenCart

Description

The goal of the gencart package is to provide a framework for 1) visualizing spatial dependencies in genomics data, 2) generating statistical tests of these spatial dependencies, and 3) incorporating these spatial dependencies into spatially explicit models of genome data analysis. This framework will be developed in the R statistical programming language with the eventual goal of providing a web based service of core functions to facilitate using this package for research and education in genome biology.

This program is in development, and currently represents a unfunded side project of James Estill. Source code will be added to the subversion source code repository as I can work on it. To date, this code will mostly relate to short scripts that I have written for spatial genomics visualization of LTR retrotransposon distribution in maize.

Anticipated Features

Plans are to develop a framework that contains the following:

Data schema for whole genome scale sequence feature distribution data that is compatible with existing spatial analysis packages in R
Genome data visualization tools for exploratory data analysis of the genomic landscape

This will include chloropleth mapping, dasymetric mapping and cartograms.

Genome data clustering and classifiction to facilitate 'heatmap' visualization in the distribution of genomic data.
Intragenomic spatial data analysis tools

Spatial point pattern analysis in genome space

Univariate and multivariate analysis of categorical and continuous variables mapped in genome space

Comparitive genome spatial data analysis tools

Comparitive dot plots
Automated detection of regions of synteny

Interoperability with existing genomics software, specifically:

Import/Export of GFF formated genome annotation data
Connection to SQL based annotation data (Chado and BioSQL)
Export Circos compatible text files for whole genome and comparitive genome visualzation

Example Visualization

An example of a chloropleth "heatmap", and the associated color bins for LTR Retrotransposons coverage in the maize genome is shown below.

This visualiztion provides an overview of the composition of LTR retrotransposons across the entire genome, and quickly allows the user to see that LTR retrotransposons have preferentially accumlated in pericentromeric heterochromatin. This image illustrates an overview of 1) the distribution of the LTR retrotransposons coverage for the 10 chromosomes in the maize genome, 2) the histogram of the percent coverage of all 1 MB bins in the genome, 3) the empiriculative cumulative distirbution coverage for these data, and 4) color assignment of these data into ten color classes using an euqal interval clustering approach. The above visualization was generated by hand in the R statistical programming language, and providing a tool for quick generation of intuitive visualizations such as these could facilitate exploratory spatial data analysis.

Author: James Estill
Last Updated: October 16, 2009