Multi-locus polony haplotyping: Data analysis

Data analysis for multi-locus polony haplotyping

Analyses of polony images are similar to what have been developed previously for two-locus haplotyping and exon typing. A few modifications were made to cope with new problems occur in multi-locus haplotyping experiements.

First of all, polony images generated in multi-locus haplotyping experiements tend to have weaker signals and higher background, because we are amplifying multiple amplicons from a complexed genomic background and the amplicon size is usually larger. This calls for a better algorithm for polony identification. In previous methods, an intensity threshold was used so that any cluster of pixelx with higher intensity is considered as a potential polony. A fixed threshold doesn't work in our cases. We have instead developed an algorithm based on the log-of-Gaussian (LOG) transformation (implemented in Pisa7.m) , which relies on gradient intead of absolution intensity for polony identification.

In addition, to improve haplotyping efficiency, we did experiments with as high polony density as possible. This brings the problem of having many partially overlapping polonies. A cluster of overlapping polony is identifed by the LOG algorithm as a single object, and usually fails to pass the morphology filter. We found that a water-shedding algorithm is most effective to separate these overlapping polonies. A set of less stringent morphology criteria is applied to these polonies.

Multi-locus haplotype calling is similar to alternative-spliced isoform calling in the exon typing experiment. However, there is no master image where all other images can register to. So the algorithm is somewhat different. Finally additional codes were written for two-locus haplotype analysis and calculation of LD statistics.

Matlab (with the image processing tool kit) is used for all data analyses. A set of Matlab programs for multi-locus haplotyping is available for download. The two programs need to be edited for each experiment are polMaster.m and getSlideInfo.m.

polMaster.m is the main entry of this data analysis package. It is the actual program you want to run. This program can be further modified for different analyses.

getSlideInfo.m contains all information about the experiment, including the file name of all images, gene name, locus name, allele etc.. This program need to be edited for EVERY new experiment.

There are a few Matlab programs that are often called by the polMaster program. These are the codes for many supporting functions. For example, boolean-like operations on polony objects can be carried out by polAdd.m polAnd.m polXor.m and polSubtract.m; PISA7.m and PISA7.fig belongs to the program for polony identification, it provides a GUI interface to make parameter adjustment easier; identification of overlapping polonies is done with countOverlappedPolonies.m; GlobalAlign.m is for image alignment; the number of overlapping polonies between two images that could occurs by chance is extimated by randomOverlaps.m.

Finally, you need to make a gel mask for each slide before doing image analysis. Having a gel mask will improve the speed. Here is the instructions of how to do it.