Data analysis for multi-locus polony haplotyping
Analyses of polony images are similar to what have been developed
previously
for two-locus haplotyping and exon typing. A few modifications were
made to cope with new problems occur in multi-locus haplotyping
experiements.
First of all, polony images generated in multi-locus haplotyping
experiements tend to have weaker signals and higher background, because
we are amplifying multiple amplicons from a complexed genomic
background and the amplicon size is usually larger. This calls for a
better algorithm for polony identification. In previous methods, an
intensity threshold was used so that any cluster of pixelx with higher
intensity is
considered as a potential polony. A fixed threshold doesn't work in our
cases. We have instead developed an algorithm based on the
log-of-Gaussian (LOG) transformation (implemented in Pisa7.m) , which
relies on gradient intead of absolution intensity for polony
identification.
In addition, to improve haplotyping efficiency, we did
experiments with as high polony density as possible. This brings the
problem of having many partially overlapping polonies. A cluster of
overlapping polony is identifed by the LOG algorithm as a single
object, and usually fails to pass the morphology filter. We found that
a water-shedding algorithm is most effective to separate these
overlapping polonies. A set of less stringent morphology criteria is
applied to these polonies.
Multi-locus haplotype calling is similar to alternative-spliced isoform
calling in the exon typing experiment. However, there is no master
image where all other images can register to. So the algorithm is
somewhat different. Finally additional codes were written for two-locus
haplotype analysis and calculation of LD statistics.
Matlab (with the image processing tool kit) is used for all data
analyses. A set of Matlab programs for multi-locus haplotyping is
available for download. The two programs
need to be edited for each
experiment are polMaster.m and getSlideInfo.m.
polMaster.m is the main entry of this data analysis package. It is the
actual program you want to run. This program can be further modified
for different analyses.
getSlideInfo.m contains all information about the experiment, including
the file name of all images, gene name, locus name, allele etc.. This
program need to be edited for EVERY new experiment.
There are a few Matlab programs that are often called by the polMaster
program. These are the codes for many supporting functions. For
example, boolean-like operations on polony objects can be carried out
by polAdd.m polAnd.m polXor.m and polSubtract.m; PISA7.m and PISA7.fig
belongs to the program for polony identification, it provides a GUI
interface to make parameter adjustment easier; identification of
overlapping polonies is done with countOverlappedPolonies.m;
GlobalAlign.m is for image alignment; the number of overlapping
polonies between two images that could occurs by chance is extimated by
randomOverlaps.m.
Finally, you need to make a gel mask for each slide before doing image
analysis. Having a gel mask will improve the speed. Here is the instructions of how to do it.