Primer Finder Overview

John Aach
Last updated: 07/27/98

Table of contents

General overview

Purpose of the PrimerFinder system

PrimerFinder generates primers for the pKO3 knockout system and derivatives as documented in

Link, A.J., Phillips, D. and Church, G.M. (1997) "Methods for generating precise deletions and insertions in the genome of wild-type Escherichia coli: Application to open reading frame characterization." J. Bacteriology 179: 6228-6237.

Information on the pKO3 method can also be found here.

The pKO3 methodology requires the generation of a plasmid insert that contains flanking sequence on both sides of the coding region, but in which the coding region is replaced by a short DNA tag. To create the insert requires four primers, N flank inner and outer primers (Ni and No) and C flank inner and outer primers (Ci and Co). The inner primers have complementary 5' extensions that constitute the replacement coding sequence. The outer primers have extensions that code for restriction sites. The following diagrams depict the situation:

Chromosomal priming sites

[Chromosomal priming sites]

Primer structure

[Primer structure]

Plasmid insert

[Plasmid insert]

Among the factors that must be taken into account in picking pKO3 primers are:

PrimerFinder considers all these and other factors into account in selecting pKO3 primers. The GenomeSequence database is used as the source of information on organism sequence, gene locations, sizes and topology of chromosomes, restriction sites in the genome, and unique oligo length information. Over 50 parameters are available to provide precise specifications for the selection process. They control not only PrimerFinder's basic processing, but also how the above factors are weighted in scoring and sorting candidate primer sets and how the results should be displayed.

PrimerFinder was designed around and currently only selects primers for E coli but its processing is entirely general.

Overview of PrimerFinder processing

PrimerFinder primer selection proceeds through the following steps.

  1. Read and validate the parameter file. The parameter file provides overrides to default values built into the system. Details on parameters are given below.
  2. Get information on the specified gene and any specified restriction enzymes from the GenomeSequence database
  3. Check for situations in which the gene is entirely contained in or contains another gene, putting out a warning or error depending on parameter specifications.
  4. Using parameters and information from the database, define the regions in each gene flank that will be searched for candidate primers and collect together most of the information that will be needed to evaluate them. This involves going through the following sequence of sub-steps for each gene flank.
  1. Construct sets of candidate primers from the inner and outer primer search regions of both flanking regions. Primers are picked by moving through all 3' end locations in a search region and, for any fixed 3' end, extending 5'-wards a base at a time starting from a minimum length. Chromosomal segment Tms are computed for each candidate primer or extension, with basewise extension stopping when a Tm is found that exceeds a parameter-determined limit. For each candidate primer, self-dimer and hairpin alignments are enumerated and dimer and hairpin Tms computed and checked against parameter-specified limits. Different Tm limits apply to dimers and hairpins that involve 3' ends compared to other dimers and hairpins. Self-dimer and hairpin Tms consider the entire primer including extensions for restriction sites and coding region replacements, not just their chromosomal segments. Chromosomal segments of primers are also checked for 3'-end uniqueness in the genome and for their potential generation of genomic PCR noise. Additional details are given below. Only primers with no self-dimers and hairpins exceeding parameter thresholds, and whose 3' ends are unique and free from the potential to generate genomic PCR noise according to parameter specifications, are accepted.
  2. Enumerate all combinations of inner and outer primers for each flank which are within parameter-specified flank size limits, and compute primer-dimer alignments and check associated Tms against parameter thresholds. Again, different thresholds are used for dimers involving 3' ends compared with others. Combinations of inner and outer primers are also examined for their potential to generate genomic PCR noise. Only primer pairs passing flank size, primer-dimer Tm, and genomic PCR noise threshold tests are accepted.
  3. Enumerate all combinations of primer tetrads consisting of a primer pair from each flank. Outer primers are checked for primer-dimers and tetrads with unacceptable primer-dimer Tms are rejected. (Inner primers are not cross-compared; the pKO3 methodology requires them to dimerize. Nor are primers from different flanks checked for genomic PCR noise, as no PCR reaction involving both flanks will be done in the presence of the genome.) Remaining primer tetrads are scored by the methodology described below.
  4. Sort primer tetrads according to computed score and display the results based on parameter-supplied values.

Primer-dimer and hairpin analysis

The total number of possible primer-dimer alignments involving arbitrary numbers and sizes of bulge and interior loops between two primers of lengths m and n is potentially huge. Neglecting complementarity between aligned bases for the moment, there are mCs*nCs alignments of primers aligned at exactly s bases along their lengths, and therefore Sum mCs*nCs [s=1 to min(m,n)] total possible alignments. Complementarity can be taken into account by considering that only 4-s of any s-base alignment would, on average, be paired at all s positions.

It should be unnecessary to consider all possible primer-dimer alignments when looking for maximal primer-dimer Tms, however, since large numbers or sizes of bulge and interior loops will lower Tms both by their positive entropies and enthalpies and by diminishing the numbers of paired and nearest-neighbor paired bases. A simple heuristic has been employed in PrimerFinder in an attempt to reduce the numbers of alignments examined to manageable levels, the key elements of which are as follows:

  1. The only primer-dimer alignments that are considered have a minimal region of exact homology except for the possibility of a single bulge nucleotide in one of the primer sequences. (Regions of exact homology with no bulge nucleotides are, of course, also considered.)
  2. Outside of the region of exact homology the primers are considered to be aligned along their lengths without bulges or loops of any kind. (Note that "aligned" does not imply "paired" in this context.)

The size of the minimal region of exact homology is specified by a parameter.

Example

Assuming that the first primer has sequence acgtgtcta and the second has sequence ggcaggaact (both 5'-to-3'), and the minimal region of exact homology is 3, the following represents a primer-dimer alignment that PrimerFinder will consider.

          
      acgt|gtc|ta       = primer 1 (5'-to-3')
         t|cag|gacgg    = primer 2 (3'-to-5')
            a           = primer 2 bulge nucleotide = a
         ^      ^       = duplex portion of alignment

Hairpin alignments are dealt with in the same way, except that the two sequences are both parts of the same primer. In fact, PrimerFinder derives all hairpin alignments for a primer from the primer-dimer alignments computed for its self-dimer. As with any simple heuristic, however, some potentially important cases will be missed. For instance, PrimerFinder will never see the alignment

         t     t       = two primer 1 bulge t nucleotides
      a|gcg|a|gcc|a    = primer 1 (5'-to-3')
       |cgg|a|cgg|a    = primer 2 (3'-to-5')
        ^       ^      = duplex portion of alignment

even though it would see either one of the alignments individually.

3' end involvement

PrimerFinder considers "3' end involvement" to be any situation where a minimum number of bases at the 3' end of a hairpin or of either primer of a primer-dimer is exactly paired against itself or the other primer. The region examined for 3' end involvement has no intrinsic connection with the region of homology which defines the primer-dimer or hairpin alignment as described above; the latter either may contain the 3' end or may not, but if it does, it will not be considered a case of 3' end involvement if the bulge base is within the 3' end homology region. The purpose of tracking 3' end involvement is that, where it obtains, primers may be consumed and made unusable in PCR reactions. Making the determination of 3' end involvement independent of the region of homology used to define primer-dimers worthy of consideration leads to a more conservative and thereby safer selection of primers. It means that any primers that stick together enough anywhere along their lengths, but also have 3' end involvement, are excluded, not only primers that actually stick together at their 3' ends.

The size of the region used to determine 3' end involvement is set by a parameter.

Examples

Assuming that the minimal region of homology for primer-dimer consideration is 3, and the minimal region of exact homology for 3' end involvement is 2, the following are examples of 3' end involvement. (Regions relevant to 3' end involvement are signified by parentheses.)

      a(cg)t|gtc|ta       = primer 1 (5'-to-3')
       (gc)t|cag|gacgg    = primer 2 (3'-to-5')
            a          

      a|(cg)a|atgta       = primer 1 (5'-to-3')
       |(gc)t|caagacgg    = primer 2 (3'-to-5')
           a          

       a|cga|atg(ct)      = primer 1 (5'-to-3')
       g|gct|caa(ga)cgg   = primer 2 (3'-to-5')
          a          

This is an example of a situation without 3' end involvement:

      a|(cg)a|atgta       = primer 1 (5'-to-3')
       |(gc)t|caagacgg    = primer 2 (3'-to-5')
         a          

The reason is that the bulge base 'a' of the region of primer-dimer homology is within the region examined for 3' end involvement.

Tm calculations

Tm calculations for primers against chromosomal DNA consider only primer chromosomal segments and use the formulation given in

SantaLucia, Allawi, Senevirante, "Improved Nearest-Neighbor Parameters for Predicting DNA Duplex Stability," Biochemistry 1996 (35), 3555-3562.

This method is also applied to hairpins and primer-dimers with the following modifications:

Genomic PCR noise

"Genomic PCR noise" is PrimerFinder's name for undesired PCR products generated by primers whose 3' ends hybridize to sites in the genome other than the intended site, and which are close enough and in the right orientation to form PCR products. PrimerFinder assesses the potential for genomic PCR noise by means of a GenomeSequence database table that contains the locations of all N-mers in the genome for certain sized N.

PrimerFinder's basic test is to see whether two sites corresponding to the last N-mer of two primers can be found anywhere in the genome within a parameter-specified window where the site with the lower chromosome location is on strand 2 and the other site is on strand 1. This configuration is required for the window to lead to a productive PCR reaction. This test is executed at two points in processing: first, when individual primers are generated, and second, when inner and outer primers from a flank are assembled into pairs. The first test discovers whether a candidate primer by itself may be associated with a productive PCR window, while the second test discovers whether candidate primer pairs for flank amplification may together also amplify unwanted DNA. Further testing of primers from both flanks is unnecessary because pKO3 protocols employ different PCR reactions for amplification of the two flanks, and because the flanks are assembled and themselves amplified only after other genomic DNA is removed. Thus, primers from opposite flanks are never used together in a PCR reaction in the presence of the rest of the genome.

In addition to the window size, the value of N may also be specified as a parameter. However, due to its large size the initial implementation of the table has been confined to N=9 for the E coli K-12 MG1655 sequence. The value 9 was selected because it is the smallest value of N for which the likelihood of N-mers randomly generating PCR products of 1000 bases or less in this genome is <= 10%. More generally, if the size of the genome is assumed to be L=4639221 (the size of the M52 version of the sequence), and bases are assumed to be equiprobable, and the size of the window = K, the least value of N for which the probability of finding a productive PCR window is <= q is given by the expression N=(-log(q) + log(2) + log(L) + log(K))/2log(4). Values of N for selected K and q are as follows:

q =>

K (down)

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

500

9.1

8.9

8.7

8.6

8.5

8.5

8.4

8.4

8.3

8.3

1000

9.4

9.1

9.0

8.9

8.8

8.7

8.7

8.6

8.6

8.5

1500

9.5

9.3

9.1

9.0

8.9

8.9

8.8

8.8

8.7

8.7

2000

9.6

9.4

9.2

9.1

9.0

9.0

8.9

8.9

8.8

8.8

2500

9.7

9.4

9.3

9.2

9.1

9.0

9.0

8.9

8.9

8.9

3000

9.8

9.5

9.4

9.3

9.2

9.1

9.1

9.0

9.0

8.9

3500

9.8

9.6

9.4

9.3

9.2

9.2

9.1

9.1

9.0

9.0

4000

9.9

9.6

9.5

9.4

9.3

9.2

9.2

9.1

9.1

9.0

Aside from excluding candidate primers and primer pairs from further consideration due to genomic PCR noise, PrimerFinder considers the size of the smallest PCR window for accepted primers and primer pairs that is greater than the exclusion threshold in scoring primer tetrads. The larger this minimum window, the better the score associated with the tetrad (more on this below)

The large size of the GenomeSequence table for N-mer locations and the potentially large number of candidate primers and primer pairs to test means that genomic PCR noise checking can take a substantial amount of processing time. If it is desirable to reduce processing time, a parameter option exists by which genomic PCR noise testing can simply be turned off.

 

Targets, thresholds, and overall scoring of candidate primer sets

PrimerFinder also manages scoring at several levels -- individual primer, primer pair, and primer tetrad. Since PrimerFinder's purpose is to find the best primer tetrads for knocking out genes using the pKO3 methodology, scoring at the primer and primer pair level is designed to support this latter. PrimerFinder also scores primers and primer sets along several dimensions. Conceptually, two kinds of scoring variable are implemented, targets and thresholds.

Targets represent values to which PrimerFinder will try to find a closest match. In general, targets are associated with a cutoff value on either side of the target, and primers and primer pairs which exceed either value are excluded from further consideration. Those which survive are used in primer tetrads, and the tetrads themselves are scored according to a deviation from the target value. The smaller the deviation, the smaller (better) the score. Mathematically, the deviation from target is the square root of the sum of the squares of the differences of the individual values from the target. (I.e., it measures how close the primer or primer pair values are to the target, not their mean.)

Thresholds are one-sided cutoff values which PrimerFinder will not allow to be exceeded. As with targets, primers and primer pairs with values exceeding this cutoff are excluded from further consideration. Those which survive are used in primer tetrads, and scored according to their distance from the cutoff. However, in this case, the score is made smaller (better) when the distance is larger. Mathematically, the distance is the smallest distance of any of the primer or primer pair variable values to the threshold, suitably normalized against the possible range of the variable. (I.e., the distance represents the worst case value of the variable for any of the primers or primer pairs in the tetrad.)

The difference between targets and thresholds can be summarized this way: With a target, you get a better score for being as close as possible to a stipulated target, whereas with a threshold, you get a better score for being as far away as possible on one side of a stipulated cutoff.

Target and threshold values are generally established by PrimerFinder parameters. As targets are associated with a cutoff on either side, an additional parameter is generally required with a target value. This generally takes the form of a variable specifying an acceptable variance -- i.e., distance on either side of the target.

The following are the target and threshold variables tracked by PrimerFinder:

Targets

Thresholds

A few notes concerning these variables:

  1. How low one can successfully set the primer dimer Tm threshold depends on the restriction sites specified for the primers. If, e.g., NotI is specified as an outer primer restriction site, outer primers will all experience high Tm self dimers due to the presence of a palindromic GC-rich 8-mer NotI site at their 5' ends. To set the threshold too low will then eliminate all outer primers.
  2. Self-dimers such as the above all involve 5' ends however, so setting a lower primer dimer Tm threshold for 3' end involvement is still possible and generally advisable.
  3. Separate variables and thresholds are set for hairpins vs primer dimers because these involve different Tm calculations. Among other things, primer dimer Tms are DNA concentration-dependent while hairpins are not.
  4. The theory behind the variable shortest unique oligo length at the 3' end of a primer is that this is (an admittedly crude) measure for the potential of the primer to hybridize to the wrong chromosomal location. In E coli K-12 MG1655, for instance, there are no unique 7-mers in the genome. That means that for any primer, the last 7 bases will be capable of hybridizing to at least one other site in the chromosome than the intended primer target. In general, if the length of the shortest unique oligo at the 3' end of the chromosomal segment of a primer is n, then there will be at least 2 sites that the last n-1 bases may hybridize to, plus possibly more for the last n-2, etc. The smaller n is, then, the less potential for cross-hybridization, and ideally one would like n to be as small as possible. However, because shortest unique oligo length at the 3' end is not a variable that can be experimentally controlled, specification of a formal target or threshold cutoff is impractical. PrimerFinder treats it as a threshold with an infinite cutoff value. No primer is ever rejected from consideration, but they are considered to be better the closer they come to having a value of 0 for shortest unique oligo length at the 3' end.

Overall scoring

As primer tetrads are collected, overall scores for each tetrad are computed by applying a parameter-specified weight to each target deviation and each closest-distance-from-threshold value and summing them up. Lower overall scores indicate better primer tetrads, higher ones worse, according to the weights specified.

It is at the point of overall scoring that threshold scores are normalized according to their respective ranges. Parameter variables are available to help define the relevant ranges are for Tm and uniqueness thresholds. Primer dimer and hairpin Tm thresholds present an issue in that the minimal homology requirements for primer dimer alignments (see above) may result in many primer tetrads not having primer dimer or hairpin alignments. What Tms should be assigned for purposes of scoring the tetrad? The lower limit possible for a primer dimer Tm is, of course, -273 C, but it is unreasonable to assign this as the Tm for such tetrads. Simply because the tetrad's primers do not have the minimal homology alignment considered by PrimerFinder does not mean they have no primer dimers with Tms above absolute 0. For this reason, a parameter variable score_Tm_floor is used to set the lower limit of Tm thresholds. Any primer dimer of hairpin Tm variable without a value, or with a value below this floor, is set to the floor. The distance between the floor and the specified threshold is set as the range of the variable. A similar scheme is employed for shortest unique oligo length at the 3' end, and also for the minimum window size > the genomic PCR noise threshold.

Running PrimerFinder

PrimerFinder may be run by simply selecting parameters from the Specify PrimerFinder Parameters form and clicking on the Submit button. Values shown in the Specify PrimerFinder Parameters form are system defaults. All that is required for PrimerFinder to run is that you specify a gene to be knocked out in the appropriate section of the form. Provision and adjustment of other parameters serve to tailor the selected primers to your specifications.

PrimerFinder takes several minutes to select primers. During that time you will receive status messages indicating PrimerFinder's place in processing. Expect to wait the longest periods of time while PrimerFinder is selecting primers and primer pairs, and while it is assembling primer tetrads. Times for primer and primer pair selection may be reduced by turning off genomic PCR noise checking.

Be careful about asking PrimerFinder to show large numbers or percentages of its primer tetrads. This can lead to the generation of very large amounts of data.

A few more parameters are available than are shown in the Specify PrimerFinder Parameters form. These are mostly for debugging or to constrain PrimerFinder's use of resources. A few of these are displayed for information purposes on the form but are not modifiable.

PrimerFinder implementation

PrimerFinder is written in perl 5.0 and uses Sybperl against the GenomeSequence database. It runs on a DEC Alpha 3000.

PrimerFinder is a perl package- and object-oriented system and is based on a set of 28 packages and objects in addition to the main perl driver and CGI programs that support the web interface.

Objects used by the system are:

Non-object packages employed by the system are:

In addition, PrimerFinder employs two packages from the GenomeSequence query application:

Finally, the web interface to PrimerFinder consists mainly of two CGI scripts.

PrimerFinder parameters

Parameters are known by name codes that are used directly by the PrimerFinder system. These name codes are similar to the text labels used in the Specify PrimerFinder Parameters form but reflect a slightly different categorization than that used in the form. The categories from the form are used below. Text in parentheses by parameter name codes gives the corresponding text label from the Specify PrimerFinder Parameters form.

Database

These parameters are only used for debugging and are not presented on the Specify PrimerFinder Parameters form. They control the version of the database that PrimerFinder uses to produce output.

db_database

Name of the version of GenomeSequence that PrimerFinder will use.

db_userid

Userid used to access db_database.

db_password

Password used to access db_database.

Organism and sequence version

These parameters set the GenomeSequence organism and sequence version that will be used for primer selection. Currently only the Genbank M52 sequence version of E coli K-12 strain MG1655 is available on the database.

db_organism (Organism)

Name of the organism whose sequence PrimerFinder will analyze from the GenomeSequence database.

db_seqversion (Sequence version)

Name of the version of the sequence PrimerFinder will analyze.

Gene to be knocked out

These parameters identify the gene for which PrimerFinder will select pKO3 knock-out primers. Any of these three parameters may be specified. If more than one is provided, they must refer to the same gene.

gene_name (Gene name)

Name of the gene to be knocked out, as given on the GenomeSequence database. Currently GenomeSequence only contains Genbank names for E coli K-12 genes.

gene_accessno (Gene accession number)

Source database accession number of the gene to be knocked out, as given on the GenomeSequence database. Currently GenomeSequence only contains Blattner accession numbers ("b" numbers) for E coli K-12 genes.

gene_no (GenomeSequence gene number)

GenomeSequence internal gene number for gene.

Primer sequence characteristics

These parameters give information about sequence that PrimerFinder is to build into the 5' extensions of the primers.

primer_No_restrictsite (N outer restriction site)

Name of a restriction enzyme whose site is to be built into the N outer primer.

primer_No_restrictsite_sequence (N outer restriction site sequence)

Specifies restriction site sequence to be built into N outer primer for specified restriction enzyme. To be used only when the restriction enzyme has a consensus sequence with ambiguous bases. Otherwise, PrimerFinder can get the sequence from the GenomeSequence database.

primer_No_overhang (N outer primer overhang sequence)

Additional sequence that PrimerFinder will build in at the 5' extreme of the N outer primer to give room for restriction enzymes to operate. It is therefore mostly used with primer_No_restrictsite, which specifies the restriction site.

primer_Co_restrictsite (C outer restriction site)

Name of a restriction enzyme whose site is to be built into the C outer primer.

primer_Co_restrictsite_sequence (C outer restriction site sequence)

Specifies restriction site sequence to be built into C outer primer for specified restriction enzyme. To be used only when the restriction enzyme has a consensus sequence with ambiguous bases. Otherwise, PrimerFinder can get the sequence from the GenomeSequence database.

primer_Co_overhang (C outer primer overhang sequence)

Additional sequence that PrimerFinder will build in at the 5' extreme of the C outer primer to give room for restriction enzymes to operate. It is therefore mostly used with primer_Co_restrictsite, which specifies the restriction site.

Primer Tm and dimerization characteristics

These parameters set targets and thresholds for melting temperatures (Tms) of primer chromosomal segments, primer-dimers, and hairpins. See Targets, thresholds, and overall scoring of candidate primer sets for additional information on these parameters.

primer_Tm_target (Target Tm)

Target Tm of the 3' end of the primer that is to hybridize to chromosomal DNA in the gene flank. You can control the range of primer melting temperatures that PrimerFinder considers, which can accelerate your search for primers. If your range is too narrow, or if it disagrees with the length range that you specify, PrimerFinder will be limited in its ability to find primers.

primer_Tm_variance (Acceptable Tm variance)

The distance above or below primer_Tm_target that a primer must be within to be further considered by PrimerFinder.

primer_Tm_hairpin_max (Max acceptable Tm for hairpin)

The threshold for Tm of primer hairpins above which primers will be rejected.

primer_Tm_3p_hairpin_max (Max acceptable Tm for hairpin with 3' end involvement)

The threshold for Tm of primer hairpins with 3' end involvement above which primers will be rejected.

primer_Tm_primerdimer_max (Max acceptable Tm for primer-dimer)

The threshold for Tm of primer dimers between primers in a tetrad (excluding inner primers) above which primers will be rejected.

primer_Tm_3p_primerdimer_max (Max acceptable Tm for primer-dimer with 3' end involvement)

The threshold for Tm of primer dimers with 3' end involvement between primers in a tetrad (excluding inner primers) above which primers will be rejected.

Coding region replacement sequence specifications

These parameters define the sequence that will be programmed into the inner primers to replace the coding region of the gene to be knocked out, and also how this sequence will be managed.

replacement_sequence (Coding region replacement sequence)

The replacement sequence to be programmed into the inner primers. It will be used as-is in the C inner primer and its reverse complement in the N inner primer. The N inner primer generally contains the gene of interest's start codon and the C inner primer its stop codon. Therefore these do not have to be supplied if the replacement sequence is to code for a protein.

The replacement sequence may either be selected from the list or input in the input box. Use of the input box overrides the selection list.

keep_replacement_seq_in_frame (Keep replacement in frame)

A Yes/No indicator that instructs PrimerFinder as to whether replacement_sequence needs to be kept in frame in the knockout that will result from using PrimerFinder primers. When the value of this variable is "Y", PrimerFinder will create inner primers whose 3' ends incorporate entire codons of coding region DNA; i.e., the number of bases of coding region DNA will always be divisible by 3. If the value is "N", PrimerFinder will create primers that incorporate any number of bases of coding region DNA.

Because a "Y" value of this variable requires PrimerFinder to move through coding region DNA, it is also required that the variables primerloc_Ni_in_offset_max and primerloc_Ci_in_offset_max specify positive offsets. See descriptions of these variables for details.

Gene flank specifications

These parameters set targets and thresholds for gene flank sizes and other flank attributes. See Targets, thresholds, and overall scoring of candidate primer sets for additional information.

flank_size_target (Flank size target)

Target for the size of the N and C flanks of the gene to be knocked out that PrimerFinder primers will amplify. The flank size is the distance between the outermost end of an outer primer and the innermost end of an inner primer for a given flank.

flank_size_variance (Acceptable flank size variance)

The distance above or below flank_size_target that the flank size associated with a primer pair must be within to be further considered by PrimerFinder.

flank_restrictsite_avoid (Additional restriction sites to avoid in flank)

Restriction sites additional to any specified in primer_No_restrictsite and primer_Co_restrictsite that need to be avoided in the flank. Up to three may be specified.

Scoring weights and other scoring parameters

Score weights and other values that PrimerFinder uses to score primer tetrads. See Targets, thresholds, and overall scoring of candidate primer sets for additional information.

score_weight_Tm_variance (Weight applied to Tm variance)

Weight applied to the deviation of primer chromosomal Tms for all primers in a primer tetrad from primer_Tm_target. "Deviation" here means the square root of the sum of the squares of the differences between individual primer Tms and primer_Tm_target, divided by the specified maximum acceptable deviation from target primer_Tm_variance. (Note that this is not equal to the standard deviation of the individual primer chromosomal Tms).

score_weight_length_variance (Weight applied to length variance)

Weight applied to the deviation of primer chromosomal length for all primers in a primer tetrad from primer_length_target. "Deviation" here means the square root of the sum of the squares of the differences between individual primer lengths and primer_length_target, divided by the specified maximum acceptable deviation from target, primer_length_variance. (Note that this is not equal to the standard deviation of the individual primer chromosomal lengths).

score_weight_flanksize_variance (Weight applied to flank size variance)

Weight applied to the deviation of the N and C flank sizes associated with a primer tetrad from flank_size_target. "Deviation" here means the square root of the sum of the squares of the differences between N and C flank sizes and flank_size_target, divided by the specified maximum acceptable deviation from target flank_size_variance. (Note that this is not equal to the standard deviation of the flank sizes).

score_weight_primerdimer_Tm (Weight applied to Tm of most stable primer-dimer)

Weight applied to the maximum Tm of any primer dimer alignment for individual primers, inner and outer primers of a flank, and outer primers of both flanks, for the primers in a primer tetrad.

score_weight_primerdimer_Tm_3p (Weight applied to Tm of most stable primer-dimer with 3' end involvement)

Weight applied to the maximum Tm of any primer dimer alignment with 3' end involvement for individual primers, inner and outer primers of a flank, and outer primers of both flanks, for the primers in a primer tetrad.

score_weight_hairpin_Tm (Weight applied to Tm of most stable hairpin)

Weight applied to the maximum Tm of any hairpin alignment for the individual in a primer tetrad.

score_weight_hairpin_Tm_3p (Weight applied to Tm of most stable hairpin with 3' end involvement)

Weight applied to the maximum Tm of any hairpin alignment with 3' end involvement for the individual in a primer tetrad.

score_weight_primer_3p_uniqueoligolength (Weight applied to max 3' end unique oligo length)

Weight applied to the maximum of the shortest unique oligo lengths for the 3' end of all primers in a primer tetrad.

score_weight_min_genomic_PCR_noise_window (Weight applied to min genomic PCR noise window exceeding exclusion threshold)

Weight applied to minimum productive PCR window exceeding the genomic PCR noise exclusion threshold primer_genomic_PCR_noise_window_size for the four individual primers and two flank primer pairs in a primer tetrad.

score_Tm_floor (Floor Tm)

Minimum value to be used in scoring Tms for primer dimers and hairpins.

score_uniqueness_ceiling (Unique oligo size ceiling)

Maximum value to be used in scoring the shortest unique oligo length at a primer 3' end.

score_min_genomic_PCR_noise_window_ceiling (Genomic PCR window size ceiling)

Maximum value to be used in scoring the smallest genomic PCR noise window for an individual primer or flank primer pair that is greater than primer_genomic_PCR_noise_window_size.

Output display parameters

There parameters determine what selection of primer tetrads to display and other display options.

output_display_number, output_display_number_interpretation (Primer tetrads to display)

These two variables allow the specification of a number or a percentage of primer tetrads to display. Primer tetrads are always presented in order from best-scored to worst-scored. It is advised that only small numbers or percents ever be selected here, as PrimerFinder routinely computes and scores over 2000 candidate primer tetrads.

output_display_parameters_summary (Display parameters summary)

A Yes/No indicator that instructs PrimerFinder to present a summary of the values of all parameters.

output_display_gene_summary (Display gene summary)

A Yes/No indicator that instructs PrimerFinder to present a summary of all information on the gene to be knocked out that was extracted from the GenomeSequence database.

output_display_score_details (Display scoring details)

A Yes/No indicator that instructs PrimerFinder to present details on the scoring of every primer tetrad displayed. Such details include the alignments of the primer dimers and hairpins that have worst Tm scores, deviations of primer Tms and flank sizes from parameter-specified targets, etc.

Primer search region specifications

These parameters control how far into the coding region of the gene to be knocked out PrimerFinder will look in picking inner primers. These values are offsets into the gene coding region, not base numbers; thus the value associated with the end of the first codon is 2 not 3, and of the end of the second codon is 5 and not 6. Primer search region specifications interact with the parameter keep_replacement_seq_in_frame, for if the latter is specified as "Y", PrimerFinder will incorporate whole codons into the inner primers and must therefore examine the coding region. Values used in this case should be from the sequence 2, 5, 8, ...

These parameters interact with parameter opt_ignore_N_overlap and opt_ignore_C_overlap when overlapping genes are detected, and also with parameter keep_replacement_seq_in_frame. If an overlap is detected and PrimerFinder is instructed to ignore it, PrimerFinder will start at the gene terminus and proceed inward as far as the primerloc_Ni_in_offset_max or primerloc_Ci_in_offset_max in picking primers. If PrimerFinder is instructed not to ignore the overlap, and keep_replacement_seq_in_frame is "Y", the offsets will be applied to the beginning of the last codon of the gene to be knocked out that is affected by the overlap. If PrimerFinder is not ignoring overlaps and keep_replacement_seq_in_frame is "N", PrimerFinder will start at the last (i.e., innermost) base of the overlap.

primerloc_Ni_in_offset_max (Max offset into gene of N inner primer)

Max offset into the gene coding region to be used in searching for N inner primers.

primerloc_Ci_in_offset_max (Max offset into gene of C inner primer)

Max offset into the gene coding region to be used in searching for C inner primers.

Genomic PCR noise parameters

These parameters specify variables that are used in genomic PCR noise processing.

primer_genomic_PCR_noise_Nmer_size (N-mer size to be used in genomic PCR noise checking)

Size of primer 3' N-mer to be tested for in genomic PCR noise.

primer_genomic_PCR_noise_window_size (PCR window size threshold in genomic PCR noise checking)

Spacing between primer 3' N-mer sites that is considered to potentially generate undesired genomic PCR noise.

opt_check_genomic_PCR_noise (Perform genomic PCR checking)

Yes/No indicator controlling genomic PCR noise processing. Turning it off can improve PrimerFinder performance.

Primer Tm calculation parameters

These parameters specify variables that are used in calculating Tms. DNA concentration values are only important for primer chromosomal Tm and primer dimer computations.

primer_Tmcalc_Na_conc (Na molar concentration)

Molar concentration of Na+ that will be assumed in Tm calculations. Note that most Tm calculation formulas are based on Tm measurements made at Na+ concentrations of about 1 mol, and that the formulas extrapolate to other concentrations. It is not advised to specify of values for Na+ lower than about 0.2 mol.

primer_Tmcalc_DNA_conc (DNA strand molar concentration)

Molar concentration of DNA strands assumed in Tm calculations.

Primer processing parameters

These parameters control general features of PrimerFinder primer processing.

primer_length_target (Target primer length)

Target length for the chromosomal segment of the primers. Using this parameter and primer_length_variance, you can control the range of primer lengths that PrimerFinder considers, which can accelerate your search for primers. If your range is too narrow, or if it disagrees with the Tm range that you specify, PrimerFinder will be limited in its ability to find primers.

primer_length_variance (Acceptable primer length variance)

The distance above or below primer_length_target within which a primer must be to be further considered by PrimerFinder.

primer_min_primerdimer_homology_region (Min size homology region for primer-dimer analysis)

Size of the minimum homology region PrimerFinder will use in considering primer dimer or hairpin alignments. See Primer-dimer and hairpin analysis for further information.

primer_3p_min_exact_homology_region (Min size homology for 3' end involvement)

Size of the minimum homology region PrimerFinder will use in determining with a primer dimer or hairpin alignment involves the 3' end of a primer. See Primer-dimer and hairpin analysis for further information.

Other processing options

A variety of other processing options available in PrimerFinder. Most of these are Yes/No indicators.

opt_ignore_gene_containment (Ignore gene containment)

When "Y", instructs PrimerFinder to skip its checks for gene containment.

opt_ignore_N_overlap (Ignore gene overlaps at N terminus)

When "Y", instructs PrimerFinder to ignore any overlap it detects at the N terminus of the gene to be knocked out. This parameter interacts with primerloc_Ni_in_offset_max and keep_replacement_seq_in_frame.

opt_ignore_C_overlap (Ignore gene overlaps at C terminus)

When "Y", instructs PrimerFinder to ignore any overlap it detects at the C terminus of the gene to be knocked out. This parameter interacts with primerloc_Ci_in_offset_max and keep_replacement_seq_in_frame.

opt_restrictsite_padding (Amount of padding at restriction sites)

When PrimerFinder delimits gene flank boundaries due to discovery of a specified restriction site in the flank, it sets the end of the boundary by moving inward a certain number of bases from the site. This parameter sets that number.

opt_reject_primers_for_nonuniqueness (Reject primers with non-unique 3' ends)

When "Y", instructs PrimerFinder to reject any primer whose chromosomal segment is smaller than the length of the shortest unique oligo in the genome that ends at the primer 3' end. Thus, any primer accepted when this parameter is "Y" has enough chromosomal DNA to uniquely hybridize to the desired primer target site. When the parameter is "N", some primers may be able to hybridize at multiple places in the chromosome. Setting this parameter to "Y" is recommended; however, if primer_target_Tm is low the result may be that no primers may be found acceptable, since low Tms require shorter sequences.

opt_print_rejection_statistics (Print statistics on primer rejection)

When "Y", PrimerFinder prints short statistical summaries of how many primers, primer pairs, and primer tetrads have been rejected for various reasons. These statistics may be useful in understanding what factors may be responsible for getting too many or too few primers, primer pairs, or tetrads. This can, in turn, suggest changes in parameters that might result in better yields.

gene_override_locstart (Override gene location start)

May be used to optionally override the GenomeSequence database value of the starting location of the gene on its chromosome. Note that locstart is the location of the leftmost terminus of the coding region of the gene, not necessarily its 5' terminus location. Locstart represents the 5' terminus location of a gene only for strand 1 genes. The value of this parameter, if supplied, must be > 0 and <= the length of the chromosome containing the gene. This means that gene locations cannot be changed in such a way as to overlap the conventional sequence start point for circular chromosomes.

gene_override_locend (Override gene location end)

May be used to optionally override the GenomeSequence database value of the ending location of the gene on its chromosome. Note that locend is the location of the rightmost terminus of the coding region of the gene, not necessarily its 3' terminus location. Locend represents the 3' terminus location of a gene only for strand 1 genes. The value of this parameter, if supplied, must be > 0 and <= the length of the chromosome containing the gene. This means that gene locations cannot be changed in such a way as to overlap the conventional sequence start point for circular chromosomes.

opt_conservative_primerdimer_hairpin_Tm_calc (Use conservative primer dimer and hairpin Tm calculation)

Omits consideration of internal loops in primer dimers and hairpins when computing Tms. This eliminates entropy penalties that raise free energy, thereby resulting in a conservative Tm calculation. In effect, only stacked base pairs and initiation factors are counted, along with the hairpin loop for hairpin calculations.

opt_trace

This parameter is not available in the web interface. It allows specification of kinds of trace reports that PrimerFinder can produce to help in diagnosis and debugging of problems. Use of opt_trace typically results in huge amounts of output, which is the primary reason it is not offered through the web interface.

PrimerFinder processing limits

These parameters set limits on the numbers of primers, primer pairs, and primer tetrads that PrimerFinder will process. PrimerFinder processing is memory and cycle intensive, and too many of any of these entities will cause abnormal termination or excessive compute times. These parameters are included here for documentation purposes. They may not be overridden through the web interface.

When a processing limit is exceeded, PrimerFinder puts out a warning message and proceeds with the primers, primer pairs, and primer tetrads it has accumulated. The result is that PrimerFinder will still find the optimal primer tetrads within the accumulated primer sets, but may not find the optimum over the entire range of possibilities determined by the parameter settings. If it is important to find this optimum, try running PrimerFinder multiple times with more restrictive parameters, varying parameters so as to cover the original range.

limit_number_primers (Limit on number of primers)

Maximum number of primers of any single type (i.e., N inner, N outer, C inner, and C outer) that PrimerFinder will consider.

limit_number_primerpairs (Limit on number of primer pairs)

Maximum number of primer pairs from either the N or the C flank that PrimerFinder will consider.

limit_number_primertetrads (Limit on number of primer tetrads)

Maximum number of primer tetrads that PrimerFinder will consider.

Acknowledgments

I thank George Church and the members of his lab for advice and instruction on the overall design and details of PrimerFinder. Discussions with Dereth Phillips and Fritz Roth were of particular importance.

The suffix array program used to get shortest unique oligo lengths was written by Tim Chen, PhD, who helped me modify and run it. Quite an elegant and sophisticated program!

PrimerFinder is a replacement for an older primer picking system that was authored by Keith Robison, PhD (now at Millennium). Although PrimerFinder does not use any of the code from this system, it's overall structure and many details were inspired by it. Another very nifty and sophisticated program!

 

For further information

Please contact

John Aach
Church Lab
Department of Genetics
Harvard Medical School
phone: 617-432-0061
fax: 617-432-3698

Copyright

Copyright (c) 1998 by John Aach and the President and Fellows of Harvard University