Inputs: a tab-delimited data file in which the first column is a SAGE tag, and
another tab-delimited file that contains the SAGE tag, EST number and description.
Output: a tab-delimited file that contains the SAGE tag from the data file in the first column, followed by the EST number(s) and description(s), if
they were found in the second file, and separated by spaces, in the second column, followed by all but the last column of data. The last row of the data
file is also removed.
If there are multiple descriptions for the same tag, the rule is that any additional description is appended to the description if it contains "EST" at the start;
otherwise, it is added to the beginning of the description if it contains "complete" or "mRNA" or "CDS" (case-insensitive); otherwise, it is appended to the description.
According to these rules, if something starts with "EST" but also contains "mRNA", it is considered an EST and is appended to the description.
1/16/2000 adnan@genetics.med.harvard.edu