fetch (fetch.pl), home (fetch: Free Extraction Tool for Computational Humanoid), CopyLeft(L)

A scientific (bioinformatics) software for Linux.

Version: 1.3

fetch is an easy swissprot protein sequence retrieval utility. It is text based. It is available in Perl and Linux binary format compiled by Perl compiler (if you want to have other OS binaries from C code, ask us, but Linux is the fastest and free so it would be better to get one for yourself for the future). Our usual charge for commercial use is $2,000,000,000 dollars for any of the programs. However if you support GNU flavoured free software, it is free.

It can extract any sequences you want as long as they are in swissprot sequence database(primarily). Swissprot is distibuted as a file called 'seq.dat' (around 100 mb, Oct. 1996). fetch creates an index file (seq.idx) for the entries for speed.

You can create indexfile with fetch (without any help program), by using -c option in any directory. As long as the index file is in Swissprot database dir and pointed by SWISS or SWDIR or SWINDEX envrionment setting, you can use fetch. Also, you can specify the files at prompt. The perl index file is much smaller than C generated binary index files. It is really small compared to other ones, so it would not affect your disk space.

fetch can also make an index file for any fasta format seq. database. In that case you can set the environment setting with FASTADB, FASTAINDEX to point to the files. These functions are for speed in retrieving only sequences.

If you want to find some sequences in a proprietory fasta seq. db, you can tell the files to fetch at prompt. So, you can: fetch -af MY_FASTA.fa MY_FASTA.fa.idx, in fact, you do not need to specify -f option, as fetch can determine the format automatically. Also, if you have already created fasta db index file with fetch -c MY_FASTA.fa, you can put either of the MY_FASTA file(index or seq ), as fetch will look up the directory where the first file sits.

Example)

If you downloaded fetch.pl, you can rename it to fetch or whatever you like.
============= SETTTING UP==========================
Suppose your swissprot file is in /usr/db/swiss as 'seq.dat'
Your swissprot path ENV is set to SWISS=/usr/db/swiss
To make an index file for the first time use, you run fetch: fetch -c /usr/db/swiss/seq.dat , from either at any dir or specify /usr/db/swiss dir (if you are already in /usr/db/swiss, you can fetch -c seq.dat , this is recommanded)
You created 'seq.idx' now, from the above step.
You can copy 'seq.idx' to SWISS prot dir(/usr/db/swiss) or set env SWINDEX to any path you want to put seq.idx (say, /usr/agb/db/temp/)
==============USE of FETCH=======================
Now the setup is finished. fetch requires only two files( seq.dat and seq.idx).
ex1) fetch -a HUMAN, to fetch all the swissprot sequences from human(STDOUT). This will show the normal full swissprot entries.
ex2) fetch -a -f HUMAN, to fetch all the swissprot seq. from human but in fasta format (STDOUT)
ex3) fetch *HUMAN, same as ex1)
ex4) fetch -f *HUMAN , same as ex2)
ex5) fetch YAKO_YEAST , to fetch one single sequence of YAKO_YEAST
ex6) fetch -f YAKO_YEAST , same as ex5, but in fasta format.
ex7) fetch -l *YEAST , to get the list of all matches with YEAST in them. In one column format.
ex8) fetch -g YAKO_YEAST , same as above , but in GDF file format.
ex9) fetch -g *YEAST , guess what this should do :-)
ex10) fetch C*YEAST , fetches all the yeast seqs which have C in their names. This will fetch things like COXW_YEAST as well as YCW2_YEAST, ....
ex11) fetch -f C*YEAST , same as above, but fetches FASTA format sequences only.
ex12) fetch -af YEAST s=100 S=200 n=5 , this will get any 5 seq occurred early which has YEAST in their names in fasta format. However, the sizes of seq are between 100 and 200.
NOTE: When you use glob (*), there shouldn't be files which match the seq. names in pwd. If there are files called xxxYEAST in your pwd and if you search for *YEAST, you might get wrong result. To get around this in the dir, you can use ' ' to tell the LINUX or UNIX shell that they are not files. So, You can fetch 'C*YEAST' safely.

All the options:

-h : help
-c : create index file (seq.idx)
-f : fasta file format output
-l : list matched names in swiss
-g : output in GDF format
-a : all possible matches (globbing)
-af : all and fasta format output
-s : for specifying species (if you say, HUMAN, it will fetch all human proteins, but if you say RAT , it will fetch ARATH as well, -s option prevents getting ARATH but only RAT )
n= : number of sequences to fetch in fasta format output (e.g. n=100 , at prompt), this option will automatically set -f option
s= : the smallest sequence size (e.g. s=10, to get seq. at least size of 10 aa) this option will automatically set -f option
S= : the largest sequence size (e.g. S=1000, to get seq. less than 1000 aa ) this option will automatically set -f option

Welcome to your bug reports and enhancement requests.(jhp20@cus.cam.ac.uk)

Download

My ftp site in LMB or CPAN author JONG dir

Other programs

Geanfammer

License Policy

CopyLeft (L), but I support GPL policy as well.

Jong Park,