Accessing Databases

There are a large and ever-changing number of methods to access these databases. However, they fall into several major classes

E-mail Servers

Using an E-mail server is easy. You send a formatted message to a special Internet E-mail address, and it returns the sequences (or search results). Major drawbacks are

Internet servers

There are a growing number of services which allow access to sequence data over internet links, using either standard or specialized tools.

Entrez

Entrez is a combined bibliographic, protein sequence, and nucleotide sequence database maintained by the National Center for Biotechnology Information (NCBI). A major advantage of Entrez are interconnections between the various databases -- you can move quickly from a sequence to its reference to another sequence. Another central concept in Entrez is neighboring, the grouping together of sequences and references by computed similarity scores. Entrez has sprouted several variants. First, the Entrez database can be accessed by either CD ROM or over the Internet. Second, Entrez can be used with either a custom Graphical Interface client, using a World Wide Web browser, with a command line browser (CLEVER), or via NCBI's toolkit written in C.
World Wide Web / Gopher
Many biosequence databases are available as hypertext on the World Wide Web, or as flat files from gophers.

Local copy of database

You can, of course, maintain a local copy of a database. This is particularly advantageous for intensive users, as it prevents networks from becoming limiting factors. Most databases can be obtained by anonymous FTP; many can also be obtained on CD ROM.

This document is intended to serve as a guide to using certain bioinformatics programs. It cannot be guaranteed to be free of errors or completely up-to-date. If you know of errors or other shortcomings of this document, please mail them to Keith Robison (Church Lab, HMS Genetics)
KRobison@nucleus.harvard.edu