
This figure shows the relationship between a Web Service, different types of
data storage, Web and Client applications and other Web services. In this model a Web
Service contains a collection of Data Readers and Data Writers for reading
and writing of disparate types of data in different storage formats.
Examples of disparate types of data are arrays of data from microarrays and
spectral analyzers, gel images, pictures, sequences, annotations,
protocols and journal articles. Applications
access the data and information through a common API (Application
Programming Interface) that binds the appropriate reader and writer. That
is, a Web Service behaves similarly to a computer operating system which uses
device drivers to access disparate hardware devices. In addition, Web Services can provide standard
components that contain algorithms for analysis, filtering, correlating and integration.
An important advantage of a Web Service is that when a component is updated or
when new components are added then all applications benefit. Since Web
Services can be linked to other Web Services and since applications can use
multiple Web Services; Web Services can be constructed for specific
functionality such as a Web Service could be dedicated to DNA Microarrays, to
Protein arrays, to Mass Spectral data, to genomic databases or to
proteomic databases.
A Web Service can have public and private programming interfaces to enable or
restrict access to specific functionality for applications across the internet
and intranet. For example, it is possible to design a web page that displays
information retrieved by utilizing API from multiple web services.
Data can be stored in a variety of formats such as relational databases that are local or distributed
across a network, Text files or Binary files. Text files have a variety of
formats including keyword based such as GenBank and SwissProt, XML tag
structured, fixed column and other structures (e.g. Fasta format, CSV (Character
Separated Variables such as commas and tabs) ) where all files can be
viewed using a standard text editor. Binary files are usually proprietary
formatted files (e.g. Mass Spectrums, Images, pictures, ...) and can only be viewed with applications that have knowledge of
their format. Web Services may provide the capability of reading Text and
Binary files to transfer the content to relational databases where the standard
database query language (SQL) can be used for further accessing and integration
with other content.
The following table provides examples of different types of Text files being
used in bioinformatics. Documents are available that describe their format and
content.
| Keyword based text files |
Swiss-Prot entry:
Each line has a two character code
followed by a space and then the content of the line which has a well
defined format structure. The only exception is the actual sequence. |
ID 104K_THEPA STANDARD; PRT; 924 AA.
AC P15711;
DT 01-APR-1990 (REL. 14, CREATED)
DT 01-APR-1990 (REL. 14, LAST SEQUENCE UPDATE)
DT 01-AUG-1992 (REL. 23, LAST ANNOTATION UPDATE)
DE 104 KD MICRONEME-RHOPTRY ANTIGEN.
OS THEILERIA PARVA.
OC EUKARYOTA; ALVEOLATA; APICOMPLEXA; PIROPLASMIDA; THEILERIIDAE;
OC THEILERIA.
RN [1]
RP SEQUENCE FROM N.A.
RC STRAIN=MUGUGA;
RX MEDLINE; 90158697.
RA IAMS K.P., YOUNG J.R., NENE V., DESAI J., WEBSTER P.,
RA OLE-MOIYOI O.K., MUSOKE A.J.;
RT "Characterisation of the gene encoding a 104-kilodalton microneme-
RT rhoptry protein of Theileria parva.";
RL MOL. BIOCHEM. PARASITOL. 39:47-60(1990).
CC -!- SUBCELLULAR LOCATION: IN MICRONEME/RHOPTRY COMPLEXES.
CC -!- DEVELOPMENTAL STAGE: SPOROZOITE ANTIGEN.
CC --------------------------------------------------------------------------
CC This SWISS-PROT entry is copyright. It is produced through a collaboration
CC between the Swiss Institute of Bioinformatics and the EMBL outstation -
CC the European Bioinformatics Institute. There are no restrictions on its
CC use by non-profit institutions as long as its content is in no way
CC modified and this statement is not removed. Usage by and for commercial
CC entities requires a license agreement (See http://www.isb-sib.ch/announce/
CC or send an email to license@isb-sib.ch).
CC --------------------------------------------------------------------------
DR EMBL; M29954; G161866; -.
DR PIR; A44945; A44945.
KW ANTIGEN; SPOROZOITE; REPEAT.
FT DOMAIN 1 19 HYDROPHOBIC.
FT DOMAIN 905 924 HYDROPHOBIC.
SQ SEQUENCE 924 AA; 103625 MW; 4563AAA0 CRC32;
MKFLILLFNI LCLFPVLAAD NHGVGPQGAS GVDPITFDIN SNQTGPAFLT AVEMAGVKYL
QVQHGSNVNI HRLVEGNVVI WENASTPLYT GAIVTNNDGP YMAYVEVLGD PNLQFFIKSG
DAWVTLSEHE YLAKLQEIRQ AVHIESVFSL NMAFQLENNK YEVETHAKNG ANMVTFIPRN
GHICKMVYHK NVRIYKATGN DTVTSVVGFF RGLRLLLINV FSIDDNGMMS NRYFQHVDDK
YVPISQKNYE TGIVKLKDYK HAYHPVDLDI KDIDYTMFHL ADATYHEPCF KIIPNTGFCI
TKLFDGDQVL YESFNPLIHC INEVHIYDRN NGSIICLHLN YSPPSYKAYL VLKDTGWEAT
THPLLEEKIE ELQDQRACEL DVNFISDKDL YVAALTNADL NYTMVTPRPH RDVIRVSDGS
EVLWYYEGLD NFLVCAWIYV SDGVASLVHL RIKDRIPANN DIYVLKGDLY WTRITKIQFT
QEIKRLVKKS KKKLAPITEE DSDKHDEPPE GPGASGLPPK APGDKEGSEG HKGPSKGSDS
SKEGKKPGSG KKPGPAREHK PSKIPTLSKK PSGPKDPKHP RDPKEPRKSK SPRTASPTRR
PSPKLPQLSK LPKSTSPRSP PPPTRPSSPE RPEGTKIIKT SKPPSPKPPF DPSFKEKFYD
DYSKAASRSK ETKTTVVLDE SFESILKETL PETPGTPFTT PRPVPPKRPR TPESPFEPPK
DPDSPSTSPS EFFTPPESKR TRFHETPADT PLPDVTAELF KEPDVTAETK SPDEAMKRPR
SPSEYEDTSP GDYPSLPMKR HRLERLRLTT TEMETDPGRM AKDASGKPVK LKRSKSFDDL
TTVELAPEPK ASRIVVDDEG TEADDEETHP PEERQKTEVR RRRPPKKPSK SPRPSKPKKP
KKPDSAYIPS ILAILVVSLI VGIL
//
|
| XML |
Beginning and Ending Tags are paired
angular bracketed names with a format of: <name>...</name>
Data contained between beginning and ending tags can
be text, numbers, images and blobs (Binary Large Objects) |
<Elements>
<AtomicNbr>1</AtomicNbr>
<Element>Hydrogen</Element>
<Symbol>H</Symbol>
</Elements>
<Elements>
<AtomicNbr>2</AtomicNbr>
<Element>Helium</Element>
<Symbol>He</Symbol>
</Elements>
<Elements>
<AtomicNbr>3</AtomicNbr>
<Element>Lithium</Element>
<Symbol>Li</Symbol>
</Elements>
<Elements>
<AtomicNbr>4</AtomicNbr>
<Element>Beryllium</Element>
<Symbol>Be</Symbol>
</Elements>
<Elements>
<AtomicNbr>5</AtomicNbr>
<Element>Boron</Element>
<Symbol>B</Symbol>
</Elements>
... |
| Fixed Column |
4 fixed width columns with defined data
in each column |
AAV2 V 010804: N=Adeno-associated virus 2
C=AAV2
ABIMA E 003320: N=Abies magnifica
C=Red fir
ABMVW V 010816: N=Abutilon mosaic virus (isolate West India)
ABRPR E 003816: N=Abrus precatorius
C=Indian licorice
S=Crab's eye
ABSGL E 004829: N=Absidia glauca
C=Pin mould
ABUPI E 008989: N=Aburria pipile
C=Common piping guan
S=Trinidad piping guan
ABUTH E 003631: N=Abutilon theophrasti
C=China jute
S=Indian mallow
|
| Other |
FASTA - line 1 is a header with defined notation and vertical bars
separating data fields. Lines 2 to N contain the sequence with a maximum
line length.
|
>sp|P21589|5NTD|5'-NUCLEOTIDASE PRECURSOR (EC 3.1.3.5) (ECTO-NUCLEOTIDASE)
(5'-NT) (CD73 ANTIGEN).
MCPRAARAPATLLLALGAVLWPAAGAWELTILHTNDVHSRLEQTSEDSSKCVNASRCMGGVARLFTKVQQIRRAEPNVLL
LDAGDQYQGTIWFTVYKGAEVAHFMNALRYDAMALGNHEFDNGVEGLIEPLLKEAKFPILSANIKAKGPLASQISGLYLP
YKVLPVGDEVVGIVGYTSKETPFLSNPGTNLVFEDEITALQPEVDKLKTLNVNKIIALGHSGFEMDKLIAQKVRGVDVVV
GGHSNTFLYTGNPPSKEVPAGKYPFIVTSDDGRKVPVVQAYAFGKYLGYLKIEFDERGNVISSHGNPILLNSSIPEDPSI
KADINKWRIKLDNYSTQELGKTIVYLDGSSQSCRFRECNMGNLICDAMINNNLRHTDEMFWNHVSMCILNGGGIRSPIDE
RNNGTITWENLAAVLPFGGTFDLVQLKGSTLKKAFEHSVHRYGQSTGEFLQVGGIHVVYDLSRKPGDRVVKLDVLCTKCR
VPSYDPLKMDEVYKVILPNFLANGGDGFQMIKDELLRHDSGDQDINVVSTYISKMKVIYPAVEGRIKFSTGSHCHGSFSL
IFLSLWAVIFVLYQ |