In addition to the web interface to the HMMER software, we also provide access to it via RESTful web services. REST (or REpresentational State Transfer) refers to a style of building web services which makes it easy to interact programmatically with the site. A programmatic interface, commonly called an Application Programming Interface (API) allows users to write scripts or programs to access data, rather than having to rely on a browser to view a site. Below is a list of services provided, some examples of how to use the API and the supported data formats. This section should be used in conjunction with the search help page which defines the parameters for modifying how HMMER performs the search.
A RESTful service sends and receives data over HTTP, the same protocol that is used by websites and browsers. As such, the services provided through a RESTful interface are identified and controlled using URLs. In the HMMER website we use the same URLs to provide both the standard HTML browser representation of your search results and those request by REST in an alternative format, such as XML or JSON. To submit a phmmer search to our servers via your browser you visit the following URL:
http://hmmer.janelia.org/search/phmmer
When operating through a browser, the options in the form are serialized to a string that consists of a set of parameters and values. This string is then parsed by our servers to tailor the search according to your criteria. When submitting a request via REST, you will need to specify these. Those that apply to controlling the search itself are describe in detail in the search section.
The returned format is controlled by the how the page is requested. This is done in one of two ways - if the Content-Type or Accept field is set in the HTTP header to one of the following:
Then JSON, YAML or XML will be returned. But do not worry too much, as the Content-Type is often set automatically based on how you formulate the request i.e. if you send the form parameters as XML, you get an XML response. Just for completeness, normally your browser sets the content-type to one of the following:
If the Content-Type is set to be one of these, then a HTML response will be generated.
One thing to note with a RESTful interface is how the URLs are sent to the server. HTTP has several different request methods.
The basic thing to remember with REST is that HTTP methods translate to different operations, so the following HTTP methods do the following:
All objects in REST have a URI. So, every entity in the system has a unique URI. If an object needs to be updated, this is achieved by POSTing a document to that object"s URI. If that object then needs to be removed, an HTTP DELETE request is issued to that object's URI. When a search is POST-ed to the server, it creates a new object, with a unique URI, that can then be queried.
curlYou do not need be an expert programmer to retrieve data via our RESTful
interface. A widely used machine parsable format is XML, the following section demonstrates
a simple way of sending and retrieving XML using the simple Unix command line tool
curl. The following POSTs the request to the server (our server
configuration requires you to also unset the default value in the header for Expect,
-H 'Expect:'):
shell% curl -L -H 'Expect:' -H 'Accept:text/xml' -F seqdb=pdb -F algo=phmmer -F seq='<test.seq' http://hmmer.janelia.org/search/phmmer
should give the following response in XML:
<?xml version="1.0" encoding="UTF-8"?>
<opt>
<data name='results' resultSize='224339'>
<_internal highbit='370.5' lowbit='19.0' numberSig='242' offset='42280'>
<timings search='0.283351' unpack='0.176821' />
</_internal>
<hits
name='2abl_A'
acc='2abl_A'
bias='0.1'
desc='mol:protein length:163 ABL TYROSINE KINASE'
evalue='1.1e-110'
ndom='1'
nincluded='1'
nregions='1'
reported='1'
score='370.5'
species='Homo sapiens'
taxid='9606' >
<domains
aliL='163'
aliM='163'
aliN='163'
aliaseq='MGPSENDPNLFVALYDFVASGDNTLSITKGEKLRVLGYNHNGEWCEAQTKNGQGWVPSNYITPVNSLEKHSWYHGPVSRNAAEYLLSSGINGSFLVRESESSPGQRSISLRYEGRVYHYRINTASDGKLYVSSESRFNTLAELVHHHSTVADGLITTLHYPAP'
alihmmfrom='1'
alihmmname='2abl_A'
alihmmto='163'
alimline='+gpsendpnlfvalydfvasgdntlsitkgeklrvlgynhngewceaqtkngqgwvpsnyitpvnslekhswyhgpvsrnaaeyllssgingsflvresesspgqrsislryegrvyhyrintasdgklyvssesrfntlaelvhhhstvadglittlhypap'
alimodel='lgpsendpnlfvalydfvasgdntlsitkgeklrvlgynhngewceaqtkngqgwvpsnyitpvnslekhswyhgpvsrnaaeyllssgingsflvresesspgqrsislryegrvyhyrintasdgklyvssesrfntlaelvhhhstvadglittlhypap'
alippline='8*****************************************************************************************************************************************************************9'
alisqacc='2abl_A'
alisqdesc='mol:protein length:163 ABL TYROSINE KINASE'
alisqfrom='1'
alisqname='2abl_A'
alisqto='163'
bias='0.05'
bitscore='370.357543945312'
envsc='250.653518676758'
cevalue='4.21e-121'
ievalue='4.21e-121'
iali='1'
ienv='1'
is_included='1'
is_reported='1'
jali='163'
jenv='163'
/>
</hits>
.
.
.
</data>
</opt>
In this example, the sequence to be searched is in the file test.seq. When using curl the value of the parameter "seq" needs to be quoted so that its value is taken correctly from the file "test.seq". The other parameters can also be added directly to the URL, as a regular CGI-style parameter, if you prefer.
Most programming languages have the ability to send HTTP requests and receive HTTP responses. A Perl script to submit a search and receive the responses as XML might be as trivial as this:
In this case the LWP module recognises the content as being XML and sets the Content-Type for you. The response from
the server is identical to that obtained from using curl. Notice there are in fact two requests to the
server. The first posts the job to the server, the second then fetches the result. The location of the result is
specified in the response from the first request.
Although XML is just plain text and therefore human-readable, it's intended to be parsed into a data structure. Extending the Perl script above, we can add the ability to parse the XML using an external Perl module, XML::LibXML:
This script now prints out the name, description and E-value of all significant sequence hits for the given query sequence as a tab delimited file.
2abl_A mol:protein length:163 ABL TYROSINE KINASE 1.1e-110 2fo0_A mol:protein length:495 Proto-oncogene tyrosine-protein kinase ABL1 ( 8.4e-109 1opk_A mol:protein length:495 Proto-oncogene tyrosine-protein kinase ABL1 8.4e-109 1opl_A mol:protein length:537 proto-oncogene tyrosine-protein kinase 9.7e-109 1ab2_A mol:protein length:109 C-ABL TYROSINE KINASE SH2 DOMAIN 3.3e-62 3k2m_A mol:protein length:112 Proto-oncogene tyrosine-protein kinase ABL1 3.1e-61 2ecd_A mol:protein length:119 Tyrosine-protein kinase ABL2 6.5e-58 1abo_A mol:protein length:62 ABL TYROSINE KINASE 1.1e-38 3eg1_A mol:protein length:63 Proto-oncogene tyrosine-protein kinase ABL1 1.6e-38 3eg0_A mol:protein length:63 Proto-oncogene tyrosine-protein kinase ABL1 1.7e-38 3eg3_A mol:protein length:63 Proto-oncogene tyrosine-protein kinase ABL1 3.3e-38 1ju5_C mol:protein length:61 Abl 8.4e-38 1bbz_A mol:protein length:58 ABL TYROSINE KINASE 7.0e-36 2o88_A mol:protein length:58 Proto-oncogene tyrosine-protein kinase ABL1 9.1e-35 1awo_A mol:protein length:62 ABL TYROSINE KINASE 1.7e-34 ......
The main two input parameters to a phmmer search are a protein sequence and the target database, defined using the seq and seqdb parameters respectively. Other parameters for controlling the search are defined in the search section. If any of these parameters are omitted, then the default values for that parameter will be set.
Searches should be POST-ed to the following url:
http://hmmer.janelia.org/search/phmmere.g.
curl -L -H 'Expect:' -H 'Accept:text/xml' -F seqdb=pdb -F seq='<test.seq' http://hmmer.janelia.org/search/phmmer
When using the website, we also perform a Pfam search by default. However, when using the API you will only be returned the phmmer results. To get Pfam search results, use the hmmscan interface.
Hmmscan also has two main parameters - a sequence and a profile HMM database - defined using the seq and hmmdb parameters respectively. We currently offer four profile HMM databases: Pfam, TIGRFAMs, Gene3D and Superfamily. When searching against the former two, the cut-offs can be defined by the user (other parameters for controlling the search are defined in the search section). With the latter two HMM databases, all cut-off parameters will be ignored and the HMM database default parameters will be used. This is because Gene3D and Superfamily both use their own post-processing mechanisms to defined their domains, in addition to the hmmscan results.
Searches should be POST-ed to the following url:
http://hmmer.janelia.org/search/hmmscane.g.
curl -L -H 'Expect:' -H 'Accept:text/xml' -F hmmdb=pfam -F seq='<test.seq' http://hmmer.janelia.org/search/hmmscan
The input to hmmsearch on the web is either a multiple sequence alignment or a hidden Markov model in HMMER3 format. We do not support HMMER2 format as these HMMs are not forward compatible with HMMER3. When uploading a multiple sequence alignment, an HMM is built on the server using hmmbuild with the default parameters.
Searches should be POST-ed to the following url:
http://hmmer.janelia.org/search/hmmsearche.g.
curl -L -H 'Expect:' -H 'Accept:text/xml' -F seqdb=pdb -F seq='<test.ali' http://hmmer.janelia.org/search/hmmsearch
Commonly used parameters for hmmsearch are listed below.
| Parameter | Description | Accepted values | Example | Default/Without Parameter | Notes |
|---|---|---|---|---|---|
| seqdb | The sequence database to be search against. | env_nr | nr | pdb | rp15 | rp35 | rp55 | rp75 | swissprot | unimes | uniprotkb | uniprotrefprot | seqdb=pdb | Required, there is no default. If absent an error will be returned. | The sequence database to search against. You can not currently perform profile-profile HMM searches using HMMER, hmmdb is not an accepted parameter. |
| seq | The query sequence | A protein sequence alignment in stockholm format, of an HMM | # STOCKHOLM 1.0 KLRVLGY.HNGEWCEAQTKNGQGWVPSNYITPVNSLEKHSWYHGPASRN.AEY KLRVLGYNHN.EWC.AQSKNGQGWVPSNYITPVNSIDKHSWYHGPVSRNAAEY // |
Required, there is no default. If absent an error will be returned. | The STOCKHOLM format is a specific input format to HMMER. There are methods in bioperl that allow you to convert between different alignment formats. |
Jackhmmer is an iterative search algorithm that can be initiated with a sequence, multiple sequence alignment or profile HMM. The number of iterations to run can be supplied as an additional parameter and will perform a succession of searches until the job has completed. Fetching the results is a little more complicated, as the search may finish before the number of iterations if it converges.
Searches should be POST-ed to the following url:
http://hmmer.janelia.org/search/jackhmmere.g.
curl -L -H 'Expect:' -H 'Accept:text/xml' -F seqdb=pdb -F iterations=5 -F seq='<test1.fa' http://hmmer.janelia.org/search/jackhmmer
Some commonly used parameters for jackhmmer are (see search for more details):
| Parameter | Description | Accepted values | Example | Default/Without Parameter | Notes |
|---|---|---|---|---|---|
| seqdb | The sequence database to be search against. | env_nr | nr | pdb | rp15 | rp35 | rp55 | rp75 | swissprot | unimes | uniprotkb | uniprotrefprot | seqdb=pdb | Required, there is no default. If absent an error will be returned. | The sequence database to search against. You can not currently perform profile-profile HMM searches using HMMER, hmmdb is not an accepted parameter. |
| seq | The query sequence, alignment or profile HMM | A protein sequence in FASTA format, multiple sequence alignment or HMMER3 profile HMM | Required, there is no default. If absent an error will be returned. | See notes on phmmer/hmmsearch. | |
| iterations | The number of iterations. | Integer between 1 and 5 | 5 | iterations=5 | The maximum number of searches that will be performed. If the search converges (no additional sequences found compare to the previous search), then the search will stop. |
In addition to the standard HMMER searches an uploaded sequence can be annotated to show signal peptide & transmembrane regions, disordered regions and coiled-coil regions.
Annotation requests should be POST-ed to the following urls:
http://hmmer.janelia.org/annotation/disordere.g.
curl -L -H 'Expect:' -H 'Accept:text/xml' -F seq='<test.fa' http://hmmer.janelia.org/annotation/disorder
http://hmmer.janelia.org/annotation/coilse.g.
curl -L -H 'Expect:' -H 'Accept:text/xml' -F seq='<test.fa' http://hmmer.janelia.org/annotation/coils
http://hmmer.janelia.org/annotation/phobiuse.g.
curl -L -H 'Expect:' -H 'Accept:text/xml' -F seq='<test.fa' http://hmmer.janelia.org/annotation/phobius
Parameters used for annotations:
| Parameter | Description | Accepted values | Example | Default/Without Parameter |
|---|---|---|---|---|
| seq | The query sequence | A protein sequence in FASTA format | seq='<test.fa' | Required, there is no default. If absent an error will be returned. |
Annotation results can be fetched with a GET request using the UUID supplied in the POST response:
http://hmmer.janelia.org/annotation/<annotation-type>/UUIDe.g.
curl -H 'Expect:' -H 'Accept:text/xml' http://hmmer.janelia.org/annotation/phobius/4162F712-1DD2-11B2-B17E-C09EFE1DC403
Search results can be retrieved using the job identifier that is returned in your initial search response. The job identifier is a UUID (Universally Unique Identifiers) that is in the format of:4162F712-1DD2-11B2-B17E-C09EFE1DC403. Thus, to retrieve your job, you can use the following URL in a GET request:
http://hmmer.janelia.org/results/hmmscan/$your_uuide.g.
http://hmmer.janelia.org/results/hmmscan/4162F712-1DD2-11B2-B17E-C09EFE1DC403
This is one of the few services where the returned format can be modified using a parameter.
| Parameter | Description | Accepted values | Example | Default/Without Parameter | Notes |
|---|---|---|---|---|---|
| range | The range of the results to retrieve | Integer,Integer | range=1,100 | All results | The results are ordered by E-value and as there can be thousands of matches to your query, it can be useful to retrieve a subset of results. The range is two, unsigned, comma separated integers. The first integer is expected to be less than the second integer. To retrieve one row, just fetch using a range where the two integers are the same value. If your first integer is in range, and your second is out of range, the second integer will be modified to include all results. i.e. If your results set is only 300 in size, and a range of 1,1000 is requested, then you will get 300 results. If your starting integer is "out" of range, then no results will be returned. |
| ali | Return alignments. | true | 1 | ali=1 | No alignments will be returned | Sometimes you are not so interested in the alignment of the match to the query sequence. By default no alignments are returned, to keep results compact. |
| output | Modify the format that the results are returned in. | xml | json | text | yaml | html | output=text | The format of the results can be modified with by setting "output=$format". The same can be achieved by setting the "Accept" field in the HTTP header. If both the HTTP header and the parameter are set, we currently assume that the parameter is the desired format. |
Once you have finished with a result, you can either leave it on the server for a week and we will delete it, or you can.
In the following section there are some different examples of using the API using Python and Java. An example Perl client can be found above. Code is also beginning to become available in some of the common Bio-software packages, such as BioJava.
The following piece of python is a little more complex than those discussed previously. In this case, we submit a search to the server, but stop the HTTP handler from automatically following the redirection to the results page. Instead, a custom handler is define that grabs the redirection URL and modifies it by the addition of parameters such that it fetches just the first 10 matches in JSON format, rather than grabbing the whole response. This can be useful when the results are large and you want to paginate the response, or if you are only interested in the most significant sequence matches.
import urllib, urllib2
# install a custom handler to prevent following of redirects automatically.
class SmartRedirectHandler(urllib2.HTTPRedirectHandler):
def http_error_302(self, req, fp, code, msg, headers):
return headers
opener = urllib2.build_opener(SmartRedirectHandler())
urllib2.install_opener(opener);
parameters = {
'seqdb':'pdb',
'seq':'>Seq\nKLRVLGYHNGEWCEAQTKNGQGWVPSNYITPVNSLENSIDKHSWYHGPVSRNAAEY'
}
enc_params = urllib.urlencode(parameters);
#post the seqrch request to the server
request = urllib2.Request('http://hmmer.janelia.org/search/phmmer',enc_params)
#get the url where the results can be fetched from
results_url = urllib2.urlopen(request).getheader('location')
# modify the range, format and presence of alignments in your results here
res_params = {
'output':'json',
'range':'1,10'
}
# add the parameters to your request for the results
enc_res_params = urllib.urlencode(res_params)
modified_res_url = results_url + '?' + enc_res_params
# send a GET request to the server
results_request = urllib2.Request(modified_res_url)
data = urllib2.urlopen(results_request)
# print out the results
print data.read()
The following is a very basic Java source file that, once compiled and executed performs an hmmscan search. The response is returned in JSON format. With this two stage POST and GET, you can POST the request in one format and get a response back in another by setting the Accept type. To get this example to work, you should save the code in a file called RESTClient.java. Then run the command 'javac RESTClient.java'. Assuming that this is successful and a file called RESTClient.class is produced, you can execute the class by running the command 'java RESTClient'
import java.net.*;
import java.io.*;
public class RESTClient{
public static void main(String[] args) {
try {
URL url = new URL("http://hmmer.janelia.org/search/hmmscan");
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setDoOutput(true);
connection.setDoInput(true);
connection.setInstanceFollowRedirects(false);
connection.setRequestMethod("POST");
connection.setRequestProperty("Content-Type", "application/x-www-form-urlencoded");
connection.setRequestProperty("Accept", "application/json");
//Add the database and the sequence. Add more options as you wish!
String urlParameters = "hmmdb=" + URLEncoder.encode("pfam", "UTF-8") +
"&seq=" + ">seq\nEMGPSENDPNLFVALYDFVASGDNTLSITKGEKLRVLGYNHNGEWCEAQTKNGQGWVPSNYITPV" +
"NSLEKHSWYHGPVSRNAAEYLLSSGINGSFLVRESESSPGQRSISLRYEG" +
"RVYHYRINTASDGKLYVSSESRFNTLAELVHHHSTVADGLITTLHYPAP";
connection.setRequestProperty("Content-Length", "" +
Integer.toString(urlParameters.getBytes().length));
//Send request
DataOutputStream wr = new DataOutputStream (
connection.getOutputStream ());
wr.writeBytes (urlParameters);
wr.flush ();
wr.close ();
//Now get the redirect URL
URL respUrl = new URL( connection.getHeaderField( "Location" ));
HttpURLConnection connection2 = (HttpURLConnection) respUrl.openConnection();
connection2.setRequestMethod("GET");
connection2.setRequestProperty("Accept", "application/json");
//Get the response and print it to the screen
BufferedReader in = new BufferedReader(
new InputStreamReader(
connection2.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null)
System.out.println(inputLine);
in.close();
} catch(Exception e) {
throw new RuntimeException(e);
}
}
}
A jackhmmer is a multipart search. The following Perl code performs a series of requests to the server. The first POST request generates the jobs, the while loop then performs GET requests to get the job status, until the status of the job is done. The last request GETs the results of the last iteration, which are returned in JSON format.
#!/usr/bin/env perl
use strict;
use warnings;
use LWP::UserAgent;
use JSON;
#Get a new Web user agent.
my $ua = LWP::UserAgent->new;
$ua->timeout(60);
$ua->env_proxy;
#Set a new JSON end encoder/decoder
my $json = JSON->new->allow_nonref;
#-------------------------------------------------------------------------------
#Set up the job
#URL to query
my $rootUrl = "http://hmmer.janelia.org";
my $url = $rootUrl."/search/jackhmmer";
my $seq = ">2abl_A mol:protein length:163 ABL TYROSINE KINASE
MGPSENDPNLFVALYDFVASGDNTLSITKGEKLRVLGYNHNGEWCEAQTKNGQGWVPSNYITPVNSLEKHS
WYHGPVSRNAAEYLLSSGINGSFLVRESESSPGQRSISLRYEGRVYHYRINTASDGKLYVSSESRFNTLAE
LVHHHSTVADGLITTLHYPAP";
my %content = (
'algo' => 'jackhmmer',
'seq' => $seq,
'seqdb' => 'pdb',
iterations => 5,
);
#-------------------------------------------------------------------------------
#Now POST the request and generate the search job.
my $response = $ua->post(
$url,
'content-type' => 'application/json',
Content => $json->encode( \%content )
);
if($response->status_line ne "201 Created"){
die "Failed to create job, got:".$response->status_line;
}
my $job = $json->decode( $response->content );
print "Generated job UUID:".$job->{job_id}."\n";
#Follow the redicrection to the resouce create for the job.
my $job_location = $response->header("location");
#Now poll the server until the job has finished
$response = $ua->get( $job_location, 'Accept' => 'application/json' );
my $max_retry = 50;
my $count = 1;
while ( $response->status_line eq '200 OK' ) {
my $status = $json->decode( $response->content );
print "Checking status ($count)......";
if ( $status->{status} eq 'DONE' ) {
print "Job done.\n";
last;
}
elsif ( $status->{status} eq 'ERROR' ) {
print "Job failed, exiting!\n";
exit(1);
}
elsif ( $status->{status} eq 'RUN' or $status->{status} eq 'PEND' ) {
my ($lastIteration) = $status->{result}->[-1]->{uuid} =~ /\.(\d+)/;
print "Currently on iteration $lastIteration [$status->{status}].\n";
}
if ( $count > $max_retry ) {
print "Jobs should have finished.....exiting\n";;
exit(1);
}
#Job still running, so give it a chance to complete.
sleep(5);
#Check again on the job status...
$response = $ua->get( $job_location, 'Accept' => 'application/json' );
$count++;
}
#Job should have finished, but we may have converged, so get the last job.
my $results = $json->decode( $response->content );
my $lastIteration = pop( @{ $results->{result} } );
#Now fetch the results of the last iteration
my $searchResult = $ua->get( $rootUrl."/results/score/".$lastIteration->{uuid}, 'Accept' => 'application/json' );
unless( $searchResult->status_line eq "200 OK"){
die "Failed to get search results\n";
}
#Decode the content of the full set of results
$results = $json->decode( $searchResult->content );
print "Matched ".$results->{'results'}->{'stats'}->{'nincluded'}." sequences ($lastIteration->{uuid})!\n";
#Now do something more interesting with the results......
So far, the submission of batch searches via REST has not really been mentioned. This is because we do not anticipate this being so useful as you can programmatically send sequence after sequence. However, a batch upload of sequences is possible for phmmer and hmmscan. The main difference is that instead of using the seq parameter, we use the file parameter. There is also a subtle difference in the way that the curl command is formulated. Rather than using a redirect (<), a @ symbol is used to force the content part of the request to be what is contained within the file, rather than being attached to the parameter.
curl -L -H 'Expect:' -H 'Accept:text/xml' -F seqdb=pdb -F file='@batch.fasta' http://hmmer.janelia.org/search/phmmer
It is also possible to include an email address for notification of when the batch search has been processed. Again, not particularly useful for an API, but it may be useful for keeping track of a pipeline. To specify an email via the command line, simply use the parameter email and set this to a valid email address. All of the other phmmer or hmmscan search parameters apply to the batch search.
curl to fetch results is very easy:
curl -L -H 'Expect:' -H 'Accept:text/xml' http://hmmer.janelia.org/results/phmmer/CF5BCDA4-0C7E-11E0-AF4F-B1E277D6C7BA?output=text&ali=1&range=1,2
In this case we want to fetch the first two hits, with their alignments as a textual output format.
phmmer results for job CF5BCDA4-0C7E-11E0-AF4F-B1E277D6C7BA: Target Num-hits Bias Bit-Score E-value Tax-Id Species Description ================================================================================================================ 2abl_A 1 0.1 370.5 1.1e-110 9606 Homo sapiens mol:protein length:163 ABL TYROSINE KINASE ---------------------------------------------------------------------------------------------------------------- Target-env-start Target-env-end Target-ali-start Target-ali-end Query-start Query-end E-value ---------------------------------------------------------------------------------------------------------------- 1 163 1 163 1 163 4.21e-121 QUERY lgpsendpnlfvalydfvasgdntlsitkgeklrvlgynhngewceaqtkngqgwvpsnyitpvnslekhswyhgpvsrnaaeyllssgingsflvresesspgqrsislryegrvyhyrintasdgklyvssesrfntlaelvhhhstvadglittlhypap MATCH +gpsendpnlfvalydfvasgdntlsitkgeklrvlgynhngewceaqtkngqgwvpsnyitpvnslekhswyhgpvsrnaaeyllssgingsflvresesspgqrsislryegrvyhyrintasdgklyvssesrfntlaelvhhhstvadglittlhypap PP 8*****************************************************************************************************************************************************************9 TARGET MGPSENDPNLFVALYDFVASGDNTLSITKGEKLRVLGYNHNGEWCEAQTKNGQGWVPSNYITPVNSLEKHSWYHGPVSRNAAEYLLSSGINGSFLVRESESSPGQRSISLRYEGRVYHYRINTASDGKLYVSSESRFNTLAELVHHHSTVADGLITTLHYPAP ---------------------------------------------------------------------------------------------------------------- ================================================================================================================ 2fo0_A 1 0.1 364.3 8.4e-109 9606 Homo sapiens mol:protein length:495 Proto-oncogene tyrosine-protein kinase ABL1 ( ---------------------------------------------------------------------------------------------------------------- Target-env-start Target-env-end Target-ali-start Target-ali-end Query-start Query-end E-value ---------------------------------------------------------------------------------------------------------------- 33 195 34 195 2 163 4.15e-119 QUERY gpsendpnlfvalydfvasgdntlsitkgeklrvlgynhngewceaqtkngqgwvpsnyitpvnslekhswyhgpvsrnaaeyllssgingsflvresesspgqrsislryegrvyhyrintasdgklyvssesrfntlaelvhhhstvadglittlhypap MATCH gpsendpnlfvalydfvasgdntlsitkgeklrvlgynhngewceaqtkngqgwvpsnyitpvnslekhswyhgpvsrnaaeyllssgingsflvresesspgqrsislryegrvyhyrintasdgklyvssesrfntlaelvhhhstvadglittlhypap PP 9****************************************************************************************************************************************************************9 TARGET GPSENDPNLFVALYDFVASGDNTLSITKGEKLRVLGYNHNGEWCEAQTKNGQGWVPSNYITPVNSLEKHSWYHGPVSRNAAEYLLSSGINGSFLVRESESSPGQRSISLRYEGRVYHYRINTASDGKLYVSSESRFNTLAELVHHHSTVADGLITTLHYPAP ---------------------------------------------------------------------------------------------------------------- ================================================================================================================ Search Details ============== Date Started: 2010-12-20 16:19:20 Cmd: phmmer -E 10 --domE 10 --incE 0.01 --incdomE 0.03 --mx BLOSUM62 --pextend 0.4 --popen 0.02 --seqdb 1 Database: pdb, downloaded on 2010-12-11 Query: >2abl_A mol:protein length:163 ABL TYROSINE KINASE MGPSENDPNLFVALYDFVASGDNTLSITKGEKLRVLGYNHNGEWCEAQ TKNGQGWVPSNYITPVNSLEKHSWYHGPVSRNAAEYLLSSGINGSFLV RESESSPGQRSISLRYEGRVYHYRINTASDGKLYVSSESRFNTLAELV HHHSTVADGLITTLHYPAP Stats ===== page:1 nhits:322 ...... 47 nreported:322 domZ:322
The results format is currently are large mass of data that can be a little complex when first looked at. But, the data structure is fairly simple and is represented pictorially below:
In the following sections the contents of each part of the results data structure will be described. Parts of the data structure will be referred to as hashes ( key, value pairs) or arrays, but depending on the type of response requested will translate into different entities, for example elements and attributes for an XML response.
Only parts of the response actually deemed useful will be described.
| Key | Value |
|---|---|
| stats | The stats hash |
| hits | Array of sequence hashes |
| uuid | The unique job identifier |
| algo | The HMMER search algorithm |
| searchDB | The target search database |
| _internal | Hash containing some internal accounting |
The stats hash contains some brief summary statistics about the job.
| Key | Value |
|---|---|
| nhits | The number of hits found above reporting thresholds |
| Z | The number of sequences or models in the target database |
| domZ | The number of hits in the target database |
| nmodels | The number of models in this search |
| nincluded | The number of sequences or models scoring above the significance threshold |
| nreported | The number of sequences or models scoring above the reporting threshold |
The hits array contains one or more sequences. Only parts of the response actually deemed useful will be described. With the non-redundant databases, the redundant sequence information will also be included, but as the sequences are identical, the information about the hit is identical.
| Key | Value |
|---|---|
| name | Name of the target (sequence for phmmer/hmmsearch, HMM for hmmscan) |
| acc | Accession of the target |
| acc2 | Secondary accession of the target |
| id | Identifier of the target |
| desc | Description of the target |
| score | Bit score of the sequence (all domains, without correction) |
| pvalue | P-value of the score |
| evalue | E-value of the score |
| nregions | Number of regions evaluated |
| nenvelopes | Number of envelopes handed over for domain definition, null2, alignment, and scoring. |
| ndom | Total number of domains identified in this sequence |
| nreported | Number of domains satisfying reporting thresholding |
| nincluded | Number of domains satisfying inclusion thresholding |
| taxid | The NCBI taxonomy identifier of the target (if applicable) |
| species | The species name of the target (if applicable) |
| kg | The kingdom of life that the target belongs to - based on placing in the NCBI taxonomy tree (if applicable) |
| seqs | An array containing information about the 100% redundant sequences |
| pdbs | Array of pdb identifiers (which chains information) |
The domain or hit hash contains the details of the match, in particular the alignment between the query and the target.
| Key | Value |
|---|---|
| ienv | Envelope start position |
| jenv | Envelope end position |
| iali | Alignment start position |
| jali | Alignment end position |
| bias | null2 score contribution |
| oasc | Optimal alignment accuracy score |
| bitscore | Overall score in bits, null corrected, if this were the only domain in seq |
| cevalue | Conditional E-value based on the domain correction |
| ievalue | Independent E-value based on the domain correction |
| is_reported | 1 if domain meets reporting thresholds |
| is_included | 1 if domain meets inclusion thresholds |
| alimodel | Aligned query consensus sequence phmmer and hmmsearch, target hmm for hmmscan |
| alimline | Match line indicating identities, conservation +'s, gaps |
| aliaseq | Aligned target sequence for phmmer and hmmsearch, query for hmmscan |
| alippline | Posterior probability annotation |
| alihmmname | Name of HMM (query sequence for phmmer, alignment for hmmsearch and target hmm for hmmscan) |
| alihmmacc | Accession of HMM |
| alihmmdesc | Description of HMM |
| alihmmfrom | Start position on HMM |
| alihmmto | End position on HMM |
| aliM | Length of model |
| alisqname | Name of target sequence (phmmer, hmmscan) or query sequence(hmmscan) |
| alisqacc | Accession of sequence |
| alisqdesc | Description of sequence |
| alisqfrom | Start position on sequence |
| alisqto | End position on sequence |
| aliL | Length of sequence |
Sometimes for hmmscan results you will see some of these additional fields, which are added during the post-processing by database specific methods.
| Key | Value |
|---|---|
| significant | Whether the match is deemed significant by the source database |
| segments | Sometimes a domain match can be broken into regions (or segments) during post-processing. |
| clan | Pfam specific, contains the clan accession that the model belongs to, if appropriate. |
| outcompeted | Whether this match has been out scored (or competed) during post-processing |
| family | Superfamily specific, contains the superfamily family assignment information. |
One of the philosophies of a RESTful API is to also pass the appropriate HTTP status code in response to the query URL. Most of the time a 200 (success) status code will be received. However, there may be times when that is not the case. There is a complete list of HTTP codes here, but we have listed most of the status codes that may be returned and how they relate to what is actually going on at the server.
| HTTP status code |
Status description |
Notes |
|---|---|---|
| 200 | Ok | The job has either been run or queued up successfully. In the former case, the body should contain the results, whereas the latter will contain your job identifier that can be used to query/fetch the results in the future. |
| 201 | Create | The job has been created successfully. Response will contain either the content describing the job and/or a redirection to the created resource in the HTTP header. |
| 202 | Accepted | The job has been accepted by the search system and is either pending (waiting to be started) or running. After a short delay, your script should check for results again. |
| 302 | Found/Redirection | The request was found, but the client must take additional action to complete the request. Usually there is a redirection URL found in the response header. |
| 400 | Bad Request | Your job contained either invalid parameters or parameter values. The body of your response should contain information about which parameter or value failed and possibly the reason why it failed. If you continue to receive this in response to a request and can not understand why it is failing, you should contact the help desk for assistance. |
| 410 | Gone | Your job was deleted from the search system. This may be because the time that we have been able to store the results has expired or that you have explicitly asked for the results to be deleted. |
| 500 | Internal server error | There was a problem with running your job, typically due to a problem with the back-end compute servers, rather than the job itself. The body of the response may contain an error message from the server. Contact the help desk for assistance with the problem. |
| 502 | Bad gateway | There was a problem scheduling or running the job. The job has failed and will not produce results. There is no need to check the status again. |
| 503 | Service unavailable | The body of the response may contain a message as to why the job has been put on hold. This may be due to site maintenance, database updates, queue overload or if there is a problem. This status is set typically by an administrator and should this status code be present for longer that a few hours, you should contact the help desk. |
The RESTful interface supports three different, commonly used, machine readable formats:
XML, JSON and
YAML. In addition to these, we also provide
HTML and text.
Which format used, is really down to personal choice. XML is widely used with libraries in many different languages,
JSON is readily applicable to use with websites, where a server may make a call to a HMMER web service
and pass the resulting JSON string back to the client/browser, where the HMMER result may be post-processed
by JavaScript running on the client.YAML is a more recent markup language which, despite being readily parsed
by software, is more human-readable than XML or JSON. The HTML responses are not really meant for anything other
than a browser or command line tools such as curl or wget. The text output is
the best output if you want to cut and paste results into a lab book.
We have tried to provide as many services as possible via REST. However, there are still a few things that we do not provide. For example, there is no way of generating a domain graphic or getting a graph of the distribution of hits. We can not provide this via REST as the both of these are generated client side using JavaScript libraries and the HTML5 canvas element. The RESTful services are also, naturally, restricted to just the set of HMMER programs that are available via the website. But, if there is something that you think would be useful, then please get in touch and we will consider it for inclusion.