API

Help Contents:
  1. Contacting us
  2. HMMER3 algorithms
  3. Supported target databases
  4. Search Parameters
  5. Results
  6. Application programming interface
    1. Basic concepts
      1. URLs
      2. Sending requests
      3. Retrieving data
    2. Available services
      1. POSTing phmmer searches
      2. POSTing hmmscan searches
      3. POSTing hmmsearch searches
      4. POSTing jackhmmer searches
      5. GETting results
      6. DELETE-ing results
    3. Examples
      1. phmmer
      2. hmmscan
      3. jackhmmer
      4. batch
      5. Fetching Results
    4. Result Format
    5. Other useful things to know
      1. Response codes
      2. Data formats
      3. Things we do not support

A RESTful approach

In addition to the web interface to the HMMER software, we also provide access to it via RESTful web services. REST (or REpresentational State Transfer) refers to a style of building web services which makes it easy to interact programmatically with the site. A programmatic interface, commonly called an Application Programming Interface (API) allows users to write scripts or programs to access data, rather than having to rely on a browser to view a site. Below is a list of services provided, some examples of how to use the API and the supported data formats. This section should be used in conjunction with the search help page which defines the parameters for modifying how HMMER performs the search.

Basic concepts

URLs

A RESTful service sends and receives data over HTTP, the same protocol that is used by websites and browsers. As such, the services provided through a RESTful interface are identified and controlled using URLs. In the HMMER website we use the same URLs to provide both the standard HTML browser representation of your search results and those request by REST in an alternative format, such as XML or JSON. To submit a phmmer search to our servers via your browser you visit the following URL:

http://hmmer.janelia.org/search/phmmer

When operating through a browser, the options in the form are serialized to a string that consists of a set of parameters and values. This string is then parsed by our servers to tailor the search according to your criteria. When submitting a request via REST, you will need to specify these. Those that apply to controlling the search itself are describe in detail in the search section.

The returned format is controlled by the how the page is requested. This is done in one of two ways - if the Content-Type or Accept field is set in the HTTP header to one of the following:

Then JSON, YAML or XML will be returned. But do not worry too much, as the Content-Type is often set automatically based on how you formulate the request i.e. if you send the form parameters as XML, you get an XML response. Just for completeness, normally your browser sets the content-type to one of the following:

If the Content-Type is set to be one of these, then a HTML response will be generated.

One thing to note with a RESTful interface is how the URLs are sent to the server. HTTP has several different request methods.

The basic thing to remember with REST is that HTTP methods translate to different operations, so the following HTTP methods do the following:

All objects in REST have a URI. So, every entity in the system has a unique URI. If an object needs to be updated, this is achieved by POSTing a document to that object"s URI. If that object then needs to be removed, an HTTP DELETE request is issued to that object's URI. When a search is POST-ed to the server, it creates a new object, with a unique URI, that can then be queried.

back to top

Sending requests

Example using curl

You do not need be an expert programmer to retrieve data via our RESTful interface. A widely used machine parsable format is XML, the following section demonstrates a simple way of sending and retrieving XML using the simple Unix command line tool curl. The following POSTs the request to the server (our server configuration requires you to also unset the default value in the header for Expect, -H 'Expect:'):

shell% curl -L -H 'Expect:' -H 'Accept:text/xml' -F seqdb=pdb -F algo=phmmer -F seq='<test.seq' http://hmmer.janelia.org/search/phmmer

should give the following response in XML:

<?xml version="1.0" encoding="UTF-8"?>
<opt>
  <data name='results' resultSize='224339'>
    <_internal highbit='370.5' lowbit='19.0' numberSig='242' offset='42280'>
      <timings search='0.283351' unpack='0.176821' />
    </_internal>
    <hits
    	name='2abl_A'
    	acc='2abl_A'
    	bias='0.1'
    	desc='mol:protein length:163  ABL TYROSINE KINASE'
    	evalue='1.1e-110'
    	ndom='1'
    	nincluded='1'
    	nregions='1'
    	reported='1'
    	score='370.5'
    	species='Homo sapiens'
    	taxid='9606' >
            <domains
                aliL='163'
                aliM='163'
                aliN='163'
                aliaseq='MGPSENDPNLFVALYDFVASGDNTLSITKGEKLRVLGYNHNGEWCEAQTKNGQGWVPSNYITPVNSLEKHSWYHGPVSRNAAEYLLSSGINGSFLVRESESSPGQRSISLRYEGRVYHYRINTASDGKLYVSSESRFNTLAELVHHHSTVADGLITTLHYPAP'
                alihmmfrom='1'
                alihmmname='2abl_A'
                alihmmto='163'
                alimline='+gpsendpnlfvalydfvasgdntlsitkgeklrvlgynhngewceaqtkngqgwvpsnyitpvnslekhswyhgpvsrnaaeyllssgingsflvresesspgqrsislryegrvyhyrintasdgklyvssesrfntlaelvhhhstvadglittlhypap'
                alimodel='lgpsendpnlfvalydfvasgdntlsitkgeklrvlgynhngewceaqtkngqgwvpsnyitpvnslekhswyhgpvsrnaaeyllssgingsflvresesspgqrsislryegrvyhyrintasdgklyvssesrfntlaelvhhhstvadglittlhypap'
                alippline='8*****************************************************************************************************************************************************************9'
                alisqacc='2abl_A'
                alisqdesc='mol:protein length:163  ABL TYROSINE KINASE'
                alisqfrom='1'
                alisqname='2abl_A'
                alisqto='163'
                bias='0.05'
                bitscore='370.357543945312'
                envsc='250.653518676758'
                cevalue='4.21e-121'
                ievalue='4.21e-121'
				iali='1'
                ienv='1'
                is_included='1'
                is_reported='1'
                jali='163'
                jenv='163'
            />
    </hits>
    .
    .
    .
  </data>
</opt>

In this example, the sequence to be searched is in the file test.seq. When using curl the value of the parameter "seq" needs to be quoted so that its value is taken correctly from the file "test.seq". The other parameters can also be added directly to the URL, as a regular CGI-style parameter, if you prefer.

Using a script

Most programming languages have the ability to send HTTP requests and receive HTTP responses. A Perl script to submit a search and receive the responses as XML might be as trivial as this:

Example
#!/usr/bin/perl

use strict;
use warnings;
use LWP::UserAgent;
use XML::Simple;

#Get a new Web user agent.
my $ua = LWP::UserAgent->new;
$ua->timeout(20);
$ua->env_proxy;

my $host = "http://hmmer.janelia.org";
my $search = "/search/phmmer";

#Parameters
my  $seq = qq(>2abl_A mol:protein length:163  ABL TYROSINE KINASE
MGPSENDPNLFVALYDFVASGDNT
LSITKGEKLRVLGYNHNGEWCEAQ
TKNGQGWVPSNYITPVNSLEKHSW
YHGPVSRNAAEYLLSSGINGSFLV
RESESSPGQRSISLRYEGRVYHYR
INTASDGKLYVSSESRFNTLAELV
HHHSTVADGLITTLHYPAP);

my $seqdb = 'pdb';

#Make a hash to encode for the content.
my %content = ( 'seqdb' => $seqdb,
                'content'   => "<![CDATA[$seq]]>" );

#Convert the parameters to XML
my $xml = XMLout(\%content, NoEscape => 1);

#Now post it off
my $response = $ua->post( $host.$search, 'content-type' => 'text/xml', Content => $xml );

#By default, we should get redirected!
if($response->is_redirect){

  #Now make a second requests, a get this time, to get the results.
  $response =
  $ua->get($response->header("location"), 'Accept' => 'text/xml' );

  if($response->is_success){
    print $response->content;
  }else{
    print "Error with redirect GET:".$response->content;
    die $response->status_line;
  }
}else{
  die $response->status_line;
}

In this case the LWP module recognises the content as being XML and sets the Content-Type for you. The response from the server is identical to that obtained from using curl. Notice there are in fact two requests to the server. The first posts the job to the server, the second then fetches the result. The location of the result is specified in the response from the first request.

back to top

Retrieving data

Although XML is just plain text and therefore human-readable, it's intended to be parsed into a data structure. Extending the Perl script above, we can add the ability to parse the XML using an external Perl module, XML::LibXML:

Example:
#!/usr/bin/perl

use strict;
use warnings;
use LWP::UserAgent;
use XML::Simple;
use XML::LibXML;

#Get a new Web user agent.
my $ua = LWP::UserAgent->new;
$ua->timeout(20);
$ua->env_proxy;

my $host = "http://hmmer.janelia.org";
my $search = "/search/phmmer";

#Parameters
my  $seq = qq(>2abl_A mol:protein length:163  ABL TYROSINE KINASE
MGPSENDPNLFVALYDFVASGDNTLSITKGE
KLRVLGYNHNGEWCEAQTKNGQGWVPSNYIT
PVNSLEKHSWYHGPVSRNAAEYLLSSGINGS
FLVRESESSPGQRSISLRYEGRVYHYRINTA
SDGKLYVSSESRFNTLAELVHHHSTVADGLI
TTLHYPAP);

my $seqdb = 'pdb';

#Make a hash to encode for the content.
my %content = ( 'seqdb' => $seqdb,
                'content'   => "<![CDATA[$seq]]>" );

#Convert the parameters to XML
my $xml = XMLout(\%content, NoEscape => 1);

#Now post it off
my $response = $ua->post( $host.$search, 'content-type' => 'text/xml', Content => $xml );

die "error: failed to successfully POST request: " . $response->status_line . "\n"
  unless ($response->is_redirect);

#By default, we should get redirected!
$response =
  $ua->get($response->header("location"), 'Accept' => 'text/xml' );

die "error: failed to retrieve XML: " . $response->status_line . "\n"
  unless $response->is_success;


my $xmlRes = '';

$xmlRes .= $response->content;
my $xml_parser = XML::LibXML->new();
my $dom = $xml_parser->parse_string( $xmlRes );

my $root = $dom->documentElement();

my ( $entry ) = $root->getChildrenByTagName( 'data' );
my @hits  = $entry->getChildrenByTagName( 'hits' );

foreach my $hit (@hits){
  next if($hit->getAttribute( 'nincluded' ) == 0 );
  print $hit->getAttribute( 'name' )."\t".$hit->getAttribute( 'desc' )."\t".$hit->getAttribute( 'evalue' )."\n";
} 

This script now prints out the name, description and E-value of all significant sequence hits for the given query sequence as a tab delimited file.

2abl_A	mol:protein length:163  ABL TYROSINE KINASE	1.1e-110
2fo0_A	mol:protein length:495  Proto-oncogene tyrosine-protein kinase ABL1 (	8.4e-109
1opk_A	mol:protein length:495  Proto-oncogene tyrosine-protein kinase ABL1	8.4e-109
1opl_A	mol:protein length:537  proto-oncogene tyrosine-protein kinase	9.7e-109
1ab2_A	mol:protein length:109  C-ABL TYROSINE KINASE SH2 DOMAIN	3.3e-62
3k2m_A	mol:protein length:112  Proto-oncogene tyrosine-protein kinase ABL1	3.1e-61
2ecd_A	mol:protein length:119  Tyrosine-protein kinase ABL2	6.5e-58
1abo_A	mol:protein length:62  ABL TYROSINE KINASE	1.1e-38
3eg1_A	mol:protein length:63  Proto-oncogene tyrosine-protein kinase ABL1	1.6e-38
3eg0_A	mol:protein length:63  Proto-oncogene tyrosine-protein kinase ABL1	1.7e-38
3eg3_A	mol:protein length:63  Proto-oncogene tyrosine-protein kinase ABL1	3.3e-38
1ju5_C	mol:protein length:61  Abl	8.4e-38
1bbz_A	mol:protein length:58  ABL TYROSINE KINASE	7.0e-36
2o88_A	mol:protein length:58  Proto-oncogene tyrosine-protein kinase ABL1	9.1e-35
1awo_A	mol:protein length:62  ABL TYROSINE KINASE	1.7e-34
......

back to top


Available Services

POSTing phmmer searches

The main two input parameters to a phmmer search are a protein sequence and the target database, defined using the seq and seqdb parameters respectively. Other parameters for controlling the search are defined in the search section. If any of these parameters are omitted, then the default values for that parameter will be set.

Searches should be POST-ed to the following url:

http://hmmer.janelia.org/search/phmmer
e.g.
 curl -L -H 'Expect:' -H 'Accept:text/xml' -F seqdb=pdb -F seq='<test.seq' http://hmmer.janelia.org/search/phmmer

When using the website, we also perform a Pfam search by default. However, when using the API you will only be returned the phmmer results. To get Pfam search results, use the hmmscan interface.

POSTing hmmscan searches

Hmmscan also has two main parameters - a sequence and a profile HMM database - defined using the seq and hmmdb parameters respectively. We currently offer four profile HMM databases: Pfam, TIGRFAMs, Gene3D and Superfamily. When searching against the former two, the cut-offs can be defined by the user (other parameters for controlling the search are defined in the search section). With the latter two HMM databases, all cut-off parameters will be ignored and the HMM database default parameters will be used. This is because Gene3D and Superfamily both use their own post-processing mechanisms to defined their domains, in addition to the hmmscan results.

Searches should be POST-ed to the following url:

http://hmmer.janelia.org/search/hmmscan
e.g.
 curl -L -H 'Expect:' -H 'Accept:text/xml' -F hmmdb=pfam -F seq='<test.seq' http://hmmer.janelia.org/search/hmmscan

POSTing hmmsearch searches

The input to hmmsearch on the web is either a multiple sequence alignment or a hidden Markov model in HMMER3 format. We do not support HMMER2 format as these HMMs are not forward compatible with HMMER3. When uploading a multiple sequence alignment, an HMM is built on the server using hmmbuild with the default parameters.

Searches should be POST-ed to the following url:

http://hmmer.janelia.org/search/hmmsearch
e.g.
 curl -L -H 'Expect:' -H 'Accept:text/xml' -F seqdb=pdb -F seq='<test.ali' http://hmmer.janelia.org/search/hmmsearch

Commonly used parameters for hmmsearch are listed below.

Parameter Description Accepted values Example Default/Without Parameter Notes
seqdb The sequence database to be search against. env_nr | nr | refseq | pdb | rp15 | rp35 | rp55 | rp75 | swissprot | unimes | uniprotkb | uniprotrefprot | pfamseq seqdb=pdb Required, there is no default. If absent an error will be returned. The sequence database to search against. You can not currently perform profile-profile HMM searches using HMMER, hmmdb is not an accepted parameter.
seq The query sequence A protein sequence alignment in stockholm format, of an HMM
# STOCKHOLM 1.0
KLRVLGY.HNGEWCEAQTKNGQGWVPSNYITPVNSLEKHSWYHGPASRN.AEY
KLRVLGYNHN.EWC.AQSKNGQGWVPSNYITPVNSIDKHSWYHGPVSRNAAEY
//
Required, there is no default. If absent an error will be returned. The STOCKHOLM format is a specific input format to HMMER. There are methods in bioperl that allow you to convert between different alignment formats.

POSTing jackhmmer searches

Jackhmmer is an iterative search algorithm that can be initiated with a sequence, multiple sequence alignment or profile HMM. The number of iterations to run can be supplied as an additional parameter and will perform a succession of searches until the job has completed. Fetching the results is a little more complicated, as the search may finish before the number of iterations if it converges.

Searches should be POST-ed to the following url:

http://hmmer.janelia.org/search/jackhmmer
e.g.
 curl -L -H 'Expect:' -H 'Accept:text/xml' -F seqdb=pdb -F iterations=5 -F seq='<test1.fa' http://hmmer.janelia.org/search/jackhmmer

Some commonly used parameters for jackhmmer are (see search for more details):

Parameter Description Accepted values Example Default/Without Parameter Notes
seqdb The sequence database to be search against. env_nr | nr | refseq | pdb | rp15 | rp35 | rp55 | rp75 | swissprot | unimes | uniprotkb | uniprotrefprot | pfamseq seqdb=pdb Required, there is no default. If absent an error will be returned. The sequence database to search against. You can not currently perform profile-profile HMM searches using HMMER, hmmdb is not an accepted parameter.
seq The query sequence, alignment or profile HMM A protein sequence in FASTA format, multiple sequence alignment or HMMER3 profile HMM Required, there is no default. If absent an error will be returned. See notes on phmmer/hmmsearch.
iterations The number of iterations. Integer between 1 and 5 5 iterations=5 The maximum number of searches that will be performed. If the search converges (no additional sequences found compare to the previous search), then the search will stop.

POSTing annotation searches

In addition to the standard HMMER searches an uploaded sequence can be annotated to show signal peptide & transmembrane regions, disordered regions and coiled-coil regions.

Annotation requests should be POST-ed to the following urls:

Disorder

http://hmmer.janelia.org/annotation/disorder
e.g.
curl -L -H 'Expect:' -H 'Accept:text/xml' -F  seq='<test.fa' http://hmmer.janelia.org/annotation/disorder

Coiled-coil

http://hmmer.janelia.org/annotation/coils
e.g.
curl -L -H 'Expect:' -H 'Accept:text/xml' -F  seq='<test.fa' http://hmmer.janelia.org/annotation/coils

Transmembrane & Signal Peptides

http://hmmer.janelia.org/annotation/phobius
e.g.
curl -L -H 'Expect:' -H 'Accept:text/xml' -F  seq='<test.fa' http://hmmer.janelia.org/annotation/phobius

Parameters used for annotations:

Parameter Description Accepted values Example Default/Without Parameter
seq The query sequence A protein sequence in FASTA format seq='<test.fa' Required, there is no default. If absent an error will be returned.

Annotation results can be fetched with a GET request using the UUID supplied in the POST response:

http://hmmer.janelia.org/annotation/<annotation-type>/UUID
e.g.
curl -H 'Expect:' -H 'Accept:text/xml' http://hmmer.janelia.org/annotation/phobius/4162F712-1DD2-11B2-B17E-C09EFE1DC403

Fetching results

Search results can be retrieved using the job identifier that is returned in your initial search response. The job identifier is a UUID (Universally Unique Identifiers) that is in the format of:4162F712-1DD2-11B2-B17E-C09EFE1DC403. Thus, to retrieve your job, you can use the following URL in a GET request:

http://hmmer.janelia.org/results/hmmscan/$your_uuid
e.g.
http://hmmer.janelia.org/results/hmmscan/4162F712-1DD2-11B2-B17E-C09EFE1DC403

This is one of the few services where the returned format can be modified using a parameter.

Parameter Description Accepted values Example Default/Without Parameter Notes
range The range of the results to retrieve Integer,Integer range=1,100 All results The results are ordered by E-value and as there can be thousands of matches to your query, it can be useful to retrieve a subset of results. The range is two, unsigned, comma separated integers. The first integer is expected to be less than the second integer. To retrieve one row, just fetch using a range where the two integers are the same value. If your first integer is in range, and your second is out of range, the second integer will be modified to include all results. i.e. If your results set is only 300 in size, and a range of 1,1000 is requested, then you will get 300 results. If your starting integer is "out" of range, then no results will be returned.
ali Return alignments. true | 1 ali=1 No alignments will be returned Sometimes you are not so interested in the alignment of the match to the query sequence. By default no alignments are returned, to keep results compact.
output Modify the format that the results are returned in. xml | json | text | yaml html output=text The format of the results can be modified with by setting "output=$format". The same can be achieved by setting the "Accept" field in the HTTP header. If both the HTTP header and the parameter are set, we currently assume that the parameter is the desired format.

Deleting results

Once you have finished with a result, you can either leave it on the server for a week and we will delete it, or you can.

Examples

In the following section there are some different examples of using the API using Python and Java. An example Perl client can be found above. Code is also beginning to become available in some of the common Bio-software packages, such as BioJava.

Searching using phmmer

The following piece of python is a little more complex than those discussed previously. In this case, we submit a search to the server, but stop the HTTP handler from automatically following the redirection to the results page. Instead, a custom handler is define that grabs the redirection URL and modifies it by the addition of parameters such that it fetches just the first 10 matches in JSON format, rather than grabbing the whole response. This can be useful when the results are large and you want to paginate the response, or if you are only interested in the most significant sequence matches.

import urllib, urllib2

# install a custom handler to prevent following of redirects automatically.
class SmartRedirectHandler(urllib2.HTTPRedirectHandler):
    def http_error_302(self, req, fp, code, msg, headers):
        return headers
opener = urllib2.build_opener(SmartRedirectHandler())
urllib2.install_opener(opener);

parameters = {
              'seqdb':'pdb',
              'seq':'>Seq\nKLRVLGYHNGEWCEAQTKNGQGWVPSNYITPVNSLENSIDKHSWYHGPVSRNAAEY'
             }
enc_params = urllib.urlencode(parameters);

#post the seqrch request to the server
request = urllib2.Request('http://hmmer.janelia.org/search/phmmer',enc_params)

#get the url where the results can be fetched from
results_url = urllib2.urlopen(request).getheader('location')

# modify the range, format and presence of alignments in your results here
res_params = {
              'output':'json',
              'range':'1,10'
             }

# add the parameters to your request for the results
enc_res_params = urllib.urlencode(res_params)
modified_res_url = results_url + '?' + enc_res_params

# send a GET request to the server
results_request = urllib2.Request(modified_res_url)
data = urllib2.urlopen(results_request)

# print out the results
print data.read()

Searching using hmmscan

The following is a very basic Java source file that, once compiled and executed performs an hmmscan search. The response is returned in JSON format. With this two stage POST and GET, you can POST the request in one format and get a response back in another by setting the Accept type. To get this example to work, you should save the code in a file called RESTClient.java. Then run the command 'javac RESTClient.java'. Assuming that this is successful and a file called RESTClient.class is produced, you can execute the class by running the command 'java RESTClient'

import java.net.*;
import java.io.*;

public class RESTClient{
  public static void main(String[] args) {
    try {
        URL url = new URL("http://hmmer.janelia.org/search/hmmscan");
        HttpURLConnection connection = (HttpURLConnection) url.openConnection();
        connection.setDoOutput(true);
        connection.setDoInput(true);
        connection.setInstanceFollowRedirects(false);
        connection.setRequestMethod("POST");
        connection.setRequestProperty("Content-Type", "application/x-www-form-urlencoded");
        connection.setRequestProperty("Accept", "application/json");

        //Add the database and the sequence. Add more options as you wish!
        String urlParameters = "hmmdb=" + URLEncoder.encode("pfam", "UTF-8") +
        "&seq=" + ">seq\nEMGPSENDPNLFVALYDFVASGDNTLSITKGEKLRVLGYNHNGEWCEAQTKNGQGWVPSNYITPV" +
        "NSLEKHSWYHGPVSRNAAEYLLSSGINGSFLVRESESSPGQRSISLRYEG" +
        "RVYHYRINTASDGKLYVSSESRFNTLAELVHHHSTVADGLITTLHYPAP";

         connection.setRequestProperty("Content-Length", "" +
               Integer.toString(urlParameters.getBytes().length));


        //Send request
        DataOutputStream wr = new DataOutputStream (
                  connection.getOutputStream ());
        wr.writeBytes (urlParameters);
        wr.flush ();
        wr.close ();



        //Now get the redirect URL
        URL respUrl = new URL( connection.getHeaderField( "Location" ));
        HttpURLConnection connection2 = (HttpURLConnection) respUrl.openConnection();
        connection2.setRequestMethod("GET");
        connection2.setRequestProperty("Accept", "application/json");


        //Get the response and print it to the screen
        BufferedReader in = new BufferedReader(
                                new InputStreamReader(
                                connection2.getInputStream()));

        String inputLine;

        while ((inputLine = in.readLine()) != null)
            System.out.println(inputLine);
        in.close();


    } catch(Exception e) {
        throw new RuntimeException(e);
    }
  }
}

Searching using jackhmmer

A jackhmmer is a multipart search. The following Perl code performs a series of requests to the server. The first POST request generates the jobs, the while loop then performs GET requests to get the job status, until the status of the job is done. The last request GETs the results of the last iteration, which are returned in JSON format.

#!/usr/bin/env perl
use strict;
use warnings;
use LWP::UserAgent;
use JSON;

#Get a new Web user agent.
my $ua = LWP::UserAgent->new;
$ua->timeout(60);
$ua->env_proxy;
#Set a new JSON end encoder/decoder
my $json = JSON->new->allow_nonref;

#-------------------------------------------------------------------------------
#Set up the job

#URL to query
my $rootUrl = "http://hmmer.janelia.org";
my $url = $rootUrl."/search/jackhmmer";

my $seq = ">2abl_A mol:protein length:163  ABL TYROSINE KINASE
MGPSENDPNLFVALYDFVASGDNTLSITKGEKLRVLGYNHNGEWCEAQTKNGQGWVPSNYITPVNSLEKHS
WYHGPVSRNAAEYLLSSGINGSFLVRESESSPGQRSISLRYEGRVYHYRINTASDGKLYVSSESRFNTLAE
LVHHHSTVADGLITTLHYPAP";

my %content = (
  'algo'     => 'jackhmmer',
  'seq'      => $seq,
  'seqdb'    => 'pdb',
  iterations => 5,
);

#-------------------------------------------------------------------------------
#Now POST the request and generate the search job.
my $response = $ua->post(
  $url,
  'content-type' => 'application/json',
  Content        => $json->encode( \%content )
);

if($response->status_line ne "201 Created"){
  die "Failed to create job, got:".$response->status_line;
}

my $job = $json->decode( $response->content );
print "Generated job UUID:".$job->{job_id}."\n";

#Follow the redicrection to the resouce create for the job.
my $job_location = $response->header("location");
#Now poll the server until the job has finished
$response = $ua->get( $job_location, 'Accept' => 'application/json' );

my $max_retry = 50;
my $count     = 1;

while ( $response->status_line eq '200 OK' ) {
  my $status = $json->decode( $response->content );

  print "Checking status ($count)......";
  if ( $status->{status} eq 'DONE' ) {
    print "Job done.\n";
    last;
  }
  elsif ( $status->{status} eq 'ERROR' ) {
    print "Job failed, exiting!\n";
    exit(1);
  }
  elsif ( $status->{status} eq 'RUN' or $status->{status} eq 'PEND' ) {
    my ($lastIteration) = $status->{result}->[-1]->{uuid} =~ /\.(\d+)/;
    print "Currently on iteration $lastIteration [$status->{status}].\n";
  }

  if ( $count > $max_retry ) {
    print "Jobs should have finished.....exiting\n";;
    exit(1);
  }
  #Job still running, so give it a chance to complete.
  sleep(5);
  #Check again on the job status...
  $response = $ua->get( $job_location, 'Accept' => 'application/json' );
  $count++;
}

#Job should have finished, but we may have converged, so get the last job.
my $results = $json->decode( $response->content );
my $lastIteration = pop( @{ $results->{result} } );
#Now fetch the results of the last iteration
my $searchResult = $ua->get( $rootUrl."/results/score/".$lastIteration->{uuid}, 'Accept' => 'application/json' );
unless( $searchResult->status_line eq "200 OK"){
  die "Failed to get search results\n";  
}

#Decode the content of the full set of results
$results = $json->decode( $searchResult->content );
print "Matched ".$results->{'results'}->{'stats'}->{'nincluded'}." sequences ($lastIteration->{uuid})!\n";
#Now do something more interesting with the results......

Batch Searches

So far, the submission of batch searches via REST has not really been mentioned. This is because we do not anticipate this being so useful as you can programmatically send sequence after sequence. However, a batch upload of sequences is possible for phmmer and hmmscan. The main difference is that instead of using the seq parameter, we use the file parameter. There is also a subtle difference in the way that the curl command is formulated. Rather than using a redirect (<), a @ symbol is used to force the content part of the request to be what is contained within the file, rather than being attached to the parameter.


curl -L -H 'Expect:' -H 'Accept:text/xml' -F seqdb=pdb -F file='@batch.fasta' http://hmmer.janelia.org/search/phmmer

It is also possible to include an email address for notification of when the batch search has been processed. Again, not particularly useful for an API, but it may be useful for keeping track of a pipeline. To specify an email via the command line, simply use the parameter email and set this to a valid email address. All of the other phmmer or hmmscan search parameters apply to the batch search.

Fetching results

Using curl to fetch results is very easy:
curl -L -H 'Expect:' -H 'Accept:text/xml' http://hmmer.janelia.org/results/phmmer/CF5BCDA4-0C7E-11E0-AF4F-B1E277D6C7BA?output=text&ali=1&range=1,2

In this case we want to fetch the first two hits, with their alignments as a textual output format.

phmmer results for job CF5BCDA4-0C7E-11E0-AF4F-B1E277D6C7BA:

Target Num-hits Bias Bit-Score E-value Tax-Id Species Description
================================================================================================================
2abl_A 1	0.1	370.5	1.1e-110	9606	Homo sapiens	mol:protein length:163  ABL TYROSINE KINASE
----------------------------------------------------------------------------------------------------------------
Target-env-start	Target-env-end	Target-ali-start	Target-ali-end	Query-start	Query-end	E-value
----------------------------------------------------------------------------------------------------------------
1	163	1	163	1	163	4.21e-121
QUERY  lgpsendpnlfvalydfvasgdntlsitkgeklrvlgynhngewceaqtkngqgwvpsnyitpvnslekhswyhgpvsrnaaeyllssgingsflvresesspgqrsislryegrvyhyrintasdgklyvssesrfntlaelvhhhstvadglittlhypap
MATCH  +gpsendpnlfvalydfvasgdntlsitkgeklrvlgynhngewceaqtkngqgwvpsnyitpvnslekhswyhgpvsrnaaeyllssgingsflvresesspgqrsislryegrvyhyrintasdgklyvssesrfntlaelvhhhstvadglittlhypap
PP     8*****************************************************************************************************************************************************************9
TARGET MGPSENDPNLFVALYDFVASGDNTLSITKGEKLRVLGYNHNGEWCEAQTKNGQGWVPSNYITPVNSLEKHSWYHGPVSRNAAEYLLSSGINGSFLVRESESSPGQRSISLRYEGRVYHYRINTASDGKLYVSSESRFNTLAELVHHHSTVADGLITTLHYPAP
----------------------------------------------------------------------------------------------------------------
================================================================================================================

2fo0_A 1	0.1	364.3	8.4e-109	9606	Homo sapiens	mol:protein length:495  Proto-oncogene tyrosine-protein kinase ABL1 (
----------------------------------------------------------------------------------------------------------------
Target-env-start	Target-env-end	Target-ali-start	Target-ali-end	Query-start	Query-end	E-value
----------------------------------------------------------------------------------------------------------------
33	195	34	195	2	163	4.15e-119
QUERY  gpsendpnlfvalydfvasgdntlsitkgeklrvlgynhngewceaqtkngqgwvpsnyitpvnslekhswyhgpvsrnaaeyllssgingsflvresesspgqrsislryegrvyhyrintasdgklyvssesrfntlaelvhhhstvadglittlhypap
MATCH  gpsendpnlfvalydfvasgdntlsitkgeklrvlgynhngewceaqtkngqgwvpsnyitpvnslekhswyhgpvsrnaaeyllssgingsflvresesspgqrsislryegrvyhyrintasdgklyvssesrfntlaelvhhhstvadglittlhypap
PP     9****************************************************************************************************************************************************************9
TARGET GPSENDPNLFVALYDFVASGDNTLSITKGEKLRVLGYNHNGEWCEAQTKNGQGWVPSNYITPVNSLEKHSWYHGPVSRNAAEYLLSSGINGSFLVRESESSPGQRSISLRYEGRVYHYRINTASDGKLYVSSESRFNTLAELVHHHSTVADGLITTLHYPAP
----------------------------------------------------------------------------------------------------------------
================================================================================================================

Search Details
==============
Date Started: 2010-12-20 16:19:20
Cmd: phmmer -E 10 --domE 10 --incE 0.01 --incdomE 0.03 --mx BLOSUM62 --pextend 0.4 --popen 0.02 --seqdb 1
Database: pdb,  downloaded on 2010-12-11
Query:  >2abl_A mol:protein length:163  ABL TYROSINE
 KINASE
 MGPSENDPNLFVALYDFVASGDNTLSITKGEKLRVLGYNHNGEWCEAQ
 TKNGQGWVPSNYITPVNSLEKHSWYHGPVSRNAAEYLLSSGINGSFLV
 RESESSPGQRSISLRYEGRVYHYRINTASDGKLYVSSESRFNTLAELV
 HHHSTVADGLITTLHYPAP

Stats
=====
page:1
nhits:322
......
47
nreported:322
domZ:322

back to top

Result Format

The results format is currently are large mass of data that can be a little complex when first looked at. But, the data structure is fairly simple and is represented pictorially below:

API data structure

In the following sections the contents of each part of the results data structure will be described. Parts of the data structure will be referred to as hashes ( key, value pairs) or arrays, but depending on the type of response requested will translate into different entities, for example elements and attributes for an XML response.

'Results' Hash

Only parts of the response actually deemed useful will be described.

KeyValue
statsThe stats hash
hitsArray of sequence hashes
uuidThe unique job identifier
algoThe HMMER search algorithm
searchDBThe target search database
_internalHash containing some internal accounting

'Stats' Hash

The stats hash contains some brief summary statistics about the job.

KeyValue
nhitsThe number of hits found above reporting thresholds
ZThe number of sequences or models in the target database
domZThe number of hits in the target database
nmodelsThe number of models in this search
nincludedThe number of sequences or models scoring above the significance threshold
nreportedThe number of sequences or models scoring above the reporting threshold

'Sequence' Hash

The hits array contains one or more sequences. Only parts of the response actually deemed useful will be described. With the non-redundant databases, the redundant sequence information will also be included, but as the sequences are identical, the information about the hit is identical.

KeyValue
nameName of the target (sequence for phmmer/hmmsearch, HMM for hmmscan)
accAccession of the target
acc2Secondary accession of the target
idIdentifier of the target
descDescription of the target
scoreBit score of the sequence (all domains, without correction)
pvalueP-value of the score
evalueE-value of the score
nregionsNumber of regions evaluated
nenvelopesNumber of envelopes handed over for domain definition, null2, alignment, and scoring.
ndomTotal number of domains identified in this sequence
nreportedNumber of domains satisfying reporting thresholding
nincludedNumber of domains satisfying inclusion thresholding
taxidThe NCBI taxonomy identifier of the target (if applicable)
speciesThe species name of the target (if applicable)
kgThe kingdom of life that the target belongs to - based on placing in the NCBI taxonomy tree (if applicable)
seqsAn array containing information about the 100% redundant sequences
pdbsArray of pdb identifiers (which chains information)

'Domain' Hash

The domain or hit hash contains the details of the match, in particular the alignment between the query and the target.

KeyValue
ienvEnvelope start position
jenvEnvelope end position
ialiAlignment start position
jaliAlignment end position
biasnull2 score contribution
oascOptimal alignment accuracy score
bitscoreOverall score in bits, null corrected, if this were the only domain in seq
cevalueConditional E-value based on the domain correction
ievalueIndependent E-value based on the domain correction
is_reported1 if domain meets reporting thresholds
is_included1 if domain meets inclusion thresholds
alimodelAligned query consensus sequence phmmer and hmmsearch, target hmm for hmmscan
alimlineMatch line indicating identities, conservation +'s, gaps
aliaseqAligned target sequence for phmmer and hmmsearch, query for hmmscan
alipplinePosterior probability annotation
alihmmnameName of HMM (query sequence for phmmer, alignment for hmmsearch and target hmm for hmmscan)
alihmmaccAccession of HMM
alihmmdescDescription of HMM
alihmmfromStart position on HMM
alihmmtoEnd position on HMM
aliMLength of model
alisqnameName of target sequence (phmmer, hmmscan) or query sequence(hmmscan)
alisqaccAccession of sequence
alisqdescDescription of sequence
alisqfromStart position on sequence
alisqtoEnd position on sequence
aliLLength of sequence

Sometimes for hmmscan results you will see some of these additional fields, which are added during the post-processing by database specific methods.

KeyValue
significantWhether the match is deemed significant by the source database
segmentsSometimes a domain match can be broken into regions (or segments) during post-processing.
clanPfam specific, contains the clan accession that the model belongs to, if appropriate.
outcompetedWhether this match has been out scored (or competed) during post-processing
familySuperfamily specific, contains the superfamily family assignment information.

back to top

Other useful things to know

Response codes

One of the philosophies of a RESTful API is to also pass the appropriate HTTP status code in response to the query URL. Most of the time a 200 (success) status code will be received. However, there may be times when that is not the case. There is a complete list of HTTP codes here, but we have listed most of the status codes that may be returned and how they relate to what is actually going on at the server.

HTTP status
code
Status
description
Notes
200 Ok The job has either been run or queued up successfully. In the former case, the body should contain the results, whereas the latter will contain your job identifier that can be used to query/fetch the results in the future.
201 Create The job has been created successfully. Response will contain either the content describing the job and/or a redirection to the created resource in the HTTP header.
202 Accepted The job has been accepted by the search system and is either pending (waiting to be started) or running. After a short delay, your script should check for results again.
302 Found/Redirection The request was found, but the client must take additional action to complete the request. Usually there is a redirection URL found in the response header.
400 Bad Request Your job contained either invalid parameters or parameter values. The body of your response should contain information about which parameter or value failed and possibly the reason why it failed. If you continue to receive this in response to a request and can not understand why it is failing, you should contact the help desk for assistance.
410 Gone Your job was deleted from the search system. This may be because the time that we have been able to store the results has expired or that you have explicitly asked for the results to be deleted.
500 Internal server error There was a problem with running your job, typically due to a problem with the back-end compute servers, rather than the job itself. The body of the response may contain an error message from the server. Contact the help desk for assistance with the problem.
502 Bad gateway There was a problem scheduling or running the job. The job has failed and will not produce results. There is no need to check the status again.
503 Service unavailable The body of the response may contain a message as to why the job has been put on hold. This may be due to site maintenance, database updates, queue overload or if there is a problem. This status is set typically by an administrator and should this status code be present for longer that a few hours, you should contact the help desk.

Data formats

The RESTful interface supports three different, commonly used, machine readable formats: XML, JSON and YAML. In addition to these, we also provide HTML and text. Which format used, is really down to personal choice. XML is widely used with libraries in many different languages, JSON is readily applicable to use with websites, where a server may make a call to a HMMER web service and pass the resulting JSON string back to the client/browser, where the HMMER result may be post-processed by JavaScript running on the client.YAML is a more recent markup language which, despite being readily parsed by software, is more human-readable than XML or JSON. The HTML responses are not really meant for anything other than a browser or command line tools such as curl or wget. The text output is the best output if you want to cut and paste results into a lab book.


Things we do not support

We have tried to provide as many services as possible via REST. However, there are still a few things that we do not provide. For example, there is no way of generating a domain graphic or getting a graph of the distribution of hits. We can not provide this via REST as the both of these are generated client side using JavaScript libraries and the HTML5 canvas element. The RESTful services are also, naturally, restricted to just the set of HMMER programs that are available via the website. But, if there is something that you think would be useful, then please get in touch and we will consider it for inclusion.

back to top