Posts tagged esearch
Querying for summary and genbank file with a search term
0Programming Language: Perl
Modules required: Bio::DB::EUtilities
EUtilities is a very useful resource to retrieve data from NCBI database.
This script retrieves a brief summary and the genbank file for genes that are relevant to a search word.
There is also an option to limit you number of results returns (very handy).
Bio::DB::EUtilities is required to execute the search.
This method is much faster than using Bio::DB::Query::GenBank module.
If you require more information than the summary output, you can always get it from the genbank file.
The summary does not provide information such as references and sequences.
Chapter 10 of ‘Beginning Perl for Bioinformatics’ is a good and simple reference for retrieving data from from files in genbank format. This resource is freely available here.
Note that you can query for proteins as well. (uncomment line 15)
Installing new modules: this can be easily done using cpan.
Here is the documentation for cpan.
Or, here is documentation for installing modules manually.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | #!/usr/bin/perl -w use Bio::DB::EUtilities; use strict; #specify search term my $search_term = 'breast cancer'; #maximum number of results to retrieve my $retmax = 10; #first search for a list of genbank ids that match your search term my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch', # -db => 'protein', -db => 'nucleotide', -term => $search_term, -retmax => $retmax); #list of Genbank IDs my @ids = $factory->get_ids; #loop through the list of IDs foreach my $id (@ids){ # get a summary and print details $factory->reset_parameters(-eutil => 'esummary',-db => 'nucleotide',-id => $id); my $ds = $factory->next_DocSum; # print flattened mode from summary above while (my $item = $ds->next_Item('flattened')) { # checks id itens has contents printf("%-20s:%s\n",$item->get_name,$item->get_content) if $item->get_content; } # download the full genbank file $factory = Bio::DB::EUtilities->new(-eutil => 'efetch', -db => 'nucleotide', -id => $id, -rettype => 'gbwithparts'); $factory->get_Response(-file => "$id.gb"); } |
#!/usr/bin/perl -w use Bio::DB::EUtilities; use strict; #specify search term my $search_term = 'breast cancer'; #maximum number of results to retrieve my $retmax = 10; #first search for a list of genbank ids that match your search term my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch', # -db => 'protein', -db => 'nucleotide', -term => $search_term, -retmax => $retmax); #list of Genbank IDs my @ids = $factory->get_ids; #loop through the list of IDs foreach my $id (@ids){ # get a summary and print details $factory->reset_parameters(-eutil => 'esummary',-db => 'nucleotide',-id => $id); my $ds = $factory->next_DocSum; # print flattened mode from summary above while (my $item = $ds->next_Item('flattened')) { # checks id itens has contents printf("%-20s:%s\n",$item->get_name,$item->get_content) if $item->get_content; } # download the full genbank file $factory = Bio::DB::EUtilities->new(-eutil => 'efetch', -db => 'nucleotide', -id => $id, -rettype => 'gbwithparts'); $factory->get_Response(-file => "$id.gb"); }
Querying for Articles from pmc database with a search terms
0Programming Language: Perl
Modules required: LWP::Simple
This script retrieves the full articles (or the abstract if the full article are not available) that a relevant to a search term from the PubmedCentral (pmc) database.
The results are sorted in reverse chronological order.
There is also an option to limit you number of results returns (very handy).
The output is in xml format.
An example for a perl module to tranform XML files : XML:Simple
LWP::Simple is required to execute the search.
Installing new modules: this can be easily done using cpan.
Here is the documentation for cpan.
Or, here is documentation for installing modules manually.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | #!/usr/bin/perl -w use strict; use LWP::Simple; #search term to find my $search_term = 'breast cancer'; #maximum number of results to retrieve my $retmax = 10; my $utils = 'http://www.ncbi.nlm.nih.gov/entrez/eutils'; my $db_name = 'pmc'; # Submit the search and retrieve the XML based results my $esearch_result = get( $utils . '/esearch.fcgi?db=' . $db_name . '&retmax='.$retmax.'&term=' . $search_term ); # paper IDs my @ids = ($esearch_result =~ m|.*<Id>(.*)</Id>.*|g); #loop through all the ids # get individual papers (if not, then abstacts) foreach my $id (@ids) { #get all details for each paper - full text if available my $efetch = $utils . '/efetch.fcgi?db=' . $db_name . '&id=' . $id; #prints out to a xml file (file name generated from database name and current paper ID) open(OUTFILE, ">$db_name$id.xml"); print OUTFILE get($efetch); close OUTFILE; } |
#!/usr/bin/perl -w use strict; use LWP::Simple; #search term to find my $search_term = 'breast cancer'; #maximum number of results to retrieve my $retmax = 10; my $utils = 'http://www.ncbi.nlm.nih.gov/entrez/eutils'; my $db_name = 'pmc'; # Submit the search and retrieve the XML based results my $esearch_result = get( $utils . '/esearch.fcgi?db=' . $db_name . '&retmax='.$retmax.'&term=' . $search_term ); # paper IDs my @ids = ($esearch_result =~ m|.*<Id>(.*)</Id>.*|g); #loop through all the ids # get individual papers (if not, then abstacts) foreach my $id (@ids) { #get all details for each paper - full text if available my $efetch = $utils . '/efetch.fcgi?db=' . $db_name . '&id=' . $id; #prints out to a xml file (file name generated from database name and current paper ID) open(OUTFILE, ">$db_name$id.xml"); print OUTFILE get($efetch); close OUTFILE; }