Book Review: Building Bioinformatics Solutions: with Perl, R and MySQL

1

Authors: Conrad Bessant, Ian Shadforth, Darren Oakley

I found this book really helpful during my MSc assignments, group project and thesis project. It gives a nice informative introduction for each tool then works through towards developing advanced bioinformatics applications.

This book presents a well-written and complete overview of bioinformatics software development using Perl, R and MySQL.

The book is divided into 5 chapters. The first chapter begins with an interesting introduction on various bioinformatics resources and applications. This is followed by useful and descriptive chapters on MySQL, Perl, R and web programming.

A highlight of the book is it’s well-structured layout. For each chapter, there is easy introduction followed by concise explanations to developing advanced bioinformatics applications and well commented examples of code plus accompanying online resources on it’s website.

As this book is titled “Building Bioinformatics Solutions: with Perl, R and MySQL”, and it does cover its ground very well, plus there is no assumtion of prior knowledge on software development – hence, it is also suitable for those with minimal programming experience.

In conclusion, this book is a very valuable resource for software development in bioinformatics, both for those that are already working in this domain and for those desiring to do so. It is also appropriate as a general introduction into the area of bioinformatics for interested and literate non-specialists.


Buy From Amazon

Bibliographic information from GoogleBooks
Title Building bioinformatics solutions with Perl, R and MySQL
Oxford biology
Authors Conrad BessantIan ShadforthDarren Oakley
Publisher Oxford University Press, 2009
ISBN 0199230234, 9780199230235
Length 241 pages
Subjects Bioinformatics
Business & Economics / Statistics
Computers / Bioinformatics
Computers / General
Computers / Programming Languages / General
Mathematics / Probability & Statistics / General
Open source software
Science / Applied Sciences

Google search patterns on ‘Sequencing’

0

Wonder why there is no recent activities here? I have been busy with my MSc thesis <wipes sweat> But on the bright side,I have gathered alot of articles on Next-generation Sequencing from my literature review. Blogs on NGS coming soon <watch this space>
Google insights results on ‘Sequencing’ is looking good too <no surprises there ;)>



Querying for summary and genbank file with a search term

0

Programming Language: Perl
Modules required: Bio::DB::EUtilities

EUtilities is a very useful resource to retrieve data from NCBI database.

This script retrieves a brief summary and the genbank file for genes that are relevant to a search word.

There is also an option to limit you number of results returns (very handy).

Bio::DB::EUtilities is required to execute the search.

This method is much faster than using Bio::DB::Query::GenBank module.

If you require more information than the summary output, you can always get it from the genbank file.
The summary does not provide information such as references and sequences.

Chapter 10 of ‘Beginning Perl for Bioinformatics’ is a good and simple reference for retrieving data from from files in genbank format. This resource is freely available here.

Note that you can query for proteins as well. (uncomment line 15)

Installing new modules: this can be easily done using cpan.
Here is the documentation for cpan.
Or, here is documentation for installing modules manually.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
#!/usr/bin/perl -w
 
use Bio::DB::EUtilities; 
use strict;
 
#specify search term 
my $search_term = 'breast cancer';			
 
 
#maximum number of results to retrieve
my $retmax = 10;
 
#first search for a list of genbank ids that match your search term
my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch',
#                           -db => 'protein',
                         -db => 'nucleotide',
                         -term => $search_term,
                         -retmax => $retmax);
 
#list of Genbank IDs
my @ids = $factory->get_ids;
 
#loop through the list of IDs
foreach my $id (@ids){
 
	# get a summary and print details
		$factory->reset_parameters(-eutil => 'esummary',-db => 'nucleotide',-id => $id); 
		my $ds = $factory->next_DocSum; 
 
		# print flattened mode from summary above 
		while (my $item = $ds->next_Item('flattened'))  { 
			# checks id itens has contents
			printf("%-20s:%s\n",$item->get_name,$item->get_content) if $item->get_content; 
		} 
 
 
	# download the full genbank file 
		$factory = Bio::DB::EUtilities->new(-eutil => 'efetch', 
								-db => 'nucleotide', 
								-id => $id, 
								-rettype => 'gbwithparts'); 								
		$factory->get_Response(-file => "$id.gb"); 
 
}

Querying for Articles from pmc database with a search terms

0

Programming Language: Perl
Modules required: LWP::Simple

This script retrieves the full articles (or the abstract if the full article are not available) that a relevant to a search term from the PubmedCentral (pmc) database.

The results are sorted in reverse chronological order.

There is also an option to limit you number of results returns (very handy).

The output is in xml format.

An example for a perl module to tranform XML files : XML:Simple

LWP::Simple is required to execute the search.

Installing new modules: this can be easily done using cpan.
Here is the documentation for cpan.
Or, here is documentation for installing modules manually.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#!/usr/bin/perl -w
 
use strict;
use LWP::Simple;
 
 
#search term to find
my $search_term = 'breast cancer';
 
#maximum number of results to retrieve
my $retmax = 10;
 
my $utils = 'http://www.ncbi.nlm.nih.gov/entrez/eutils';
my $db_name = 'pmc';
 
# Submit the search and retrieve the XML based results
my $esearch_result = get( $utils . '/esearch.fcgi?db=' . $db_name . '&retmax='.$retmax.'&term=' . $search_term );
 
# paper IDs
my @ids = ($esearch_result =~ m|.*<Id>(.*)</Id>.*|g);
 
#loop through all the ids
# get individual papers (if not, then abstacts)
foreach my $id (@ids) {
 
	#get all details for each paper - full text if available	
	my $efetch = $utils . '/efetch.fcgi?db=' . $db_name . '&id=' . $id;   
 
	#prints out to a xml file (file name generated from database name and current paper ID)
	open(OUTFILE, ">$db_name$id.xml");
	print OUTFILE get($efetch);
	close OUTFILE;
}

TIde – systematic identification of optimal drug targets

0

TIde is developed to systematically run an automated scan and identify drug targets from a kinetic network SBML model. It is still under development but if it does all that they say it will, we will have another addition to the list of the few breakthroughs that has brought more answers than questions. Oh, don’t get me wrong, I am a good fan of the inquisitive mind and all these unanswered questions is what that keeps this field as cutting-edge as it is.

But, it cannot be denied that amidst all this complexity, it is only simplicity that everyone finds interesting.

Anyway, back to TIde, an open source, platform independent tool to investigate ordinary differential equation models; a systematic approach could not only accelerate the drug research, but also save valuable of the time and money (resources that could be used elsewhere).

Good open source software  are the best things that have happened to the programming world.

Resources:

BioMart

0

Untill recently, I was vaguely aware of how powerful a tool Biomart is, but after my lectures and tutorials on Biomart, I have to say, I have become a big fan. The documentation to install the tool and create the database is pretty good, and the huge community to support new programmers is impressive. Plus the plugins and APIs…

This object-oriented data mining system that lets one build complex queries from the underlying databases, and it’s web interface aids non-programmers (biologists) query for data with lesser effort; plus once one has defined what he wants to download using Biomart, he can simply generate an XML file.

I have to accept that the tools isn’t flawless, but hey, it’s open source, we can clean it up ourself. Plus, it should not be overlooked that Biomart has just being around for 6 years and there is still so much work being put into it – most of the mojor issues will be dealt with in their next major release, which they are now re-writing in Java.

I am quiet excited to start some real work on Biomart…

Go to Top