Interface to BioMart databases (e.g. Ensembl, COSMIC ,Wormbase and Gramene)
Bioconductor version: Release (3.0)
In recent years a wealth of biological data has become available in public data repositories. Easy access to these valuable data resources and firm integration with data analysis is needed for comprehensive bioinformatics data analysis. biomaRt provides an interface to a growing collection of databases implementing the BioMart software suite (http://www.biomart.org). The package enables retrieval of large amounts of data in a uniform way without the need to know the underlying database schemas or write complex SQL queries. Examples of BioMart databases are Ensembl, COSMIC, Uniprot, HGNC, Gramene, Wormbase and dbSNP mapped to Ensembl. These major databases give biomaRt users direct access to a diverse set of data and enable a wide range of powerful online queries from gene annotation to database mining.
Author: Steffen Durinck <durincks at gene.com>, Wolfgang Huber
Maintainer: Steffen Durinck <durincks at gene.com>
Citation (from within R, enter citation("biomaRt")):
Installation
To install this package, start R and enter:
source("http://bioconductor.org/biocLite.R")biocLite("biomaRt")Documentation
To view documentation for the version of this package installed in your system, start R and enter:
browseVignettes("biomaRt")
PDFR ScriptThe biomaRt users guide
PDF Reference Manual
Details
biocViews
Annotation,
SoftwareVersion2.22.0
In Bioconductor sinceBioC 1.6 (R-2.1) or earlier
LicenseArtistic-2.0
Dependsmethods
Importsutils,
XML,
RCurl,
AnnotationDbiSuggests
annotateSystem Requirements
URL
Depends On Me
ChIPpeakAnno,
customProDB,
dagLogo,
domainsignatures,
DrugVsDisease,
Fletcher2013b,
genefu,
GenomeGraphs,
MineICA,
PSICQUIC,
Roleswitch,
Sushi,
VegaMCImports Me
affycoretools,
ArrayExpressHTS,
cobindR,
customProDB,
DEXSeq,
DOQTL,
easyRNASeq,
GenomicFeatures,
GOexpress,
Gviz,
HTSanalyzeR,
IdMappingRetrieval,
KEGGprofile,
MEDIPS,
metaseqR,
methyAnalysis,
oposSOM,
phenoTest,
R453Plus1Toolbox,
RNAither,
SeqGSEASuggests Me
BiocCaseStudies,
ccTutorial,
DEGreport,
GeneAnswers,
Genominator,
h5vc,
isobar,
leeBamViews,
massiR,
MineICA,
MiRaGE,
oneChannelGUI,
paxtoolsr,
Pbase,
piano,
Rcade,
RforProteomics,
RIPSeeker,
RnaSeqTutorial,
rTANDEM,
rTRM,
ShortRead,
SIM,
systemPipeR,
trackViewerPackage Archives
Follow
Installation instructions to use this package in your R session.
Package Source
biomaRt_2.22.0.tar.gzWindows Binary
biomaRt_2.22.0.zip (32- & 64-bit)
Mac OS X 10.6 (Snow Leopard)
biomaRt_2.22.0.tgzMac OS X 10.9 (Mavericks)
biomaRt_2.22.0.tgzBrowse/checkout source
(username/password: readonly)Package Downloads Report
Download Statshttp://www.bioconductor.org/packages/release/bioc/html/biomaRt.html
使用BiomaRt獲得在線(xiàn)注釋信息 4
25 七 2013 |
程序員 Tags:
教程 ·
生物信息學(xué)完整的生物信息學(xué)分析步驟往往會(huì)包含注釋工作。在Bioconductor中,最方便的辦法是使用注釋包。注釋資源除了以包的形式進(jìn)行封裝外,還可以通過(guò)諸如BiomaRt等工具獲取在線(xiàn)的注釋數(shù)據(jù)。使用在線(xiàn)資源為我們提供了更加及時(shí)以及豐富的注釋資源。那么,什么是BiomaRt呢?如何理解和使用BiomaRt呢?
為了更好的理解和掌握biomaRt,我們可以先通過(guò)在線(xiàn)資源來(lái)了解一下它的原型biomart (http://www.biomart.org)。 biomart是為生物科研提供數(shù)據(jù)服務(wù)的免費(fèi)軟件,它為數(shù)據(jù)下載提供打包方案。它有許多成功的應(yīng)用實(shí)例,比如歐洲生物信息學(xué)中心(The European Bioinformatics Institute ,EBI)維護(hù)的Ensembl數(shù)據(jù)庫(kù)(http://www.ensembl.org/)就使用biomart提供數(shù)據(jù)批量下載服務(wù), 還有COSMIC, Uniprot, HGNC, Gramene, Wormbase以及dbSNP等。
我們首先點(diǎn)擊Ensembl主頁(yè)上導(dǎo)航菜單中的
BioMart鏈接可以進(jìn)入下圖所示的頁(yè)面。我們可以通過(guò)頁(yè)面下方的優(yōu)酷鏈接查看視頻教程。
這個(gè)頁(yè)面是biomart提供的默認(rèn)風(fēng)格,布局分三個(gè)部分:主菜單,左側(cè)導(dǎo)航條,右側(cè)信息顯示以及具體表單區(qū)。首先在頁(yè)面左側(cè)從上至下依次選擇所需的數(shù)據(jù)源(dataset),過(guò)濾器(filters)以及數(shù)據(jù)組成(attributes)。
之后就可以點(diǎn)擊主菜單中的結(jié)果(Results)按鈕來(lái)查看結(jié)果了。我們可以看到,在Attributes中選中的每一項(xiàng)都會(huì)以列名的形式顯示出來(lái)。在這一頁(yè)中我們可以選擇格式后點(diǎn)擊GO按鈕下載。
有了上面的介紹,我們就可以開(kāi)始了解如何使用biomaRt軟件包了。我們的任務(wù)是使用biomaRt實(shí)現(xiàn)基因名與Entrez Id及Ensemble ID之間的注釋。來(lái)看代碼:
> biocLite("biomaRt") #使用bioconnductor的biocLite安裝biomaRt包> library("biomaRt") #載入biomaRt包> mart <- useMart("ensembl", "hsapiens_gene_ensembl")> entrez <- c("673","7157","837")> getBM(attributes=c("entrezgene","hgnc_symbol", "ensembl_gene_id", "affy_hg_u133_plus_2"),+ filters = "entrezgene",+ values = entrez,+ mart = mart) entrezgene hgnc_symbol ensembl_gene_id affy_hg_u133_plus_21 673 BRAF ENSG00000157764 206044_s_at2 673 BRAF ENSG00000157764 236402_at3 673 BRAF ENSG00000157764 243829_at4 7157 TP53 ENSG00000141510 5 7157 TP53 ENSG00000141510 211300_s_at6 7157 TP53 ENSG00000141510 201746_at7 837 CASP4 ENSG00000196954 209310_s_at8 837 CASP4 ENSG00000196954 9 837 CASP4 ENSG00000196954 213596_at
從上面的操作來(lái)看,使用biomaRt只需要兩步,1,指定mart數(shù)據(jù)庫(kù),2,使用getBM獲得注釋。但是首先,我們?nèi)绾沃烙心男┓?wù)器,以及這些服務(wù)器上哪些數(shù)據(jù)庫(kù)呢?其次,我們?nèi)绾潍@陽(yáng)getBM中attributes,filters的正確設(shè)置呢?
關(guān)于第一個(gè)問(wèn)題,我們可以使用biomaRt中的listMarts以及l(fā)istDatasets兩個(gè)函數(shù)來(lái)解決。
> marts <- listMarts(); head(marts) #查看當(dāng)前可用的數(shù)據(jù)源 biomart version1 ensembl ENSEMBL GENES 72 (SANGER UK)2 snp ENSEMBL VARIATION 72 (SANGER UK)3 functional_genomics ENSEMBL REGULATION 72 (SANGER UK)4 vega VEGA 52 (SANGER UK)5 fungi_mart_18 ENSEMBL FUNGI 18 (EBI UK)6 fungi_variations_18 ENSEMBL FUNGI VARIATION 18 (EBI UK)> ensembl <- useMart("ensembl") #使用ensembl數(shù)據(jù)源> datasets <- listDatasets(ensembl); datasets[1:10,] #查看ensembl中可用數(shù)據(jù)庫(kù) dataset description version1 oanatinus_gene_ensembl Ornithorhynchus anatinus genes (OANA5) OANA52 tguttata_gene_ensembl Taeniopygia guttata genes (taeGut3.2.4) taeGut3.2.43 cporcellus_gene_ensembl Cavia porcellus genes (cavPor3) cavPor34 gaculeatus_gene_ensembl Gasterosteus aculeatus genes (BROADS1) BROADS15 lafricana_gene_ensembl Loxodonta africana genes (loxAfr3) loxAfr36 itridecemlineatus_gene_ensembl Ictidomys tridecemlineatus genes (spetri2) spetri27 mlucifugus_gene_ensembl Myotis lucifugus genes (myoLuc2) myoLuc28 hsapiens_gene_ensembl Homo sapiens genes (GRCh37.p11) GRCh37.p119 choffmanni_gene_ensembl Choloepus hoffmanni genes (choHof1) choHof110 csavignyi_gene_ensembl Ciona savignyi genes (CSAV2.0) CSAV2.0
對(duì)于第二個(gè)問(wèn)題,我們使用biomaRt中的listFilters以及l(fā)istAttributes兩個(gè)函數(shù)來(lái)解決。
> mart <- useMart("ensembl", "hsapiens_gene_ensembl")> filters <- listFilters(mart); filters[grepl("entrez", filters[,1]),] name description38 with_entrezgene with EntrezGene ID(s)122 entrezgene EntrezGene ID(s) [e.g. 100287163]> attributes <- listAttributes(mart); attributes[grepl("^ensembl|hgnc", attributes[,1]), ] name description1 ensembl_gene_id Ensembl Gene ID2 ensembl_transcript_id Ensembl Transcript ID3 ensembl_peptide_id Ensembl Protein ID4 ensembl_exon_id Ensembl Exon ID51 hgnc_id HGNC ID(s)52 hgnc_symbol HGNC symbol53 hgnc_transcript_name HGNC transcript name134 ensembl_gene_id Ensembl Gene ID135 ensembl_transcript_id Ensembl Transcript ID136 ensembl_peptide_id Ensembl Protein ID162 ensembl_exon_id Ensembl Exon ID165 ensembl_gene_id Ensembl Gene ID166 ensembl_transcript_id Ensembl Transcript ID167 ensembl_peptide_id Ensembl Protein ID175 ensembl_gene_id Ensembl Gene ID176 ensembl_transcript_id Ensembl Transcript ID177 ensembl_peptide_id Ensembl Protein ID1616 ensembl_gene_id Ensembl Gene ID1617 ensembl_transcript_id Ensembl Transcript ID1618 ensembl_peptide_id Ensembl Protein ID1691 ensembl_gene_id Ensembl Gene ID1706 ensembl_transcript_id Ensembl Transcript ID1707 ensembl_peptide_id Ensembl Protein ID1715 ensembl_exon_id Ensembl Exon ID
最后的問(wèn)題是,biomaRt會(huì)被如何使用呢?我們做注釋的時(shí)候,怎么就想到要使用biomaRt呢?因?yàn)樵谧⑨屔?,各種ID,symbol, name之間的轉(zhuǎn)換都可以考慮使用biomaRt來(lái)做。更重要的是,biomaRt還會(huì)有很多SNP, alternative splicing, exon, intron, 5’utr, 3’utr等等信息。當(dāng)然,只要能做也數(shù)據(jù)庫(kù)并使用SQL訪問(wèn)的數(shù)據(jù)都可以使用biomaRt來(lái)獲取。所以我們的思路可以更加發(fā)散一些。
http://pgfe.umassmed.edu/ou/archives/3281