免费视频淫片aa毛片_日韩高清在线亚洲专区vr_日韩大片免费观看视频播放_亚洲欧美国产精品完整版

打開APP
userphoto
未登錄

開通VIP,暢享免費電子書等14項超值服

開通VIP
OrthoMCL鑒定物種同源基因 (安裝+使用)


Orthologs are homologs separated by speciation events. Paralogs arehomologs separated by duplication events. Detection of orthologs isbecoming much more important with the rapid progress in genomesequencing.

OrthoMCL is a genome-scale algorithm for grouping orthologous proteinsequences. It provides not only groups shared by two or morespecies/genomes, but also groups representing species-specific geneexpansion families. So it serves as an important utility for automatedeukaryotic genome annotation. OrthoMCL starts with reciprocal besthits within each genome as potential in-paralog/recent paralog pairsand reciprocal best hits across any two genomes as potential orthologpairs. Related proteins are interlinked in a similarity graph. ThenMCL (Markov Clustering algorithm, Van Dongen 2000; www.micans.org/mcl)is invoked to split mega-clusters. This process is analogous to themanual review in COG construction. MCL clustering is based on weightsbetween each pair of proteins, so to correct for differences inevolutionary distance the weights are normalized before running MCL.

OrthoMCL is similar to the INPARANOID algorithm (Remm, Storm et al.2001),  but is extended to cluster orthologs from multiplespecies. OrthoMCL clusters are coherent with groups identified by EGO(Lee,  Sultana et al. 2002),  and an analysis using EC number suggestsa high degree of reliability (Li,  Stoeckert et al. 2003).

In a recent assessment (Chen,  et al. 2007),  the performance of sevenwidely used orthology detection algorithms,  representing threekinds of strategies (phylogeny-based,  evolutionary distance-basedand BLAST-based),  are evaluated using the statisticaltechnique Latent Class Analysis (LCA). LCA is useful when thereare large data sets available but no gold standard. The resultsshow an overall trade-off between sensitivity and specificityamong these algorithms,  with INPARANOID and OrthoMCL as the twobest methods having both False Positive (FP) and False Negative(FN) error rates lower than 20%.

安裝和使用

  • 統(tǒng)一配置環(huán)境變量,一勞永逸

    • export PERL5LIB=${PERL5LIB}:~/perl5lib/加到~/.bashrc

    • export PATH=${PATH}:~/bin 加到 ~/.bashrc

    • 環(huán)境變量配置:在系統(tǒng)中新建目錄 ~/bin,將其完整路徑加入到環(huán)境變量。

    • PERL5LIB配置:在系統(tǒng)中新建目錄 ~/perl5lib,將其完整路徑加入到環(huán)境變量。

    • 更新環(huán)境變量配置 source ~/.bashrc

  • mcl安裝

    wget http://www.micans.org/mcl/src/mcl-latest.tar.gz
    tar xvzf mcl-latest.tar.gz
    cd mcl-latest
    ./configure --prefix=`pwd`/../mcl_bin
    make
    make install
    ln -s `pwd`/../mcl_bin/bin/* ~/bin/
  • orthoMCL安裝

    wget http://orthomcl.org/common/downloads/software/v2.0/orthomclSoftware-v2.0.9.tar.gz
    tar xvzf orthomclSoftware-v2.0.9.tar.gz
    cd orthomclSoftware-v2.0.9
    ln -s `pwd`/bin/* ~/bin/
    ln -s `pwd`/lib/perl/* ~/perl5lib
  • 配置Mysql數(shù)據(jù)庫

    • 新建名字為orthomcl的數(shù)據(jù)庫 CREATE DATABASE orthomcl;

    • 新建用戶orthomcl,密碼為152108, 該用戶對數(shù)據(jù)庫orthomcl有完全操作權(quán)限

      GRANT SELECT,INSERT,UPDATE,DELETE,CREATE VIEW,
      CREATE,INDEX,DROP on orthomcl.* TO 'orthomcl'@'localhost'
      IDENTIFIED BY '152108';


      FLUSH PRIVILEGES;
    • 若啟動失敗,查看log文件 /var/log/mysqld.log中的錯誤信息。

    • /usr/libexec/mysqld: Can't change dir to [Error code 13]

    • 確保datadir的所有上層目錄有x屬性

    • 若依然啟動不了,在終端運行setenforce 0 關(guān)閉SELINUX

    • 查看mysql服務(wù) service mysqld status

    • 關(guān)掉mysql服務(wù) service mysqld stop

    • 移動數(shù)據(jù)庫目錄到目標(biāo)位置

    • mkdir ~/mysql; chown mysql:mysql ~/mysql

    • mv /var/lib/mysql/* ~/mysql/

    • /etc/my.cnf文件中修改datadir~/mysql

    • mysql -uroot登錄mysql數(shù)據(jù)庫

    • 在mysql操作界面依次輸入sql語句

      SET PASSWORD=PASSWORD("passwd");

      FLUSH PRIVILEGES;

    • yum install mysql mysql-server

    • 安裝mysql數(shù)據(jù)庫

    • 設(shè)置mysql根用戶的密碼

    • 因為OrthoMCL運行時需要較大的存儲空間,而我的根目錄下空間不夠,因此需要更換數(shù)據(jù)庫目錄;如果根目錄下空間足夠,則不需要這部分操作。

    • 修改/etc/my.cnf配置文件

      [mysqld]datadir=~/mysql#[OPTIMIZATION]##Set this value to 50% of available RAM if your environment permits.myisam_sort_buffer_size=60G##[OPTIMIZATION]##This value should be at least 50% of free hard drive space. Use#caution if setting it to 100% of free space however. Your hard disk#may fill up!myisam_max_sort_file_size=200G##[OPTIMIZATION]##Our default of 2G is probably fine for this value. Change this value#only if you are using a machine with little resources available. read_buffer_size=2G
    • 啟動mysql服務(wù) service mysqld start

    • 新建用戶和數(shù)據(jù)庫

    • centos7中使用mariadb取代了mysql, 但所有命令的執(zhí)行相同 (忽略掉這一段)

      yum install mariadb mariadb-server
      systemctl start mariadb ==> 啟動mariadb
      systemctl enable mariadb ==> 開機自啟動
      mysql_secure_installation ==> 設(shè)置 root密碼等相關(guān)
      mysql -uroot -pPASSWD ==> 測試登錄!
  • 配置OrthoMCL工作文件

    • orthomclInstallSchema orthomcl.config inst_schema.log species

    • 建一個目錄 (~/orthmcl_work),存儲OrthoMCL配置文件

    • 拷貝orthomclSoftware-v2.0.9/doc/OrthoMCLEngine/Main/orthomcl.config.template到~/orthmcl_work,重命名為orthomcl.config

    • 修改內(nèi)容為:

      # this config assumes a mysql database named 'orthomcl'. # Adjust according to your situation.dbVendor=mysql #Databsename: orthmcldbConnectString=dbi:mysql:orthomcl#Database usernamedbLogin=orthomcl#Database passworddbPassword=152108# Change strings as you likesimilarSequencesTable=SimilarSequencesorthologTable=OrthologinParalogTable=InParalogcoOrthologTable=CoOrtholog#StandardsinterTaxonMatchView=InterTaxonMatchpercentMatchCutoff=50evalueExponentCutoff=-5oracleIndexTblSpc=NONE
    • 生成數(shù)據(jù)表:

  • 創(chuàng)建OrthoMCL輸入文件

    • orthomclFilterFasta orthlMCL 10 20

    • OrthoMCL的輸入文件為fasta格式文件,其中fasta序列的名字格式為>taxoncode|unique_prot_id。序列名稱為空格或下劃線分開的兩列,第一列為3到4個字母的物種代碼,第二列為蛋白序列的唯一ID。

    • 通常一個基因選擇一條代表性蛋白序列。

    • 這些文件使用統(tǒng)一后綴.fasta,并存儲于同一文件夾orthlMCL下 (這個文件夾下只能存儲fasta格式序列,不然運行 orthomclBlastParser時會報錯)。

    • 序列過濾,允許最短的蛋白長度為10,stop codons最大比例為20%,默認得到goodProteins.fasta。

    • 將得到的goodProteins.fasta與orthoMCL的數(shù)據(jù)合并,得到orthoMCL.fa。

    • 通常我們需要準(zhǔn)備研究物種及其多個近緣或者有代表性物種的蛋白質(zhì)序列,因此可不與orthoMCL數(shù)據(jù)庫中的蛋白質(zhì)序列合并,直接用我們的goodProteins.fasta作為orthoMCL.fa。

  • 序列BLAST

    makeblastdb -in orthoMCL.fa -dbtype prot -title orthomcl \
       -out orthomcl -logfile orthomcl.log`
    blastp -db orthomcl -query goodProteins.fasta -seg yes \
       -out orthomcl.blastout -evalue 1e-5 -outfmt 7 -num_threads 70`
  • 略卻其它步驟,都整合到一個bash腳本中。

  • 整合的分析腳本orthoMcl.sh

  Usage:

 /MPATHB/self/NGS/orthoMcl.sh options

 Function:

 This script is used to perform orthoMcl analysis using MySql, MCL and
 orthomcl.

 Before running this script, one must have one mysql database and a
 mysql user which can perform operation on this database.


 OPTIONS:
     -d    Mysql database name (using user_name as prefix to avoid
         duplication) [Necessary]
     -u    Mysql database username [Necessary]
     -p    Mysql database password [Necessary]
     -s    Target species of this analysis
         (Any representing string is OK, the shorter the better)
         [Necessary]
     -D    A directory containing FASTA files for all proteins.
         [Necessary]
     -S    Sequences downloaded from orthMCL website.
         [Optional, not used anymore]
     -t    Number of threads for blast. [Default 50]
  • parseOrthoMclResult.py解析orthoMCL的輸出結(jié)果,主要是groups.xls文件

    • 獲得每個物種各個基因簇中基因數(shù)目的矩陣。

    • 提取在所有物種中都只有一個拷貝的基因,提交給工具orthoMclPhyloGenetic.py用于做進化分析。

    • 提取特定物種特有的基因簇。

    • 提取多個物種共有相對于其它物種特異的基因簇。

    • 提取某物種特異擴增或缺失的基因家族。

    • parseOrthoMclResult.py

      Program description:          This is designed to parse orthmcl results.          Input file format:          cluster_name<colon><any blank>spe1<vertical_line>prot1<any blank>spe2<verticial_line>prot2<any blank>.....          C10000: Aco|Aco000153.1 Aco|Aco004369.1 Aco|Aco010005.1          C10001: Aco|Aco000153.1 Cla|Cla004369.1 Dec|Dec010005.1          Tasks:          1. Get a matrix showing the number of proteins in each cluster.          2. Extract single gene clusters and their sequences in all given          species. In the output nucleotide file, ending stop codon (TAA,          TAG, TGA) will be removed for compatible with          `translatorx_vLocal.pl` and `trimal`.          3. Extract species specific clusters for given species.          4. Extract gene-expansion clusters for given species.          5. Extract multiple-species specific clusters.      Usage: parseOrthoMclResult.py -i file      Options:        -h, --help            show this help message and exit        -i FILEIN, --input-file=FILEIN                              Output of `orthomclMclToGroups`.        -t MAIN_SPE, --target-species=MAIN_SPE                              Specify the `species` name used for extracting species                              specific clusters or specially expanded clusters.        -E EXCLUDE_WHEN_READING, --exclude-all=EXCLUDE_WHEN_READING                              Comma or blank separated strings representing species                              excluded when reading in the result. It will affect                              all tasks. Default including all species.        -e EXCLUDE_SINGLE_CONSERVE, --exclude-2=EXCLUDE_SINGLE_CONSERVE                              Comma or blank separated strings representing species                              should not be considered when performing task <2>.                              Default including all species.        -s SPECIFIC_MULTIPLE, --specific-multiple-5=SPECIFIC_MULTIPLE                              Comma or blank separated strings representing multiple                              species used for task <5>. Default muting task 5.        -P DIR_PROT, --directory-prot=DIR_PROT                              Directory containing all protein sequences used for                              `orthoMcl.sh`. All sequences have a suffix `.fasta`.        -N DIR_NUCL, --directory-nucl=DIR_NUCL                              Directory containing all nucleotide sequences used for                              `orthoMcl.sh`. All sequences have a suffix `.fasta`.        -o OUTP, --output-prefix=OUTP                              Prefix for output files.        -v, --verbose         Show process information        -d, --debug           Debug the program



本站僅提供存儲服務(wù),所有內(nèi)容均由用戶發(fā)布,如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容,請點擊舉報。
打開APP,閱讀全文并永久保存 查看更多類似文章
猜你喜歡
類似文章
尋找同源基因工具OrthoMCL與OrthoFinder的安裝與使用
Cell:20種宏基因組學(xué)物種分類工具大比拼
一.MySQL入門基礎(chǔ)
Mysql中文亂碼問題完美解決方案
Openfire使用上的一些技巧
亞洲象北遷
更多類似文章 >>
生活服務(wù)
分享 收藏 導(dǎo)長圖 關(guān)注 下載文章
綁定賬號成功
后續(xù)可登錄賬號暢享VIP特權(quán)!
如果VIP功能使用有故障,
可點擊這里聯(lián)系客服!

聯(lián)系客服