24
yrosequencing for Metagenomics: ccessing and organizing raw dat Giuseppe D’Auria FISABIO, Valencia Norwich 08-12 September 2014

Pyrosequencing for Metagenomics: accessing and organizing raw data Giuseppe D’Auria FISABIO, Valencia Norwich 08-12 September 2014

Embed Size (px)

Citation preview

  • Pyrosequencing for Metagenomics: accessing and organizing raw data Giuseppe DAuriaFISABIO, ValenciaNorwich 08-12 September 2014

  • We will start from a single sff (standard flowgram format) file containing a metagenome and a metatranscriptome experiments labelled by two MIDs (Multiplex Identifiers)Practice workflow

  • Practice workflowWe will start from a single sff (standard flowgram format) file containing a metagenome and a metatranscriptome experiments labelled by two MIDs (Multiplex Identifiers)

  • Extracting MIDs

    FASTA file Fasta Qual mid_fasta_fileIdentify Mids and separateFasta and Fasta Qualityfilesbin_fasta_on_mid_primers.pl SFFExtract fasta and quality files belonging to each datasethttp://sourceforge.net/projects/mira-assembler/files/MIRA/

  • Open the terminalout_midi_CCAACC Metagenomeout_midi_CGCCAT MetatranscriptomeExtract fasta and quality files belonging to each dataset

  • Open the terminalExtract fasta and quality files belonging to each dataset

  • We will start from a single sff (standard flowgram format) file containing a metagenome and a metatranscriptome experiments labelled by two MIDs (Multiplex Identifiers)Practice workflow

  • Open the terminalMapping and recruitment graphhttp://mummer.sourceforge.net/

  • We will start from a single sff (standard flowgram format) file containing a metagenome and a metatranscriptome experiments labelled by two MIDs (Multiplex Identifiers)Practice workflow

  • Assembly protocol via MIRAembo@embo-VirtualBox:~/data/project2/metage$ # Linking metagenome file for assemblyembo@embo-VirtualBox:~/data/project2/metage$ ln -s out_midi_CCAACC.fasta metage_in.454.fastaembo@embo-VirtualBox:~/data/project2/metage$ ln -s out_midi_CCAACC.fasta.qual metage_in.454.fasta.qualembo@embo-VirtualBox:~/data/project2/metage$ ln -s ../dataset2.xml metage_traceinfo_in.454.xml embo@embo-VirtualBox:~/data/project2/metage$ # Start denovo assemblyembo@embo-VirtualBox:~/data/project2/metage$ mira --project=metage --job=denovo,genome,draft,454 454_SETTINGS -LR:ft=fasta

    embo@embo-VirtualBox:~/data/project2/metage$ # Goto results folderembo@embo-VirtualBox:~/data/project2/metage$ cd metage_assemblyembo@embo-VirtualBox:~/data/project2/metage/metage_assembly$ cd metage_d_results

    embo@embo-VirtualBox:~/data/project2/metage/metage_assembly/metage_d_results$ # Take a look at the resultsembo@embo-VirtualBox:~/data/project2/metage/metage_assembly/metage_d_results$ tablet metage_out.ace &

  • We will start from a single sff (standard flowgram format) file containing a metagenome and a metatranscriptome experiments labelled by two MIDs (Multiplex Identifiers)Practice workflow

  • Searching for rRNAsembo@embo-VirtualBox:~/data/project2/metage/metage_assembly/metage_d_results$ cd ../../../ embo@embo-VirtualBox:~/data/project2$ cd metatra

    embo@embo-VirtualBox:~/data/project2/metatra$ # Link needed filesembo@embo-VirtualBox:~/data/project2/metatra$ ln -s out_midi_CGCCAT.fasta metatra.fas

    embo@embo-VirtualBox:~/data/project2/metatra$ # Searching for 16S sequencesembo@embo-VirtualBox:~/data/project2/metatra$ rna_hmm3.py -i metatra.fas -m ssu -o metatra_16S -L embo@embo-VirtualBox:~/data/project2/metatra$ ../../References/hmm3

    embo@embo-VirtualBox:~/data/project2/metatra$ # Extract 16S sequences from the 16S tableembo@embo-VirtualBox:~/data/project2/metatra$ extract_sequences_by_list.pl -f metatra.fas -t metatra_16S -c 0 -o -d 1

  • Practice workflowWe will start from a single sff (standard flowgram format) file containing a metagenome and a metatranscriptome experiments labelled by two MIDs (Multiplex Identifiers)

  • Clusteringembo@embo-VirtualBox:~/data/project2/metatra$ # Filtering out chimerasembo@embo-VirtualBox:~/data/project2/metatra$ #ChimeraSlayer.pl --query_FASTA 16S.list.fasta

    embo@embo-VirtualBox:~/data/project2/metatra$ # Clustering 16S sequencesembo@embo-VirtualBox:~/data/project2/metatra$ cdhit -i 16S.list.fasta -o 16Sc90s90 -c 0.9 -s 0.9 -bak 1embo@embo-VirtualBox:~/data/project2/metatra$ cd-hit_translate.pl 16Sc90s90.bak.clstr > 16S.tab

  • Practice workflowWe will start from a single sff (standard flowgram format) file containing a metagenome and a metatranscriptome experiments labelled by two MIDs (Multiplex Identifiers)

  • embo@embo-VirtualBox:~/data/project2/metatra$ # 16S assignation by RDP_classifieembo@embo-VirtualBox:~/data/project2/metatra$ java -jar ~/Software/rdp_classifier_2.2/rdp_classifier-2.2.jar -q 16S.remain.fasta -o 16S_rdp -f fixrank

    Annotate 16S rRNAhttp://rdp.cme.msu.edu/index.jsp

  • Practice workflowWe will start from a single sff (standard flowgram format) file containing a metagenome and a metatranscriptome experiments labelled by two MIDs (Multiplex Identifiers)

  • embo@embo-VirtualBox:~/data/project2/metatra$ # Searching for tRNAsembo@embo-VirtualBox:~/data/project2/metatra$ tRNAscan-SE -B 16S.remain.fasta > tRNAs.tab

    embo@embo-VirtualBox:~/data/project2/metatra$ # Extract tRNAs sequences from the tRNAs tableembo@embo-VirtualBox:~/data/project2/metatra$ extract_sequences_by_list.pl -f 16S.remain.fasta -t tRNAs.tab -c 0 -o tRNAs -d 1

    Searching for tRNAs

  • Running out of physical limits

  • http://www.perl.org/For INTREPID and BRAVE people

  • Perl is a scripting language widely used for system administration and programming on the World Wide Web.

    It originated in the UNIX community and has a strong UNIX slant, but usage on Windows has grown rapidly.

    ActivePerl is a quality-assured binary distribution of Perl for popular UNIX platforms and Windows.

    perl (small 'p') is the program used to interpret the Perl language.

  • http://www.r-project.org/For INTREPID and BRAVE people IIR is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS.

  • Thank you again for your attention..........

    *********************