4
Clean up sequences with multiple >GI numbers when downloaded from NCBI BLAST website [ Example of one sequence and the duplication clean up for phylo tree will not work!!!! >gi|565476349|ref|XP_006295815.1| hypothetical protein CARUB_v10024941mg [Capsella rubella] >gi|482564523|gb|EOA28713.1| hypothetical protein CARUB_v10024941mg [Capsella rubella] MASEQARRDNKVTEREVKVEKDRVPKMTSHFESMADKGKDSEMQRHQTEGGDTQFVSLSDKGSNMPVSDEGEGETKMKRT] These are the non-redundant definition lines from the BLAST database. Try downloading the sequences through the GenPept link. This will take you to the Entrez system. When you follow the link to FASTA there, you will get only the first title. In the Descriptions section, select the ones you want. Then click the GenPept link at the top. Once in the Protein system, use the "Send to" menu to download the sequences in FASTA format. On the next slides I pointed to the places where those links are:::

Clean up sequences with multiple >GI numbers when downloaded from NCBI BLAST website [ Example of one sequence and the duplication clean up for phylo tree

Embed Size (px)

Citation preview

Page 1: Clean up sequences with multiple >GI numbers when downloaded from NCBI BLAST website [ Example of one sequence and the duplication clean up for phylo tree

Clean up sequences with multiple >GI numbers when downloaded from NCBI BLAST website

[ Example of one sequence and the duplication clean up for phylo tree will not work!!!!>gi|565476349|ref|XP_006295815.1| hypothetical protein CARUB_v10024941mg [Capsella rubella] >gi|482564523|gb|EOA28713.1| hypothetical protein CARUB_v10024941mg [Capsella rubella]MASEQARRDNKVTEREVKVEKDRVPKMTSHFESMADKGKDSEMQRHQTEGGDTQFVSLSDKGSNMPVSDEGEGETKMKRT]

These are the non-redundant definition lines from the BLAST database. Try downloading the sequences through the GenPept link. This will take you to the Entrez system. When you follow the link to FASTA there, you will get only the first title. In the Descriptions section, select the ones you want. Then click the GenPept link at the top. Once in the Protein system, use the "Send to" menu to download the sequences in FASTA format.

On the next slides I pointed to the places where those links are:::

Page 2: Clean up sequences with multiple >GI numbers when downloaded from NCBI BLAST website [ Example of one sequence and the duplication clean up for phylo tree

After you select all, here is the link to GenPept

Page 3: Clean up sequences with multiple >GI numbers when downloaded from NCBI BLAST website [ Example of one sequence and the duplication clean up for phylo tree

Clink on the Send to a fileAnd the sequences will be Downloaded to the computer

Page 4: Clean up sequences with multiple >GI numbers when downloaded from NCBI BLAST website [ Example of one sequence and the duplication clean up for phylo tree

Send in FASTA