13
NERC Biomolecular Analysis Facility - Sheffield 1 06/11/11 How to run STRUCTURE in the ICEBERG platform.doc NERC Biomolecular Analysis Facility - Sheffield APPENDIX. How to run STRUCTURE in the ICEBERG platform by Filipa Martins (24 th November 2011) Disclaimer: We have provided this protocol as a courtesy, in the hope that this will help you in your research. Although we have taken great care to record the protocol accurately, we do not accept any responsibility for its accuracy or for any costs or damages that you might suffer as a consequence of mistakes in this protocol. If you use this protocol it is implicit that you accept the conditions of this disclaimer. The following document represents one approach of many possible ones. These guides are meant to provide you with a quick introduction to various aspects of data analysis. They are not meant to replace the programme manuals and you still need to think about what the most appropriate way might be for analysing your particular data. For a full and complete understanding, you should also read the original publication associated with each programme. Please acknowledge NERC Biomolecular Analysis Facility-Sheffield and the Natural Environmental Research Council when using this protocol. Example acknowledgement: "The laboratory work was assisted by the protocol(s) provided by the NERC Biomolecular Analysis Facility - Sheffield which is funded by the Natural Environmental Research Council, UK."

NERC Biomolecular Analysis Facility - Sheffield APPENDIX. How …soria-carrasco.staff.shef.ac.uk/softrepo/wp-content/... · 2013-11-28 · NERC Biomolecular Analysis Facility - Sheffield

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: NERC Biomolecular Analysis Facility - Sheffield APPENDIX. How …soria-carrasco.staff.shef.ac.uk/softrepo/wp-content/... · 2013-11-28 · NERC Biomolecular Analysis Facility - Sheffield

NERC Biomolecular Analysis Facility - Sheffield

1 06/11/11 How to run STRUCTURE in the ICEBERG platform.doc

NERC Biomolecular Analysis Facility - Sheffield

APPENDIX. How to run STRUCTURE in the ICEBERG platform

by Filipa Martins (24th November 2011)

Disclaimer:

We have provided this protocol as a courtesy, in the hope that this will help you in your research.

Although we have taken great care to record the protocol accurately, we do not accept any

responsibility for its accuracy or for any costs or damages that you might suffer as a consequence of

mistakes in this protocol. If you use this protocol it is implicit that you accept the conditions of this

disclaimer.

The following document represents one approach of many possible ones.

These guides are meant to provide you with a quick introduction to various aspects of data

analysis. They are not meant to replace the programme manuals and you still need to think about

what the most appropriate way might be for analysing your particular data. For a full and

complete understanding, you should also read the original publication associated with each

programme.

Please acknowledge NERC Biomolecular Analysis Facility-Sheffield and the Natural

Environmental Research Council when using this protocol.

Example acknowledgement:

"The laboratory work was assisted by the protocol(s) provided by the NERC Biomolecular Analysis

Facility - Sheffield which is funded by the Natural Environmental Research Council, UK."

Page 2: NERC Biomolecular Analysis Facility - Sheffield APPENDIX. How …soria-carrasco.staff.shef.ac.uk/softrepo/wp-content/... · 2013-11-28 · NERC Biomolecular Analysis Facility - Sheffield

NERC Biomolecular Analysis Facility - Sheffield

2 06/11/11 How to run STRUCTURE in the ICEBERG platform.doc

Prepare your input files for ICEBERG

To run STRUCTURE in the ICEBERG platform you need the following batch of files:

1- Your GENOTYPE data file in .str data format (genotype.txt)

Use p.e. CONVERT software to transform your GENEPOP data format (.txt, .dat or .gen) in a

STRUCTURE format (.str).

2- MAINPARAMS file (mainparams.fil)

Default settings for the data file are Input file contains individual labels, Input file contains a

population identifier, value given to missing genotype data= –9, data file contains row of

marker names. Default settings for the program are length of burnin period= 500,000 and

number of MCMC reps after burnin= 1,000,000. These settings can be changed by opening

the file in WordPad.

3- EXTRAPARAMS file (extraparams.fil)

Main default settings are allele frequencies are correlated among pops, assume different

value of Fst for all subpopulations, use admixture model and use no prior population

information to assist clustering. These settings can be changed by opening the file in

WordPad.

4- Submission files for each K (sub_K1.sh, sub_K2.sh, sub_K3.sh, sub_K4.sh, ...)

You need to have a submission file per number of clusters (K) you want to analyse. In

sub_K1.sh you can find:

Command lines (highlighted)

-t, number of replicates/runs

of the respective K

-K, maximum number of

populations,

-L, number of loci,

-N, number of individuals,

-i, input file name

-o, output file name and

location.

Page 3: NERC Biomolecular Analysis Facility - Sheffield APPENDIX. How …soria-carrasco.staff.shef.ac.uk/softrepo/wp-content/... · 2013-11-28 · NERC Biomolecular Analysis Facility - Sheffield

NERC Biomolecular Analysis Facility - Sheffield

3 06/11/11 How to run STRUCTURE in the ICEBERG platform.doc

These command lines need to be altered accordingly with the name, content and location of

your genotype data file in the ICEBERG. File and program settings in the submission files (.sh)

overlay any similar information included in the mainparams.fil or extraparams.fil.

5- Likelihood summary submission file (get_results_K15.sh)

This file will summarise all likelihood values of each run and K in the file likelihood.txt.

Default commands summarise 10 runs of 15 clusters (K). If you are running more than 10

replicates or analysing more than 15 clusters, you need to update the given submission file

to include the likelihood values of the additional runs and/or clusters.

6- STRUCTURE application for Linux (structure.exe)

mainparams.fil, extraparams.fil, submission files set (sub_K1.sh to sub_K15.sh and

get_results_K15.sh) and the STRUCTURE application file for Linux (structure.exe) can be found at

F:\BO\TAB\bo4smgf\PROTOCOLS FOR DATA ANALYSIS - NBAF – KEEP\ . The file genotype.txt is given

as an example of genotype data file.

Page 4: NERC Biomolecular Analysis Facility - Sheffield APPENDIX. How …soria-carrasco.staff.shef.ac.uk/softrepo/wp-content/... · 2013-11-28 · NERC Biomolecular Analysis Facility - Sheffield

NERC Biomolecular Analysis Facility - Sheffield

4 06/11/11 How to run STRUCTURE in the ICEBERG platform.doc

Getting started in the ICEBERG platform

A) OPEN Exceed (installation disc provided by CICS)

Programs - Open Text Exceed 14 - Exceed Tools – Xstart

Note: You must have your Exceed password synchronized with your account password first – go to

https://www.shef.ac.uk/cics/password.

Insert your User ID and Password (provided by CICS) and Host and Command settings shown in the

figure above and click on to start. bo1fmm will be used as the User ID in this protocol. A new

command window will open:

Note: Entries are case sensitive.

Page 5: NERC Biomolecular Analysis Facility - Sheffield APPENDIX. How …soria-carrasco.staff.shef.ac.uk/softrepo/wp-content/... · 2013-11-28 · NERC Biomolecular Analysis Facility - Sheffield

NERC Biomolecular Analysis Facility - Sheffield

5 06/11/11 How to run STRUCTURE in the ICEBERG platform.doc

B) OPEN WinSCP (downloaded from the web)

Programs - WinSCP – WinSCP

If you have never created an account, select New and insert your User name and Host name shown

in the figure below. And select Login.

Insert your Password to get access to your WinSCP session area:

Page 6: NERC Biomolecular Analysis Facility - Sheffield APPENDIX. How …soria-carrasco.staff.shef.ac.uk/softrepo/wp-content/... · 2013-11-28 · NERC Biomolecular Analysis Facility - Sheffield

NERC Biomolecular Analysis Facility - Sheffield

6 06/11/11 How to run STRUCTURE in the ICEBERG platform.doc

On the left side of the window you can search your input files while on the right side you will create

folders and import your input files to be submitted in ICEBERG from the “left side”.

Page 7: NERC Biomolecular Analysis Facility - Sheffield APPENDIX. How …soria-carrasco.staff.shef.ac.uk/softrepo/wp-content/... · 2013-11-28 · NERC Biomolecular Analysis Facility - Sheffield

NERC Biomolecular Analysis Facility - Sheffield

7 06/11/11 How to run STRUCTURE in the ICEBERG platform.doc

Using STRUCTURE in the ICEBERG platform

Further on, you will be working with an ICEBERG window (opened with Xstart) and WinSCP

simultaneously.

In the ICEBERG window, access your main work area identified by your account username (bo1fmm

in this example) in the data folder by typing cd(space)/data/bo1fmm/ and open a new work “node”

window by typing qsh. The new window will be your main ICEBERG window.

In the WinSCP window, go to the main folder root using on the right side of the window and

confirm the presence of your account folder in the data folder (bo1fmm here). Refresh the folder if

needed.

Page 8: NERC Biomolecular Analysis Facility - Sheffield APPENDIX. How …soria-carrasco.staff.shef.ac.uk/softrepo/wp-content/... · 2013-11-28 · NERC Biomolecular Analysis Facility - Sheffield

NERC Biomolecular Analysis Facility - Sheffield

8 06/11/11 How to run STRUCTURE in the ICEBERG platform.doc

In the WinSCP window, create a new folder by clicking in Create Directory in the bottom of the

window and type the new folder name (STRUCTURE here).

Once inside your new folder, you can search for your files using the left side of the window.

Page 9: NERC Biomolecular Analysis Facility - Sheffield APPENDIX. How …soria-carrasco.staff.shef.ac.uk/softrepo/wp-content/... · 2013-11-28 · NERC Biomolecular Analysis Facility - Sheffield

NERC Biomolecular Analysis Facility - Sheffield

9 06/11/11 How to run STRUCTURE in the ICEBERG platform.doc

Copy all your files to the new folder by clicking Copy in the bottom of the window.

Page 10: NERC Biomolecular Analysis Facility - Sheffield APPENDIX. How …soria-carrasco.staff.shef.ac.uk/softrepo/wp-content/... · 2013-11-28 · NERC Biomolecular Analysis Facility - Sheffield

NERC Biomolecular Analysis Facility - Sheffield

10 06/11/11 How to run STRUCTURE in the ICEBERG platform.doc

Confirm if the directory location written in the submission files is the same as the one where your

files are being copied by double clicking in each submission file.

Change the settings of the command o- (highlighted) to match the directory location (arrow) if

needed.

Page 11: NERC Biomolecular Analysis Facility - Sheffield APPENDIX. How …soria-carrasco.staff.shef.ac.uk/softrepo/wp-content/... · 2013-11-28 · NERC Biomolecular Analysis Facility - Sheffield

NERC Biomolecular Analysis Facility - Sheffield

11 06/11/11 How to run STRUCTURE in the ICEBERG platform.doc

You need to activate the STRUCTURE application by typing chmod(space)777(space)structure.exe

and refreshing the window (confirm if the rights of the application changed from rw-r--r-- to

rwxrwxrwx).

Now you’re ready to start submitting your STRUCTURE jobs to ICEBERG. To submit the job for the

first K (sub_K1.sh) you need to type qsub(space)sub_k1.sh. Do the same for the others Ks.

Page 12: NERC Biomolecular Analysis Facility - Sheffield APPENDIX. How …soria-carrasco.staff.shef.ac.uk/softrepo/wp-content/... · 2013-11-28 · NERC Biomolecular Analysis Facility - Sheffield

NERC Biomolecular Analysis Facility - Sheffield

12 06/11/11 How to run STRUCTURE in the ICEBERG platform.doc

Check the status of your runs by typing Qstat. Running jobs are indicated by an r while queued jobs

are indicated by qw.

After all the jobs have finished you have a final output file for each run and K, identified by an “_f” in

the end (p.e. K1_1_f is the final output for the first run of the first K).

Next, you can summarise the likelihood values across all runs by typing

qsub(space)get_results_K15.sh.

Copy the content of the likelihood.txt file for an Excel sheet and plot the likelihood values as shown

below.

Page 13: NERC Biomolecular Analysis Facility - Sheffield APPENDIX. How …soria-carrasco.staff.shef.ac.uk/softrepo/wp-content/... · 2013-11-28 · NERC Biomolecular Analysis Facility - Sheffield

NERC Biomolecular Analysis Facility - Sheffield

13 06/11/11 How to run STRUCTURE in the ICEBERG platform.doc

As you can see, this is not a greatest likelihood plot since we can’t infer the number of genetic

clusters in the data. In this case, an Evanno’s correction should be applied (see NBAF Protocol 3 for

more details).