3
Overcoming the Challenges in Data Independent Acquisition (DIA) via High Resolution Accurate Mass Orbitrap Based Mass Spectrometer Yue Xuan, 1 Jan Muntel, 2 Sebastian T. Berger, 2 Andreas FR Huhme, 3 Hanno Steen, 2 Thomas Moehring 1 1 Thermo Fisher Scientific, Bremen, Germany; 2 Departments of Pathology, Boston Children’s Hospital, Boston, MA; 3 Thermo Fisher Scientific, San Jose, CA Poster Note 64405 Overview Purpose: Utilize Thermo Scientific TM Q Exactive TM HF mass spectrometer for DIA LC- MS/MS experiments to study the urinary proteome, and demonstrate how high resolution/ accurate mass spectrometry is employed to overcome DIA challenges and enable large scope of urine proteomics studies. Methods: Urine samples from 87 patients associated with seven different differential diagnoses were collected and trypsinized using a membrane-based processing method in a 96-well plate format. For generation of a spectral library all samples were analyzed by DDA methods. DIA analysis was performed on the Q Exactive HF MS to quantitatively mapping the urinary proteomes. Results: In this study, we have developed a robust DIA method for the comprehensive mapping of the urinary proteome that enables large scale urine proteomics studies with high throughput ( less than 4 days for 87 biological sample measurements) and excellent reproducibility (media of CV% is 7%). Introduction The promises of data independent acquisition (DIA) strategies are a comprehensive and reproducible data collection for large-scale quantitative proteomics experiments. However, the wide isolation window (usually >10Da) of DIA experiments co-isolates and co-fragments multiple peptides, resulting in highly complex DIA MS/MS spectra and makes the DIA data analysis challenging. To accurately identify and precisely quantify thousands of proteins per DIA experiment, the completeness and specificity of spectral library, the mass accuracy of the data, and the technical variance in quantitation play important roles. In this work, we utilize the Q Exactive HF mass spectrometer for DIA LC-MS/MS experiments to study the urinary proteome, and demonstrate how high resolution/ accurate mass spectrometry is employed to overcome these DIA challenges and enable large scope of urine proteomics studies.. Methods Sample Preparation Urine samples were collected from consenting patients visiting the Emergency Department at Boston Children's Hospital. Upon consent, urine samples from 87 patients associated with seven different differential diagnoses were collected (abdominal pain controls: n = 33, ovarian cyst: n = 12, mesenteric adenitis: n = 6, constipation n = 7, urinary tract infection; UTI: n = 11, gastritis: n = 6, gastroenteritis: n = 12) and trypsinized using a membrane-based processing method in a 96-well plate format. Liquid Chromatography For the spectral library, all 87 samples were analyzed by a nanoLC system equipped with a LC-chip system (cHiPLC nanoflex, Eksigent, trapping column: Nano cHiPLC Trap column 200 μm x 0.5 mm Reprosil C18, 3 μm, analytical column: Nano cHiPLC column 75 μm x 15 cm Reprosil C18, 3 μm) coupled online to a Q-Exactive mass spectrometer (Thermo Fisher Scientific). Peptides (4 μl of digest) were separated by a linear gradient from 93% buffer A (0.2% FA in water) / 7% buffer B (0.2% FA in ACN) to 75% buffer A / 25% buffer B within 75 min. DIA experiments on all 87 samples: Each sample was analyzed on a Thermo Scientific TM EASY-nLC TM 1000 nanoLC system equipped with a trapping column (Thermo Scientific TM PepMap TM 100, 75um x 2cm, C18, 3um) and an analytical column (PepMapRSLC, 75um x 25cm, C18, 2um) coupled to a Q Exactive HF mass spectrometer equipped with a Thermo Scientific TM EASY-Spray TM nanoelectrospray ion source. Peptides (2 μl of digest) were separated by a linear gradient from 93% buffer A (0.2% FA in water) / 7% buffer B (0.2% FA in ACN) to 75% buffer A / 25% buffer B within 30 min. The total run time with loading and washing steps was 50 min. Column oven was set to 40°C . Mass Spectrometry DIA method on QE HF: Each DIA duty cycle contains one full scan and 24DIA MS/MS scans to covering the mass range 400 -1000 TH. Full scan with a resolution 30,000 @ 200 Th; AGC target 3e6, maximal IT 50ms; mass range 400-1,000 Th; followed by DIA scans with resolution 30,000 @ 200 Th; isolation width 20 m/z for the first 20 DIA scans, 40 m/z for the following 2 DIA scans, and 60m/z for the last two DIA scans; NCE 30; target value 1e6, maximal injection time set to “auto”, which automatically calculates the maximal injection time based on the detection time to allow the mass spectrometer always operating in the parallel ion filling and detection mode. Data Analysis Database search in MaxQuant software, version 1.5.0.0 was performed directly with the RAW- and WIFF files using the human UniProt database. Trypsin with up to 2 missed cleavages; mass tolerances set to 20 ppm for the first search and 4.5 ppm for the second search for the Q-Exactive data and 0.1 Da for the first search and 0.01 for the main search for the TripleTOF 5600 data. Oxidation of M was chosen as dynamic modification (+15.995 Da) and carbamidomethylation of C as static modification (+57.021 Da). FDR was set to 1% on peptide and protein level. All DIA data were directly analyzed in Spectronaut 7.0 (Biognosys). dynamic score refinement and MS1 scoring enabled; interference correction and cross run normalization (total peak area) enabled; All results were filtered by a Q value of 0.01 (equals a FDR of 1% on peptide level). Protein intensity was calculated by summing the peptide intensities of each protein from the Spectronaut output file. The data were imported into Perseus 1.5.1.6 and missing values imputed by the minimum value for each protein. Significance of protein abundance changes were calculated using the u-test (non-parametric test) and protein with a p value below 0.01 were considered to be significantly changed. The annotation of biological process was done with the DAVID online tool using the comprehensive urinary spectral library- Spec Lib 06 in Table 1). Mass Spectrometry DDA method on QE: The MS was operated in data-dependent TOP10 mode with the following settings: mass range 400-1,000 Th; resolution for MS1 scan 70,000 @ 200 Th; lock mass: 445.120025 Th; resolution for MS2 scan 17,500 @ 200 Th; isolation width 1.6 m/z; NCE 27; underfill ratio 1%; charge state exclusion: unassigned, 1, >6; dynamic exclusion 30 s. Additionally a subset of randomly chosen 23 samples was analyzed on a commercial TripleTOF TM 5600 (AB Sciex, Concord, Canada) using the same LC setup and gradient. The MS was operated in data-dependent TOP50 mode with following settings: MS1 mass range 400-1,000 Th with 250 ms acc. time; MS2 mass range 100-1,700 Th with 50 ms acc. time and following MS2 selection criteria: UNIT resolution, intensity threshold 100 cts; charge states 2-5. Dynamic exclusion set to 17 s.

Overcoming the challenges in Data Independent Acquisition ... · To elucidate how the DIA routine performs compared to a DDA workflow, we analyzed an unrelated sample three times

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Overcoming the challenges in Data Independent Acquisition ... · To elucidate how the DIA routine performs compared to a DDA workflow, we analyzed an unrelated sample three times

Overcoming the Challenges in Data Independent Acquisition (DIA) via High Resolution Accurate Mass Orbitrap Based Mass Spectrometer Yue Xuan,1 Jan Muntel,2 Sebastian T. Berger,2 Andreas FR Huhme,3 Hanno Steen,2

Thomas Moehring1 1Thermo Fisher Scientifi c, Bremen, Germany; 2Departments of Pathology, Boston Children’s Hospital, Boston, MA; 3Thermo Fisher Scientifi c, San Jose, CA

Po

ster No

te 64

40

5

Overcoming the challenges in Data Independent Acquisition (DIA) via high resolution accurate mass Orbitrap based mass spectrometer Yue Xuan1; Jan Muntel2; Sebastian T. Berger2; Andreas FR Huhmer3; Hanno Steen2; Thomas Moehring1 1Thermo Fisher Scientific, Bremen, GERMANY; 2Departments of Pathology, Boston Children’s Hospital, Boston, MA; 3Thermo Fisher Scientific, San Jose, CA

Conclusion We successfully established a DIA workflow, which enables large scope of urine

proteomics studies based on high resolution accurate mass Orbitrap technology.

Even without any prefractionation of the samples, we detected more than 2,500 proteins, which is close to complete coverage of the urinary proteome by our DIA workflow.

The highly reproducible quantification (median CV 7%) enables to skip technical replicates and to focus on the analysis of biological samples.

The high number of quantified proteins per sample (in average 1,300) per single DIA experiment enabled to get insights into the host cell response in an UTI based on changes in the urinary proteome as well into the formation of an ovarian cyst.

We envision that the comprehensiveness and low analysis time per sample will allow the application of this DIA workflow in synchronous biomarker discovery and validation which requires the analysis of hundreds of samples.

References 1. Rosenberger, G. SCIENTIFIC DATA | 1:140031 | DOI: 10.1038/sdata.2014.31

2. Zheng, J. BMC genomics 2013,14, 777

3. Takaya et al., Int J Oncol, 2015

4. Feldman et al., Clin Cancer Res, 2009

Overview Purpose: Utilize Thermo Scientific TM Q ExactiveTM HF mass spectrometer for DIA LC-MS/MS experiments to study the urinary proteome, and demonstrate how high resolution/ accurate mass spectrometry is employed to overcome DIA challenges and enable large scope of urine proteomics studies.

Methods: Urine samples from 87 patients associated with seven different differential diagnoses were collected and trypsinized using a membrane-based processing method in a 96-well plate format. For generation of a spectral library all samples were analyzed by DDA methods. DIA analysis was performed on the Q Exactive HF MS to quantitatively mapping the urinary proteomes.

Results: In this study, we have developed a robust DIA method for the comprehensive mapping of the urinary proteome that enables large scale urine proteomics studies with high throughput ( less than 4 days for 87 biological sample measurements) and excellent reproducibility (media of CV% is 7%).

Introduction The promises of data independent acquisition (DIA) strategies are a comprehensive and reproducible data collection for large-scale quantitative proteomics experiments. However, the wide isolation window (usually >10Da) of DIA experiments co-isolates and co-fragments multiple peptides, resulting in highly complex DIA MS/MS spectra and makes the DIA data analysis challenging. To accurately identify and precisely quantify thousands of proteins per DIA experiment, the completeness and specificity of spectral library, the mass accuracy of the data, and the technical variance in quantitation play important roles. In this work, we utilize the Q Exactive HF mass spectrometer for DIA LC-MS/MS experiments to study the urinary proteome, and demonstrate how high resolution/ accurate mass spectrometry is employed to overcome these DIA challenges and enable large scope of urine proteomics studies..

Methods Sample Preparation

Urine samples were collected from consenting patients visiting the Emergency Department at Boston Children's Hospital. Upon consent, urine samples from 87 patients associated with seven different differential diagnoses were collected (abdominal pain controls: n = 33, ovarian cyst: n = 12, mesenteric adenitis: n = 6, constipation n = 7, urinary tract infection; UTI: n = 11, gastritis: n = 6, gastroenteritis: n = 12) and trypsinized using a membrane-based processing method in a 96-well plate format.

Liquid Chromatography

For the spectral library, all 87 samples were analyzed by a nanoLC system equipped with a LC-chip system (cHiPLC nanoflex, Eksigent, trapping column: Nano cHiPLC Trap column 200 μm x 0.5 mm Reprosil C18, 3 μm, analytical column: Nano cHiPLC column 75 μm x 15 cm Reprosil C18, 3 μm) coupled online to a Q-Exactive mass spectrometer (Thermo Fisher Scientific). Peptides (4 µl of digest) were separated by a linear gradient from 93% buffer A (0.2% FA in water) / 7% buffer B (0.2% FA in ACN) to 75% buffer A / 25% buffer B within 75 min.

DIA experiments on all 87 samples: Each sample was analyzed on a Thermo ScientificTM EASY-nLCTM 1000 nanoLC system equipped with a trapping column (Thermo ScientificTM PepMapTM100, 75um x 2cm, C18, 3um) and an analytical column (PepMapRSLC, 75um x 25cm, C18, 2um) coupled to a Q Exactive HF mass spectrometer equipped with a Thermo Scientific TM EASY-Spray TM nanoelectrospray ion source. Peptides (2 µl of digest) were separated by a linear gradient from 93% buffer A (0.2% FA in water) / 7% buffer B (0.2% FA in ACN) to 75% buffer A / 25% buffer B within 30 min. The total run time with loading and washing steps was 50 min. Column oven was set to 40°C .

Mass Spectrometry

DDA method on QE: The MS was operated in data-dependent TOP10 mode with the following settings: mass range 400-1,000 Th; resolution for MS1 scan 70,000 @ 200 Th; lock mass: 445.120025 Th; resolution for MS2 scan 17,500 @ 200 Th; isolation width 1.6 m/z; NCE 27; underfill ratio 1%; charge state exclusion: unassigned, 1, >6; dynamic exclusion 30 s.

Additionally a subset of randomly chosen 23 samples was analyzed on a commercial TripleTOF TM 5600 (AB Sciex, Concord, Canada) using the same LC setup and gradient. The MS was operated in data-dependent TOP50 mode with following settings: MS1 mass range 400-1,000 Th with 250 ms acc. time; MS2 mass range 100-1,700 Th with 50 ms acc. time and following MS2 selection criteria: UNIT resolution, intensity threshold 100 cts; charge states 2-5. Dynamic exclusion set to 17 s.

FIGURE 1. The established DIA workflow for Urinary Biomarker discovery

FIGURE 2. Influence of Spectral Library: Percentage of peptides and proteins that are detected in all three replicates, in 2 out of 3 replicates, in single replicate.

FIGURE 3. Reproducibility Evaluation with Three Replicates: Technical Reproducibility Of DDA And DIA Quantification Based On Three Replicates (Fig 3A). Number of Identification in Relation to Technical Replicates ( Fig 3B)

Results Selecting the Most Appropriate Spectral Library

To investigate the influence of the spectral library on the data analysis, we generated six different spectral libraries (Table 1). The quality of a spectral library can be assessed by the completeness of the library and the reproducibility of peptide/protein detection.

The comprehensive urinary library compromised 2,600 protein groups and 20,000 peptides (Spec Lib 06 in Table 1). Based on current studies, this library covers the vast majority of the urinary proteome (2).

We calculated the percentage of peptides and proteins that were detected in all three replicates, in 2 out of 3 replicates or in only one single replicate (Figure 2). The results show a higher reproducibility of peptide/protein identification and quantification when using the urinary spectral library 06. This finding underscores the importance of using a sample-specific spectral library .

High Reproducible Peptide Detection and Quantitation with DIA experiments

To elucidate how the DIA routine performs compared to a DDA workflow, we analyzed an unrelated sample three times by a DDA and the DIA method on the Q Exactive HF using the same 30 min gradient. The DIA data were analyzed using the comprehensive urinary library (Spec lib 06, Table 1).

With our optimized DIA workflow (Figure 1), the median CV of the peptide/protein quantification of the DIA data was only 6.7% and 8.1%, which is twofold better than from the DDA data (Figure 3A). The number of detected peptides (~5,220) and proteins (1,120) with a single DIA experiment are almost twofold as the DDA data (Figure 3B). Our DIA method show a highly reproducible and precise quantification on peptide level.

FIGURE 5. Comprehensive Urinary Proteome Coverage by DIA workflow

FIGURE 6. Top Biomarker Discovery by using DIA workflow

The power of Mass Accuracy

The application of wide isolation windows (>10 Th) in typical DIA experiments results in complex MS/MS spectra. During data analysis, the ion chromatograms of multiple fragment ions are extracted and aligned for peptide detection and quantification. To separate the analyte of interest from interferences, a highly accurate mass of the ions is crucial. We applied different mass tolerances for the data analysis (50 ppm, 20 ppm, and 10 ppm). One example is shown in Figure 4. With ± 2.5 mins retention time window and 50 ppm mass tolerance for extracting the multiple transitions of a peptide LVGYLDR, several interferences overlapped with the peptide of interest. Only with 10 ppm mass accuracy, the interferences are removed from the spectrum, therefore, 10 ppm mass accuracy is minimum requirement for an accurate detection of peptides .

TABLE 1. Spectral libraries

Library Name Protein Groups Peptides Fragment ions Instrument Source

Spec Lib 01 2,077 15,470 109,064 Q Exactive MS 23 DDA files

Urine

Spec Lib 02 1,436 9,569 65,880 TripleTOF 5600 23 DDA files

Urine

Spec Lib 03 2,226 16,985 123,087 Combined from Spec Lib 01 and 02

Urine

Spec Lib 04 14,158 149,420 2,832,306 TripleTOF 5600 see Rosenberger et al., 2014 (1)

Spec Lib 05 1,869 40,902 925,156 TripleTOF 5600 subset of Human-14,000

Spec Lib 06 2,575 19,854 144,643 87 Q Exactive MS DDA files + 23 TripleToF 5600

DDA files

Urine

DIA method on QE HF: Each DIA duty cycle contains one full scan and 24DIA MS/MS scans to covering the mass range 400 -1000 TH. Full scan with a resolution 30,000 @ 200 Th; AGC target – 3e6, maximal IT – 50ms; mass range 400-1,000 Th; followed by DIA scans with resolution 30,000 @ 200 Th; isolation width 20 m/z for the first 20 DIA scans, 40 m/z for the following 2 DIA scans, and 60m/z for the last two DIA scans; NCE 30; target value 1e6, maximal injection time set to “auto”, which automatically calculates the maximal injection time based on the detection time to allow the mass spectrometer always operating in the parallel ion filling and detection mode.

FIGURE 4. The ion chromatograms of multiple transitions of peptide LVGYLDR of protein (P07602) are extracted with 50 ppm, 20 ppm, and 10 ppm. Retention time window is ± 2.5 mins. 10 ppm mass accuracy is the minimum requirement for an accurate peptide detection.

Comprehensive Urinary Proteome Coverage by DIA workflow

The establishment of the DIA workflow (Figure 1) was applied to a urinary study compromising 87 samples. The samples were derived from patients with abdominal pain in a pediatric emergency room. After digestion in a 96-well plate format we were able to analyze them in less than 4 days by our optimized DIA workflow. In average, we detected 1,301 protein groups (848 – 1,720) (Figure 5A) and 5,714 peptides per sample (3,172 – 8,231) (Figure 5B). In total, our DIA workflow enabled the detection of 2,456 proteins, representing 95% of the proteins in the spectral library. These numbers are contrasted by the DDA output, in which only 7% of the proteins were identified in more than 95% of the samples and 60% in less than 25% of the samples (Figure 5C). Compared to all other samples, 773 proteins were significantly changed in their amount in the UTI samples (non-parametric Mann-Whitney U test, p<0.05), 502 in the ovarian cyst samples, 209 in the constipation samples, 111 in the mesenteric adenitis samples and 58 in the gastroenteritis samples (Figure 5D).

73

77

71

26

61

69

12 10

13 22 15

13 15 13

16

52

24 18

0

1000

2000

3000

4000

5000

Iden

tifie

d Pe

ptid

es

Peptide Level

3of3 2of3 1of3

80

81

75

22

62

75

8 8 10

18

12 9 12 12

15

60

25 15

0

400

800

1200

1600

Iden

tifie

d Pr

otei

ns

Protein Level

3B

1 2 3 0

1000

2000

3000

4000

5000

6000

7000

Replicate

Iden

tifie

d Pe

ptid

es

Peptide Level 3A

1 2 3 0

200

400

600

800

1000

1200

1400

Replicate

Iden

tifie

d Pr

otei

ns

Protein Level

DDA ID increase DDA DDA - matching ID increase DDA - matching DIA ID increase DIA

6 8 10

Median CV 16.7%

20

0

40

60

80

100

120

140

Protein Level

DDA – Peptide Peak Areas

Log10 Protein Intensity

Coe

ffici

ent O

f Var

ianc

e %

2 4 6 8

Median CV

8.1% 20

0

40

60

80

100

120

140 DIA – Fragment Ion Peak Areas

Log10 Protein Intensity

10 7 9

Median CV 15.7%

20

0

40

60

80

100

120

140

6 8 7 4 6 8

Median CV

6.7% 20

0

40

60

80

100

120

140

3 5

Peptide Level

DIA – Fragment Ion Peak Areas

DDA – Peptide Peak Areas

Log10 Peptide Intensity

Coe

ffici

ent O

f Var

ianc

e %

Log10 Peptide Intensity

Data Analysis

Database search in MaxQuant software, version 1.5.0.0 was performed directly with the RAW- and WIFF files using the human UniProt database. Trypsin with up to 2 missed cleavages; mass tolerances set to 20 ppm for the first search and 4.5 ppm for the second search for the Q-Exactive data and 0.1 Da for the first search and 0.01 for the main search for the TripleTOF 5600 data. Oxidation of M was chosen as dynamic modification (+15.995 Da) and carbamidomethylation of C as static modification (+57.021 Da). FDR was set to 1% on peptide and protein level.

All DIA data were directly analyzed in Spectronaut 7.0 (Biognosys). dynamic score refinement and MS1 scoring – enabled; interference correction and cross run normalization (total peak area) – enabled; All results were filtered by a Q value of 0.01 (equals a FDR of 1% on peptide level).

Protein intensity was calculated by summing the peptide intensities of each protein from the Spectronaut output file. The data were imported into Perseus 1.5.1.6 and missing values imputed by the minimum value for each protein. Significance of protein abundance changes were calculated using the u-test (non-parametric test) and protein with a p value below 0.01 were considered to be significantly changed. The annotation of biological process was done with the DAVID online tool using the comprehensive urinary spectral library- Spec Lib 06 in Table 1).

Biomarker Discovery

To find the best suitable biomarker candidates we focused on the ten highest significantly changed proteins for each of the five conditions. The performance of the biomarker candidates has been assessed by calculating the area under the receiver-operating characteristic (AUROC) (Figure 6). For example, Cystatin-B (CYTB) is an intracellular thiol proteinase inhibitor. It is increased level in Ovarian Cyst (OC) with a pValue equals to 1.3e-5 (Bonferroni-corrected: p=0.027), and AUROC is 0.91. Cystatin-B has been identified as biomarker candidate in the context of malignant growths in ovaries (3) and bladder (4), i.e. genitourinary tract.

50 ppm 20 ppm 10 ppm

─ [y5+] – 623.315 m/z ─ [y6+] – 722.383 m/z ─ [y6+ -H3PO4] – 624.406 m/z ─ [y5++] – 312.161 m/z ─ [y3+ -NH3] – 386.203 m/z ─ [y4+] – 566.293 m/z

RT: 18.93 - 24.07

19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e A

bund

ance

NL: 7.12E5Base Peak m/z= 623.2838-623.3462 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.88E5Base Peak m/z= 722.3469-722.4191 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.24E4Base Peak m/z= 624.3748-624.4372 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.02E7Base Peak m/z= 312.1454-312.1766 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.83E6Base Peak m/z= 386.1837-386.2223 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2647-566.3213 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9

RT: 18.93 - 24.07

19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e A

bund

ance

NL: 7.12E5Base Peak m/z= 623.3025-623.3275 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.30E5Base Peak m/z= 722.3686-722.3974 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 3.25E4Base Peak m/z= 624.3935-624.4185 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 5.85E6Base Peak m/z= 312.1548-312.1672 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.97E5Base Peak m/z= 386.1953-386.2107 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2817-566.3043 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9

RT: 18.93 - 24.07

19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e A

bund

ance

NL: 7.12E5Base Peak m/z= 623.3088-623.3212 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.13E5Base Peak m/z= 722.3758-722.3902 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 3.25E4Base Peak m/z= 624.3998-624.4122 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 7.60E3Base Peak m/z= 312.1579-312.1641 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.21E5Base Peak m/z= 386.1991-386.2069 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2873-566.2987 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9

Spectronaut is a trademark of Biognosys AG. MaxQuant software is a trademark of Max-Planck Institute of Biochemistry. TripleTOF is a trademark of Sciex, Pte. Ltd. All other trademarks are the property of Thermo Fisher Scientific and its subsidiaries. This information is not intended to encourage use of these products in any manners that might infringe the intellectual property rights of others.

0 200 400 600 800

1000 1200 1400 1600 1800

Prot

ein

Grou

ps

Identified Protein Groups

0 1000 2000 3000 4000 5000 6000 7000 8000 9000

Uni

que

Pept

ides

Identified Peptides

0 100 200 300 400 500 600

0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5

Prot

eins

Sample Coverage

DDA DIA Protein Sample Coverage %

0

100

200

300

400

500

600

700

800

Sign

ifica

ntly

Cha

nged

Pro

tein

s (M

ann-

Whi

tney

-U-t

est p

<0.0

5)

Significantly Changed Proteins

5A 5B

5C 5D

Pain Control Group Ovarian Cyst Urinary Tract Infection

Constipation Mesenteric Adenitis Gastroenteritis

Ovarian Cyst (OC) Marker: CYTB UTI Marker: PERM

Constipation (Con) Marker: UROK Mesenteric Adenitis (MA) Marker: LEG3

Gastroenteritis (GE) Marker: KPYR

Ovarian Cyst (AUROC=0.91)

1-Specificity

Sens

itivi

ty

0 0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07

OC

U

TI

Con

M

A

GE

P

ain

CYTB

1e7 1e6

1e3 1e2

1e4 1e5

Mesenteric Adenitis (AUROC=0.872)

1-Specificity

Sens

itivi

ty

0 0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07

OC

U

TI

Con

M

A

GE

P

ain

LEG3

1e7 1e6

1e3 1e2

1e4 1e5

Gastroenteritis (AUROC=0.76)

1-Specificity

Sens

itivi

ty

0 0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

1.E+02

1.E+03

1.E+04

1.E+05

OC

U

TI

Con

M

A

GE

P

ain

KPYR

1e3

1e2

1e4

1e5

Urinary Tract Infection (AUROC=0.968)

1-Specificity

Sens

itivi

ty

0 0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07

OC

U

TI

Con

M

A

GE

P

ain

PERE

1e7 1e6

1e3 1e2

1e4 1e5

Constipation (AUROC=0.928)

1-Specificity

Sens

itivi

ty

0 0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07

OC

U

TI

Con

M

A

GE

P

ain

UROK

1e7 1e6

1e3 1e2

1e4 1e5

Peptides

Comprehensive Urinary Spectral Library

2,575 protein groups (87 Q Exactive MS and

23 TripleToF 5600TM runs)

19,854 unique peptides

144,643 transitions

87 ER patients in sixth groups

Constipation

Mesenteric Adenitis

Gastroenteritis

Pain Control Group

Ovarian Cyst

Urinary Tract Infection

DIA (Q Exactive HF MS)

Digestion

Search against database using

Spectronaut FDR 1%

Urinary Samples (87)

Scoring and Q Value calculation by Spectronaut

Perseus

u-test (non-parametric test)

Validated Peptides/proteins from 87 samples

Proteins

Samples

Num

ber o

f pro

tein

s/pe

ptid

es

Optimized DIA method:

average 8-10 data points per LC peak ( 9s @ FWHM)

average cycle time 2 s

Patient 1

Patient 2

Patient N

KEGG Pathway Enrichment p value

0.001 0.05 1

RT: 20.66 - 21.64

20.7 20.8 20.9 21.0 21.1 21.2 21.3 21.4 21.5 21.6Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Relat

ive A

bund

ance

NL: 2.35E7Base Peak m/z= 506.2709-506.2811 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9

Overcoming the challenges in Data Independent Acquisition (DIA) via high resolution accurate mass Orbitrap based mass spectrometer Yue Xuan1; Jan Muntel2; Sebastian T. Berger2; Andreas FR Huhmer3; Hanno Steen2; Thomas Moehring1 1Thermo Fisher Scientific, Bremen, GERMANY; 2Departments of Pathology, Boston Children’s Hospital, Boston, MA; 3Thermo Fisher Scientific, San Jose, CA

Conclusion We successfully established a DIA workflow, which enables large scope of urine

proteomics studies based on high resolution accurate mass Orbitrap technology.

Even without any prefractionation of the samples, we detected more than 2,500 proteins, which is close to complete coverage of the urinary proteome by our DIA workflow.

The highly reproducible quantification (median CV 7%) enables to skip technical replicates and to focus on the analysis of biological samples.

The high number of quantified proteins per sample (in average 1,300) per single DIA experiment enabled to get insights into the host cell response in an UTI based on changes in the urinary proteome as well into the formation of an ovarian cyst.

We envision that the comprehensiveness and low analysis time per sample will allow the application of this DIA workflow in synchronous biomarker discovery and validation which requires the analysis of hundreds of samples.

References 1. Rosenberger, G. SCIENTIFIC DATA | 1:140031 | DOI: 10.1038/sdata.2014.31

2. Zheng, J. BMC genomics 2013,14, 777

3. Takaya et al., Int J Oncol, 2015

4. Feldman et al., Clin Cancer Res, 2009

Overview Purpose: Utilize Thermo Scientific TM Q ExactiveTM HF mass spectrometer for DIA LC-MS/MS experiments to study the urinary proteome, and demonstrate how high resolution/ accurate mass spectrometry is employed to overcome DIA challenges and enable large scope of urine proteomics studies.

Methods: Urine samples from 87 patients associated with seven different differential diagnoses were collected and trypsinized using a membrane-based processing method in a 96-well plate format. For generation of a spectral library all samples were analyzed by DDA methods. DIA analysis was performed on the Q Exactive HF MS to quantitatively mapping the urinary proteomes.

Results: In this study, we have developed a robust DIA method for the comprehensive mapping of the urinary proteome that enables large scale urine proteomics studies with high throughput ( less than 4 days for 87 biological sample measurements) and excellent reproducibility (media of CV% is 7%).

Introduction The promises of data independent acquisition (DIA) strategies are a comprehensive and reproducible data collection for large-scale quantitative proteomics experiments. However, the wide isolation window (usually >10Da) of DIA experiments co-isolates and co-fragments multiple peptides, resulting in highly complex DIA MS/MS spectra and makes the DIA data analysis challenging. To accurately identify and precisely quantify thousands of proteins per DIA experiment, the completeness and specificity of spectral library, the mass accuracy of the data, and the technical variance in quantitation play important roles. In this work, we utilize the Q Exactive HF mass spectrometer for DIA LC-MS/MS experiments to study the urinary proteome, and demonstrate how high resolution/ accurate mass spectrometry is employed to overcome these DIA challenges and enable large scope of urine proteomics studies..

Methods Sample Preparation

Urine samples were collected from consenting patients visiting the Emergency Department at Boston Children's Hospital. Upon consent, urine samples from 87 patients associated with seven different differential diagnoses were collected (abdominal pain controls: n = 33, ovarian cyst: n = 12, mesenteric adenitis: n = 6, constipation n = 7, urinary tract infection; UTI: n = 11, gastritis: n = 6, gastroenteritis: n = 12) and trypsinized using a membrane-based processing method in a 96-well plate format.

Liquid Chromatography

For the spectral library, all 87 samples were analyzed by a nanoLC system equipped with a LC-chip system (cHiPLC nanoflex, Eksigent, trapping column: Nano cHiPLC Trap column 200 μm x 0.5 mm Reprosil C18, 3 μm, analytical column: Nano cHiPLC column 75 μm x 15 cm Reprosil C18, 3 μm) coupled online to a Q-Exactive mass spectrometer (Thermo Fisher Scientific). Peptides (4 µl of digest) were separated by a linear gradient from 93% buffer A (0.2% FA in water) / 7% buffer B (0.2% FA in ACN) to 75% buffer A / 25% buffer B within 75 min.

DIA experiments on all 87 samples: Each sample was analyzed on a Thermo ScientificTM EASY-nLCTM 1000 nanoLC system equipped with a trapping column (Thermo ScientificTM PepMapTM100, 75um x 2cm, C18, 3um) and an analytical column (PepMapRSLC, 75um x 25cm, C18, 2um) coupled to a Q Exactive HF mass spectrometer equipped with a Thermo Scientific TM EASY-Spray TM nanoelectrospray ion source. Peptides (2 µl of digest) were separated by a linear gradient from 93% buffer A (0.2% FA in water) / 7% buffer B (0.2% FA in ACN) to 75% buffer A / 25% buffer B within 30 min. The total run time with loading and washing steps was 50 min. Column oven was set to 40°C .

Mass Spectrometry

DDA method on QE: The MS was operated in data-dependent TOP10 mode with the following settings: mass range 400-1,000 Th; resolution for MS1 scan 70,000 @ 200 Th; lock mass: 445.120025 Th; resolution for MS2 scan 17,500 @ 200 Th; isolation width 1.6 m/z; NCE 27; underfill ratio 1%; charge state exclusion: unassigned, 1, >6; dynamic exclusion 30 s.

Additionally a subset of randomly chosen 23 samples was analyzed on a commercial TripleTOF TM 5600 (AB Sciex, Concord, Canada) using the same LC setup and gradient. The MS was operated in data-dependent TOP50 mode with following settings: MS1 mass range 400-1,000 Th with 250 ms acc. time; MS2 mass range 100-1,700 Th with 50 ms acc. time and following MS2 selection criteria: UNIT resolution, intensity threshold 100 cts; charge states 2-5. Dynamic exclusion set to 17 s.

FIGURE 1. The established DIA workflow for Urinary Biomarker discovery

FIGURE 2. Influence of Spectral Library: Percentage of peptides and proteins that are detected in all three replicates, in 2 out of 3 replicates, in single replicate.

FIGURE 3. Reproducibility Evaluation with Three Replicates: Technical Reproducibility Of DDA And DIA Quantification Based On Three Replicates (Fig 3A). Number of Identification in Relation to Technical Replicates ( Fig 3B)

Results Selecting the Most Appropriate Spectral Library

To investigate the influence of the spectral library on the data analysis, we generated six different spectral libraries (Table 1). The quality of a spectral library can be assessed by the completeness of the library and the reproducibility of peptide/protein detection.

The comprehensive urinary library compromised 2,600 protein groups and 20,000 peptides (Spec Lib 06 in Table 1). Based on current studies, this library covers the vast majority of the urinary proteome (2).

We calculated the percentage of peptides and proteins that were detected in all three replicates, in 2 out of 3 replicates or in only one single replicate (Figure 2). The results show a higher reproducibility of peptide/protein identification and quantification when using the urinary spectral library 06. This finding underscores the importance of using a sample-specific spectral library .

High Reproducible Peptide Detection and Quantitation with DIA experiments

To elucidate how the DIA routine performs compared to a DDA workflow, we analyzed an unrelated sample three times by a DDA and the DIA method on the Q Exactive HF using the same 30 min gradient. The DIA data were analyzed using the comprehensive urinary library (Spec lib 06, Table 1).

With our optimized DIA workflow (Figure 1), the median CV of the peptide/protein quantification of the DIA data was only 6.7% and 8.1%, which is twofold better than from the DDA data (Figure 3A). The number of detected peptides (~5,220) and proteins (1,120) with a single DIA experiment are almost twofold as the DDA data (Figure 3B). Our DIA method show a highly reproducible and precise quantification on peptide level.

FIGURE 5. Comprehensive Urinary Proteome Coverage by DIA workflow

FIGURE 6. Top Biomarker Discovery by using DIA workflow

The power of Mass Accuracy

The application of wide isolation windows (>10 Th) in typical DIA experiments results in complex MS/MS spectra. During data analysis, the ion chromatograms of multiple fragment ions are extracted and aligned for peptide detection and quantification. To separate the analyte of interest from interferences, a highly accurate mass of the ions is crucial. We applied different mass tolerances for the data analysis (50 ppm, 20 ppm, and 10 ppm). One example is shown in Figure 4. With ± 2.5 mins retention time window and 50 ppm mass tolerance for extracting the multiple transitions of a peptide LVGYLDR, several interferences overlapped with the peptide of interest. Only with 10 ppm mass accuracy, the interferences are removed from the spectrum, therefore, 10 ppm mass accuracy is minimum requirement for an accurate detection of peptides .

TABLE 1. Spectral libraries

Library Name Protein Groups Peptides Fragment ions Instrument Source

Spec Lib 01 2,077 15,470 109,064 Q Exactive MS 23 DDA files

Urine

Spec Lib 02 1,436 9,569 65,880 TripleTOF 5600 23 DDA files

Urine

Spec Lib 03 2,226 16,985 123,087 Combined from Spec Lib 01 and 02

Urine

Spec Lib 04 14,158 149,420 2,832,306 TripleTOF 5600 see Rosenberger et al., 2014 (1)

Spec Lib 05 1,869 40,902 925,156 TripleTOF 5600 subset of Human-14,000

Spec Lib 06 2,575 19,854 144,643 87 Q Exactive MS DDA files + 23 TripleToF 5600

DDA files

Urine

DIA method on QE HF: Each DIA duty cycle contains one full scan and 24DIA MS/MS scans to covering the mass range 400 -1000 TH. Full scan with a resolution 30,000 @ 200 Th; AGC target – 3e6, maximal IT – 50ms; mass range 400-1,000 Th; followed by DIA scans with resolution 30,000 @ 200 Th; isolation width 20 m/z for the first 20 DIA scans, 40 m/z for the following 2 DIA scans, and 60m/z for the last two DIA scans; NCE 30; target value 1e6, maximal injection time set to “auto”, which automatically calculates the maximal injection time based on the detection time to allow the mass spectrometer always operating in the parallel ion filling and detection mode.

FIGURE 4. The ion chromatograms of multiple transitions of peptide LVGYLDR of protein (P07602) are extracted with 50 ppm, 20 ppm, and 10 ppm. Retention time window is ± 2.5 mins. 10 ppm mass accuracy is the minimum requirement for an accurate peptide detection.

Comprehensive Urinary Proteome Coverage by DIA workflow

The establishment of the DIA workflow (Figure 1) was applied to a urinary study compromising 87 samples. The samples were derived from patients with abdominal pain in a pediatric emergency room. After digestion in a 96-well plate format we were able to analyze them in less than 4 days by our optimized DIA workflow. In average, we detected 1,301 protein groups (848 – 1,720) (Figure 5A) and 5,714 peptides per sample (3,172 – 8,231) (Figure 5B). In total, our DIA workflow enabled the detection of 2,456 proteins, representing 95% of the proteins in the spectral library. These numbers are contrasted by the DDA output, in which only 7% of the proteins were identified in more than 95% of the samples and 60% in less than 25% of the samples (Figure 5C). Compared to all other samples, 773 proteins were significantly changed in their amount in the UTI samples (non-parametric Mann-Whitney U test, p<0.05), 502 in the ovarian cyst samples, 209 in the constipation samples, 111 in the mesenteric adenitis samples and 58 in the gastroenteritis samples (Figure 5D).

73

77

71

26

61

69

12 10

13 22 15

13 15 13

16

52

24 18

0

1000

2000

3000

4000

5000

Iden

tifie

d Pe

ptid

es

Peptide Level

3of3 2of3 1of3

80

81

75

22

62

75

8 8 10

18

12 9 12 12

15

60

25 15

0

400

800

1200

1600

Iden

tifie

d Pr

otei

ns

Protein Level

3B

1 2 3 0

1000

2000

3000

4000

5000

6000

7000

Replicate

Iden

tifie

d Pe

ptid

es

Peptide Level 3A

1 2 3 0

200

400

600

800

1000

1200

1400

Replicate

Iden

tifie

d Pr

otei

ns

Protein Level

DDA ID increase DDA DDA - matching ID increase DDA - matching DIA ID increase DIA

6 8 10

Median CV 16.7%

20

0

40

60

80

100

120

140

Protein Level

DDA – Peptide Peak Areas

Log10 Protein Intensity

Coe

ffici

ent O

f Var

ianc

e %

2 4 6 8

Median CV

8.1% 20

0

40

60

80

100

120

140 DIA – Fragment Ion Peak Areas

Log10 Protein Intensity

10 7 9

Median CV 15.7%

20

0

40

60

80

100

120

140

6 8 7 4 6 8

Median CV

6.7% 20

0

40

60

80

100

120

140

3 5

Peptide Level

DIA – Fragment Ion Peak Areas

DDA – Peptide Peak Areas

Log10 Peptide Intensity

Coe

ffici

ent O

f Var

ianc

e %

Log10 Peptide Intensity

Data Analysis

Database search in MaxQuant software, version 1.5.0.0 was performed directly with the RAW- and WIFF files using the human UniProt database. Trypsin with up to 2 missed cleavages; mass tolerances set to 20 ppm for the first search and 4.5 ppm for the second search for the Q-Exactive data and 0.1 Da for the first search and 0.01 for the main search for the TripleTOF 5600 data. Oxidation of M was chosen as dynamic modification (+15.995 Da) and carbamidomethylation of C as static modification (+57.021 Da). FDR was set to 1% on peptide and protein level.

All DIA data were directly analyzed in Spectronaut 7.0 (Biognosys). dynamic score refinement and MS1 scoring – enabled; interference correction and cross run normalization (total peak area) – enabled; All results were filtered by a Q value of 0.01 (equals a FDR of 1% on peptide level).

Protein intensity was calculated by summing the peptide intensities of each protein from the Spectronaut output file. The data were imported into Perseus 1.5.1.6 and missing values imputed by the minimum value for each protein. Significance of protein abundance changes were calculated using the u-test (non-parametric test) and protein with a p value below 0.01 were considered to be significantly changed. The annotation of biological process was done with the DAVID online tool using the comprehensive urinary spectral library- Spec Lib 06 in Table 1).

Biomarker Discovery

To find the best suitable biomarker candidates we focused on the ten highest significantly changed proteins for each of the five conditions. The performance of the biomarker candidates has been assessed by calculating the area under the receiver-operating characteristic (AUROC) (Figure 6). For example, Cystatin-B (CYTB) is an intracellular thiol proteinase inhibitor. It is increased level in Ovarian Cyst (OC) with a pValue equals to 1.3e-5 (Bonferroni-corrected: p=0.027), and AUROC is 0.91. Cystatin-B has been identified as biomarker candidate in the context of malignant growths in ovaries (3) and bladder (4), i.e. genitourinary tract.

50 ppm 20 ppm 10 ppm

─ [y5+] – 623.315 m/z ─ [y6+] – 722.383 m/z ─ [y6+ -H3PO4] – 624.406 m/z ─ [y5++] – 312.161 m/z ─ [y3+ -NH3] – 386.203 m/z ─ [y4+] – 566.293 m/z

RT: 18.93 - 24.07

19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e A

bund

ance

NL: 7.12E5Base Peak m/z= 623.2838-623.3462 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.88E5Base Peak m/z= 722.3469-722.4191 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.24E4Base Peak m/z= 624.3748-624.4372 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.02E7Base Peak m/z= 312.1454-312.1766 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.83E6Base Peak m/z= 386.1837-386.2223 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2647-566.3213 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9

RT: 18.93 - 24.07

19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e A

bund

ance

NL: 7.12E5Base Peak m/z= 623.3025-623.3275 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.30E5Base Peak m/z= 722.3686-722.3974 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 3.25E4Base Peak m/z= 624.3935-624.4185 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 5.85E6Base Peak m/z= 312.1548-312.1672 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.97E5Base Peak m/z= 386.1953-386.2107 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2817-566.3043 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9

RT: 18.93 - 24.07

19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e A

bund

ance

NL: 7.12E5Base Peak m/z= 623.3088-623.3212 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.13E5Base Peak m/z= 722.3758-722.3902 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 3.25E4Base Peak m/z= 624.3998-624.4122 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 7.60E3Base Peak m/z= 312.1579-312.1641 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.21E5Base Peak m/z= 386.1991-386.2069 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2873-566.2987 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9

Spectronaut is a trademark of Biognosys AG. MaxQuant software is a trademark of Max-Planck Institute of Biochemistry. TripleTOF is a trademark of Sciex, Pte. Ltd. All other trademarks are the property of Thermo Fisher Scientific and its subsidiaries. This information is not intended to encourage use of these products in any manners that might infringe the intellectual property rights of others.

0 200 400 600 800

1000 1200 1400 1600 1800

Prot

ein

Grou

ps

Identified Protein Groups

0 1000 2000 3000 4000 5000 6000 7000 8000 9000

Uni

que

Pept

ides

Identified Peptides

0 100 200 300 400 500 600

0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5

Prot

eins

Sample Coverage

DDA DIA Protein Sample Coverage %

0

100

200

300

400

500

600

700

800

Sign

ifica

ntly

Cha

nged

Pro

tein

s (M

ann-

Whi

tney

-U-t

est p

<0.0

5)

Significantly Changed Proteins

5A 5B

5C 5D

Pain Control Group Ovarian Cyst Urinary Tract Infection

Constipation Mesenteric Adenitis Gastroenteritis

Ovarian Cyst (OC) Marker: CYTB UTI Marker: PERM

Constipation (Con) Marker: UROK Mesenteric Adenitis (MA) Marker: LEG3

Gastroenteritis (GE) Marker: KPYR

Ovarian Cyst (AUROC=0.91)

1-Specificity

Sens

itivi

ty

0 0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07

OC

U

TI

Con

M

A

GE

P

ain

CYTB

1e7 1e6

1e3 1e2

1e4 1e5

Mesenteric Adenitis (AUROC=0.872)

1-Specificity

Sens

itivi

ty

0 0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07

OC

U

TI

Con

M

A

GE

P

ain

LEG3

1e7 1e6

1e3 1e2

1e4 1e5

Gastroenteritis (AUROC=0.76)

1-Specificity

Sens

itivi

ty

0 0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

1.E+02

1.E+03

1.E+04

1.E+05

OC

U

TI

Con

M

A

GE

P

ain

KPYR

1e3

1e2

1e4

1e5

Urinary Tract Infection (AUROC=0.968)

1-Specificity

Sens

itivi

ty

0 0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07

OC

U

TI

Con

M

A

GE

P

ain

PERE

1e7 1e6

1e3 1e2

1e4 1e5

Constipation (AUROC=0.928)

1-Specificity

Sens

itivi

ty

0 0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07

OC

U

TI

Con

M

A

GE

P

ain

UROK

1e7 1e6

1e3 1e2

1e4 1e5

Peptides

Comprehensive Urinary Spectral Library

2,575 protein groups (87 Q Exactive MS and

23 TripleToF 5600TM runs)

19,854 unique peptides

144,643 transitions

87 ER patients in sixth groups

Constipation

Mesenteric Adenitis

Gastroenteritis

Pain Control Group

Ovarian Cyst

Urinary Tract Infection

DIA (Q Exactive HF MS)

Digestion

Search against database using

Spectronaut FDR 1%

Urinary Samples (87)

Scoring and Q Value calculation by Spectronaut

Perseus

u-test (non-parametric test)

Validated Peptides/proteins from 87 samples

Proteins

Samples

Num

ber o

f pro

tein

s/pe

ptid

es

Optimized DIA method:

average 8-10 data points per LC peak ( 9s @ FWHM)

average cycle time 2 s

Patient 1

Patient 2

Patient N

KEGG Pathway Enrichment p value

0.001 0.05 1

RT: 20.66 - 21.64

20.7 20.8 20.9 21.0 21.1 21.2 21.3 21.4 21.5 21.6Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Relat

ive A

bund

ance

NL: 2.35E7Base Peak m/z= 506.2709-506.2811 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9

Overcoming the challenges in Data Independent Acquisition (DIA) via high resolution accurate mass Orbitrap based mass spectrometer Yue Xuan1; Jan Muntel2; Sebastian T. Berger2; Andreas FR Huhmer3; Hanno Steen2; Thomas Moehring1 1Thermo Fisher Scientific, Bremen, GERMANY; 2Departments of Pathology, Boston Children’s Hospital, Boston, MA; 3Thermo Fisher Scientific, San Jose, CA

Conclusion We successfully established a DIA workflow, which enables large scope of urine

proteomics studies based on high resolution accurate mass Orbitrap technology.

Even without any prefractionation of the samples, we detected more than 2,500 proteins, which is close to complete coverage of the urinary proteome by our DIA workflow.

The highly reproducible quantification (median CV 7%) enables to skip technical replicates and to focus on the analysis of biological samples.

The high number of quantified proteins per sample (in average 1,300) per single DIA experiment enabled to get insights into the host cell response in an UTI based on changes in the urinary proteome as well into the formation of an ovarian cyst.

We envision that the comprehensiveness and low analysis time per sample will allow the application of this DIA workflow in synchronous biomarker discovery and validation which requires the analysis of hundreds of samples.

References 1. Rosenberger, G. SCIENTIFIC DATA | 1:140031 | DOI: 10.1038/sdata.2014.31

2. Zheng, J. BMC genomics 2013,14, 777

3. Takaya et al., Int J Oncol, 2015

4. Feldman et al., Clin Cancer Res, 2009

Overview Purpose: Utilize Thermo Scientific TM Q ExactiveTM HF mass spectrometer for DIA LC-MS/MS experiments to study the urinary proteome, and demonstrate how high resolution/ accurate mass spectrometry is employed to overcome DIA challenges and enable large scope of urine proteomics studies.

Methods: Urine samples from 87 patients associated with seven different differential diagnoses were collected and trypsinized using a membrane-based processing method in a 96-well plate format. For generation of a spectral library all samples were analyzed by DDA methods. DIA analysis was performed on the Q Exactive HF MS to quantitatively mapping the urinary proteomes.

Results: In this study, we have developed a robust DIA method for the comprehensive mapping of the urinary proteome that enables large scale urine proteomics studies with high throughput ( less than 4 days for 87 biological sample measurements) and excellent reproducibility (media of CV% is 7%).

Introduction The promises of data independent acquisition (DIA) strategies are a comprehensive and reproducible data collection for large-scale quantitative proteomics experiments. However, the wide isolation window (usually >10Da) of DIA experiments co-isolates and co-fragments multiple peptides, resulting in highly complex DIA MS/MS spectra and makes the DIA data analysis challenging. To accurately identify and precisely quantify thousands of proteins per DIA experiment, the completeness and specificity of spectral library, the mass accuracy of the data, and the technical variance in quantitation play important roles. In this work, we utilize the Q Exactive HF mass spectrometer for DIA LC-MS/MS experiments to study the urinary proteome, and demonstrate how high resolution/ accurate mass spectrometry is employed to overcome these DIA challenges and enable large scope of urine proteomics studies..

Methods Sample Preparation

Urine samples were collected from consenting patients visiting the Emergency Department at Boston Children's Hospital. Upon consent, urine samples from 87 patients associated with seven different differential diagnoses were collected (abdominal pain controls: n = 33, ovarian cyst: n = 12, mesenteric adenitis: n = 6, constipation n = 7, urinary tract infection; UTI: n = 11, gastritis: n = 6, gastroenteritis: n = 12) and trypsinized using a membrane-based processing method in a 96-well plate format.

Liquid Chromatography

For the spectral library, all 87 samples were analyzed by a nanoLC system equipped with a LC-chip system (cHiPLC nanoflex, Eksigent, trapping column: Nano cHiPLC Trap column 200 μm x 0.5 mm Reprosil C18, 3 μm, analytical column: Nano cHiPLC column 75 μm x 15 cm Reprosil C18, 3 μm) coupled online to a Q-Exactive mass spectrometer (Thermo Fisher Scientific). Peptides (4 µl of digest) were separated by a linear gradient from 93% buffer A (0.2% FA in water) / 7% buffer B (0.2% FA in ACN) to 75% buffer A / 25% buffer B within 75 min.

DIA experiments on all 87 samples: Each sample was analyzed on a Thermo ScientificTM EASY-nLCTM 1000 nanoLC system equipped with a trapping column (Thermo ScientificTM PepMapTM100, 75um x 2cm, C18, 3um) and an analytical column (PepMapRSLC, 75um x 25cm, C18, 2um) coupled to a Q Exactive HF mass spectrometer equipped with a Thermo Scientific TM EASY-Spray TM nanoelectrospray ion source. Peptides (2 µl of digest) were separated by a linear gradient from 93% buffer A (0.2% FA in water) / 7% buffer B (0.2% FA in ACN) to 75% buffer A / 25% buffer B within 30 min. The total run time with loading and washing steps was 50 min. Column oven was set to 40°C .

Mass Spectrometry

DDA method on QE: The MS was operated in data-dependent TOP10 mode with the following settings: mass range 400-1,000 Th; resolution for MS1 scan 70,000 @ 200 Th; lock mass: 445.120025 Th; resolution for MS2 scan 17,500 @ 200 Th; isolation width 1.6 m/z; NCE 27; underfill ratio 1%; charge state exclusion: unassigned, 1, >6; dynamic exclusion 30 s.

Additionally a subset of randomly chosen 23 samples was analyzed on a commercial TripleTOF TM 5600 (AB Sciex, Concord, Canada) using the same LC setup and gradient. The MS was operated in data-dependent TOP50 mode with following settings: MS1 mass range 400-1,000 Th with 250 ms acc. time; MS2 mass range 100-1,700 Th with 50 ms acc. time and following MS2 selection criteria: UNIT resolution, intensity threshold 100 cts; charge states 2-5. Dynamic exclusion set to 17 s.

FIGURE 1. The established DIA workflow for Urinary Biomarker discovery

FIGURE 2. Influence of Spectral Library: Percentage of peptides and proteins that are detected in all three replicates, in 2 out of 3 replicates, in single replicate.

FIGURE 3. Reproducibility Evaluation with Three Replicates: Technical Reproducibility Of DDA And DIA Quantification Based On Three Replicates (Fig 3A). Number of Identification in Relation to Technical Replicates ( Fig 3B)

Results Selecting the Most Appropriate Spectral Library

To investigate the influence of the spectral library on the data analysis, we generated six different spectral libraries (Table 1). The quality of a spectral library can be assessed by the completeness of the library and the reproducibility of peptide/protein detection.

The comprehensive urinary library compromised 2,600 protein groups and 20,000 peptides (Spec Lib 06 in Table 1). Based on current studies, this library covers the vast majority of the urinary proteome (2).

We calculated the percentage of peptides and proteins that were detected in all three replicates, in 2 out of 3 replicates or in only one single replicate (Figure 2). The results show a higher reproducibility of peptide/protein identification and quantification when using the urinary spectral library 06. This finding underscores the importance of using a sample-specific spectral library .

High Reproducible Peptide Detection and Quantitation with DIA experiments

To elucidate how the DIA routine performs compared to a DDA workflow, we analyzed an unrelated sample three times by a DDA and the DIA method on the Q Exactive HF using the same 30 min gradient. The DIA data were analyzed using the comprehensive urinary library (Spec lib 06, Table 1).

With our optimized DIA workflow (Figure 1), the median CV of the peptide/protein quantification of the DIA data was only 6.7% and 8.1%, which is twofold better than from the DDA data (Figure 3A). The number of detected peptides (~5,220) and proteins (1,120) with a single DIA experiment are almost twofold as the DDA data (Figure 3B). Our DIA method show a highly reproducible and precise quantification on peptide level.

FIGURE 5. Comprehensive Urinary Proteome Coverage by DIA workflow

FIGURE 6. Top Biomarker Discovery by using DIA workflow

The power of Mass Accuracy

The application of wide isolation windows (>10 Th) in typical DIA experiments results in complex MS/MS spectra. During data analysis, the ion chromatograms of multiple fragment ions are extracted and aligned for peptide detection and quantification. To separate the analyte of interest from interferences, a highly accurate mass of the ions is crucial. We applied different mass tolerances for the data analysis (50 ppm, 20 ppm, and 10 ppm). One example is shown in Figure 4. With ± 2.5 mins retention time window and 50 ppm mass tolerance for extracting the multiple transitions of a peptide LVGYLDR, several interferences overlapped with the peptide of interest. Only with 10 ppm mass accuracy, the interferences are removed from the spectrum, therefore, 10 ppm mass accuracy is minimum requirement for an accurate detection of peptides .

TABLE 1. Spectral libraries

Library Name Protein Groups Peptides Fragment ions Instrument Source

Spec Lib 01 2,077 15,470 109,064 Q Exactive MS 23 DDA files

Urine

Spec Lib 02 1,436 9,569 65,880 TripleTOF 5600 23 DDA files

Urine

Spec Lib 03 2,226 16,985 123,087 Combined from Spec Lib 01 and 02

Urine

Spec Lib 04 14,158 149,420 2,832,306 TripleTOF 5600 see Rosenberger et al., 2014 (1)

Spec Lib 05 1,869 40,902 925,156 TripleTOF 5600 subset of Human-14,000

Spec Lib 06 2,575 19,854 144,643 87 Q Exactive MS DDA files + 23 TripleToF 5600

DDA files

Urine

DIA method on QE HF: Each DIA duty cycle contains one full scan and 24DIA MS/MS scans to covering the mass range 400 -1000 TH. Full scan with a resolution 30,000 @ 200 Th; AGC target – 3e6, maximal IT – 50ms; mass range 400-1,000 Th; followed by DIA scans with resolution 30,000 @ 200 Th; isolation width 20 m/z for the first 20 DIA scans, 40 m/z for the following 2 DIA scans, and 60m/z for the last two DIA scans; NCE 30; target value 1e6, maximal injection time set to “auto”, which automatically calculates the maximal injection time based on the detection time to allow the mass spectrometer always operating in the parallel ion filling and detection mode.

FIGURE 4. The ion chromatograms of multiple transitions of peptide LVGYLDR of protein (P07602) are extracted with 50 ppm, 20 ppm, and 10 ppm. Retention time window is ± 2.5 mins. 10 ppm mass accuracy is the minimum requirement for an accurate peptide detection.

Comprehensive Urinary Proteome Coverage by DIA workflow

The establishment of the DIA workflow (Figure 1) was applied to a urinary study compromising 87 samples. The samples were derived from patients with abdominal pain in a pediatric emergency room. After digestion in a 96-well plate format we were able to analyze them in less than 4 days by our optimized DIA workflow. In average, we detected 1,301 protein groups (848 – 1,720) (Figure 5A) and 5,714 peptides per sample (3,172 – 8,231) (Figure 5B). In total, our DIA workflow enabled the detection of 2,456 proteins, representing 95% of the proteins in the spectral library. These numbers are contrasted by the DDA output, in which only 7% of the proteins were identified in more than 95% of the samples and 60% in less than 25% of the samples (Figure 5C). Compared to all other samples, 773 proteins were significantly changed in their amount in the UTI samples (non-parametric Mann-Whitney U test, p<0.05), 502 in the ovarian cyst samples, 209 in the constipation samples, 111 in the mesenteric adenitis samples and 58 in the gastroenteritis samples (Figure 5D).

73

77

71

26

61

69

12 10

13 22 15

13 15 13

16

52

24 18

0

1000

2000

3000

4000

5000

Iden

tifie

d Pe

ptid

es

Peptide Level

3of3 2of3 1of3

80

81

75

22

62

75

8 8 10

18

12 9 12 12

15

60

25 15

0

400

800

1200

1600

Iden

tifie

d Pr

otei

ns

Protein Level

3B

1 2 3 0

1000

2000

3000

4000

5000

6000

7000

Replicate

Iden

tifie

d Pe

ptid

es

Peptide Level 3A

1 2 3 0

200

400

600

800

1000

1200

1400

Replicate

Iden

tifie

d Pr

otei

ns

Protein Level

DDA ID increase DDA DDA - matching ID increase DDA - matching DIA ID increase DIA

6 8 10

Median CV 16.7%

20

0

40

60

80

100

120

140

Protein Level

DDA – Peptide Peak Areas

Log10 Protein Intensity

Coe

ffici

ent O

f Var

ianc

e %

2 4 6 8

Median CV

8.1% 20

0

40

60

80

100

120

140 DIA – Fragment Ion Peak Areas

Log10 Protein Intensity

10 7 9

Median CV 15.7%

20

0

40

60

80

100

120

140

6 8 7 4 6 8

Median CV

6.7% 20

0

40

60

80

100

120

140

3 5

Peptide Level

DIA – Fragment Ion Peak Areas

DDA – Peptide Peak Areas

Log10 Peptide Intensity

Coe

ffici

ent O

f Var

ianc

e %

Log10 Peptide Intensity

Data Analysis

Database search in MaxQuant software, version 1.5.0.0 was performed directly with the RAW- and WIFF files using the human UniProt database. Trypsin with up to 2 missed cleavages; mass tolerances set to 20 ppm for the first search and 4.5 ppm for the second search for the Q-Exactive data and 0.1 Da for the first search and 0.01 for the main search for the TripleTOF 5600 data. Oxidation of M was chosen as dynamic modification (+15.995 Da) and carbamidomethylation of C as static modification (+57.021 Da). FDR was set to 1% on peptide and protein level.

All DIA data were directly analyzed in Spectronaut 7.0 (Biognosys). dynamic score refinement and MS1 scoring – enabled; interference correction and cross run normalization (total peak area) – enabled; All results were filtered by a Q value of 0.01 (equals a FDR of 1% on peptide level).

Protein intensity was calculated by summing the peptide intensities of each protein from the Spectronaut output file. The data were imported into Perseus 1.5.1.6 and missing values imputed by the minimum value for each protein. Significance of protein abundance changes were calculated using the u-test (non-parametric test) and protein with a p value below 0.01 were considered to be significantly changed. The annotation of biological process was done with the DAVID online tool using the comprehensive urinary spectral library- Spec Lib 06 in Table 1).

Biomarker Discovery

To find the best suitable biomarker candidates we focused on the ten highest significantly changed proteins for each of the five conditions. The performance of the biomarker candidates has been assessed by calculating the area under the receiver-operating characteristic (AUROC) (Figure 6). For example, Cystatin-B (CYTB) is an intracellular thiol proteinase inhibitor. It is increased level in Ovarian Cyst (OC) with a pValue equals to 1.3e-5 (Bonferroni-corrected: p=0.027), and AUROC is 0.91. Cystatin-B has been identified as biomarker candidate in the context of malignant growths in ovaries (3) and bladder (4), i.e. genitourinary tract.

50 ppm 20 ppm 10 ppm

─ [y5+] – 623.315 m/z ─ [y6+] – 722.383 m/z ─ [y6+ -H3PO4] – 624.406 m/z ─ [y5++] – 312.161 m/z ─ [y3+ -NH3] – 386.203 m/z ─ [y4+] – 566.293 m/z

RT: 18.93 - 24.07

19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e A

bund

ance

NL: 7.12E5Base Peak m/z= 623.2838-623.3462 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.88E5Base Peak m/z= 722.3469-722.4191 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.24E4Base Peak m/z= 624.3748-624.4372 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.02E7Base Peak m/z= 312.1454-312.1766 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.83E6Base Peak m/z= 386.1837-386.2223 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2647-566.3213 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9

RT: 18.93 - 24.07

19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e A

bund

ance

NL: 7.12E5Base Peak m/z= 623.3025-623.3275 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.30E5Base Peak m/z= 722.3686-722.3974 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 3.25E4Base Peak m/z= 624.3935-624.4185 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 5.85E6Base Peak m/z= 312.1548-312.1672 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.97E5Base Peak m/z= 386.1953-386.2107 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2817-566.3043 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9

RT: 18.93 - 24.07

19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e A

bund

ance

NL: 7.12E5Base Peak m/z= 623.3088-623.3212 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.13E5Base Peak m/z= 722.3758-722.3902 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 3.25E4Base Peak m/z= 624.3998-624.4122 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 7.60E3Base Peak m/z= 312.1579-312.1641 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.21E5Base Peak m/z= 386.1991-386.2069 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2873-566.2987 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9

Spectronaut is a trademark of Biognosys AG. MaxQuant software is a trademark of Max-Planck Institute of Biochemistry. TripleTOF is a trademark of Sciex, Pte. Ltd. All other trademarks are the property of Thermo Fisher Scientific and its subsidiaries. This information is not intended to encourage use of these products in any manners that might infringe the intellectual property rights of others.

0 200 400 600 800

1000 1200 1400 1600 1800

Prot

ein

Grou

ps

Identified Protein Groups

0 1000 2000 3000 4000 5000 6000 7000 8000 9000

Uni

que

Pept

ides

Identified Peptides

0 100 200 300 400 500 600

0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5

Prot

eins

Sample Coverage

DDA DIA Protein Sample Coverage %

0

100

200

300

400

500

600

700

800

Sign

ifica

ntly

Cha

nged

Pro

tein

s (M

ann-

Whi

tney

-U-t

est p

<0.0

5)

Significantly Changed Proteins

5A 5B

5C 5D

Pain Control Group Ovarian Cyst Urinary Tract Infection

Constipation Mesenteric Adenitis Gastroenteritis

Ovarian Cyst (OC) Marker: CYTB UTI Marker: PERM

Constipation (Con) Marker: UROK Mesenteric Adenitis (MA) Marker: LEG3

Gastroenteritis (GE) Marker: KPYR

Ovarian Cyst (AUROC=0.91)

1-Specificity

Sens

itivi

ty

0 0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07

OC

U

TI

Con

M

A

GE

P

ain

CYTB

1e7 1e6

1e3 1e2

1e4 1e5

Mesenteric Adenitis (AUROC=0.872)

1-Specificity

Sens

itivi

ty

0 0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07

OC

U

TI

Con

M

A

GE

P

ain

LEG3

1e7 1e6

1e3 1e2

1e4 1e5

Gastroenteritis (AUROC=0.76)

1-Specificity

Sens

itivi

ty

0 0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

1.E+02

1.E+03

1.E+04

1.E+05

OC

U

TI

Con

M

A

GE

P

ain

KPYR

1e3

1e2

1e4

1e5

Urinary Tract Infection (AUROC=0.968)

1-Specificity

Sens

itivi

ty

0 0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07

OC

U

TI

Con

M

A

GE

P

ain

PERE

1e7 1e6

1e3 1e2

1e4 1e5

Constipation (AUROC=0.928)

1-Specificity

Sens

itivi

ty

0 0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07

OC

U

TI

Con

M

A

GE

P

ain

UROK

1e7 1e6

1e3 1e2

1e4 1e5

Peptides

Comprehensive Urinary Spectral Library

2,575 protein groups (87 Q Exactive MS and

23 TripleToF 5600TM runs)

19,854 unique peptides

144,643 transitions

87 ER patients in sixth groups

Constipation

Mesenteric Adenitis

Gastroenteritis

Pain Control Group

Ovarian Cyst

Urinary Tract Infection

DIA (Q Exactive HF MS)

Digestion

Search against database using

Spectronaut FDR 1%

Urinary Samples (87)

Scoring and Q Value calculation by Spectronaut

Perseus

u-test (non-parametric test)

Validated Peptides/proteins from 87 samples

Proteins

Samples

Num

ber o

f pro

tein

s/pe

ptid

es

Optimized DIA method:

average 8-10 data points per LC peak ( 9s @ FWHM)

average cycle time 2 s

Patient 1

Patient 2

Patient N

KEGG Pathway Enrichment p value

0.001 0.05 1

RT: 20.66 - 21.64

20.7 20.8 20.9 21.0 21.1 21.2 21.3 21.4 21.5 21.6Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Relat

ive A

bund

ance

NL: 2.35E7Base Peak m/z= 506.2709-506.2811 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9

Page 2: Overcoming the challenges in Data Independent Acquisition ... · To elucidate how the DIA routine performs compared to a DDA workflow, we analyzed an unrelated sample three times

2 Overcoming the Challenges in Data Independent Acquisition (DIA) via High Resolution Accurate Mass Orbitrap Based Mass Spectrometer

Overcoming the challenges in Data Independent Acquisition (DIA) via high resolution accurate mass Orbitrap based mass spectrometer Yue Xuan1; Jan Muntel2; Sebastian T. Berger2; Andreas FR Huhmer3; Hanno Steen2; Thomas Moehring1 1Thermo Fisher Scientific, Bremen, GERMANY; 2Departments of Pathology, Boston Children’s Hospital, Boston, MA; 3Thermo Fisher Scientific, San Jose, CA

Conclusion We successfully established a DIA workflow, which enables large scope of urine

proteomics studies based on high resolution accurate mass Orbitrap technology.

Even without any prefractionation of the samples, we detected more than 2,500 proteins, which is close to complete coverage of the urinary proteome by our DIA workflow.

The highly reproducible quantification (median CV 7%) enables to skip technical replicates and to focus on the analysis of biological samples.

The high number of quantified proteins per sample (in average 1,300) per single DIA experiment enabled to get insights into the host cell response in an UTI based on changes in the urinary proteome as well into the formation of an ovarian cyst.

We envision that the comprehensiveness and low analysis time per sample will allow the application of this DIA workflow in synchronous biomarker discovery and validation which requires the analysis of hundreds of samples.

References 1. Rosenberger, G. SCIENTIFIC DATA | 1:140031 | DOI: 10.1038/sdata.2014.31

2. Zheng, J. BMC genomics 2013,14, 777

3. Takaya et al., Int J Oncol, 2015

4. Feldman et al., Clin Cancer Res, 2009

Overview Purpose: Utilize Thermo Scientific TM Q ExactiveTM HF mass spectrometer for DIA LC-MS/MS experiments to study the urinary proteome, and demonstrate how high resolution/ accurate mass spectrometry is employed to overcome DIA challenges and enable large scope of urine proteomics studies.

Methods: Urine samples from 87 patients associated with seven different differential diagnoses were collected and trypsinized using a membrane-based processing method in a 96-well plate format. For generation of a spectral library all samples were analyzed by DDA methods. DIA analysis was performed on the Q Exactive HF MS to quantitatively mapping the urinary proteomes.

Results: In this study, we have developed a robust DIA method for the comprehensive mapping of the urinary proteome that enables large scale urine proteomics studies with high throughput ( less than 4 days for 87 biological sample measurements) and excellent reproducibility (media of CV% is 7%).

Introduction The promises of data independent acquisition (DIA) strategies are a comprehensive and reproducible data collection for large-scale quantitative proteomics experiments. However, the wide isolation window (usually >10Da) of DIA experiments co-isolates and co-fragments multiple peptides, resulting in highly complex DIA MS/MS spectra and makes the DIA data analysis challenging. To accurately identify and precisely quantify thousands of proteins per DIA experiment, the completeness and specificity of spectral library, the mass accuracy of the data, and the technical variance in quantitation play important roles. In this work, we utilize the Q Exactive HF mass spectrometer for DIA LC-MS/MS experiments to study the urinary proteome, and demonstrate how high resolution/ accurate mass spectrometry is employed to overcome these DIA challenges and enable large scope of urine proteomics studies..

Methods Sample Preparation

Urine samples were collected from consenting patients visiting the Emergency Department at Boston Children's Hospital. Upon consent, urine samples from 87 patients associated with seven different differential diagnoses were collected (abdominal pain controls: n = 33, ovarian cyst: n = 12, mesenteric adenitis: n = 6, constipation n = 7, urinary tract infection; UTI: n = 11, gastritis: n = 6, gastroenteritis: n = 12) and trypsinized using a membrane-based processing method in a 96-well plate format.

Liquid Chromatography

For the spectral library, all 87 samples were analyzed by a nanoLC system equipped with a LC-chip system (cHiPLC nanoflex, Eksigent, trapping column: Nano cHiPLC Trap column 200 μm x 0.5 mm Reprosil C18, 3 μm, analytical column: Nano cHiPLC column 75 μm x 15 cm Reprosil C18, 3 μm) coupled online to a Q-Exactive mass spectrometer (Thermo Fisher Scientific). Peptides (4 µl of digest) were separated by a linear gradient from 93% buffer A (0.2% FA in water) / 7% buffer B (0.2% FA in ACN) to 75% buffer A / 25% buffer B within 75 min.

DIA experiments on all 87 samples: Each sample was analyzed on a Thermo ScientificTM EASY-nLCTM 1000 nanoLC system equipped with a trapping column (Thermo ScientificTM PepMapTM100, 75um x 2cm, C18, 3um) and an analytical column (PepMapRSLC, 75um x 25cm, C18, 2um) coupled to a Q Exactive HF mass spectrometer equipped with a Thermo Scientific TM EASY-Spray TM nanoelectrospray ion source. Peptides (2 µl of digest) were separated by a linear gradient from 93% buffer A (0.2% FA in water) / 7% buffer B (0.2% FA in ACN) to 75% buffer A / 25% buffer B within 30 min. The total run time with loading and washing steps was 50 min. Column oven was set to 40°C .

Mass Spectrometry

DDA method on QE: The MS was operated in data-dependent TOP10 mode with the following settings: mass range 400-1,000 Th; resolution for MS1 scan 70,000 @ 200 Th; lock mass: 445.120025 Th; resolution for MS2 scan 17,500 @ 200 Th; isolation width 1.6 m/z; NCE 27; underfill ratio 1%; charge state exclusion: unassigned, 1, >6; dynamic exclusion 30 s.

Additionally a subset of randomly chosen 23 samples was analyzed on a commercial TripleTOF TM 5600 (AB Sciex, Concord, Canada) using the same LC setup and gradient. The MS was operated in data-dependent TOP50 mode with following settings: MS1 mass range 400-1,000 Th with 250 ms acc. time; MS2 mass range 100-1,700 Th with 50 ms acc. time and following MS2 selection criteria: UNIT resolution, intensity threshold 100 cts; charge states 2-5. Dynamic exclusion set to 17 s.

FIGURE 1. The established DIA workflow for Urinary Biomarker discovery

FIGURE 2. Influence of Spectral Library: Percentage of peptides and proteins that are detected in all three replicates, in 2 out of 3 replicates, in single replicate.

FIGURE 3. Reproducibility Evaluation with Three Replicates: Technical Reproducibility Of DDA And DIA Quantification Based On Three Replicates (Fig 3A). Number of Identification in Relation to Technical Replicates ( Fig 3B)

Results Selecting the Most Appropriate Spectral Library

To investigate the influence of the spectral library on the data analysis, we generated six different spectral libraries (Table 1). The quality of a spectral library can be assessed by the completeness of the library and the reproducibility of peptide/protein detection.

The comprehensive urinary library compromised 2,600 protein groups and 20,000 peptides (Spec Lib 06 in Table 1). Based on current studies, this library covers the vast majority of the urinary proteome (2).

We calculated the percentage of peptides and proteins that were detected in all three replicates, in 2 out of 3 replicates or in only one single replicate (Figure 2). The results show a higher reproducibility of peptide/protein identification and quantification when using the urinary spectral library 06. This finding underscores the importance of using a sample-specific spectral library .

High Reproducible Peptide Detection and Quantitation with DIA experiments

To elucidate how the DIA routine performs compared to a DDA workflow, we analyzed an unrelated sample three times by a DDA and the DIA method on the Q Exactive HF using the same 30 min gradient. The DIA data were analyzed using the comprehensive urinary library (Spec lib 06, Table 1).

With our optimized DIA workflow (Figure 1), the median CV of the peptide/protein quantification of the DIA data was only 6.7% and 8.1%, which is twofold better than from the DDA data (Figure 3A). The number of detected peptides (~5,220) and proteins (1,120) with a single DIA experiment are almost twofold as the DDA data (Figure 3B). Our DIA method show a highly reproducible and precise quantification on peptide level.

FIGURE 5. Comprehensive Urinary Proteome Coverage by DIA workflow

FIGURE 6. Top Biomarker Discovery by using DIA workflow

The power of Mass Accuracy

The application of wide isolation windows (>10 Th) in typical DIA experiments results in complex MS/MS spectra. During data analysis, the ion chromatograms of multiple fragment ions are extracted and aligned for peptide detection and quantification. To separate the analyte of interest from interferences, a highly accurate mass of the ions is crucial. We applied different mass tolerances for the data analysis (50 ppm, 20 ppm, and 10 ppm). One example is shown in Figure 4. With ± 2.5 mins retention time window and 50 ppm mass tolerance for extracting the multiple transitions of a peptide LVGYLDR, several interferences overlapped with the peptide of interest. Only with 10 ppm mass accuracy, the interferences are removed from the spectrum, therefore, 10 ppm mass accuracy is minimum requirement for an accurate detection of peptides .

TABLE 1. Spectral libraries

Library Name Protein Groups Peptides Fragment ions Instrument Source

Spec Lib 01 2,077 15,470 109,064 Q Exactive MS 23 DDA files

Urine

Spec Lib 02 1,436 9,569 65,880 TripleTOF 5600 23 DDA files

Urine

Spec Lib 03 2,226 16,985 123,087 Combined from Spec Lib 01 and 02

Urine

Spec Lib 04 14,158 149,420 2,832,306 TripleTOF 5600 see Rosenberger et al., 2014 (1)

Spec Lib 05 1,869 40,902 925,156 TripleTOF 5600 subset of Human-14,000

Spec Lib 06 2,575 19,854 144,643 87 Q Exactive MS DDA files + 23 TripleToF 5600

DDA files

Urine

DIA method on QE HF: Each DIA duty cycle contains one full scan and 24DIA MS/MS scans to covering the mass range 400 -1000 TH. Full scan with a resolution 30,000 @ 200 Th; AGC target – 3e6, maximal IT – 50ms; mass range 400-1,000 Th; followed by DIA scans with resolution 30,000 @ 200 Th; isolation width 20 m/z for the first 20 DIA scans, 40 m/z for the following 2 DIA scans, and 60m/z for the last two DIA scans; NCE 30; target value 1e6, maximal injection time set to “auto”, which automatically calculates the maximal injection time based on the detection time to allow the mass spectrometer always operating in the parallel ion filling and detection mode.

FIGURE 4. The ion chromatograms of multiple transitions of peptide LVGYLDR of protein (P07602) are extracted with 50 ppm, 20 ppm, and 10 ppm. Retention time window is ± 2.5 mins. 10 ppm mass accuracy is the minimum requirement for an accurate peptide detection.

Comprehensive Urinary Proteome Coverage by DIA workflow

The establishment of the DIA workflow (Figure 1) was applied to a urinary study compromising 87 samples. The samples were derived from patients with abdominal pain in a pediatric emergency room. After digestion in a 96-well plate format we were able to analyze them in less than 4 days by our optimized DIA workflow. In average, we detected 1,301 protein groups (848 – 1,720) (Figure 5A) and 5,714 peptides per sample (3,172 – 8,231) (Figure 5B). In total, our DIA workflow enabled the detection of 2,456 proteins, representing 95% of the proteins in the spectral library. These numbers are contrasted by the DDA output, in which only 7% of the proteins were identified in more than 95% of the samples and 60% in less than 25% of the samples (Figure 5C). Compared to all other samples, 773 proteins were significantly changed in their amount in the UTI samples (non-parametric Mann-Whitney U test, p<0.05), 502 in the ovarian cyst samples, 209 in the constipation samples, 111 in the mesenteric adenitis samples and 58 in the gastroenteritis samples (Figure 5D).

73

77

71

26

61

69

12 10

13 22 15

13 15 13

16

52

24 18

0

1000

2000

3000

4000

5000

Iden

tifie

d Pe

ptid

es

Peptide Level

3of3 2of3 1of3

80

81

75

22

62

75

8 8 10

18

12 9 12 12

15

60

25 15

0

400

800

1200

1600

Iden

tifie

d Pr

otei

ns

Protein Level

3B

1 2 3 0

1000

2000

3000

4000

5000

6000

7000

Replicate

Iden

tifie

d Pe

ptid

es

Peptide Level 3A

1 2 3 0

200

400

600

800

1000

1200

1400

Replicate

Iden

tifie

d Pr

otei

ns

Protein Level

DDA ID increase DDA DDA - matching ID increase DDA - matching DIA ID increase DIA

6 8 10

Median CV 16.7%

20

0

40

60

80

100

120

140

Protein Level

DDA – Peptide Peak Areas

Log10 Protein Intensity

Coe

ffici

ent O

f Var

ianc

e %

2 4 6 8

Median CV

8.1% 20

0

40

60

80

100

120

140 DIA – Fragment Ion Peak Areas

Log10 Protein Intensity

10 7 9

Median CV 15.7%

20

0

40

60

80

100

120

140

6 8 7 4 6 8

Median CV

6.7% 20

0

40

60

80

100

120

140

3 5

Peptide Level

DIA – Fragment Ion Peak Areas

DDA – Peptide Peak Areas

Log10 Peptide Intensity

Coe

ffici

ent O

f Var

ianc

e %

Log10 Peptide Intensity

Data Analysis

Database search in MaxQuant software, version 1.5.0.0 was performed directly with the RAW- and WIFF files using the human UniProt database. Trypsin with up to 2 missed cleavages; mass tolerances set to 20 ppm for the first search and 4.5 ppm for the second search for the Q-Exactive data and 0.1 Da for the first search and 0.01 for the main search for the TripleTOF 5600 data. Oxidation of M was chosen as dynamic modification (+15.995 Da) and carbamidomethylation of C as static modification (+57.021 Da). FDR was set to 1% on peptide and protein level.

All DIA data were directly analyzed in Spectronaut 7.0 (Biognosys). dynamic score refinement and MS1 scoring – enabled; interference correction and cross run normalization (total peak area) – enabled; All results were filtered by a Q value of 0.01 (equals a FDR of 1% on peptide level).

Protein intensity was calculated by summing the peptide intensities of each protein from the Spectronaut output file. The data were imported into Perseus 1.5.1.6 and missing values imputed by the minimum value for each protein. Significance of protein abundance changes were calculated using the u-test (non-parametric test) and protein with a p value below 0.01 were considered to be significantly changed. The annotation of biological process was done with the DAVID online tool using the comprehensive urinary spectral library- Spec Lib 06 in Table 1).

Biomarker Discovery

To find the best suitable biomarker candidates we focused on the ten highest significantly changed proteins for each of the five conditions. The performance of the biomarker candidates has been assessed by calculating the area under the receiver-operating characteristic (AUROC) (Figure 6). For example, Cystatin-B (CYTB) is an intracellular thiol proteinase inhibitor. It is increased level in Ovarian Cyst (OC) with a pValue equals to 1.3e-5 (Bonferroni-corrected: p=0.027), and AUROC is 0.91. Cystatin-B has been identified as biomarker candidate in the context of malignant growths in ovaries (3) and bladder (4), i.e. genitourinary tract.

50 ppm 20 ppm 10 ppm

─ [y5+] – 623.315 m/z ─ [y6+] – 722.383 m/z ─ [y6+ -H3PO4] – 624.406 m/z ─ [y5++] – 312.161 m/z ─ [y3+ -NH3] – 386.203 m/z ─ [y4+] – 566.293 m/z

RT: 18.93 - 24.07

19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e A

bund

ance

NL: 7.12E5Base Peak m/z= 623.2838-623.3462 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.88E5Base Peak m/z= 722.3469-722.4191 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.24E4Base Peak m/z= 624.3748-624.4372 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.02E7Base Peak m/z= 312.1454-312.1766 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.83E6Base Peak m/z= 386.1837-386.2223 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2647-566.3213 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9

RT: 18.93 - 24.07

19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e A

bund

ance

NL: 7.12E5Base Peak m/z= 623.3025-623.3275 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.30E5Base Peak m/z= 722.3686-722.3974 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 3.25E4Base Peak m/z= 624.3935-624.4185 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 5.85E6Base Peak m/z= 312.1548-312.1672 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.97E5Base Peak m/z= 386.1953-386.2107 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2817-566.3043 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9

RT: 18.93 - 24.07

19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e A

bund

ance

NL: 7.12E5Base Peak m/z= 623.3088-623.3212 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.13E5Base Peak m/z= 722.3758-722.3902 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 3.25E4Base Peak m/z= 624.3998-624.4122 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 7.60E3Base Peak m/z= 312.1579-312.1641 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.21E5Base Peak m/z= 386.1991-386.2069 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2873-566.2987 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9

Spectronaut is a trademark of Biognosys AG. MaxQuant software is a trademark of Max-Planck Institute of Biochemistry. TripleTOF is a trademark of Sciex, Pte. Ltd. All other trademarks are the property of Thermo Fisher Scientific and its subsidiaries. This information is not intended to encourage use of these products in any manners that might infringe the intellectual property rights of others.

0 200 400 600 800

1000 1200 1400 1600 1800

Prot

ein

Grou

ps

Identified Protein Groups

0 1000 2000 3000 4000 5000 6000 7000 8000 9000

Uni

que

Pept

ides

Identified Peptides

0 100 200 300 400 500 600

0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5

Prot

eins

Sample Coverage

DDA DIA Protein Sample Coverage %

0

100

200

300

400

500

600

700

800

Sign

ifica

ntly

Cha

nged

Pro

tein

s (M

ann-

Whi

tney

-U-t

est p

<0.0

5)

Significantly Changed Proteins

5A 5B

5C 5D

Pain Control Group Ovarian Cyst Urinary Tract Infection

Constipation Mesenteric Adenitis Gastroenteritis

Ovarian Cyst (OC) Marker: CYTB UTI Marker: PERM

Constipation (Con) Marker: UROK Mesenteric Adenitis (MA) Marker: LEG3

Gastroenteritis (GE) Marker: KPYR

Ovarian Cyst (AUROC=0.91)

1-Specificity

Sens

itivi

ty

0 0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07

OC

U

TI

Con

M

A

GE

P

ain

CYTB

1e7 1e6

1e3 1e2

1e4 1e5

Mesenteric Adenitis (AUROC=0.872)

1-Specificity

Sens

itivi

ty

0 0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07

OC

U

TI

Con

M

A

GE

P

ain

LEG3

1e7 1e6

1e3 1e2

1e4 1e5

Gastroenteritis (AUROC=0.76)

1-Specificity

Sens

itivi

ty

0 0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

1.E+02

1.E+03

1.E+04

1.E+05

OC

U

TI

Con

M

A

GE

P

ain

KPYR

1e3

1e2

1e4

1e5

Urinary Tract Infection (AUROC=0.968)

1-Specificity

Sens

itivi

ty

0 0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07

OC

U

TI

Con

M

A

GE

P

ain

PERE

1e7 1e6

1e3 1e2

1e4 1e5

Constipation (AUROC=0.928)

1-Specificity

Sens

itivi

ty

0 0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07

OC

U

TI

Con

M

A

GE

P

ain

UROK

1e7 1e6

1e3 1e2

1e4 1e5

Peptides

Comprehensive Urinary Spectral Library

2,575 protein groups (87 Q Exactive MS and

23 TripleToF 5600TM runs)

19,854 unique peptides

144,643 transitions

87 ER patients in sixth groups

Constipation

Mesenteric Adenitis

Gastroenteritis

Pain Control Group

Ovarian Cyst

Urinary Tract Infection

DIA (Q Exactive HF MS)

Digestion

Search against database using

Spectronaut FDR 1%

Urinary Samples (87)

Scoring and Q Value calculation by Spectronaut

Perseus

u-test (non-parametric test)

Validated Peptides/proteins from 87 samples

Proteins

Samples

Num

ber o

f pro

tein

s/pe

ptid

es

Optimized DIA method:

average 8-10 data points per LC peak ( 9s @ FWHM)

average cycle time 2 s

Patient 1

Patient 2

Patient N

KEGG Pathway Enrichment p value

0.001 0.05 1

RT: 20.66 - 21.64

20.7 20.8 20.9 21.0 21.1 21.2 21.3 21.4 21.5 21.6Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Relat

ive A

bund

ance

NL: 2.35E7Base Peak m/z= 506.2709-506.2811 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9

Overcoming the challenges in Data Independent Acquisition (DIA) via high resolution accurate mass Orbitrap based mass spectrometer Yue Xuan1; Jan Muntel2; Sebastian T. Berger2; Andreas FR Huhmer3; Hanno Steen2; Thomas Moehring1 1Thermo Fisher Scientific, Bremen, GERMANY; 2Departments of Pathology, Boston Children’s Hospital, Boston, MA; 3Thermo Fisher Scientific, San Jose, CA

Conclusion We successfully established a DIA workflow, which enables large scope of urine

proteomics studies based on high resolution accurate mass Orbitrap technology.

Even without any prefractionation of the samples, we detected more than 2,500 proteins, which is close to complete coverage of the urinary proteome by our DIA workflow.

The highly reproducible quantification (median CV 7%) enables to skip technical replicates and to focus on the analysis of biological samples.

The high number of quantified proteins per sample (in average 1,300) per single DIA experiment enabled to get insights into the host cell response in an UTI based on changes in the urinary proteome as well into the formation of an ovarian cyst.

We envision that the comprehensiveness and low analysis time per sample will allow the application of this DIA workflow in synchronous biomarker discovery and validation which requires the analysis of hundreds of samples.

References 1. Rosenberger, G. SCIENTIFIC DATA | 1:140031 | DOI: 10.1038/sdata.2014.31

2. Zheng, J. BMC genomics 2013,14, 777

3. Takaya et al., Int J Oncol, 2015

4. Feldman et al., Clin Cancer Res, 2009

Overview Purpose: Utilize Thermo Scientific TM Q ExactiveTM HF mass spectrometer for DIA LC-MS/MS experiments to study the urinary proteome, and demonstrate how high resolution/ accurate mass spectrometry is employed to overcome DIA challenges and enable large scope of urine proteomics studies.

Methods: Urine samples from 87 patients associated with seven different differential diagnoses were collected and trypsinized using a membrane-based processing method in a 96-well plate format. For generation of a spectral library all samples were analyzed by DDA methods. DIA analysis was performed on the Q Exactive HF MS to quantitatively mapping the urinary proteomes.

Results: In this study, we have developed a robust DIA method for the comprehensive mapping of the urinary proteome that enables large scale urine proteomics studies with high throughput ( less than 4 days for 87 biological sample measurements) and excellent reproducibility (media of CV% is 7%).

Introduction The promises of data independent acquisition (DIA) strategies are a comprehensive and reproducible data collection for large-scale quantitative proteomics experiments. However, the wide isolation window (usually >10Da) of DIA experiments co-isolates and co-fragments multiple peptides, resulting in highly complex DIA MS/MS spectra and makes the DIA data analysis challenging. To accurately identify and precisely quantify thousands of proteins per DIA experiment, the completeness and specificity of spectral library, the mass accuracy of the data, and the technical variance in quantitation play important roles. In this work, we utilize the Q Exactive HF mass spectrometer for DIA LC-MS/MS experiments to study the urinary proteome, and demonstrate how high resolution/ accurate mass spectrometry is employed to overcome these DIA challenges and enable large scope of urine proteomics studies..

Methods Sample Preparation

Urine samples were collected from consenting patients visiting the Emergency Department at Boston Children's Hospital. Upon consent, urine samples from 87 patients associated with seven different differential diagnoses were collected (abdominal pain controls: n = 33, ovarian cyst: n = 12, mesenteric adenitis: n = 6, constipation n = 7, urinary tract infection; UTI: n = 11, gastritis: n = 6, gastroenteritis: n = 12) and trypsinized using a membrane-based processing method in a 96-well plate format.

Liquid Chromatography

For the spectral library, all 87 samples were analyzed by a nanoLC system equipped with a LC-chip system (cHiPLC nanoflex, Eksigent, trapping column: Nano cHiPLC Trap column 200 μm x 0.5 mm Reprosil C18, 3 μm, analytical column: Nano cHiPLC column 75 μm x 15 cm Reprosil C18, 3 μm) coupled online to a Q-Exactive mass spectrometer (Thermo Fisher Scientific). Peptides (4 µl of digest) were separated by a linear gradient from 93% buffer A (0.2% FA in water) / 7% buffer B (0.2% FA in ACN) to 75% buffer A / 25% buffer B within 75 min.

DIA experiments on all 87 samples: Each sample was analyzed on a Thermo ScientificTM EASY-nLCTM 1000 nanoLC system equipped with a trapping column (Thermo ScientificTM PepMapTM100, 75um x 2cm, C18, 3um) and an analytical column (PepMapRSLC, 75um x 25cm, C18, 2um) coupled to a Q Exactive HF mass spectrometer equipped with a Thermo Scientific TM EASY-Spray TM nanoelectrospray ion source. Peptides (2 µl of digest) were separated by a linear gradient from 93% buffer A (0.2% FA in water) / 7% buffer B (0.2% FA in ACN) to 75% buffer A / 25% buffer B within 30 min. The total run time with loading and washing steps was 50 min. Column oven was set to 40°C .

Mass Spectrometry

DDA method on QE: The MS was operated in data-dependent TOP10 mode with the following settings: mass range 400-1,000 Th; resolution for MS1 scan 70,000 @ 200 Th; lock mass: 445.120025 Th; resolution for MS2 scan 17,500 @ 200 Th; isolation width 1.6 m/z; NCE 27; underfill ratio 1%; charge state exclusion: unassigned, 1, >6; dynamic exclusion 30 s.

Additionally a subset of randomly chosen 23 samples was analyzed on a commercial TripleTOF TM 5600 (AB Sciex, Concord, Canada) using the same LC setup and gradient. The MS was operated in data-dependent TOP50 mode with following settings: MS1 mass range 400-1,000 Th with 250 ms acc. time; MS2 mass range 100-1,700 Th with 50 ms acc. time and following MS2 selection criteria: UNIT resolution, intensity threshold 100 cts; charge states 2-5. Dynamic exclusion set to 17 s.

FIGURE 1. The established DIA workflow for Urinary Biomarker discovery

FIGURE 2. Influence of Spectral Library: Percentage of peptides and proteins that are detected in all three replicates, in 2 out of 3 replicates, in single replicate.

FIGURE 3. Reproducibility Evaluation with Three Replicates: Technical Reproducibility Of DDA And DIA Quantification Based On Three Replicates (Fig 3A). Number of Identification in Relation to Technical Replicates ( Fig 3B)

Results Selecting the Most Appropriate Spectral Library

To investigate the influence of the spectral library on the data analysis, we generated six different spectral libraries (Table 1). The quality of a spectral library can be assessed by the completeness of the library and the reproducibility of peptide/protein detection.

The comprehensive urinary library compromised 2,600 protein groups and 20,000 peptides (Spec Lib 06 in Table 1). Based on current studies, this library covers the vast majority of the urinary proteome (2).

We calculated the percentage of peptides and proteins that were detected in all three replicates, in 2 out of 3 replicates or in only one single replicate (Figure 2). The results show a higher reproducibility of peptide/protein identification and quantification when using the urinary spectral library 06. This finding underscores the importance of using a sample-specific spectral library .

High Reproducible Peptide Detection and Quantitation with DIA experiments

To elucidate how the DIA routine performs compared to a DDA workflow, we analyzed an unrelated sample three times by a DDA and the DIA method on the Q Exactive HF using the same 30 min gradient. The DIA data were analyzed using the comprehensive urinary library (Spec lib 06, Table 1).

With our optimized DIA workflow (Figure 1), the median CV of the peptide/protein quantification of the DIA data was only 6.7% and 8.1%, which is twofold better than from the DDA data (Figure 3A). The number of detected peptides (~5,220) and proteins (1,120) with a single DIA experiment are almost twofold as the DDA data (Figure 3B). Our DIA method show a highly reproducible and precise quantification on peptide level.

FIGURE 5. Comprehensive Urinary Proteome Coverage by DIA workflow

FIGURE 6. Top Biomarker Discovery by using DIA workflow

The power of Mass Accuracy

The application of wide isolation windows (>10 Th) in typical DIA experiments results in complex MS/MS spectra. During data analysis, the ion chromatograms of multiple fragment ions are extracted and aligned for peptide detection and quantification. To separate the analyte of interest from interferences, a highly accurate mass of the ions is crucial. We applied different mass tolerances for the data analysis (50 ppm, 20 ppm, and 10 ppm). One example is shown in Figure 4. With ± 2.5 mins retention time window and 50 ppm mass tolerance for extracting the multiple transitions of a peptide LVGYLDR, several interferences overlapped with the peptide of interest. Only with 10 ppm mass accuracy, the interferences are removed from the spectrum, therefore, 10 ppm mass accuracy is minimum requirement for an accurate detection of peptides .

TABLE 1. Spectral libraries

Library Name Protein Groups Peptides Fragment ions Instrument Source

Spec Lib 01 2,077 15,470 109,064 Q Exactive MS 23 DDA files

Urine

Spec Lib 02 1,436 9,569 65,880 TripleTOF 5600 23 DDA files

Urine

Spec Lib 03 2,226 16,985 123,087 Combined from Spec Lib 01 and 02

Urine

Spec Lib 04 14,158 149,420 2,832,306 TripleTOF 5600 see Rosenberger et al., 2014 (1)

Spec Lib 05 1,869 40,902 925,156 TripleTOF 5600 subset of Human-14,000

Spec Lib 06 2,575 19,854 144,643 87 Q Exactive MS DDA files + 23 TripleToF 5600

DDA files

Urine

DIA method on QE HF: Each DIA duty cycle contains one full scan and 24DIA MS/MS scans to covering the mass range 400 -1000 TH. Full scan with a resolution 30,000 @ 200 Th; AGC target – 3e6, maximal IT – 50ms; mass range 400-1,000 Th; followed by DIA scans with resolution 30,000 @ 200 Th; isolation width 20 m/z for the first 20 DIA scans, 40 m/z for the following 2 DIA scans, and 60m/z for the last two DIA scans; NCE 30; target value 1e6, maximal injection time set to “auto”, which automatically calculates the maximal injection time based on the detection time to allow the mass spectrometer always operating in the parallel ion filling and detection mode.

FIGURE 4. The ion chromatograms of multiple transitions of peptide LVGYLDR of protein (P07602) are extracted with 50 ppm, 20 ppm, and 10 ppm. Retention time window is ± 2.5 mins. 10 ppm mass accuracy is the minimum requirement for an accurate peptide detection.

Comprehensive Urinary Proteome Coverage by DIA workflow

The establishment of the DIA workflow (Figure 1) was applied to a urinary study compromising 87 samples. The samples were derived from patients with abdominal pain in a pediatric emergency room. After digestion in a 96-well plate format we were able to analyze them in less than 4 days by our optimized DIA workflow. In average, we detected 1,301 protein groups (848 – 1,720) (Figure 5A) and 5,714 peptides per sample (3,172 – 8,231) (Figure 5B). In total, our DIA workflow enabled the detection of 2,456 proteins, representing 95% of the proteins in the spectral library. These numbers are contrasted by the DDA output, in which only 7% of the proteins were identified in more than 95% of the samples and 60% in less than 25% of the samples (Figure 5C). Compared to all other samples, 773 proteins were significantly changed in their amount in the UTI samples (non-parametric Mann-Whitney U test, p<0.05), 502 in the ovarian cyst samples, 209 in the constipation samples, 111 in the mesenteric adenitis samples and 58 in the gastroenteritis samples (Figure 5D).

73

77

71

26

61

69

12 10

13 22 15

13 15 13

16

52

24 18

0

1000

2000

3000

4000

5000

Iden

tifie

d Pe

ptid

es

Peptide Level

3of3 2of3 1of3

80

81

75

22

62

75

8 8 10

18

12 9 12 12

15

60

25 15

0

400

800

1200

1600

Iden

tifie

d Pr

otei

ns

Protein Level

3B

1 2 3 0

1000

2000

3000

4000

5000

6000

7000

Replicate

Iden

tifie

d Pe

ptid

es

Peptide Level 3A

1 2 3 0

200

400

600

800

1000

1200

1400

Replicate

Iden

tifie

d Pr

otei

ns

Protein Level

DDA ID increase DDA DDA - matching ID increase DDA - matching DIA ID increase DIA

6 8 10

Median CV 16.7%

20

0

40

60

80

100

120

140

Protein Level

DDA – Peptide Peak Areas

Log10 Protein Intensity

Coe

ffici

ent O

f Var

ianc

e %

2 4 6 8

Median CV

8.1% 20

0

40

60

80

100

120

140 DIA – Fragment Ion Peak Areas

Log10 Protein Intensity

10 7 9

Median CV 15.7%

20

0

40

60

80

100

120

140

6 8 7 4 6 8

Median CV

6.7% 20

0

40

60

80

100

120

140

3 5

Peptide Level

DIA – Fragment Ion Peak Areas

DDA – Peptide Peak Areas

Log10 Peptide Intensity

Coe

ffici

ent O

f Var

ianc

e %

Log10 Peptide Intensity

Data Analysis

Database search in MaxQuant software, version 1.5.0.0 was performed directly with the RAW- and WIFF files using the human UniProt database. Trypsin with up to 2 missed cleavages; mass tolerances set to 20 ppm for the first search and 4.5 ppm for the second search for the Q-Exactive data and 0.1 Da for the first search and 0.01 for the main search for the TripleTOF 5600 data. Oxidation of M was chosen as dynamic modification (+15.995 Da) and carbamidomethylation of C as static modification (+57.021 Da). FDR was set to 1% on peptide and protein level.

All DIA data were directly analyzed in Spectronaut 7.0 (Biognosys). dynamic score refinement and MS1 scoring – enabled; interference correction and cross run normalization (total peak area) – enabled; All results were filtered by a Q value of 0.01 (equals a FDR of 1% on peptide level).

Protein intensity was calculated by summing the peptide intensities of each protein from the Spectronaut output file. The data were imported into Perseus 1.5.1.6 and missing values imputed by the minimum value for each protein. Significance of protein abundance changes were calculated using the u-test (non-parametric test) and protein with a p value below 0.01 were considered to be significantly changed. The annotation of biological process was done with the DAVID online tool using the comprehensive urinary spectral library- Spec Lib 06 in Table 1).

Biomarker Discovery

To find the best suitable biomarker candidates we focused on the ten highest significantly changed proteins for each of the five conditions. The performance of the biomarker candidates has been assessed by calculating the area under the receiver-operating characteristic (AUROC) (Figure 6). For example, Cystatin-B (CYTB) is an intracellular thiol proteinase inhibitor. It is increased level in Ovarian Cyst (OC) with a pValue equals to 1.3e-5 (Bonferroni-corrected: p=0.027), and AUROC is 0.91. Cystatin-B has been identified as biomarker candidate in the context of malignant growths in ovaries (3) and bladder (4), i.e. genitourinary tract.

50 ppm 20 ppm 10 ppm

─ [y5+] – 623.315 m/z ─ [y6+] – 722.383 m/z ─ [y6+ -H3PO4] – 624.406 m/z ─ [y5++] – 312.161 m/z ─ [y3+ -NH3] – 386.203 m/z ─ [y4+] – 566.293 m/z

RT: 18.93 - 24.07

19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e A

bund

ance

NL: 7.12E5Base Peak m/z= 623.2838-623.3462 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.88E5Base Peak m/z= 722.3469-722.4191 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.24E4Base Peak m/z= 624.3748-624.4372 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.02E7Base Peak m/z= 312.1454-312.1766 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.83E6Base Peak m/z= 386.1837-386.2223 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2647-566.3213 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9

RT: 18.93 - 24.07

19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e A

bund

ance

NL: 7.12E5Base Peak m/z= 623.3025-623.3275 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.30E5Base Peak m/z= 722.3686-722.3974 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 3.25E4Base Peak m/z= 624.3935-624.4185 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 5.85E6Base Peak m/z= 312.1548-312.1672 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.97E5Base Peak m/z= 386.1953-386.2107 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2817-566.3043 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9

RT: 18.93 - 24.07

19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e A

bund

ance

NL: 7.12E5Base Peak m/z= 623.3088-623.3212 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.13E5Base Peak m/z= 722.3758-722.3902 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 3.25E4Base Peak m/z= 624.3998-624.4122 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 7.60E3Base Peak m/z= 312.1579-312.1641 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.21E5Base Peak m/z= 386.1991-386.2069 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2873-566.2987 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9

Spectronaut is a trademark of Biognosys AG. MaxQuant software is a trademark of Max-Planck Institute of Biochemistry. TripleTOF is a trademark of Sciex, Pte. Ltd. All other trademarks are the property of Thermo Fisher Scientific and its subsidiaries. This information is not intended to encourage use of these products in any manners that might infringe the intellectual property rights of others.

0 200 400 600 800

1000 1200 1400 1600 1800

Prot

ein

Grou

ps

Identified Protein Groups

0 1000 2000 3000 4000 5000 6000 7000 8000 9000

Uni

que

Pept

ides

Identified Peptides

0 100 200 300 400 500 600

0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5

Prot

eins

Sample Coverage

DDA DIA Protein Sample Coverage %

0

100

200

300

400

500

600

700

800

Sign

ifica

ntly

Cha

nged

Pro

tein

s (M

ann-

Whi

tney

-U-t

est p

<0.0

5)

Significantly Changed Proteins

5A 5B

5C 5D

Pain Control Group Ovarian Cyst Urinary Tract Infection

Constipation Mesenteric Adenitis Gastroenteritis

Ovarian Cyst (OC) Marker: CYTB UTI Marker: PERM

Constipation (Con) Marker: UROK Mesenteric Adenitis (MA) Marker: LEG3

Gastroenteritis (GE) Marker: KPYR

Ovarian Cyst (AUROC=0.91)

1-Specificity

Sens

itivi

ty

0 0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07

OC

U

TI

Con

M

A

GE

P

ain

CYTB

1e7 1e6

1e3 1e2

1e4 1e5

Mesenteric Adenitis (AUROC=0.872)

1-Specificity

Sens

itivi

ty

0 0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07

OC

U

TI

Con

M

A

GE

P

ain

LEG3

1e7 1e6

1e3 1e2

1e4 1e5

Gastroenteritis (AUROC=0.76)

1-Specificity

Sens

itivi

ty

0 0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

1.E+02

1.E+03

1.E+04

1.E+05

OC

U

TI

Con

M

A

GE

P

ain

KPYR

1e3

1e2

1e4

1e5

Urinary Tract Infection (AUROC=0.968)

1-Specificity

Sens

itivi

ty

0 0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07

OC

U

TI

Con

M

A

GE

P

ain

PERE

1e7 1e6

1e3 1e2

1e4 1e5

Constipation (AUROC=0.928)

1-Specificity

Sens

itivi

ty

0 0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07

OC

U

TI

Con

M

A

GE

P

ain

UROK

1e7 1e6

1e3 1e2

1e4 1e5

Peptides

Comprehensive Urinary Spectral Library

2,575 protein groups (87 Q Exactive MS and

23 TripleToF 5600TM runs)

19,854 unique peptides

144,643 transitions

87 ER patients in sixth groups

Constipation

Mesenteric Adenitis

Gastroenteritis

Pain Control Group

Ovarian Cyst

Urinary Tract Infection

DIA (Q Exactive HF MS)

Digestion

Search against database using

Spectronaut FDR 1%

Urinary Samples (87)

Scoring and Q Value calculation by Spectronaut

Perseus

u-test (non-parametric test)

Validated Peptides/proteins from 87 samples

Proteins

Samples

Num

ber o

f pro

tein

s/pe

ptid

es

Optimized DIA method:

average 8-10 data points per LC peak ( 9s @ FWHM)

average cycle time 2 s

Patient 1

Patient 2

Patient N

KEGG Pathway Enrichment p value

0.001 0.05 1

RT: 20.66 - 21.64

20.7 20.8 20.9 21.0 21.1 21.2 21.3 21.4 21.5 21.6Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Relat

ive A

bund

ance

NL: 2.35E7Base Peak m/z= 506.2709-506.2811 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9

Overcoming the challenges in Data Independent Acquisition (DIA) via high resolution accurate mass Orbitrap based mass spectrometer Yue Xuan1; Jan Muntel2; Sebastian T. Berger2; Andreas FR Huhmer3; Hanno Steen2; Thomas Moehring1 1Thermo Fisher Scientific, Bremen, GERMANY; 2Departments of Pathology, Boston Children’s Hospital, Boston, MA; 3Thermo Fisher Scientific, San Jose, CA

Conclusion We successfully established a DIA workflow, which enables large scope of urine

proteomics studies based on high resolution accurate mass Orbitrap technology.

Even without any prefractionation of the samples, we detected more than 2,500 proteins, which is close to complete coverage of the urinary proteome by our DIA workflow.

The highly reproducible quantification (median CV 7%) enables to skip technical replicates and to focus on the analysis of biological samples.

The high number of quantified proteins per sample (in average 1,300) per single DIA experiment enabled to get insights into the host cell response in an UTI based on changes in the urinary proteome as well into the formation of an ovarian cyst.

We envision that the comprehensiveness and low analysis time per sample will allow the application of this DIA workflow in synchronous biomarker discovery and validation which requires the analysis of hundreds of samples.

References 1. Rosenberger, G. SCIENTIFIC DATA | 1:140031 | DOI: 10.1038/sdata.2014.31

2. Zheng, J. BMC genomics 2013,14, 777

3. Takaya et al., Int J Oncol, 2015

4. Feldman et al., Clin Cancer Res, 2009

Overview Purpose: Utilize Thermo Scientific TM Q ExactiveTM HF mass spectrometer for DIA LC-MS/MS experiments to study the urinary proteome, and demonstrate how high resolution/ accurate mass spectrometry is employed to overcome DIA challenges and enable large scope of urine proteomics studies.

Methods: Urine samples from 87 patients associated with seven different differential diagnoses were collected and trypsinized using a membrane-based processing method in a 96-well plate format. For generation of a spectral library all samples were analyzed by DDA methods. DIA analysis was performed on the Q Exactive HF MS to quantitatively mapping the urinary proteomes.

Results: In this study, we have developed a robust DIA method for the comprehensive mapping of the urinary proteome that enables large scale urine proteomics studies with high throughput ( less than 4 days for 87 biological sample measurements) and excellent reproducibility (media of CV% is 7%).

Introduction The promises of data independent acquisition (DIA) strategies are a comprehensive and reproducible data collection for large-scale quantitative proteomics experiments. However, the wide isolation window (usually >10Da) of DIA experiments co-isolates and co-fragments multiple peptides, resulting in highly complex DIA MS/MS spectra and makes the DIA data analysis challenging. To accurately identify and precisely quantify thousands of proteins per DIA experiment, the completeness and specificity of spectral library, the mass accuracy of the data, and the technical variance in quantitation play important roles. In this work, we utilize the Q Exactive HF mass spectrometer for DIA LC-MS/MS experiments to study the urinary proteome, and demonstrate how high resolution/ accurate mass spectrometry is employed to overcome these DIA challenges and enable large scope of urine proteomics studies..

Methods Sample Preparation

Urine samples were collected from consenting patients visiting the Emergency Department at Boston Children's Hospital. Upon consent, urine samples from 87 patients associated with seven different differential diagnoses were collected (abdominal pain controls: n = 33, ovarian cyst: n = 12, mesenteric adenitis: n = 6, constipation n = 7, urinary tract infection; UTI: n = 11, gastritis: n = 6, gastroenteritis: n = 12) and trypsinized using a membrane-based processing method in a 96-well plate format.

Liquid Chromatography

For the spectral library, all 87 samples were analyzed by a nanoLC system equipped with a LC-chip system (cHiPLC nanoflex, Eksigent, trapping column: Nano cHiPLC Trap column 200 μm x 0.5 mm Reprosil C18, 3 μm, analytical column: Nano cHiPLC column 75 μm x 15 cm Reprosil C18, 3 μm) coupled online to a Q-Exactive mass spectrometer (Thermo Fisher Scientific). Peptides (4 µl of digest) were separated by a linear gradient from 93% buffer A (0.2% FA in water) / 7% buffer B (0.2% FA in ACN) to 75% buffer A / 25% buffer B within 75 min.

DIA experiments on all 87 samples: Each sample was analyzed on a Thermo ScientificTM EASY-nLCTM 1000 nanoLC system equipped with a trapping column (Thermo ScientificTM PepMapTM100, 75um x 2cm, C18, 3um) and an analytical column (PepMapRSLC, 75um x 25cm, C18, 2um) coupled to a Q Exactive HF mass spectrometer equipped with a Thermo Scientific TM EASY-Spray TM nanoelectrospray ion source. Peptides (2 µl of digest) were separated by a linear gradient from 93% buffer A (0.2% FA in water) / 7% buffer B (0.2% FA in ACN) to 75% buffer A / 25% buffer B within 30 min. The total run time with loading and washing steps was 50 min. Column oven was set to 40°C .

Mass Spectrometry

DDA method on QE: The MS was operated in data-dependent TOP10 mode with the following settings: mass range 400-1,000 Th; resolution for MS1 scan 70,000 @ 200 Th; lock mass: 445.120025 Th; resolution for MS2 scan 17,500 @ 200 Th; isolation width 1.6 m/z; NCE 27; underfill ratio 1%; charge state exclusion: unassigned, 1, >6; dynamic exclusion 30 s.

Additionally a subset of randomly chosen 23 samples was analyzed on a commercial TripleTOF TM 5600 (AB Sciex, Concord, Canada) using the same LC setup and gradient. The MS was operated in data-dependent TOP50 mode with following settings: MS1 mass range 400-1,000 Th with 250 ms acc. time; MS2 mass range 100-1,700 Th with 50 ms acc. time and following MS2 selection criteria: UNIT resolution, intensity threshold 100 cts; charge states 2-5. Dynamic exclusion set to 17 s.

FIGURE 1. The established DIA workflow for Urinary Biomarker discovery

FIGURE 2. Influence of Spectral Library: Percentage of peptides and proteins that are detected in all three replicates, in 2 out of 3 replicates, in single replicate.

FIGURE 3. Reproducibility Evaluation with Three Replicates: Technical Reproducibility Of DDA And DIA Quantification Based On Three Replicates (Fig 3A). Number of Identification in Relation to Technical Replicates ( Fig 3B)

Results Selecting the Most Appropriate Spectral Library

To investigate the influence of the spectral library on the data analysis, we generated six different spectral libraries (Table 1). The quality of a spectral library can be assessed by the completeness of the library and the reproducibility of peptide/protein detection.

The comprehensive urinary library compromised 2,600 protein groups and 20,000 peptides (Spec Lib 06 in Table 1). Based on current studies, this library covers the vast majority of the urinary proteome (2).

We calculated the percentage of peptides and proteins that were detected in all three replicates, in 2 out of 3 replicates or in only one single replicate (Figure 2). The results show a higher reproducibility of peptide/protein identification and quantification when using the urinary spectral library 06. This finding underscores the importance of using a sample-specific spectral library .

High Reproducible Peptide Detection and Quantitation with DIA experiments

To elucidate how the DIA routine performs compared to a DDA workflow, we analyzed an unrelated sample three times by a DDA and the DIA method on the Q Exactive HF using the same 30 min gradient. The DIA data were analyzed using the comprehensive urinary library (Spec lib 06, Table 1).

With our optimized DIA workflow (Figure 1), the median CV of the peptide/protein quantification of the DIA data was only 6.7% and 8.1%, which is twofold better than from the DDA data (Figure 3A). The number of detected peptides (~5,220) and proteins (1,120) with a single DIA experiment are almost twofold as the DDA data (Figure 3B). Our DIA method show a highly reproducible and precise quantification on peptide level.

FIGURE 5. Comprehensive Urinary Proteome Coverage by DIA workflow

FIGURE 6. Top Biomarker Discovery by using DIA workflow

The power of Mass Accuracy

The application of wide isolation windows (>10 Th) in typical DIA experiments results in complex MS/MS spectra. During data analysis, the ion chromatograms of multiple fragment ions are extracted and aligned for peptide detection and quantification. To separate the analyte of interest from interferences, a highly accurate mass of the ions is crucial. We applied different mass tolerances for the data analysis (50 ppm, 20 ppm, and 10 ppm). One example is shown in Figure 4. With ± 2.5 mins retention time window and 50 ppm mass tolerance for extracting the multiple transitions of a peptide LVGYLDR, several interferences overlapped with the peptide of interest. Only with 10 ppm mass accuracy, the interferences are removed from the spectrum, therefore, 10 ppm mass accuracy is minimum requirement for an accurate detection of peptides .

TABLE 1. Spectral libraries

Library Name Protein Groups Peptides Fragment ions Instrument Source

Spec Lib 01 2,077 15,470 109,064 Q Exactive MS 23 DDA files

Urine

Spec Lib 02 1,436 9,569 65,880 TripleTOF 5600 23 DDA files

Urine

Spec Lib 03 2,226 16,985 123,087 Combined from Spec Lib 01 and 02

Urine

Spec Lib 04 14,158 149,420 2,832,306 TripleTOF 5600 see Rosenberger et al., 2014 (1)

Spec Lib 05 1,869 40,902 925,156 TripleTOF 5600 subset of Human-14,000

Spec Lib 06 2,575 19,854 144,643 87 Q Exactive MS DDA files + 23 TripleToF 5600

DDA files

Urine

DIA method on QE HF: Each DIA duty cycle contains one full scan and 24DIA MS/MS scans to covering the mass range 400 -1000 TH. Full scan with a resolution 30,000 @ 200 Th; AGC target – 3e6, maximal IT – 50ms; mass range 400-1,000 Th; followed by DIA scans with resolution 30,000 @ 200 Th; isolation width 20 m/z for the first 20 DIA scans, 40 m/z for the following 2 DIA scans, and 60m/z for the last two DIA scans; NCE 30; target value 1e6, maximal injection time set to “auto”, which automatically calculates the maximal injection time based on the detection time to allow the mass spectrometer always operating in the parallel ion filling and detection mode.

FIGURE 4. The ion chromatograms of multiple transitions of peptide LVGYLDR of protein (P07602) are extracted with 50 ppm, 20 ppm, and 10 ppm. Retention time window is ± 2.5 mins. 10 ppm mass accuracy is the minimum requirement for an accurate peptide detection.

Comprehensive Urinary Proteome Coverage by DIA workflow

The establishment of the DIA workflow (Figure 1) was applied to a urinary study compromising 87 samples. The samples were derived from patients with abdominal pain in a pediatric emergency room. After digestion in a 96-well plate format we were able to analyze them in less than 4 days by our optimized DIA workflow. In average, we detected 1,301 protein groups (848 – 1,720) (Figure 5A) and 5,714 peptides per sample (3,172 – 8,231) (Figure 5B). In total, our DIA workflow enabled the detection of 2,456 proteins, representing 95% of the proteins in the spectral library. These numbers are contrasted by the DDA output, in which only 7% of the proteins were identified in more than 95% of the samples and 60% in less than 25% of the samples (Figure 5C). Compared to all other samples, 773 proteins were significantly changed in their amount in the UTI samples (non-parametric Mann-Whitney U test, p<0.05), 502 in the ovarian cyst samples, 209 in the constipation samples, 111 in the mesenteric adenitis samples and 58 in the gastroenteritis samples (Figure 5D).

73

77

71

26

61

69

12 10

13 22 15

13 15 13

16

52

24 18

0

1000

2000

3000

4000

5000

Iden

tifie

d Pe

ptid

es

Peptide Level

3of3 2of3 1of3

80

81

75

22

62

75

8 8 10

18

12 9 12 12

15

60

25 15

0

400

800

1200

1600

Iden

tifie

d Pr

otei

ns

Protein Level

3B

1 2 3 0

1000

2000

3000

4000

5000

6000

7000

Replicate

Iden

tifie

d Pe

ptid

es

Peptide Level 3A

1 2 3 0

200

400

600

800

1000

1200

1400

Replicate

Iden

tifie

d Pr

otei

ns

Protein Level

DDA ID increase DDA DDA - matching ID increase DDA - matching DIA ID increase DIA

6 8 10

Median CV 16.7%

20

0

40

60

80

100

120

140

Protein Level

DDA – Peptide Peak Areas

Log10 Protein Intensity

Coe

ffici

ent O

f Var

ianc

e %

2 4 6 8

Median CV

8.1% 20

0

40

60

80

100

120

140 DIA – Fragment Ion Peak Areas

Log10 Protein Intensity

10 7 9

Median CV 15.7%

20

0

40

60

80

100

120

140

6 8 7 4 6 8

Median CV

6.7% 20

0

40

60

80

100

120

140

3 5

Peptide Level

DIA – Fragment Ion Peak Areas

DDA – Peptide Peak Areas

Log10 Peptide Intensity

Coe

ffici

ent O

f Var

ianc

e %

Log10 Peptide Intensity

Data Analysis

Database search in MaxQuant software, version 1.5.0.0 was performed directly with the RAW- and WIFF files using the human UniProt database. Trypsin with up to 2 missed cleavages; mass tolerances set to 20 ppm for the first search and 4.5 ppm for the second search for the Q-Exactive data and 0.1 Da for the first search and 0.01 for the main search for the TripleTOF 5600 data. Oxidation of M was chosen as dynamic modification (+15.995 Da) and carbamidomethylation of C as static modification (+57.021 Da). FDR was set to 1% on peptide and protein level.

All DIA data were directly analyzed in Spectronaut 7.0 (Biognosys). dynamic score refinement and MS1 scoring – enabled; interference correction and cross run normalization (total peak area) – enabled; All results were filtered by a Q value of 0.01 (equals a FDR of 1% on peptide level).

Protein intensity was calculated by summing the peptide intensities of each protein from the Spectronaut output file. The data were imported into Perseus 1.5.1.6 and missing values imputed by the minimum value for each protein. Significance of protein abundance changes were calculated using the u-test (non-parametric test) and protein with a p value below 0.01 were considered to be significantly changed. The annotation of biological process was done with the DAVID online tool using the comprehensive urinary spectral library- Spec Lib 06 in Table 1).

Biomarker Discovery

To find the best suitable biomarker candidates we focused on the ten highest significantly changed proteins for each of the five conditions. The performance of the biomarker candidates has been assessed by calculating the area under the receiver-operating characteristic (AUROC) (Figure 6). For example, Cystatin-B (CYTB) is an intracellular thiol proteinase inhibitor. It is increased level in Ovarian Cyst (OC) with a pValue equals to 1.3e-5 (Bonferroni-corrected: p=0.027), and AUROC is 0.91. Cystatin-B has been identified as biomarker candidate in the context of malignant growths in ovaries (3) and bladder (4), i.e. genitourinary tract.

50 ppm 20 ppm 10 ppm

─ [y5+] – 623.315 m/z ─ [y6+] – 722.383 m/z ─ [y6+ -H3PO4] – 624.406 m/z ─ [y5++] – 312.161 m/z ─ [y3+ -NH3] – 386.203 m/z ─ [y4+] – 566.293 m/z

RT: 18.93 - 24.07

19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e A

bund

ance

NL: 7.12E5Base Peak m/z= 623.2838-623.3462 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.88E5Base Peak m/z= 722.3469-722.4191 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.24E4Base Peak m/z= 624.3748-624.4372 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.02E7Base Peak m/z= 312.1454-312.1766 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.83E6Base Peak m/z= 386.1837-386.2223 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2647-566.3213 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9

RT: 18.93 - 24.07

19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e A

bund

ance

NL: 7.12E5Base Peak m/z= 623.3025-623.3275 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.30E5Base Peak m/z= 722.3686-722.3974 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 3.25E4Base Peak m/z= 624.3935-624.4185 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 5.85E6Base Peak m/z= 312.1548-312.1672 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.97E5Base Peak m/z= 386.1953-386.2107 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2817-566.3043 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9

RT: 18.93 - 24.07

19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e A

bund

ance

NL: 7.12E5Base Peak m/z= 623.3088-623.3212 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.13E5Base Peak m/z= 722.3758-722.3902 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 3.25E4Base Peak m/z= 624.3998-624.4122 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 7.60E3Base Peak m/z= 312.1579-312.1641 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.21E5Base Peak m/z= 386.1991-386.2069 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2873-566.2987 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9

Spectronaut is a trademark of Biognosys AG. MaxQuant software is a trademark of Max-Planck Institute of Biochemistry. TripleTOF is a trademark of Sciex, Pte. Ltd. All other trademarks are the property of Thermo Fisher Scientific and its subsidiaries. This information is not intended to encourage use of these products in any manners that might infringe the intellectual property rights of others.

0 200 400 600 800

1000 1200 1400 1600 1800

Prot

ein

Grou

ps

Identified Protein Groups

0 1000 2000 3000 4000 5000 6000 7000 8000 9000

Uni

que

Pept

ides

Identified Peptides

0 100 200 300 400 500 600

0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5

Prot

eins

Sample Coverage

DDA DIA Protein Sample Coverage %

0

100

200

300

400

500

600

700

800

Sign

ifica

ntly

Cha

nged

Pro

tein

s (M

ann-

Whi

tney

-U-t

est p

<0.0

5)

Significantly Changed Proteins

5A 5B

5C 5D

Pain Control Group Ovarian Cyst Urinary Tract Infection

Constipation Mesenteric Adenitis Gastroenteritis

Ovarian Cyst (OC) Marker: CYTB UTI Marker: PERM

Constipation (Con) Marker: UROK Mesenteric Adenitis (MA) Marker: LEG3

Gastroenteritis (GE) Marker: KPYR

Ovarian Cyst (AUROC=0.91)

1-Specificity

Sens

itivi

ty

0 0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07

OC

U

TI

Con

M

A

GE

P

ain

CYTB

1e7 1e6

1e3 1e2

1e4 1e5

Mesenteric Adenitis (AUROC=0.872)

1-Specificity

Sens

itivi

ty

0 0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07

OC

U

TI

Con

M

A

GE

P

ain

LEG3

1e7 1e6

1e3 1e2

1e4 1e5

Gastroenteritis (AUROC=0.76)

1-Specificity

Sens

itivi

ty

0 0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

1.E+02

1.E+03

1.E+04

1.E+05

OC

U

TI

Con

M

A

GE

P

ain

KPYR

1e3

1e2

1e4

1e5

Urinary Tract Infection (AUROC=0.968)

1-Specificity

Sens

itivi

ty

0 0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07

OC

U

TI

Con

M

A

GE

P

ain

PERE

1e7 1e6

1e3 1e2

1e4 1e5

Constipation (AUROC=0.928)

1-Specificity

Sens

itivi

ty

0 0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07

OC

U

TI

Con

M

A

GE

P

ain

UROK

1e7 1e6

1e3 1e2

1e4 1e5

Peptides

Comprehensive Urinary Spectral Library

2,575 protein groups (87 Q Exactive MS and

23 TripleToF 5600TM runs)

19,854 unique peptides

144,643 transitions

87 ER patients in sixth groups

Constipation

Mesenteric Adenitis

Gastroenteritis

Pain Control Group

Ovarian Cyst

Urinary Tract Infection

DIA (Q Exactive HF MS)

Digestion

Search against database using

Spectronaut FDR 1%

Urinary Samples (87)

Scoring and Q Value calculation by Spectronaut

Perseus

u-test (non-parametric test)

Validated Peptides/proteins from 87 samples

Proteins

Samples

Num

ber o

f pro

tein

s/pe

ptid

es

Optimized DIA method:

average 8-10 data points per LC peak ( 9s @ FWHM)

average cycle time 2 s

Patient 1

Patient 2

Patient N

KEGG Pathway Enrichment p value

0.001 0.05 1

RT: 20.66 - 21.64

20.7 20.8 20.9 21.0 21.1 21.2 21.3 21.4 21.5 21.6Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Relat

ive A

bund

ance

NL: 2.35E7Base Peak m/z= 506.2709-506.2811 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9

Overcoming the challenges in Data Independent Acquisition (DIA) via high resolution accurate mass Orbitrap based mass spectrometer Yue Xuan1; Jan Muntel2; Sebastian T. Berger2; Andreas FR Huhmer3; Hanno Steen2; Thomas Moehring1 1Thermo Fisher Scientific, Bremen, GERMANY; 2Departments of Pathology, Boston Children’s Hospital, Boston, MA; 3Thermo Fisher Scientific, San Jose, CA

Conclusion We successfully established a DIA workflow, which enables large scope of urine

proteomics studies based on high resolution accurate mass Orbitrap technology.

Even without any prefractionation of the samples, we detected more than 2,500 proteins, which is close to complete coverage of the urinary proteome by our DIA workflow.

The highly reproducible quantification (median CV 7%) enables to skip technical replicates and to focus on the analysis of biological samples.

The high number of quantified proteins per sample (in average 1,300) per single DIA experiment enabled to get insights into the host cell response in an UTI based on changes in the urinary proteome as well into the formation of an ovarian cyst.

We envision that the comprehensiveness and low analysis time per sample will allow the application of this DIA workflow in synchronous biomarker discovery and validation which requires the analysis of hundreds of samples.

References 1. Rosenberger, G. SCIENTIFIC DATA | 1:140031 | DOI: 10.1038/sdata.2014.31

2. Zheng, J. BMC genomics 2013,14, 777

3. Takaya et al., Int J Oncol, 2015

4. Feldman et al., Clin Cancer Res, 2009

Overview Purpose: Utilize Thermo Scientific TM Q ExactiveTM HF mass spectrometer for DIA LC-MS/MS experiments to study the urinary proteome, and demonstrate how high resolution/ accurate mass spectrometry is employed to overcome DIA challenges and enable large scope of urine proteomics studies.

Methods: Urine samples from 87 patients associated with seven different differential diagnoses were collected and trypsinized using a membrane-based processing method in a 96-well plate format. For generation of a spectral library all samples were analyzed by DDA methods. DIA analysis was performed on the Q Exactive HF MS to quantitatively mapping the urinary proteomes.

Results: In this study, we have developed a robust DIA method for the comprehensive mapping of the urinary proteome that enables large scale urine proteomics studies with high throughput ( less than 4 days for 87 biological sample measurements) and excellent reproducibility (media of CV% is 7%).

Introduction The promises of data independent acquisition (DIA) strategies are a comprehensive and reproducible data collection for large-scale quantitative proteomics experiments. However, the wide isolation window (usually >10Da) of DIA experiments co-isolates and co-fragments multiple peptides, resulting in highly complex DIA MS/MS spectra and makes the DIA data analysis challenging. To accurately identify and precisely quantify thousands of proteins per DIA experiment, the completeness and specificity of spectral library, the mass accuracy of the data, and the technical variance in quantitation play important roles. In this work, we utilize the Q Exactive HF mass spectrometer for DIA LC-MS/MS experiments to study the urinary proteome, and demonstrate how high resolution/ accurate mass spectrometry is employed to overcome these DIA challenges and enable large scope of urine proteomics studies..

Methods Sample Preparation

Urine samples were collected from consenting patients visiting the Emergency Department at Boston Children's Hospital. Upon consent, urine samples from 87 patients associated with seven different differential diagnoses were collected (abdominal pain controls: n = 33, ovarian cyst: n = 12, mesenteric adenitis: n = 6, constipation n = 7, urinary tract infection; UTI: n = 11, gastritis: n = 6, gastroenteritis: n = 12) and trypsinized using a membrane-based processing method in a 96-well plate format.

Liquid Chromatography

For the spectral library, all 87 samples were analyzed by a nanoLC system equipped with a LC-chip system (cHiPLC nanoflex, Eksigent, trapping column: Nano cHiPLC Trap column 200 μm x 0.5 mm Reprosil C18, 3 μm, analytical column: Nano cHiPLC column 75 μm x 15 cm Reprosil C18, 3 μm) coupled online to a Q-Exactive mass spectrometer (Thermo Fisher Scientific). Peptides (4 µl of digest) were separated by a linear gradient from 93% buffer A (0.2% FA in water) / 7% buffer B (0.2% FA in ACN) to 75% buffer A / 25% buffer B within 75 min.

DIA experiments on all 87 samples: Each sample was analyzed on a Thermo ScientificTM EASY-nLCTM 1000 nanoLC system equipped with a trapping column (Thermo ScientificTM PepMapTM100, 75um x 2cm, C18, 3um) and an analytical column (PepMapRSLC, 75um x 25cm, C18, 2um) coupled to a Q Exactive HF mass spectrometer equipped with a Thermo Scientific TM EASY-Spray TM nanoelectrospray ion source. Peptides (2 µl of digest) were separated by a linear gradient from 93% buffer A (0.2% FA in water) / 7% buffer B (0.2% FA in ACN) to 75% buffer A / 25% buffer B within 30 min. The total run time with loading and washing steps was 50 min. Column oven was set to 40°C .

Mass Spectrometry

DDA method on QE: The MS was operated in data-dependent TOP10 mode with the following settings: mass range 400-1,000 Th; resolution for MS1 scan 70,000 @ 200 Th; lock mass: 445.120025 Th; resolution for MS2 scan 17,500 @ 200 Th; isolation width 1.6 m/z; NCE 27; underfill ratio 1%; charge state exclusion: unassigned, 1, >6; dynamic exclusion 30 s.

Additionally a subset of randomly chosen 23 samples was analyzed on a commercial TripleTOF TM 5600 (AB Sciex, Concord, Canada) using the same LC setup and gradient. The MS was operated in data-dependent TOP50 mode with following settings: MS1 mass range 400-1,000 Th with 250 ms acc. time; MS2 mass range 100-1,700 Th with 50 ms acc. time and following MS2 selection criteria: UNIT resolution, intensity threshold 100 cts; charge states 2-5. Dynamic exclusion set to 17 s.

FIGURE 1. The established DIA workflow for Urinary Biomarker discovery

FIGURE 2. Influence of Spectral Library: Percentage of peptides and proteins that are detected in all three replicates, in 2 out of 3 replicates, in single replicate.

FIGURE 3. Reproducibility Evaluation with Three Replicates: Technical Reproducibility Of DDA And DIA Quantification Based On Three Replicates (Fig 3A). Number of Identification in Relation to Technical Replicates ( Fig 3B)

Results Selecting the Most Appropriate Spectral Library

To investigate the influence of the spectral library on the data analysis, we generated six different spectral libraries (Table 1). The quality of a spectral library can be assessed by the completeness of the library and the reproducibility of peptide/protein detection.

The comprehensive urinary library compromised 2,600 protein groups and 20,000 peptides (Spec Lib 06 in Table 1). Based on current studies, this library covers the vast majority of the urinary proteome (2).

We calculated the percentage of peptides and proteins that were detected in all three replicates, in 2 out of 3 replicates or in only one single replicate (Figure 2). The results show a higher reproducibility of peptide/protein identification and quantification when using the urinary spectral library 06. This finding underscores the importance of using a sample-specific spectral library .

High Reproducible Peptide Detection and Quantitation with DIA experiments

To elucidate how the DIA routine performs compared to a DDA workflow, we analyzed an unrelated sample three times by a DDA and the DIA method on the Q Exactive HF using the same 30 min gradient. The DIA data were analyzed using the comprehensive urinary library (Spec lib 06, Table 1).

With our optimized DIA workflow (Figure 1), the median CV of the peptide/protein quantification of the DIA data was only 6.7% and 8.1%, which is twofold better than from the DDA data (Figure 3A). The number of detected peptides (~5,220) and proteins (1,120) with a single DIA experiment are almost twofold as the DDA data (Figure 3B). Our DIA method show a highly reproducible and precise quantification on peptide level.

FIGURE 5. Comprehensive Urinary Proteome Coverage by DIA workflow

FIGURE 6. Top Biomarker Discovery by using DIA workflow

The power of Mass Accuracy

The application of wide isolation windows (>10 Th) in typical DIA experiments results in complex MS/MS spectra. During data analysis, the ion chromatograms of multiple fragment ions are extracted and aligned for peptide detection and quantification. To separate the analyte of interest from interferences, a highly accurate mass of the ions is crucial. We applied different mass tolerances for the data analysis (50 ppm, 20 ppm, and 10 ppm). One example is shown in Figure 4. With ± 2.5 mins retention time window and 50 ppm mass tolerance for extracting the multiple transitions of a peptide LVGYLDR, several interferences overlapped with the peptide of interest. Only with 10 ppm mass accuracy, the interferences are removed from the spectrum, therefore, 10 ppm mass accuracy is minimum requirement for an accurate detection of peptides .

TABLE 1. Spectral libraries

Library Name Protein Groups Peptides Fragment ions Instrument Source

Spec Lib 01 2,077 15,470 109,064 Q Exactive MS 23 DDA files

Urine

Spec Lib 02 1,436 9,569 65,880 TripleTOF 5600 23 DDA files

Urine

Spec Lib 03 2,226 16,985 123,087 Combined from Spec Lib 01 and 02

Urine

Spec Lib 04 14,158 149,420 2,832,306 TripleTOF 5600 see Rosenberger et al., 2014 (1)

Spec Lib 05 1,869 40,902 925,156 TripleTOF 5600 subset of Human-14,000

Spec Lib 06 2,575 19,854 144,643 87 Q Exactive MS DDA files + 23 TripleToF 5600

DDA files

Urine

DIA method on QE HF: Each DIA duty cycle contains one full scan and 24DIA MS/MS scans to covering the mass range 400 -1000 TH. Full scan with a resolution 30,000 @ 200 Th; AGC target – 3e6, maximal IT – 50ms; mass range 400-1,000 Th; followed by DIA scans with resolution 30,000 @ 200 Th; isolation width 20 m/z for the first 20 DIA scans, 40 m/z for the following 2 DIA scans, and 60m/z for the last two DIA scans; NCE 30; target value 1e6, maximal injection time set to “auto”, which automatically calculates the maximal injection time based on the detection time to allow the mass spectrometer always operating in the parallel ion filling and detection mode.

FIGURE 4. The ion chromatograms of multiple transitions of peptide LVGYLDR of protein (P07602) are extracted with 50 ppm, 20 ppm, and 10 ppm. Retention time window is ± 2.5 mins. 10 ppm mass accuracy is the minimum requirement for an accurate peptide detection.

Comprehensive Urinary Proteome Coverage by DIA workflow

The establishment of the DIA workflow (Figure 1) was applied to a urinary study compromising 87 samples. The samples were derived from patients with abdominal pain in a pediatric emergency room. After digestion in a 96-well plate format we were able to analyze them in less than 4 days by our optimized DIA workflow. In average, we detected 1,301 protein groups (848 – 1,720) (Figure 5A) and 5,714 peptides per sample (3,172 – 8,231) (Figure 5B). In total, our DIA workflow enabled the detection of 2,456 proteins, representing 95% of the proteins in the spectral library. These numbers are contrasted by the DDA output, in which only 7% of the proteins were identified in more than 95% of the samples and 60% in less than 25% of the samples (Figure 5C). Compared to all other samples, 773 proteins were significantly changed in their amount in the UTI samples (non-parametric Mann-Whitney U test, p<0.05), 502 in the ovarian cyst samples, 209 in the constipation samples, 111 in the mesenteric adenitis samples and 58 in the gastroenteritis samples (Figure 5D).

73

77

71

26

61

69

12 10

13 22 15

13 15 13

16

52

24 18

0

1000

2000

3000

4000

5000

Iden

tifie

d Pe

ptid

es

Peptide Level

3of3 2of3 1of3

80

81

75

22

62

75

8 8 10

18

12 9 12 12

15

60

25 15

0

400

800

1200

1600

Iden

tifie

d Pr

otei

ns

Protein Level

3B

1 2 3 0

1000

2000

3000

4000

5000

6000

7000

Replicate

Iden

tifie

d Pe

ptid

es

Peptide Level 3A

1 2 3 0

200

400

600

800

1000

1200

1400

Replicate

Iden

tifie

d Pr

otei

ns

Protein Level

DDA ID increase DDA DDA - matching ID increase DDA - matching DIA ID increase DIA

6 8 10

Median CV 16.7%

20

0

40

60

80

100

120

140

Protein Level

DDA – Peptide Peak Areas

Log10 Protein Intensity

Coe

ffici

ent O

f Var

ianc

e %

2 4 6 8

Median CV

8.1% 20

0

40

60

80

100

120

140 DIA – Fragment Ion Peak Areas

Log10 Protein Intensity

10 7 9

Median CV 15.7%

20

0

40

60

80

100

120

140

6 8 7 4 6 8

Median CV

6.7% 20

0

40

60

80

100

120

140

3 5

Peptide Level

DIA – Fragment Ion Peak Areas

DDA – Peptide Peak Areas

Log10 Peptide Intensity

Coe

ffici

ent O

f Var

ianc

e %

Log10 Peptide Intensity

Data Analysis

Database search in MaxQuant software, version 1.5.0.0 was performed directly with the RAW- and WIFF files using the human UniProt database. Trypsin with up to 2 missed cleavages; mass tolerances set to 20 ppm for the first search and 4.5 ppm for the second search for the Q-Exactive data and 0.1 Da for the first search and 0.01 for the main search for the TripleTOF 5600 data. Oxidation of M was chosen as dynamic modification (+15.995 Da) and carbamidomethylation of C as static modification (+57.021 Da). FDR was set to 1% on peptide and protein level.

All DIA data were directly analyzed in Spectronaut 7.0 (Biognosys). dynamic score refinement and MS1 scoring – enabled; interference correction and cross run normalization (total peak area) – enabled; All results were filtered by a Q value of 0.01 (equals a FDR of 1% on peptide level).

Protein intensity was calculated by summing the peptide intensities of each protein from the Spectronaut output file. The data were imported into Perseus 1.5.1.6 and missing values imputed by the minimum value for each protein. Significance of protein abundance changes were calculated using the u-test (non-parametric test) and protein with a p value below 0.01 were considered to be significantly changed. The annotation of biological process was done with the DAVID online tool using the comprehensive urinary spectral library- Spec Lib 06 in Table 1).

Biomarker Discovery

To find the best suitable biomarker candidates we focused on the ten highest significantly changed proteins for each of the five conditions. The performance of the biomarker candidates has been assessed by calculating the area under the receiver-operating characteristic (AUROC) (Figure 6). For example, Cystatin-B (CYTB) is an intracellular thiol proteinase inhibitor. It is increased level in Ovarian Cyst (OC) with a pValue equals to 1.3e-5 (Bonferroni-corrected: p=0.027), and AUROC is 0.91. Cystatin-B has been identified as biomarker candidate in the context of malignant growths in ovaries (3) and bladder (4), i.e. genitourinary tract.

50 ppm 20 ppm 10 ppm

─ [y5+] – 623.315 m/z ─ [y6+] – 722.383 m/z ─ [y6+ -H3PO4] – 624.406 m/z ─ [y5++] – 312.161 m/z ─ [y3+ -NH3] – 386.203 m/z ─ [y4+] – 566.293 m/z

RT: 18.93 - 24.07

19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e A

bund

ance

NL: 7.12E5Base Peak m/z= 623.2838-623.3462 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.88E5Base Peak m/z= 722.3469-722.4191 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.24E4Base Peak m/z= 624.3748-624.4372 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.02E7Base Peak m/z= 312.1454-312.1766 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.83E6Base Peak m/z= 386.1837-386.2223 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2647-566.3213 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9

RT: 18.93 - 24.07

19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e A

bund

ance

NL: 7.12E5Base Peak m/z= 623.3025-623.3275 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.30E5Base Peak m/z= 722.3686-722.3974 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 3.25E4Base Peak m/z= 624.3935-624.4185 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 5.85E6Base Peak m/z= 312.1548-312.1672 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.97E5Base Peak m/z= 386.1953-386.2107 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2817-566.3043 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9

RT: 18.93 - 24.07

19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e A

bund

ance

NL: 7.12E5Base Peak m/z= 623.3088-623.3212 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.13E5Base Peak m/z= 722.3758-722.3902 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 3.25E4Base Peak m/z= 624.3998-624.4122 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 7.60E3Base Peak m/z= 312.1579-312.1641 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.21E5Base Peak m/z= 386.1991-386.2069 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2873-566.2987 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9

Spectronaut is a trademark of Biognosys AG. MaxQuant software is a trademark of Max-Planck Institute of Biochemistry. TripleTOF is a trademark of Sciex, Pte. Ltd. All other trademarks are the property of Thermo Fisher Scientific and its subsidiaries. This information is not intended to encourage use of these products in any manners that might infringe the intellectual property rights of others.

0 200 400 600 800

1000 1200 1400 1600 1800

Prot

ein

Grou

ps

Identified Protein Groups

0 1000 2000 3000 4000 5000 6000 7000 8000 9000

Uni

que

Pept

ides

Identified Peptides

0 100 200 300 400 500 600

0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5

Prot

eins

Sample Coverage

DDA DIA Protein Sample Coverage %

0

100

200

300

400

500

600

700

800

Sign

ifica

ntly

Cha

nged

Pro

tein

s (M

ann-

Whi

tney

-U-t

est p

<0.0

5)

Significantly Changed Proteins

5A 5B

5C 5D

Pain Control Group Ovarian Cyst Urinary Tract Infection

Constipation Mesenteric Adenitis Gastroenteritis

Ovarian Cyst (OC) Marker: CYTB UTI Marker: PERM

Constipation (Con) Marker: UROK Mesenteric Adenitis (MA) Marker: LEG3

Gastroenteritis (GE) Marker: KPYR

Ovarian Cyst (AUROC=0.91)

1-Specificity

Sens

itivi

ty

0 0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07

OC

U

TI

Con

M

A

GE

P

ain

CYTB

1e7 1e6

1e3 1e2

1e4 1e5

Mesenteric Adenitis (AUROC=0.872)

1-Specificity

Sens

itivi

ty

0 0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07

OC

U

TI

Con

M

A

GE

P

ain

LEG3

1e7 1e6

1e3 1e2

1e4 1e5

Gastroenteritis (AUROC=0.76)

1-Specificity

Sens

itivi

ty

0 0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

1.E+02

1.E+03

1.E+04

1.E+05

OC

U

TI

Con

M

A

GE

P

ain

KPYR

1e3

1e2

1e4

1e5

Urinary Tract Infection (AUROC=0.968)

1-Specificity

Sens

itivi

ty

0 0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07

OC

U

TI

Con

M

A

GE

P

ain

PERE

1e7 1e6

1e3 1e2

1e4 1e5

Constipation (AUROC=0.928)

1-Specificity

Sens

itivi

ty

0 0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07

OC

U

TI

Con

M

A

GE

P

ain

UROK

1e7 1e6

1e3 1e2

1e4 1e5

Peptides

Comprehensive Urinary Spectral Library

2,575 protein groups (87 Q Exactive MS and

23 TripleToF 5600TM runs)

19,854 unique peptides

144,643 transitions

87 ER patients in sixth groups

Constipation

Mesenteric Adenitis

Gastroenteritis

Pain Control Group

Ovarian Cyst

Urinary Tract Infection

DIA (Q Exactive HF MS)

Digestion

Search against database using

Spectronaut FDR 1%

Urinary Samples (87)

Scoring and Q Value calculation by Spectronaut

Perseus

u-test (non-parametric test)

Validated Peptides/proteins from 87 samples

Proteins

Samples

Num

ber o

f pro

tein

s/pe

ptid

es

Optimized DIA method:

average 8-10 data points per LC peak ( 9s @ FWHM)

average cycle time 2 s

Patient 1

Patient 2

Patient N

KEGG Pathway Enrichment p value

0.001 0.05 1

RT: 20.66 - 21.64

20.7 20.8 20.9 21.0 21.1 21.2 21.3 21.4 21.5 21.6Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Relat

ive A

bund

ance

NL: 2.35E7Base Peak m/z= 506.2709-506.2811 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9

Page 3: Overcoming the challenges in Data Independent Acquisition ... · To elucidate how the DIA routine performs compared to a DDA workflow, we analyzed an unrelated sample three times

PN64405-EN 0615S

Overcoming the challenges in Data Independent Acquisition (DIA) via high resolution accurate mass Orbitrap based mass spectrometer Yue Xuan1; Jan Muntel2; Sebastian T. Berger2; Andreas FR Huhmer3; Hanno Steen2; Thomas Moehring1 1Thermo Fisher Scientific, Bremen, GERMANY; 2Departments of Pathology, Boston Children’s Hospital, Boston, MA; 3Thermo Fisher Scientific, San Jose, CA

Conclusion We successfully established a DIA workflow, which enables large scope of urine

proteomics studies based on high resolution accurate mass Orbitrap technology.

Even without any prefractionation of the samples, we detected more than 2,500 proteins, which is close to complete coverage of the urinary proteome by our DIA workflow.

The highly reproducible quantification (median CV 7%) enables to skip technical replicates and to focus on the analysis of biological samples.

The high number of quantified proteins per sample (in average 1,300) per single DIA experiment enabled to get insights into the host cell response in an UTI based on changes in the urinary proteome as well into the formation of an ovarian cyst.

We envision that the comprehensiveness and low analysis time per sample will allow the application of this DIA workflow in synchronous biomarker discovery and validation which requires the analysis of hundreds of samples.

References 1. Rosenberger, G. SCIENTIFIC DATA | 1:140031 | DOI: 10.1038/sdata.2014.31

2. Zheng, J. BMC genomics 2013,14, 777

3. Takaya et al., Int J Oncol, 2015

4. Feldman et al., Clin Cancer Res, 2009

Overview Purpose: Utilize Thermo Scientific TM Q ExactiveTM HF mass spectrometer for DIA LC-MS/MS experiments to study the urinary proteome, and demonstrate how high resolution/ accurate mass spectrometry is employed to overcome DIA challenges and enable large scope of urine proteomics studies.

Methods: Urine samples from 87 patients associated with seven different differential diagnoses were collected and trypsinized using a membrane-based processing method in a 96-well plate format. For generation of a spectral library all samples were analyzed by DDA methods. DIA analysis was performed on the Q Exactive HF MS to quantitatively mapping the urinary proteomes.

Results: In this study, we have developed a robust DIA method for the comprehensive mapping of the urinary proteome that enables large scale urine proteomics studies with high throughput ( less than 4 days for 87 biological sample measurements) and excellent reproducibility (media of CV% is 7%).

Introduction The promises of data independent acquisition (DIA) strategies are a comprehensive and reproducible data collection for large-scale quantitative proteomics experiments. However, the wide isolation window (usually >10Da) of DIA experiments co-isolates and co-fragments multiple peptides, resulting in highly complex DIA MS/MS spectra and makes the DIA data analysis challenging. To accurately identify and precisely quantify thousands of proteins per DIA experiment, the completeness and specificity of spectral library, the mass accuracy of the data, and the technical variance in quantitation play important roles. In this work, we utilize the Q Exactive HF mass spectrometer for DIA LC-MS/MS experiments to study the urinary proteome, and demonstrate how high resolution/ accurate mass spectrometry is employed to overcome these DIA challenges and enable large scope of urine proteomics studies..

Methods Sample Preparation

Urine samples were collected from consenting patients visiting the Emergency Department at Boston Children's Hospital. Upon consent, urine samples from 87 patients associated with seven different differential diagnoses were collected (abdominal pain controls: n = 33, ovarian cyst: n = 12, mesenteric adenitis: n = 6, constipation n = 7, urinary tract infection; UTI: n = 11, gastritis: n = 6, gastroenteritis: n = 12) and trypsinized using a membrane-based processing method in a 96-well plate format.

Liquid Chromatography

For the spectral library, all 87 samples were analyzed by a nanoLC system equipped with a LC-chip system (cHiPLC nanoflex, Eksigent, trapping column: Nano cHiPLC Trap column 200 μm x 0.5 mm Reprosil C18, 3 μm, analytical column: Nano cHiPLC column 75 μm x 15 cm Reprosil C18, 3 μm) coupled online to a Q-Exactive mass spectrometer (Thermo Fisher Scientific). Peptides (4 µl of digest) were separated by a linear gradient from 93% buffer A (0.2% FA in water) / 7% buffer B (0.2% FA in ACN) to 75% buffer A / 25% buffer B within 75 min.

DIA experiments on all 87 samples: Each sample was analyzed on a Thermo ScientificTM EASY-nLCTM 1000 nanoLC system equipped with a trapping column (Thermo ScientificTM PepMapTM100, 75um x 2cm, C18, 3um) and an analytical column (PepMapRSLC, 75um x 25cm, C18, 2um) coupled to a Q Exactive HF mass spectrometer equipped with a Thermo Scientific TM EASY-Spray TM nanoelectrospray ion source. Peptides (2 µl of digest) were separated by a linear gradient from 93% buffer A (0.2% FA in water) / 7% buffer B (0.2% FA in ACN) to 75% buffer A / 25% buffer B within 30 min. The total run time with loading and washing steps was 50 min. Column oven was set to 40°C .

Mass Spectrometry

DDA method on QE: The MS was operated in data-dependent TOP10 mode with the following settings: mass range 400-1,000 Th; resolution for MS1 scan 70,000 @ 200 Th; lock mass: 445.120025 Th; resolution for MS2 scan 17,500 @ 200 Th; isolation width 1.6 m/z; NCE 27; underfill ratio 1%; charge state exclusion: unassigned, 1, >6; dynamic exclusion 30 s.

Additionally a subset of randomly chosen 23 samples was analyzed on a commercial TripleTOF TM 5600 (AB Sciex, Concord, Canada) using the same LC setup and gradient. The MS was operated in data-dependent TOP50 mode with following settings: MS1 mass range 400-1,000 Th with 250 ms acc. time; MS2 mass range 100-1,700 Th with 50 ms acc. time and following MS2 selection criteria: UNIT resolution, intensity threshold 100 cts; charge states 2-5. Dynamic exclusion set to 17 s.

FIGURE 1. The established DIA workflow for Urinary Biomarker discovery

FIGURE 2. Influence of Spectral Library: Percentage of peptides and proteins that are detected in all three replicates, in 2 out of 3 replicates, in single replicate.

FIGURE 3. Reproducibility Evaluation with Three Replicates: Technical Reproducibility Of DDA And DIA Quantification Based On Three Replicates (Fig 3A). Number of Identification in Relation to Technical Replicates ( Fig 3B)

Results Selecting the Most Appropriate Spectral Library

To investigate the influence of the spectral library on the data analysis, we generated six different spectral libraries (Table 1). The quality of a spectral library can be assessed by the completeness of the library and the reproducibility of peptide/protein detection.

The comprehensive urinary library compromised 2,600 protein groups and 20,000 peptides (Spec Lib 06 in Table 1). Based on current studies, this library covers the vast majority of the urinary proteome (2).

We calculated the percentage of peptides and proteins that were detected in all three replicates, in 2 out of 3 replicates or in only one single replicate (Figure 2). The results show a higher reproducibility of peptide/protein identification and quantification when using the urinary spectral library 06. This finding underscores the importance of using a sample-specific spectral library .

High Reproducible Peptide Detection and Quantitation with DIA experiments

To elucidate how the DIA routine performs compared to a DDA workflow, we analyzed an unrelated sample three times by a DDA and the DIA method on the Q Exactive HF using the same 30 min gradient. The DIA data were analyzed using the comprehensive urinary library (Spec lib 06, Table 1).

With our optimized DIA workflow (Figure 1), the median CV of the peptide/protein quantification of the DIA data was only 6.7% and 8.1%, which is twofold better than from the DDA data (Figure 3A). The number of detected peptides (~5,220) and proteins (1,120) with a single DIA experiment are almost twofold as the DDA data (Figure 3B). Our DIA method show a highly reproducible and precise quantification on peptide level.

FIGURE 5. Comprehensive Urinary Proteome Coverage by DIA workflow

FIGURE 6. Top Biomarker Discovery by using DIA workflow

The power of Mass Accuracy

The application of wide isolation windows (>10 Th) in typical DIA experiments results in complex MS/MS spectra. During data analysis, the ion chromatograms of multiple fragment ions are extracted and aligned for peptide detection and quantification. To separate the analyte of interest from interferences, a highly accurate mass of the ions is crucial. We applied different mass tolerances for the data analysis (50 ppm, 20 ppm, and 10 ppm). One example is shown in Figure 4. With ± 2.5 mins retention time window and 50 ppm mass tolerance for extracting the multiple transitions of a peptide LVGYLDR, several interferences overlapped with the peptide of interest. Only with 10 ppm mass accuracy, the interferences are removed from the spectrum, therefore, 10 ppm mass accuracy is minimum requirement for an accurate detection of peptides .

TABLE 1. Spectral libraries

Library Name Protein Groups Peptides Fragment ions Instrument Source

Spec Lib 01 2,077 15,470 109,064 Q Exactive MS 23 DDA files

Urine

Spec Lib 02 1,436 9,569 65,880 TripleTOF 5600 23 DDA files

Urine

Spec Lib 03 2,226 16,985 123,087 Combined from Spec Lib 01 and 02

Urine

Spec Lib 04 14,158 149,420 2,832,306 TripleTOF 5600 see Rosenberger et al., 2014 (1)

Spec Lib 05 1,869 40,902 925,156 TripleTOF 5600 subset of Human-14,000

Spec Lib 06 2,575 19,854 144,643 87 Q Exactive MS DDA files + 23 TripleToF 5600

DDA files

Urine

DIA method on QE HF: Each DIA duty cycle contains one full scan and 24DIA MS/MS scans to covering the mass range 400 -1000 TH. Full scan with a resolution 30,000 @ 200 Th; AGC target – 3e6, maximal IT – 50ms; mass range 400-1,000 Th; followed by DIA scans with resolution 30,000 @ 200 Th; isolation width 20 m/z for the first 20 DIA scans, 40 m/z for the following 2 DIA scans, and 60m/z for the last two DIA scans; NCE 30; target value 1e6, maximal injection time set to “auto”, which automatically calculates the maximal injection time based on the detection time to allow the mass spectrometer always operating in the parallel ion filling and detection mode.

FIGURE 4. The ion chromatograms of multiple transitions of peptide LVGYLDR of protein (P07602) are extracted with 50 ppm, 20 ppm, and 10 ppm. Retention time window is ± 2.5 mins. 10 ppm mass accuracy is the minimum requirement for an accurate peptide detection.

Comprehensive Urinary Proteome Coverage by DIA workflow

The establishment of the DIA workflow (Figure 1) was applied to a urinary study compromising 87 samples. The samples were derived from patients with abdominal pain in a pediatric emergency room. After digestion in a 96-well plate format we were able to analyze them in less than 4 days by our optimized DIA workflow. In average, we detected 1,301 protein groups (848 – 1,720) (Figure 5A) and 5,714 peptides per sample (3,172 – 8,231) (Figure 5B). In total, our DIA workflow enabled the detection of 2,456 proteins, representing 95% of the proteins in the spectral library. These numbers are contrasted by the DDA output, in which only 7% of the proteins were identified in more than 95% of the samples and 60% in less than 25% of the samples (Figure 5C). Compared to all other samples, 773 proteins were significantly changed in their amount in the UTI samples (non-parametric Mann-Whitney U test, p<0.05), 502 in the ovarian cyst samples, 209 in the constipation samples, 111 in the mesenteric adenitis samples and 58 in the gastroenteritis samples (Figure 5D).

73

77

71

26

61

69

12 10

13 22 15

13 15 13

16

52

24 18

0

1000

2000

3000

4000

5000

Iden

tifie

d Pe

ptid

es

Peptide Level

3of3 2of3 1of3

80

81

75

22

62

75

8 8 10

18

12 9 12 12

15

60

25 15

0

400

800

1200

1600

Iden

tifie

d Pr

otei

ns

Protein Level

3B

1 2 3 0

1000

2000

3000

4000

5000

6000

7000

Replicate

Iden

tifie

d Pe

ptid

es

Peptide Level 3A

1 2 3 0

200

400

600

800

1000

1200

1400

Replicate

Iden

tifie

d Pr

otei

ns

Protein Level

DDA ID increase DDA DDA - matching ID increase DDA - matching DIA ID increase DIA

6 8 10

Median CV 16.7%

20

0

40

60

80

100

120

140

Protein Level

DDA – Peptide Peak Areas

Log10 Protein Intensity

Coe

ffici

ent O

f Var

ianc

e %

2 4 6 8

Median CV

8.1% 20

0

40

60

80

100

120

140 DIA – Fragment Ion Peak Areas

Log10 Protein Intensity

10 7 9

Median CV 15.7%

20

0

40

60

80

100

120

140

6 8 7 4 6 8

Median CV

6.7% 20

0

40

60

80

100

120

140

3 5

Peptide Level

DIA – Fragment Ion Peak Areas

DDA – Peptide Peak Areas

Log10 Peptide Intensity

Coe

ffici

ent O

f Var

ianc

e %

Log10 Peptide Intensity

Data Analysis

Database search in MaxQuant software, version 1.5.0.0 was performed directly with the RAW- and WIFF files using the human UniProt database. Trypsin with up to 2 missed cleavages; mass tolerances set to 20 ppm for the first search and 4.5 ppm for the second search for the Q-Exactive data and 0.1 Da for the first search and 0.01 for the main search for the TripleTOF 5600 data. Oxidation of M was chosen as dynamic modification (+15.995 Da) and carbamidomethylation of C as static modification (+57.021 Da). FDR was set to 1% on peptide and protein level.

All DIA data were directly analyzed in Spectronaut 7.0 (Biognosys). dynamic score refinement and MS1 scoring – enabled; interference correction and cross run normalization (total peak area) – enabled; All results were filtered by a Q value of 0.01 (equals a FDR of 1% on peptide level).

Protein intensity was calculated by summing the peptide intensities of each protein from the Spectronaut output file. The data were imported into Perseus 1.5.1.6 and missing values imputed by the minimum value for each protein. Significance of protein abundance changes were calculated using the u-test (non-parametric test) and protein with a p value below 0.01 were considered to be significantly changed. The annotation of biological process was done with the DAVID online tool using the comprehensive urinary spectral library- Spec Lib 06 in Table 1).

Biomarker Discovery

To find the best suitable biomarker candidates we focused on the ten highest significantly changed proteins for each of the five conditions. The performance of the biomarker candidates has been assessed by calculating the area under the receiver-operating characteristic (AUROC) (Figure 6). For example, Cystatin-B (CYTB) is an intracellular thiol proteinase inhibitor. It is increased level in Ovarian Cyst (OC) with a pValue equals to 1.3e-5 (Bonferroni-corrected: p=0.027), and AUROC is 0.91. Cystatin-B has been identified as biomarker candidate in the context of malignant growths in ovaries (3) and bladder (4), i.e. genitourinary tract.

50 ppm 20 ppm 10 ppm

─ [y5+] – 623.315 m/z ─ [y6+] – 722.383 m/z ─ [y6+ -H3PO4] – 624.406 m/z ─ [y5++] – 312.161 m/z ─ [y3+ -NH3] – 386.203 m/z ─ [y4+] – 566.293 m/z

RT: 18.93 - 24.07

19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e A

bund

ance

NL: 7.12E5Base Peak m/z= 623.2838-623.3462 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.88E5Base Peak m/z= 722.3469-722.4191 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.24E4Base Peak m/z= 624.3748-624.4372 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.02E7Base Peak m/z= 312.1454-312.1766 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.83E6Base Peak m/z= 386.1837-386.2223 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2647-566.3213 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9

RT: 18.93 - 24.07

19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e A

bund

ance

NL: 7.12E5Base Peak m/z= 623.3025-623.3275 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.30E5Base Peak m/z= 722.3686-722.3974 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 3.25E4Base Peak m/z= 624.3935-624.4185 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 5.85E6Base Peak m/z= 312.1548-312.1672 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.97E5Base Peak m/z= 386.1953-386.2107 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2817-566.3043 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9

RT: 18.93 - 24.07

19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e A

bund

ance

NL: 7.12E5Base Peak m/z= 623.3088-623.3212 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.13E5Base Peak m/z= 722.3758-722.3902 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 3.25E4Base Peak m/z= 624.3998-624.4122 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 7.60E3Base Peak m/z= 312.1579-312.1641 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.21E5Base Peak m/z= 386.1991-386.2069 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2873-566.2987 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9

Spectronaut is a trademark of Biognosys AG. MaxQuant software is a trademark of Max-Planck Institute of Biochemistry. TripleTOF is a trademark of Sciex, Pte. Ltd. All other trademarks are the property of Thermo Fisher Scientific and its subsidiaries. This information is not intended to encourage use of these products in any manners that might infringe the intellectual property rights of others.

0 200 400 600 800

1000 1200 1400 1600 1800

Prot

ein

Grou

ps

Identified Protein Groups

0 1000 2000 3000 4000 5000 6000 7000 8000 9000

Uni

que

Pept

ides

Identified Peptides

0 100 200 300 400 500 600

0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5

Prot

eins

Sample Coverage

DDA DIA Protein Sample Coverage %

0

100

200

300

400

500

600

700

800

Sign

ifica

ntly

Cha

nged

Pro

tein

s (M

ann-

Whi

tney

-U-t

est p

<0.0

5)

Significantly Changed Proteins

5A 5B

5C 5D

Pain Control Group Ovarian Cyst Urinary Tract Infection

Constipation Mesenteric Adenitis Gastroenteritis

Ovarian Cyst (OC) Marker: CYTB UTI Marker: PERM

Constipation (Con) Marker: UROK Mesenteric Adenitis (MA) Marker: LEG3

Gastroenteritis (GE) Marker: KPYR

Ovarian Cyst (AUROC=0.91)

1-Specificity

Sens

itivi

ty

0 0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07

OC

U

TI

Con

M

A

GE

P

ain

CYTB

1e7 1e6

1e3 1e2

1e4 1e5

Mesenteric Adenitis (AUROC=0.872)

1-Specificity

Sens

itivi

ty

0 0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07

OC

U

TI

Con

M

A

GE

P

ain

LEG3

1e7 1e6

1e3 1e2

1e4 1e5

Gastroenteritis (AUROC=0.76)

1-Specificity

Sens

itivi

ty

0 0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

1.E+02

1.E+03

1.E+04

1.E+05

OC

U

TI

Con

M

A

GE

P

ain

KPYR

1e3

1e2

1e4

1e5

Urinary Tract Infection (AUROC=0.968)

1-Specificity

Sens

itivi

ty

0 0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07

OC

U

TI

Con

M

A

GE

P

ain

PERE

1e7 1e6

1e3 1e2

1e4 1e5

Constipation (AUROC=0.928)

1-Specificity

Sens

itivi

ty

0 0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07

OC

U

TI

Con

M

A

GE

P

ain

UROK

1e7 1e6

1e3 1e2

1e4 1e5

Peptides

Comprehensive Urinary Spectral Library

2,575 protein groups (87 Q Exactive MS and

23 TripleToF 5600TM runs)

19,854 unique peptides

144,643 transitions

87 ER patients in sixth groups

Constipation

Mesenteric Adenitis

Gastroenteritis

Pain Control Group

Ovarian Cyst

Urinary Tract Infection

DIA (Q Exactive HF MS)

Digestion

Search against database using

Spectronaut FDR 1%

Urinary Samples (87)

Scoring and Q Value calculation by Spectronaut

Perseus

u-test (non-parametric test)

Validated Peptides/proteins from 87 samples

Proteins

Samples

Num

ber o

f pro

tein

s/pe

ptid

es

Optimized DIA method:

average 8-10 data points per LC peak ( 9s @ FWHM)

average cycle time 2 s

Patient 1

Patient 2

Patient N

KEGG Pathway Enrichment p value

0.001 0.05 1

RT: 20.66 - 21.64

20.7 20.8 20.9 21.0 21.1 21.2 21.3 21.4 21.5 21.6Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Relat

ive A

bund

ance

NL: 2.35E7Base Peak m/z= 506.2709-506.2811 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9

Overcoming the challenges in Data Independent Acquisition (DIA) via high resolution accurate mass Orbitrap based mass spectrometer Yue Xuan1; Jan Muntel2; Sebastian T. Berger2; Andreas FR Huhmer3; Hanno Steen2; Thomas Moehring1 1Thermo Fisher Scientific, Bremen, GERMANY; 2Departments of Pathology, Boston Children’s Hospital, Boston, MA; 3Thermo Fisher Scientific, San Jose, CA

Conclusion We successfully established a DIA workflow, which enables large scope of urine

proteomics studies based on high resolution accurate mass Orbitrap technology.

Even without any prefractionation of the samples, we detected more than 2,500 proteins, which is close to complete coverage of the urinary proteome by our DIA workflow.

The highly reproducible quantification (median CV 7%) enables to skip technical replicates and to focus on the analysis of biological samples.

The high number of quantified proteins per sample (in average 1,300) per single DIA experiment enabled to get insights into the host cell response in an UTI based on changes in the urinary proteome as well into the formation of an ovarian cyst.

We envision that the comprehensiveness and low analysis time per sample will allow the application of this DIA workflow in synchronous biomarker discovery and validation which requires the analysis of hundreds of samples.

References 1. Rosenberger, G. SCIENTIFIC DATA | 1:140031 | DOI: 10.1038/sdata.2014.31

2. Zheng, J. BMC genomics 2013,14, 777

3. Takaya et al., Int J Oncol, 2015

4. Feldman et al., Clin Cancer Res, 2009

Overview Purpose: Utilize Thermo Scientific TM Q ExactiveTM HF mass spectrometer for DIA LC-MS/MS experiments to study the urinary proteome, and demonstrate how high resolution/ accurate mass spectrometry is employed to overcome DIA challenges and enable large scope of urine proteomics studies.

Methods: Urine samples from 87 patients associated with seven different differential diagnoses were collected and trypsinized using a membrane-based processing method in a 96-well plate format. For generation of a spectral library all samples were analyzed by DDA methods. DIA analysis was performed on the Q Exactive HF MS to quantitatively mapping the urinary proteomes.

Results: In this study, we have developed a robust DIA method for the comprehensive mapping of the urinary proteome that enables large scale urine proteomics studies with high throughput ( less than 4 days for 87 biological sample measurements) and excellent reproducibility (media of CV% is 7%).

Introduction The promises of data independent acquisition (DIA) strategies are a comprehensive and reproducible data collection for large-scale quantitative proteomics experiments. However, the wide isolation window (usually >10Da) of DIA experiments co-isolates and co-fragments multiple peptides, resulting in highly complex DIA MS/MS spectra and makes the DIA data analysis challenging. To accurately identify and precisely quantify thousands of proteins per DIA experiment, the completeness and specificity of spectral library, the mass accuracy of the data, and the technical variance in quantitation play important roles. In this work, we utilize the Q Exactive HF mass spectrometer for DIA LC-MS/MS experiments to study the urinary proteome, and demonstrate how high resolution/ accurate mass spectrometry is employed to overcome these DIA challenges and enable large scope of urine proteomics studies..

Methods Sample Preparation

Urine samples were collected from consenting patients visiting the Emergency Department at Boston Children's Hospital. Upon consent, urine samples from 87 patients associated with seven different differential diagnoses were collected (abdominal pain controls: n = 33, ovarian cyst: n = 12, mesenteric adenitis: n = 6, constipation n = 7, urinary tract infection; UTI: n = 11, gastritis: n = 6, gastroenteritis: n = 12) and trypsinized using a membrane-based processing method in a 96-well plate format.

Liquid Chromatography

For the spectral library, all 87 samples were analyzed by a nanoLC system equipped with a LC-chip system (cHiPLC nanoflex, Eksigent, trapping column: Nano cHiPLC Trap column 200 μm x 0.5 mm Reprosil C18, 3 μm, analytical column: Nano cHiPLC column 75 μm x 15 cm Reprosil C18, 3 μm) coupled online to a Q-Exactive mass spectrometer (Thermo Fisher Scientific). Peptides (4 µl of digest) were separated by a linear gradient from 93% buffer A (0.2% FA in water) / 7% buffer B (0.2% FA in ACN) to 75% buffer A / 25% buffer B within 75 min.

DIA experiments on all 87 samples: Each sample was analyzed on a Thermo ScientificTM EASY-nLCTM 1000 nanoLC system equipped with a trapping column (Thermo ScientificTM PepMapTM100, 75um x 2cm, C18, 3um) and an analytical column (PepMapRSLC, 75um x 25cm, C18, 2um) coupled to a Q Exactive HF mass spectrometer equipped with a Thermo Scientific TM EASY-Spray TM nanoelectrospray ion source. Peptides (2 µl of digest) were separated by a linear gradient from 93% buffer A (0.2% FA in water) / 7% buffer B (0.2% FA in ACN) to 75% buffer A / 25% buffer B within 30 min. The total run time with loading and washing steps was 50 min. Column oven was set to 40°C .

Mass Spectrometry

DDA method on QE: The MS was operated in data-dependent TOP10 mode with the following settings: mass range 400-1,000 Th; resolution for MS1 scan 70,000 @ 200 Th; lock mass: 445.120025 Th; resolution for MS2 scan 17,500 @ 200 Th; isolation width 1.6 m/z; NCE 27; underfill ratio 1%; charge state exclusion: unassigned, 1, >6; dynamic exclusion 30 s.

Additionally a subset of randomly chosen 23 samples was analyzed on a commercial TripleTOF TM 5600 (AB Sciex, Concord, Canada) using the same LC setup and gradient. The MS was operated in data-dependent TOP50 mode with following settings: MS1 mass range 400-1,000 Th with 250 ms acc. time; MS2 mass range 100-1,700 Th with 50 ms acc. time and following MS2 selection criteria: UNIT resolution, intensity threshold 100 cts; charge states 2-5. Dynamic exclusion set to 17 s.

FIGURE 1. The established DIA workflow for Urinary Biomarker discovery

FIGURE 2. Influence of Spectral Library: Percentage of peptides and proteins that are detected in all three replicates, in 2 out of 3 replicates, in single replicate.

FIGURE 3. Reproducibility Evaluation with Three Replicates: Technical Reproducibility Of DDA And DIA Quantification Based On Three Replicates (Fig 3A). Number of Identification in Relation to Technical Replicates ( Fig 3B)

Results Selecting the Most Appropriate Spectral Library

To investigate the influence of the spectral library on the data analysis, we generated six different spectral libraries (Table 1). The quality of a spectral library can be assessed by the completeness of the library and the reproducibility of peptide/protein detection.

The comprehensive urinary library compromised 2,600 protein groups and 20,000 peptides (Spec Lib 06 in Table 1). Based on current studies, this library covers the vast majority of the urinary proteome (2).

We calculated the percentage of peptides and proteins that were detected in all three replicates, in 2 out of 3 replicates or in only one single replicate (Figure 2). The results show a higher reproducibility of peptide/protein identification and quantification when using the urinary spectral library 06. This finding underscores the importance of using a sample-specific spectral library .

High Reproducible Peptide Detection and Quantitation with DIA experiments

To elucidate how the DIA routine performs compared to a DDA workflow, we analyzed an unrelated sample three times by a DDA and the DIA method on the Q Exactive HF using the same 30 min gradient. The DIA data were analyzed using the comprehensive urinary library (Spec lib 06, Table 1).

With our optimized DIA workflow (Figure 1), the median CV of the peptide/protein quantification of the DIA data was only 6.7% and 8.1%, which is twofold better than from the DDA data (Figure 3A). The number of detected peptides (~5,220) and proteins (1,120) with a single DIA experiment are almost twofold as the DDA data (Figure 3B). Our DIA method show a highly reproducible and precise quantification on peptide level.

FIGURE 5. Comprehensive Urinary Proteome Coverage by DIA workflow

FIGURE 6. Top Biomarker Discovery by using DIA workflow

The power of Mass Accuracy

The application of wide isolation windows (>10 Th) in typical DIA experiments results in complex MS/MS spectra. During data analysis, the ion chromatograms of multiple fragment ions are extracted and aligned for peptide detection and quantification. To separate the analyte of interest from interferences, a highly accurate mass of the ions is crucial. We applied different mass tolerances for the data analysis (50 ppm, 20 ppm, and 10 ppm). One example is shown in Figure 4. With ± 2.5 mins retention time window and 50 ppm mass tolerance for extracting the multiple transitions of a peptide LVGYLDR, several interferences overlapped with the peptide of interest. Only with 10 ppm mass accuracy, the interferences are removed from the spectrum, therefore, 10 ppm mass accuracy is minimum requirement for an accurate detection of peptides .

TABLE 1. Spectral libraries

Library Name Protein Groups Peptides Fragment ions Instrument Source

Spec Lib 01 2,077 15,470 109,064 Q Exactive MS 23 DDA files

Urine

Spec Lib 02 1,436 9,569 65,880 TripleTOF 5600 23 DDA files

Urine

Spec Lib 03 2,226 16,985 123,087 Combined from Spec Lib 01 and 02

Urine

Spec Lib 04 14,158 149,420 2,832,306 TripleTOF 5600 see Rosenberger et al., 2014 (1)

Spec Lib 05 1,869 40,902 925,156 TripleTOF 5600 subset of Human-14,000

Spec Lib 06 2,575 19,854 144,643 87 Q Exactive MS DDA files + 23 TripleToF 5600

DDA files

Urine

DIA method on QE HF: Each DIA duty cycle contains one full scan and 24DIA MS/MS scans to covering the mass range 400 -1000 TH. Full scan with a resolution 30,000 @ 200 Th; AGC target – 3e6, maximal IT – 50ms; mass range 400-1,000 Th; followed by DIA scans with resolution 30,000 @ 200 Th; isolation width 20 m/z for the first 20 DIA scans, 40 m/z for the following 2 DIA scans, and 60m/z for the last two DIA scans; NCE 30; target value 1e6, maximal injection time set to “auto”, which automatically calculates the maximal injection time based on the detection time to allow the mass spectrometer always operating in the parallel ion filling and detection mode.

FIGURE 4. The ion chromatograms of multiple transitions of peptide LVGYLDR of protein (P07602) are extracted with 50 ppm, 20 ppm, and 10 ppm. Retention time window is ± 2.5 mins. 10 ppm mass accuracy is the minimum requirement for an accurate peptide detection.

Comprehensive Urinary Proteome Coverage by DIA workflow

The establishment of the DIA workflow (Figure 1) was applied to a urinary study compromising 87 samples. The samples were derived from patients with abdominal pain in a pediatric emergency room. After digestion in a 96-well plate format we were able to analyze them in less than 4 days by our optimized DIA workflow. In average, we detected 1,301 protein groups (848 – 1,720) (Figure 5A) and 5,714 peptides per sample (3,172 – 8,231) (Figure 5B). In total, our DIA workflow enabled the detection of 2,456 proteins, representing 95% of the proteins in the spectral library. These numbers are contrasted by the DDA output, in which only 7% of the proteins were identified in more than 95% of the samples and 60% in less than 25% of the samples (Figure 5C). Compared to all other samples, 773 proteins were significantly changed in their amount in the UTI samples (non-parametric Mann-Whitney U test, p<0.05), 502 in the ovarian cyst samples, 209 in the constipation samples, 111 in the mesenteric adenitis samples and 58 in the gastroenteritis samples (Figure 5D).

73

77

71

26

61

69

12 10

13 22 15

13 15 13

16

52

24 18

0

1000

2000

3000

4000

5000

Iden

tifie

d Pe

ptid

es

Peptide Level

3of3 2of3 1of3

80

81

75

22

62

75

8 8 10

18

12 9 12 12

15

60

25 15

0

400

800

1200

1600

Iden

tifie

d Pr

otei

ns

Protein Level

3B

1 2 3 0

1000

2000

3000

4000

5000

6000

7000

Replicate

Iden

tifie

d Pe

ptid

es

Peptide Level 3A

1 2 3 0

200

400

600

800

1000

1200

1400

Replicate

Iden

tifie

d Pr

otei

ns

Protein Level

DDA ID increase DDA DDA - matching ID increase DDA - matching DIA ID increase DIA

6 8 10

Median CV 16.7%

20

0

40

60

80

100

120

140

Protein Level

DDA – Peptide Peak Areas

Log10 Protein Intensity

Coe

ffici

ent O

f Var

ianc

e %

2 4 6 8

Median CV

8.1% 20

0

40

60

80

100

120

140 DIA – Fragment Ion Peak Areas

Log10 Protein Intensity

10 7 9

Median CV 15.7%

20

0

40

60

80

100

120

140

6 8 7 4 6 8

Median CV

6.7% 20

0

40

60

80

100

120

140

3 5

Peptide Level

DIA – Fragment Ion Peak Areas

DDA – Peptide Peak Areas

Log10 Peptide Intensity

Coe

ffici

ent O

f Var

ianc

e %

Log10 Peptide Intensity

Data Analysis

Database search in MaxQuant software, version 1.5.0.0 was performed directly with the RAW- and WIFF files using the human UniProt database. Trypsin with up to 2 missed cleavages; mass tolerances set to 20 ppm for the first search and 4.5 ppm for the second search for the Q-Exactive data and 0.1 Da for the first search and 0.01 for the main search for the TripleTOF 5600 data. Oxidation of M was chosen as dynamic modification (+15.995 Da) and carbamidomethylation of C as static modification (+57.021 Da). FDR was set to 1% on peptide and protein level.

All DIA data were directly analyzed in Spectronaut 7.0 (Biognosys). dynamic score refinement and MS1 scoring – enabled; interference correction and cross run normalization (total peak area) – enabled; All results were filtered by a Q value of 0.01 (equals a FDR of 1% on peptide level).

Protein intensity was calculated by summing the peptide intensities of each protein from the Spectronaut output file. The data were imported into Perseus 1.5.1.6 and missing values imputed by the minimum value for each protein. Significance of protein abundance changes were calculated using the u-test (non-parametric test) and protein with a p value below 0.01 were considered to be significantly changed. The annotation of biological process was done with the DAVID online tool using the comprehensive urinary spectral library- Spec Lib 06 in Table 1).

Biomarker Discovery

To find the best suitable biomarker candidates we focused on the ten highest significantly changed proteins for each of the five conditions. The performance of the biomarker candidates has been assessed by calculating the area under the receiver-operating characteristic (AUROC) (Figure 6). For example, Cystatin-B (CYTB) is an intracellular thiol proteinase inhibitor. It is increased level in Ovarian Cyst (OC) with a pValue equals to 1.3e-5 (Bonferroni-corrected: p=0.027), and AUROC is 0.91. Cystatin-B has been identified as biomarker candidate in the context of malignant growths in ovaries (3) and bladder (4), i.e. genitourinary tract.

50 ppm 20 ppm 10 ppm

─ [y5+] – 623.315 m/z ─ [y6+] – 722.383 m/z ─ [y6+ -H3PO4] – 624.406 m/z ─ [y5++] – 312.161 m/z ─ [y3+ -NH3] – 386.203 m/z ─ [y4+] – 566.293 m/z

RT: 18.93 - 24.07

19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e A

bund

ance

NL: 7.12E5Base Peak m/z= 623.2838-623.3462 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.88E5Base Peak m/z= 722.3469-722.4191 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.24E4Base Peak m/z= 624.3748-624.4372 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.02E7Base Peak m/z= 312.1454-312.1766 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.83E6Base Peak m/z= 386.1837-386.2223 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2647-566.3213 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9

RT: 18.93 - 24.07

19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e A

bund

ance

NL: 7.12E5Base Peak m/z= 623.3025-623.3275 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.30E5Base Peak m/z= 722.3686-722.3974 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 3.25E4Base Peak m/z= 624.3935-624.4185 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 5.85E6Base Peak m/z= 312.1548-312.1672 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.97E5Base Peak m/z= 386.1953-386.2107 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2817-566.3043 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9

RT: 18.93 - 24.07

19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e A

bund

ance

NL: 7.12E5Base Peak m/z= 623.3088-623.3212 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.13E5Base Peak m/z= 722.3758-722.3902 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 3.25E4Base Peak m/z= 624.3998-624.4122 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 7.60E3Base Peak m/z= 312.1579-312.1641 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.21E5Base Peak m/z= 386.1991-386.2069 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2873-566.2987 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9

Spectronaut is a trademark of Biognosys AG. MaxQuant software is a trademark of Max-Planck Institute of Biochemistry. TripleTOF is a trademark of Sciex, Pte. Ltd. All other trademarks are the property of Thermo Fisher Scientific and its subsidiaries. This information is not intended to encourage use of these products in any manners that might infringe the intellectual property rights of others.

0 200 400 600 800

1000 1200 1400 1600 1800

Prot

ein

Grou

ps

Identified Protein Groups

0 1000 2000 3000 4000 5000 6000 7000 8000 9000

Uni

que

Pept

ides

Identified Peptides

0 100 200 300 400 500 600

0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5

Prot

eins

Sample Coverage

DDA DIA Protein Sample Coverage %

0

100

200

300

400

500

600

700

800 Si

gnifi

cant

ly C

hang

ed P

rote

ins

(Man

n-W

hitn

ey-U

-tes

t p<0

.05)

Significantly Changed Proteins

5A 5B

5C 5D

Pain Control Group Ovarian Cyst Urinary Tract Infection

Constipation Mesenteric Adenitis Gastroenteritis

Ovarian Cyst (OC) Marker: CYTB UTI Marker: PERM

Constipation (Con) Marker: UROK Mesenteric Adenitis (MA) Marker: LEG3

Gastroenteritis (GE) Marker: KPYR

Ovarian Cyst (AUROC=0.91)

1-Specificity

Sens

itivi

ty

0 0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07

OC

U

TI

Con

M

A

GE

P

ain

CYTB

1e7 1e6

1e3 1e2

1e4 1e5

Mesenteric Adenitis (AUROC=0.872)

1-Specificity

Sens

itivi

ty

0 0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07

OC

U

TI

Con

M

A

GE

P

ain

LEG3

1e7 1e6

1e3 1e2

1e4 1e5

Gastroenteritis (AUROC=0.76)

1-Specificity

Sens

itivi

ty

0 0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

1.E+02

1.E+03

1.E+04

1.E+05

OC

U

TI

Con

M

A

GE

P

ain

KPYR

1e3

1e2

1e4

1e5

Urinary Tract Infection (AUROC=0.968)

1-Specificity

Sens

itivi

ty

0 0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07

OC

U

TI

Con

M

A

GE

P

ain

PERE

1e7 1e6

1e3 1e2

1e4 1e5

Constipation (AUROC=0.928)

1-Specificity

Sens

itivi

ty

0 0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07

OC

U

TI

Con

M

A

GE

P

ain

UROK

1e7 1e6

1e3 1e2

1e4 1e5

Peptides

Comprehensive Urinary Spectral Library

2,575 protein groups (87 Q Exactive MS and

23 TripleToF 5600TM runs)

19,854 unique peptides

144,643 transitions

87 ER patients in sixth groups

Constipation

Mesenteric Adenitis

Gastroenteritis

Pain Control Group

Ovarian Cyst

Urinary Tract Infection

DIA (Q Exactive HF MS)

Digestion

Search against database using

Spectronaut FDR 1%

Urinary Samples (87)

Scoring and Q Value calculation by Spectronaut

Perseus

u-test (non-parametric test)

Validated Peptides/proteins from 87 samples

Proteins

Samples

Num

ber o

f pro

tein

s/pe

ptid

es

Optimized DIA method:

average 8-10 data points per LC peak ( 9s @ FWHM)

average cycle time 2 s

Patient 1

Patient 2

Patient N

KEGG Pathway Enrichment p value

0.001 0.05 1

RT: 20.66 - 21.64

20.7 20.8 20.9 21.0 21.1 21.2 21.3 21.4 21.5 21.6Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Relat

ive A

bund

ance

NL: 2.35E7Base Peak m/z= 506.2709-506.2811 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9

Overcoming the challenges in Data Independent Acquisition (DIA) via high resolution accurate mass Orbitrap based mass spectrometer Yue Xuan1; Jan Muntel2; Sebastian T. Berger2; Andreas FR Huhmer3; Hanno Steen2; Thomas Moehring1 1Thermo Fisher Scientific, Bremen, GERMANY; 2Departments of Pathology, Boston Children’s Hospital, Boston, MA; 3Thermo Fisher Scientific, San Jose, CA

Conclusion We successfully established a DIA workflow, which enables large scope of urine

proteomics studies based on high resolution accurate mass Orbitrap technology.

Even without any prefractionation of the samples, we detected more than 2,500 proteins, which is close to complete coverage of the urinary proteome by our DIA workflow.

The highly reproducible quantification (median CV 7%) enables to skip technical replicates and to focus on the analysis of biological samples.

The high number of quantified proteins per sample (in average 1,300) per single DIA experiment enabled to get insights into the host cell response in an UTI based on changes in the urinary proteome as well into the formation of an ovarian cyst.

We envision that the comprehensiveness and low analysis time per sample will allow the application of this DIA workflow in synchronous biomarker discovery and validation which requires the analysis of hundreds of samples.

References 1. Rosenberger, G. SCIENTIFIC DATA | 1:140031 | DOI: 10.1038/sdata.2014.31

2. Zheng, J. BMC genomics 2013,14, 777

3. Takaya et al., Int J Oncol, 2015

4. Feldman et al., Clin Cancer Res, 2009

Overview Purpose: Utilize Thermo Scientific TM Q ExactiveTM HF mass spectrometer for DIA LC-MS/MS experiments to study the urinary proteome, and demonstrate how high resolution/ accurate mass spectrometry is employed to overcome DIA challenges and enable large scope of urine proteomics studies.

Methods: Urine samples from 87 patients associated with seven different differential diagnoses were collected and trypsinized using a membrane-based processing method in a 96-well plate format. For generation of a spectral library all samples were analyzed by DDA methods. DIA analysis was performed on the Q Exactive HF MS to quantitatively mapping the urinary proteomes.

Results: In this study, we have developed a robust DIA method for the comprehensive mapping of the urinary proteome that enables large scale urine proteomics studies with high throughput ( less than 4 days for 87 biological sample measurements) and excellent reproducibility (media of CV% is 7%).

Introduction The promises of data independent acquisition (DIA) strategies are a comprehensive and reproducible data collection for large-scale quantitative proteomics experiments. However, the wide isolation window (usually >10Da) of DIA experiments co-isolates and co-fragments multiple peptides, resulting in highly complex DIA MS/MS spectra and makes the DIA data analysis challenging. To accurately identify and precisely quantify thousands of proteins per DIA experiment, the completeness and specificity of spectral library, the mass accuracy of the data, and the technical variance in quantitation play important roles. In this work, we utilize the Q Exactive HF mass spectrometer for DIA LC-MS/MS experiments to study the urinary proteome, and demonstrate how high resolution/ accurate mass spectrometry is employed to overcome these DIA challenges and enable large scope of urine proteomics studies..

Methods Sample Preparation

Urine samples were collected from consenting patients visiting the Emergency Department at Boston Children's Hospital. Upon consent, urine samples from 87 patients associated with seven different differential diagnoses were collected (abdominal pain controls: n = 33, ovarian cyst: n = 12, mesenteric adenitis: n = 6, constipation n = 7, urinary tract infection; UTI: n = 11, gastritis: n = 6, gastroenteritis: n = 12) and trypsinized using a membrane-based processing method in a 96-well plate format.

Liquid Chromatography

For the spectral library, all 87 samples were analyzed by a nanoLC system equipped with a LC-chip system (cHiPLC nanoflex, Eksigent, trapping column: Nano cHiPLC Trap column 200 μm x 0.5 mm Reprosil C18, 3 μm, analytical column: Nano cHiPLC column 75 μm x 15 cm Reprosil C18, 3 μm) coupled online to a Q-Exactive mass spectrometer (Thermo Fisher Scientific). Peptides (4 µl of digest) were separated by a linear gradient from 93% buffer A (0.2% FA in water) / 7% buffer B (0.2% FA in ACN) to 75% buffer A / 25% buffer B within 75 min.

DIA experiments on all 87 samples: Each sample was analyzed on a Thermo ScientificTM EASY-nLCTM 1000 nanoLC system equipped with a trapping column (Thermo ScientificTM PepMapTM100, 75um x 2cm, C18, 3um) and an analytical column (PepMapRSLC, 75um x 25cm, C18, 2um) coupled to a Q Exactive HF mass spectrometer equipped with a Thermo Scientific TM EASY-Spray TM nanoelectrospray ion source. Peptides (2 µl of digest) were separated by a linear gradient from 93% buffer A (0.2% FA in water) / 7% buffer B (0.2% FA in ACN) to 75% buffer A / 25% buffer B within 30 min. The total run time with loading and washing steps was 50 min. Column oven was set to 40°C .

Mass Spectrometry

DDA method on QE: The MS was operated in data-dependent TOP10 mode with the following settings: mass range 400-1,000 Th; resolution for MS1 scan 70,000 @ 200 Th; lock mass: 445.120025 Th; resolution for MS2 scan 17,500 @ 200 Th; isolation width 1.6 m/z; NCE 27; underfill ratio 1%; charge state exclusion: unassigned, 1, >6; dynamic exclusion 30 s.

Additionally a subset of randomly chosen 23 samples was analyzed on a commercial TripleTOF TM 5600 (AB Sciex, Concord, Canada) using the same LC setup and gradient. The MS was operated in data-dependent TOP50 mode with following settings: MS1 mass range 400-1,000 Th with 250 ms acc. time; MS2 mass range 100-1,700 Th with 50 ms acc. time and following MS2 selection criteria: UNIT resolution, intensity threshold 100 cts; charge states 2-5. Dynamic exclusion set to 17 s.

FIGURE 1. The established DIA workflow for Urinary Biomarker discovery

FIGURE 2. Influence of Spectral Library: Percentage of peptides and proteins that are detected in all three replicates, in 2 out of 3 replicates, in single replicate.

FIGURE 3. Reproducibility Evaluation with Three Replicates: Technical Reproducibility Of DDA And DIA Quantification Based On Three Replicates (Fig 3A). Number of Identification in Relation to Technical Replicates ( Fig 3B)

Results Selecting the Most Appropriate Spectral Library

To investigate the influence of the spectral library on the data analysis, we generated six different spectral libraries (Table 1). The quality of a spectral library can be assessed by the completeness of the library and the reproducibility of peptide/protein detection.

The comprehensive urinary library compromised 2,600 protein groups and 20,000 peptides (Spec Lib 06 in Table 1). Based on current studies, this library covers the vast majority of the urinary proteome (2).

We calculated the percentage of peptides and proteins that were detected in all three replicates, in 2 out of 3 replicates or in only one single replicate (Figure 2). The results show a higher reproducibility of peptide/protein identification and quantification when using the urinary spectral library 06. This finding underscores the importance of using a sample-specific spectral library .

High Reproducible Peptide Detection and Quantitation with DIA experiments

To elucidate how the DIA routine performs compared to a DDA workflow, we analyzed an unrelated sample three times by a DDA and the DIA method on the Q Exactive HF using the same 30 min gradient. The DIA data were analyzed using the comprehensive urinary library (Spec lib 06, Table 1).

With our optimized DIA workflow (Figure 1), the median CV of the peptide/protein quantification of the DIA data was only 6.7% and 8.1%, which is twofold better than from the DDA data (Figure 3A). The number of detected peptides (~5,220) and proteins (1,120) with a single DIA experiment are almost twofold as the DDA data (Figure 3B). Our DIA method show a highly reproducible and precise quantification on peptide level.

FIGURE 5. Comprehensive Urinary Proteome Coverage by DIA workflow

FIGURE 6. Top Biomarker Discovery by using DIA workflow

The power of Mass Accuracy

The application of wide isolation windows (>10 Th) in typical DIA experiments results in complex MS/MS spectra. During data analysis, the ion chromatograms of multiple fragment ions are extracted and aligned for peptide detection and quantification. To separate the analyte of interest from interferences, a highly accurate mass of the ions is crucial. We applied different mass tolerances for the data analysis (50 ppm, 20 ppm, and 10 ppm). One example is shown in Figure 4. With ± 2.5 mins retention time window and 50 ppm mass tolerance for extracting the multiple transitions of a peptide LVGYLDR, several interferences overlapped with the peptide of interest. Only with 10 ppm mass accuracy, the interferences are removed from the spectrum, therefore, 10 ppm mass accuracy is minimum requirement for an accurate detection of peptides .

TABLE 1. Spectral libraries

Library Name Protein Groups Peptides Fragment ions Instrument Source

Spec Lib 01 2,077 15,470 109,064 Q Exactive MS 23 DDA files

Urine

Spec Lib 02 1,436 9,569 65,880 TripleTOF 5600 23 DDA files

Urine

Spec Lib 03 2,226 16,985 123,087 Combined from Spec Lib 01 and 02

Urine

Spec Lib 04 14,158 149,420 2,832,306 TripleTOF 5600 see Rosenberger et al., 2014 (1)

Spec Lib 05 1,869 40,902 925,156 TripleTOF 5600 subset of Human-14,000

Spec Lib 06 2,575 19,854 144,643 87 Q Exactive MS DDA files + 23 TripleToF 5600

DDA files

Urine

DIA method on QE HF: Each DIA duty cycle contains one full scan and 24DIA MS/MS scans to covering the mass range 400 -1000 TH. Full scan with a resolution 30,000 @ 200 Th; AGC target – 3e6, maximal IT – 50ms; mass range 400-1,000 Th; followed by DIA scans with resolution 30,000 @ 200 Th; isolation width 20 m/z for the first 20 DIA scans, 40 m/z for the following 2 DIA scans, and 60m/z for the last two DIA scans; NCE 30; target value 1e6, maximal injection time set to “auto”, which automatically calculates the maximal injection time based on the detection time to allow the mass spectrometer always operating in the parallel ion filling and detection mode.

FIGURE 4. The ion chromatograms of multiple transitions of peptide LVGYLDR of protein (P07602) are extracted with 50 ppm, 20 ppm, and 10 ppm. Retention time window is ± 2.5 mins. 10 ppm mass accuracy is the minimum requirement for an accurate peptide detection.

Comprehensive Urinary Proteome Coverage by DIA workflow

The establishment of the DIA workflow (Figure 1) was applied to a urinary study compromising 87 samples. The samples were derived from patients with abdominal pain in a pediatric emergency room. After digestion in a 96-well plate format we were able to analyze them in less than 4 days by our optimized DIA workflow. In average, we detected 1,301 protein groups (848 – 1,720) (Figure 5A) and 5,714 peptides per sample (3,172 – 8,231) (Figure 5B). In total, our DIA workflow enabled the detection of 2,456 proteins, representing 95% of the proteins in the spectral library. These numbers are contrasted by the DDA output, in which only 7% of the proteins were identified in more than 95% of the samples and 60% in less than 25% of the samples (Figure 5C). Compared to all other samples, 773 proteins were significantly changed in their amount in the UTI samples (non-parametric Mann-Whitney U test, p<0.05), 502 in the ovarian cyst samples, 209 in the constipation samples, 111 in the mesenteric adenitis samples and 58 in the gastroenteritis samples (Figure 5D).

73

77

71

26

61

69

12 10

13 22 15

13 15 13

16

52

24 18

0

1000

2000

3000

4000

5000

Iden

tifie

d Pe

ptid

es

Peptide Level

3of3 2of3 1of3

80

81

75

22

62

75

8 8 10

18

12 9 12 12

15

60

25 15

0

400

800

1200

1600 Id

entif

ied

Prot

eins

Protein Level

3B

1 2 3 0

1000

2000

3000

4000

5000

6000

7000

Replicate

Iden

tifie

d Pe

ptid

es

Peptide Level 3A

1 2 3 0

200

400

600

800

1000

1200

1400

Replicate

Iden

tifie

d Pr

otei

ns

Protein Level

DDA ID increase DDA DDA - matching ID increase DDA - matching DIA ID increase DIA

6 8 10

Median CV 16.7%

20

0

40

60

80

100

120

140

Protein Level

DDA – Peptide Peak Areas

Log10 Protein Intensity

Coe

ffici

ent O

f Var

ianc

e %

2 4 6 8

Median CV

8.1% 20

0

40

60

80

100

120

140 DIA – Fragment Ion Peak Areas

Log10 Protein Intensity

10 7 9

Median CV 15.7%

20

0

40

60

80

100

120

140

6 8 7 4 6 8

Median CV

6.7% 20

0

40

60

80

100

120

140

3 5

Peptide Level

DIA – Fragment Ion Peak Areas

DDA – Peptide Peak Areas

Log10 Peptide Intensity

Coe

ffici

ent O

f Var

ianc

e %

Log10 Peptide Intensity

Data Analysis

Database search in MaxQuant software, version 1.5.0.0 was performed directly with the RAW- and WIFF files using the human UniProt database. Trypsin with up to 2 missed cleavages; mass tolerances set to 20 ppm for the first search and 4.5 ppm for the second search for the Q-Exactive data and 0.1 Da for the first search and 0.01 for the main search for the TripleTOF 5600 data. Oxidation of M was chosen as dynamic modification (+15.995 Da) and carbamidomethylation of C as static modification (+57.021 Da). FDR was set to 1% on peptide and protein level.

All DIA data were directly analyzed in Spectronaut 7.0 (Biognosys). dynamic score refinement and MS1 scoring – enabled; interference correction and cross run normalization (total peak area) – enabled; All results were filtered by a Q value of 0.01 (equals a FDR of 1% on peptide level).

Protein intensity was calculated by summing the peptide intensities of each protein from the Spectronaut output file. The data were imported into Perseus 1.5.1.6 and missing values imputed by the minimum value for each protein. Significance of protein abundance changes were calculated using the u-test (non-parametric test) and protein with a p value below 0.01 were considered to be significantly changed. The annotation of biological process was done with the DAVID online tool using the comprehensive urinary spectral library- Spec Lib 06 in Table 1).

Biomarker Discovery

To find the best suitable biomarker candidates we focused on the ten highest significantly changed proteins for each of the five conditions. The performance of the biomarker candidates has been assessed by calculating the area under the receiver-operating characteristic (AUROC) (Figure 6). For example, Cystatin-B (CYTB) is an intracellular thiol proteinase inhibitor. It is increased level in Ovarian Cyst (OC) with a pValue equals to 1.3e-5 (Bonferroni-corrected: p=0.027), and AUROC is 0.91. Cystatin-B has been identified as biomarker candidate in the context of malignant growths in ovaries (3) and bladder (4), i.e. genitourinary tract.

50 ppm 20 ppm 10 ppm

─ [y5+] – 623.315 m/z ─ [y6+] – 722.383 m/z ─ [y6+ -H3PO4] – 624.406 m/z ─ [y5++] – 312.161 m/z ─ [y3+ -NH3] – 386.203 m/z ─ [y4+] – 566.293 m/z

RT: 18.93 - 24.07

19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e A

bund

ance

NL: 7.12E5Base Peak m/z= 623.2838-623.3462 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.88E5Base Peak m/z= 722.3469-722.4191 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.24E4Base Peak m/z= 624.3748-624.4372 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.02E7Base Peak m/z= 312.1454-312.1766 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.83E6Base Peak m/z= 386.1837-386.2223 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2647-566.3213 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9

RT: 18.93 - 24.07

19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e A

bund

ance

NL: 7.12E5Base Peak m/z= 623.3025-623.3275 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.30E5Base Peak m/z= 722.3686-722.3974 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 3.25E4Base Peak m/z= 624.3935-624.4185 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 5.85E6Base Peak m/z= 312.1548-312.1672 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.97E5Base Peak m/z= 386.1953-386.2107 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2817-566.3043 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9

RT: 18.93 - 24.07

19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e A

bund

ance

NL: 7.12E5Base Peak m/z= 623.3088-623.3212 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.13E5Base Peak m/z= 722.3758-722.3902 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 3.25E4Base Peak m/z= 624.3998-624.4122 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 7.60E3Base Peak m/z= 312.1579-312.1641 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.21E5Base Peak m/z= 386.1991-386.2069 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2873-566.2987 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9

Spectronaut is a trademark of Biognosys AG. MaxQuant software is a trademark of Max-Planck Institute of Biochemistry. TripleTOF is a trademark of Sciex, Pte. Ltd. All other trademarks are the property of Thermo Fisher Scientific and its subsidiaries. This information is not intended to encourage use of these products in any manners that might infringe the intellectual property rights of others.

0 200 400 600 800

1000 1200 1400 1600 1800

Prot

ein

Grou

ps

Identified Protein Groups

0 1000 2000 3000 4000 5000 6000 7000 8000 9000

Uni

que

Pept

ides

Identified Peptides

0 100 200 300 400 500 600

0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5

Prot

eins

Sample Coverage

DDA DIA Protein Sample Coverage %

0

100

200

300

400

500

600

700

800

Sign

ifica

ntly

Cha

nged

Pro

tein

s (M

ann-

Whi

tney

-U-t

est p

<0.0

5)

Significantly Changed Proteins

5A 5B

5C 5D

Pain Control Group Ovarian Cyst Urinary Tract Infection

Constipation Mesenteric Adenitis Gastroenteritis

Ovarian Cyst (OC) Marker: CYTB UTI Marker: PERM

Constipation (Con) Marker: UROK Mesenteric Adenitis (MA) Marker: LEG3

Gastroenteritis (GE) Marker: KPYR

Ovarian Cyst (AUROC=0.91)

1-Specificity

Sens

itivi

ty

0 0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07

OC

U

TI

Con

M

A

GE

P

ain

CYTB

1e7 1e6

1e3 1e2

1e4 1e5

Mesenteric Adenitis (AUROC=0.872)

1-Specificity

Sens

itivi

ty

0 0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07

OC

U

TI

Con

M

A

GE

P

ain

LEG3

1e7 1e6

1e3 1e2

1e4 1e5

Gastroenteritis (AUROC=0.76)

1-Specificity

Sens

itivi

ty

0 0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

1.E+02

1.E+03

1.E+04

1.E+05

OC

U

TI

Con

M

A

GE

P

ain

KPYR

1e3

1e2

1e4

1e5

Urinary Tract Infection (AUROC=0.968)

1-Specificity

Sens

itivi

ty

0 0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07

OC

U

TI

Con

M

A

GE

P

ain

PERE

1e7 1e6

1e3 1e2

1e4 1e5

Constipation (AUROC=0.928)

1-Specificity

Sens

itivi

ty

0 0.2 0.4 0.6 0.8 1 0

0.2

0.4

0.6

0.8

1

1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07

OC

U

TI

Con

M

A

GE

P

ain

UROK

1e7 1e6

1e3 1e2

1e4 1e5

Peptides

Comprehensive Urinary Spectral Library

2,575 protein groups (87 Q Exactive MS and

23 TripleToF 5600TM runs)

19,854 unique peptides

144,643 transitions

87 ER patients in sixth groups

Constipation

Mesenteric Adenitis

Gastroenteritis

Pain Control Group

Ovarian Cyst

Urinary Tract Infection

DIA (Q Exactive HF MS)

Digestion

Search against database using

Spectronaut FDR 1%

Urinary Samples (87)

Scoring and Q Value calculation by Spectronaut

Perseus

u-test (non-parametric test)

Validated Peptides/proteins from 87 samples

Proteins

Samples

Num

ber o

f pro

tein

s/pe

ptid

es

Optimized DIA method:

average 8-10 data points per LC peak ( 9s @ FWHM)

average cycle time 2 s

Patient 1

Patient 2

Patient N

KEGG Pathway Enrichment p value

0.001 0.05 1

RT: 20.66 - 21.64

20.7 20.8 20.9 21.0 21.1 21.2 21.3 21.4 21.5 21.6Time (min)

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Relat

ive A

bund

ance

NL: 2.35E7Base Peak m/z= 506.2709-506.2811 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9

Africa +43 1 333 50 34 0Australia +61 3 9757 4300Austria +43 810 282 206Belgium +32 53 73 42 41Canada +1 800 530 8447China 800 810 5118 (free call domestic)

400 650 5118

Denmark +45 70 23 62 60Europe-Other +43 1 333 50 34 0Finland +358 10 3292 200France +33 1 60 92 48 00Germany +49 6103 408 1014India +91 22 6742 9494Italy +39 02 950 591

Japan +81 45 453 9100Korea +82 2 3420 8600Latin America +1 561 688 8700Middle East +43 1 333 50 34 0Netherlands +31 76 579 55 55New Zealand +64 9 980 6700Norway +46 8 556 468 00

Russia/CIS +43 1 333 50 34 0Singapore +65 6289 1190Spain +34 914 845 965Sweden +46 8 556 468 00Switzerland +41 61 716 77 00UK +44 1442 233555USA +1 800 532 4752

www.thermoscientific.com©2015 Thermo Fisher Scientifi c Inc. All rights reserved. ISO is a trademark of the International Standards Organization. Spectronaut is a trademark of Biognosys AG. MaxQuant software is a trademark of Max-Planck Institute of Biochemistry. TripleTOF is a trademark of Sciex, Pte. Ltd. All other trademarks are the property of Thermo Fisher Scientifi c and its subsidiaries. This information is presented as an example of the capabilities of Thermo Fisher Scientifi c products. It is not intended to encourage use of these products in any manners that might infringe the intellectual property rights of others. Specifi cations, terms and pricing are subject to change. Not all products are available in all countries. Please consult your local sales representative for details.

Thermo Fisher Scientifi c, San Jose, CA USA is ISO 13485 Certifi ed.

ISO 13485