18
Chromes from Chromatin: Sonification of the Epigenome 1 2 Davide Cittaro 1 , Dejan Lazarevic 1 , Paolo Provero 2 3 4 Author affiliations 5 1: Center for Translational Genomics and Bioinformatics, San Raffaele Hospital- via Olgettina 58, 20138 6 Milano, Italy 7 2: Department of Molecular Biotechnology and Life Sciences, University of Turin – via Nizza 52, 10126 8 Torino, Italy 9 10 Corresponding author 11 Davide Cittaro; Center for Translational Genomics and Bioinformatics, San Raffaele Hospital- via 12 olgettina 58, 20138 Milano, Italy; telephone: +39 02 26439137; email: [email protected] 13 14 Keywords 15 Sonification; Chromatin; Music; Information content 16 17 . CC-BY-NC-ND 4.0 International license was not certified by peer review) is the author/funder. It is made available under a The copyright holder for this preprint (which this version posted January 21, 2016. . https://doi.org/10.1101/037523 doi: bioRxiv preprint

Chromes from Chromatin: Sonification of the …...2016/01/21  · 21 Introduction 22 Sonification is the process of converting data into sound. Sonification itself has a long, yet

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Chromes from Chromatin: Sonification of the …...2016/01/21  · 21 Introduction 22 Sonification is the process of converting data into sound. Sonification itself has a long, yet

ChromesfromChromatin:SonificationoftheEpigenome1

2

DavideCittaro1,DejanLazarevic1,PaoloProvero23

4

Authoraffiliations5

1:CenterforTranslationalGenomicsandBioinformatics,SanRaffaeleHospital-viaOlgettina58,201386

Milano,Italy7

2:DepartmentofMolecularBiotechnologyandLifeSciences,UniversityofTurin–viaNizza52,101268

Torino,Italy9

10

Correspondingauthor11

Davide Cittaro; Center for Translational Genomics and Bioinformatics, San Raffaele Hospital- via12

olgettina58,20138Milano,Italy;telephone:+390226439137;email:[email protected]

14

Keywords15

Sonification;Chromatin;Music;Informationcontent16

17

.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted January 21, 2016. . https://doi.org/10.1101/037523doi: bioRxiv preprint

Page 2: Chromes from Chromatin: Sonification of the …...2016/01/21  · 21 Introduction 22 Sonification is the process of converting data into sound. Sonification itself has a long, yet

Abstract1

Theepigeneticmodificationsareorganized inpatternsdeterminingthe functionalpropertiesof the2

underlyinggenome.Suchpatterns,typicallymeasuredbyChIP-seqassaysofhistonemodifications,can3

becombinedandtranslatedintomusicalscores,summarizingmultiplesignalsintoasinglewaveform.4

As music is recognized as a universal way to convey meaningful information (1), we wanted to5

investigateproperties ofmusic obtainedby sonificationof ChIP-seqdata.We show that themusic6

produced by such quantitative signals is perceived by human listener as more pleasant than that7

producedfromrandomizedsignals.Moreover,thewaveformcanbeanalyzedtopredictphenotypic8

properties,suchasdifferentialgeneexpression.9

10

SignificanceStatement11

Music is recognized as universal way to communicate emotions and,more in general, meaningful12

information. Various sources of information can be translated into music or sounds, mostly for13

recreationalpurposes.Ithasbeenshownthathumanearcanclassifyinformationencodedintosounds.14

Quantitativegenomicfeatures,andinparticularepigeneticmarks,dorepresentfunctionalinformation15

thatisexploitedbycellstodrivebiologicalprocesses.Wetestamethodtotranslatesuchinformation16

intomusicandwestudysomepropertiesofthesonificatedchromatinmarks.Weshowthatnotonly17

musicalrepresentationofepigeneticmarkshasintrinsicmusicality,butalsothatdifferencesinmusical18

representationofgenomiclocireflectdifferencesoftheRNAlevelsoftheunderlyinggenes.19

20

Introduction21

Sonificationistheprocessofconvertingdataintosound.Sonificationitselfhasalong,yetpunctuated,22

storyofapplicationsinmolecularbiology,severalalgorithmstotranslateDNA(2)orproteinsequences23

(3, 4) tomusical scores have been proposed. The sameprinciples have also been extended to the24

analysisofcomplexdata(5)showingthat,all inall,sonificationcanbeusedtodescribeandclassify25

data.Indeed,theverysameproceduresmayalsobeappliedforrecreationalpurposes.26

27

OneofthelimitationsofsonificationofactualDNAandproteinsequencesistheirintrinsicconservative28

nature.Assuming thedifferences in two individual genomes are, on average, onenucleotideevery29

kilobase(6),thecorrespondingmusicalscoreswouldhavelittledifferences.30

.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted January 21, 2016. . https://doi.org/10.1101/037523doi: bioRxiv preprint

Page 3: Chromes from Chromatin: Sonification of the …...2016/01/21  · 21 Introduction 22 Sonification is the process of converting data into sound. Sonification itself has a long, yet

Onthecontrary,dynamicrangestypicaloftranscriptomicandepigenomicdatamayprovidearicher1

sourceforsonification.2

3

In thisworkwedescribeanapproach toconvertChIP-seqsignals,and inprincipleanyquantitative4

genomicfeature,intoamusicalscore.Westartedworkingonourapproachforamusementmainly,and5

werealizedthatthesonificatedchromatinsignalweresurprisinglyharmonious.Wethentriedtoassess6

somepropertiesofthemusictrackswewereabletogenerate.Weshowthattheemergingsoundsare7

notrandomandinsteadappearmoremelodiousandtunefulthanmusicgeneratedfromrandomized8

notes.WealsoshowthatdifferentChIP-seqsignalscanbecombinedintoasinglemusicaltrackand9

that tracks representing different conditions can be compared allowing for the prediction of10

differentiallyexpressedgenes.11

Examples of sonification for various genomic loci are available at https://soundcloud.com/davide-12

cittaro/sets/k56213

14

Definitions15

MIDI: MIDI (Musical Instrument Digital Interface) is a standard that describes protocols for data16

exchangeamongavarietyofdigitalmusicalinstruments,computersandrelateddevices.MIDIformat17

encodes information about note notation, pitch, velocity and other parameters controlling note18

execution(e.g.volumeandsignalsforsynchronization).19

MIDIfileformat:abinaryformatrepresentingMIDIdatainahierarchicalsetofobjects.Atthetopof20

hierarchythereisaPattern,whichcontainsalistofTracks.ATrackisalistofMIDIevents,encodingfor21

noteproperties.MIDIeventshappenatspecifictime,whichisalwaysrelativetothestartofthetrack.22

MIDIResolution:resolutionsetsthenumberoftimesthestatusbyteissentforaquarternote.The23

highertheresolution,themorenaturalthesoundisperceived.ResolutionisthenumberofTicksper24

quarternote.AtaspecificresolutionR,TickdurationinmicrosecondsTisrelatedtotempo(expressed25

inBeatsperMinute,BPM)bythefollowingequation26

27

! = 60%&'(28

29

30

.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted January 21, 2016. . https://doi.org/10.1101/037523doi: bioRxiv preprint

Page 4: Chromes from Chromatin: Sonification of the …...2016/01/21  · 21 Introduction 22 Sonification is the process of converting data into sound. Sonification itself has a long, yet

Results1

Approach2

InordertotranslateasingleChIP-seqsignaltracktomusicwebinthesignaloveraspecifiedgenomic3

interval (i.e. chrom:start-end) into fixed-size windows (e.g. 300 bp) and note duration will be4

proportionaltothesizeofsuchwindows.AswearedealingwithMIDIstandard,welettheuserspecify5

track resolution and the number of ticks per window (see definitions); the combination of these6

parametersdefinesthedurationofasinglenote.Thedefaultparametersassociateabinof300bpwith7

onequaver(1/8note).8

9

Inordertodefinethenotepitch,wetakethelogarithmoftheaverageintensityoftheChIP-seqsignal10

in a genomicbin. The sounding rangeof thewhole signal is discretized in apredefinednumberof11

semitones.Atdefaultparameters,therangeisbinnedinto52semitones,coveringfouroctaves.Inorder12

tointroducepauses,thelowestbinofthesignalrangerepresentsarest.Iftwoconsecutivenotesor13

restsfallinthesamebin,wemergetheminonenotedoublingitsduration.14

15

Usingthisapproach,anyChIP-seqsignalcanbemappedtoachromaticscale.Weimplementedthe16

possibilitytomapasignalonadifferentscale(major,minor,pentatonic…);tothisend,intensitybin17

boundariesaremergedaccordingtothedefinitionofaspecificscale(Figure1).MIDItracksproduced18

in thisway canbe then imported intoa sequencer softwarewhere they canbe furtherprocessed,19

settingtempoandtimesignature.20

21

Musicproducedfromchromatinmarksisnotperceivedasarandompattern22

Inordertotestwhethersonificationofchromatinmarksareperceivedasrandompatterns,weselected23

tengenomicregionsandgeneratedcorrespondingtracksbasedonthefollowinghistonemodifications:24

H3K27me3,H3K27ac,H3K9ac,H3K36me3,H3K4me1,H3K4me2,H3K4me3,H3K9me325

(SupportingAudiofilesS1.1toS10.1).Forthesameregions,werandomizedgenomicsignalatbase26

and bin level (Supporting Audio files S1.2 to S10.2).When data are randomized at base level, the27

average intensity isuniformsacross thebins, resulting ina repeatednote (datanot shown); this is28

largelyexpectedasChIP-seqsignalsaredistributedonthegenomeaccordingtoaPoissonlaw(7)or,29

moreprecisely,toaNegativeBinomiallaw(8).30

.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted January 21, 2016. . https://doi.org/10.1101/037523doi: bioRxiv preprint

Page 5: Chromes from Chromatin: Sonification of the …...2016/01/21  · 21 Introduction 22 Sonification is the process of converting data into sound. Sonification itself has a long, yet

1

Randomizationatbinlevel,instead,equalstoshufflingnotesduringtheexecution.Weadministrated2

aquestionnairetoasetofvolunteers(n=8)notpreviouslytestedforeducationinmusic.Volunteers3

wereaskedtolistentoeachpairoforiginal/randomtrackandchoosewhichtracktheyfeltwasmore4

appealing.Trackorderwasrandomizedwhentestingdifferentvolunteers.Notably,inthemajorityof5

thecases(62/80)themusicgeneratedfromgenomicsignalwithoutrandomizationwasjudgedmore6

appealing.Resultsaresignificant toaFisher-exact test (p=1.95e-3), suggesting thatgenomic signals7

contain informationthatcanberecognizedbyhumanear.Thenumberofcorrectanswersforeach8

volunteerrangedfrom5to10,withamedianvalueof8.9

10

Differencesinmusicaltracksreflectdifferencesingeneexpression11

Onceweassessedtheexistenceofmusicalpatterningenomicssignals,wewerekeentoexploreifthis12

kindofinformationcouldbeexploitedtoidentifybiologicalfeaturesofsamples.Sincetheepigenetic13

DNAmodificationsreflectedbyhistonemarksinfluencegeneexpression(9),wetestedifdifferencesin14

musicaltracksgeneratedfromvariousChIP-seqsignalsreflectsdifferencesingeneexpressionofthe15

corresponding loci. To this end, we downloaded ChIP-seq marks (H3K27me3, H3K27ac, H3K9ac,16

H3K36me3,H3K4me1,H3K4me2,H3K4me3,H3K9me3,Pol2b)andRNA-seqdataforK562andNHEK17

celllinesfromtheENCODEproject(10).ForeachRefSeqlocusweconvertedChIP-seqsignalstomusic18

withfixedparameters(seemethods).RNA-seqdatawereusedtoidentifygenesthataredifferentially19

expressedbetweenthetwocelllines,underap-value<0.01and|logFC|>1,accordingtorecentSEQC20

recommendations(11).21

22

Acommonwaytoclassifymusicisbasedonsummarizationoftrackfeaturesafterspectralanalysis(12,23

13).SuchapproachinvolvesthesummarizationoftrackasMel-FrequencyCepstralCoefficients(MFCC)24

thataresubsequentlyclusteredusingGaussianMixtureModels(GMM).Adistancebetweentrackscan25

thenbedefinedasdescribedin(14),whouseditasaclassifierformusicalgenres.26

27

Wetestedifasimilarapproachcouldbeusedtodevelopapredictorofdifferentialexpressionbased28

onthedistancebetweenmusicaltracksgeneratedfromtwocelllines.29

.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted January 21, 2016. . https://doi.org/10.1101/037523doi: bioRxiv preprint

Page 6: Chromes from Chromatin: Sonification of the …...2016/01/21  · 21 Introduction 22 Sonification is the process of converting data into sound. Sonification itself has a long, yet

Wedefinedadistancebetweensongsasdescribedinmethodsandweoptimizedtheparametersusing1

asatrainingsetthe250geneswiththemostsignificantdifferentialexpressionp-valueandasmany2

genes with the least significant p-value according to RNA-seq (Figure 2). We found that optimal3

performanceisatMFCC=30andGMM=10,withanAUC=0.609.4

5

We summarized tracks representing all RefSeq genes using such parameters, we then compared6

distanceswithdifferentialexpressionperformingaROCanalysis.Ourresultsindicatethatdifferences7

ininformationcontainedinmusicalrepresentationofchromatinsignalsmaybelinkedtodifferential8

expression,althoughpowerofpredictionislimited(AUC=0.5184,p=1.4597e-03).9

10

Similaritybetweenmusicaltracksoverlapssimilarbiologicalproperties11

Asadditional issuewewantedtoassess ifsimilaritiesbetweenmusicalrepresentationofchromatin12

status may be linked to the biology of the underlying genes. To this end, we calculated pairwise13

distancesforallregionsusingparametersidentifiedaboveonK562cellline.Hierarchicalclusteringof14

the distance matrix identifies eight major clusters (Figure 3, left). We performed Gene Ontology15

Enrichmentanalysisoneachcluster,here representedaswordcloudof significant terms (Figure3,16

center,supplementaryTableS2);wefoundthatdifferentclustersarelinkedtogenesshowingdifferent17

biologicalproperties.Forexample,someclusters(6,7and8)werelinkedtoregulationofcellcycle,18

otherswerelinkedtometabolicprocesses(2and5)orvesicletransport(3and4).Wealsoevaluated19

thedistributionofexpression(expressedaslog(RPKM))oftheunderlyinggenes(Figure3,right);we20

foundthatregionsclusteredbythedistancebetweenmusicaltracksbroadlyreflectsgroupsofgenes21

withdifferentlevelofexpression,spottingclustersofhigherexpression(cluster5)orlowerexpression22

(clusers2and3);assessmentofstatisticalsignificanceofdifferencesindistributionofgeneexpression23

valuesamongclustersispresentedinTable1.24

25

Discussion26

Chromatin shape and genome function are governed, among several factors, by the coordinated27

organizationofepigeneticmarks (15).Modificationsof suchmarksaredynamicandare fine-tuned28

duringthelifeofacelloranorganism.Analysisofhistonemodifications,aswellastranscriptionfactors29

andotherproteinsbindingDNA,byChIP-seqalreadydescribedpatternsofenrichmentthatarespecific30

.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted January 21, 2016. . https://doi.org/10.1101/037523doi: bioRxiv preprint

Page 7: Chromes from Chromatin: Sonification of the …...2016/01/21  · 21 Introduction 22 Sonification is the process of converting data into sound. Sonification itself has a long, yet

totheirrelativefunction(16,17).Analysisofcombinatorialpatternsofhistonemodificationsalready1

unveileditspotentialinunderstandingfunctionalpropertiesofthegenome(18,19)andthecross-talk2

amongmultiplechromatinmarks(20).3

4

Weshow,inthiswork,thattheinformationcarriedbymultiplehistonemodificationscanbecaughtin5

ahuman-friendlywaybytranslatingChIP-seqsignalsintomusicalscores.Althoughtheinvestigationof6

thepsychologicalfactorsthatunderlietunefulperceptionofsonificatedgenomicsignalsisoutofthe7

scopeofthismanuscript,ourresultssuggestthathumanhearingisabletoperceivepatternsconveying8

informationencodedinChIP-seqdataanalyzedandtodistinguishfromrandomnoise.9

10

Weautomatedtheanalysisofdifferencesbetweenmusicaltracksusinganestablishedmethodbased11

on summarization of spectral data. By this approach, we investigated the possible link between12

differencesinwayschromatinsoundsandphenotypicfeatures.Ourresultssuggestthatdifferencesin13

transcript levels can be predicted by the differences of sonificated genomic regions, although14

performancesofsuchapproachare limited.Wereasonedthatmany factorsmayexplainsuchpoor15

results:firstofallthereisavastspaceofparametersthatcanbetunedtocreateasinglemusicaltrack16

andwestilllackmethodstoexploreitefficiently.Inaddition,theMelscaleusedtosummarizeaudio17

signalhasbeendevelopedtomatchhumancapabilitiestoperceivesound(21),henceitmaynotbe18

optimalforthecomparisonofthetracksgeneratedinthiswork.19

20

Ithasalreadybeenshownthatitispossibletopredictlevelsofgeneexpressionstartingfromchromatin21

states, although themethod used to perform chromatin segmentation has a large impact on such22

predictions(22).Inthisworkwefoundthatdifferencesinchromatin-derivedmusicreflects,tosome23

extent,differencesinlevelofexpressionofunderlyinggenesandtheirrelatedbiology.24

25

Toconclude,althoughwecannotadvocatetheusageofmusicalanalysisasuniversaltooltoanalyze26

biologicaldatayet,weconfirmthatquantitativefeaturesonthegenomearepatternedandcontain27

information,hencecanbeconvertedintosoundsthatareperceivedasmusical.Welimitedouranalysis28

onspecificchromatinmodifications,butinprincipleanyquantitativegenomicfeaturecanbeconverted29

andintegratedintoamusicaltrack.Thechoiceofparametersandinstrumentshasbeenstandardized30

.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted January 21, 2016. . https://doi.org/10.1101/037523doi: bioRxiv preprint

Page 8: Chromes from Chromatin: Sonification of the …...2016/01/21  · 21 Introduction 22 Sonification is the process of converting data into sound. Sonification itself has a long, yet

fortheanalysispresented,forillustrativepurposeweshowthatdifferentsignalsfromthesameregion1

canbecombinedusingdifferent instruments(https://soundcloud.com/davide-cittaro/random-locus-2

blues) and signals fromdifferent genomic regions can bemerged (https://soundcloud.com/davide-3

cittaro/non-homologous-end-joining).4

5

6

MaterialsandMethods7

8

SonificationofENCODEChIP-seqdata9

Rawdata for variousmodificationswere downloaded from (GSE26320). Read tagswere aligned to10

humangenome(hg19)usingbwaaligner(23).Alignmentswereconvertedtobigwigtracks(24)after11

filteringforduplicatesandqualityscorehigherthan15.12

Inordertodefineregionstobeconvertedtomusicscores,weselectedintervalsaroundRefSeqgene13

definition,from1kbupstreamofTSSto2kbdownstreamofTES.ChIP-seqsignalwerefirstlyconverted14

toMIDIusingcustomscripts(https://bitbucket.org/dawe/enconcert)accordingtoparametersdefined15

inTable2.MIDItracksbelongingtothesameregionfromthesamesampleweremergedintoasingle16

MIDIfile,convertedtoWAVformatusingtimiditysoftware(http://timidity.sourceforge.net),withthe17

exceptionoftrackspresentedassupplementaryinformationwhichhavebeenprocessedwithApple18

GarageBandsoftware.19

20

ComparisonofWAVtracks21

In order to compare four samples for each converted genomic region, we extracted MFCC using22

python_speech_featureslibrary(https://github.com/jameslyons/python_speech_features).Selected23

componentswerethenclusteredusingGaussianMixtureModels,implementedinscikit-learnpython24

library0.15.2 (http://scikit-learn.org).Distancebetween two trackswereevaluatedusingHausdorff25

distance(H)betweenGMMclusters.Briefly,wefirstcalculateallpairwisedistancesbetweenGMM26

clustersusingBhattacharyyadistance(B)formultivariatenormaldistributionsas27

28

& = 18 +, − +. /'0. +, − +. + 12 345

'6, 6.

29

.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted January 21, 2016. . https://doi.org/10.1101/037523doi: bioRxiv preprint

Page 9: Chromes from Chromatin: Sonification of the …...2016/01/21  · 21 Introduction 22 Sonification is the process of converting data into sound. Sonification itself has a long, yet

1

where2

3

' = 6, + 6.2 4

5

then,asGMMarenotordered,wetaketheHausdorffdistance(H)asthemaximumbetweentherow-6

wiseandcolumn-wiseminimumofthepairwisedistancesbetweentwoGMMsets.ROCanalysison7

musicdistanceswasperformedoverthevalueofD,definedas8

9

7 = 345 1 + 8 − 345 1 + : 10

11

where12

13

: = ; <562>, <562@ + ; A;B<>,A;B<@2 14

15

istheaverageofdistancesbetweenreplicatesand16

17

8 = ; <562>, A;B<> + ; <562>, A;B<@ + ; <562@, A;B<> + ; <562@, A;B<@4 18

19

istheaverageofpairwisedistancesamongdifferentcelllines.20

21

Assessmentofdifferentiallyexpressedgenes22

RNA-seq tagswere aligned to reference genomeusing STAR aligner (25). Read counts overRefSeq23

intervalswereextractedusingbedtools.DiscretecountswerenormalizedwithTMM(26),differential24

gene expressionwas evaluated using the voom function implemented in limma (27)with a simple25

contrastbetweentwocelllines.Geneswereconsidereddifferentiallyexpressedunderap-valuelower26

than0.01andabsolutelogarithmFoldChangehigherthan1.27

28

.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted January 21, 2016. . https://doi.org/10.1101/037523doi: bioRxiv preprint

Page 10: Chromes from Chromatin: Sonification of the …...2016/01/21  · 21 Introduction 22 Sonification is the process of converting data into sound. Sonification itself has a long, yet

Clusteranalysis1

Clusteranalysiswasperformedonreplicate1ofK562dataset.WecalculatedallpairwiseHausdorff2

distances among genomic loci as defined above. Data were clustered using the Ward method.3

EnrichmentanalysiswasperformedusingEnrichr (28).Wordcloudswerecreatedwithworld_cloud4

pythonpackage(https://github.com/amueller/word_cloud)usingtextdescriptionofontologieshaving5

positiveEnrichrcombinedscore.DifferentialexpressionamongclusterswasevaluatedusingMann-6

WhitneyU-test.7

8

Acknowledgements9

Authorswouldliketothankallcollaboratorsandrelativeswhokindlysacrificedtheirtimetolistento10

musicgeneratedwhilethisworkwasdeveloped.11

12

13

References14

15161. JuslinPN(2013)Whatdoesmusicexpress?Basicemotionsandbeyond.FrontPsychol4:596.17

2. OhnoS,OhnoM(1986)Theallpervasiveprincipleofrepetitiousrecurrencegovernsnotonly18codingsequenceconstructionbutalsohumanendeavorinmusicalcomposition.19Immunogenetics24(2):71–78.20

3. KingRD,AngusCG(1996)PM--proteinmusic.ComputApplBiosci12(3):251–252.21

4. TakahashiR,MillerJH(2007)Conversionofamino-acidsequenceinproteinstoclassicalmusic:22searchforauditorypatterns.GenomeBiol8(5):405.23

5. LarsenP,GilbertJ(2013)MicrobialBebop:CreatingMusicfromComplexDynamicsinMicrobial24Ecology.PLoSONE8(3):e58119.25

6. ConsortiumT1GP,etal.(2013)Anintegratedmapofgeneticvariationfrom1,092human26genomes.Nature490(7422):56–65.27

7. ZhangY,etal.(2008)Model-basedanalysisofChIP-Seq(MACS).GenomeBiol9(9):R137.28

8. RobinsonMD,McCarthyDJ,SmythGK(2009)edgeR:aBioconductorpackagefordifferential29expressionanalysisofdigitalgeneexpressiondata.doi:10.1093/bioinformatics/btp616.30

9. KouzaridesT(2007)ChromatinModificationsandTheirFunction.Cell128(4):693–705.31

.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted January 21, 2016. . https://doi.org/10.1101/037523doi: bioRxiv preprint

Page 11: Chromes from Chromatin: Sonification of the …...2016/01/21  · 21 Introduction 22 Sonification is the process of converting data into sound. Sonification itself has a long, yet

10. ENCODEProjectConsortium,ConsortiumTEP(2012)AnintegratedencyclopediaofDNA1elementsinthehumangenome.Nature489(7414):57–74.2

11. SEQC/MAQC-IIIConsortium,SEQC/MAQC-IIIConsortium(2014)Acomprehensiveassessment3ofRNA-seqaccuracy,reproducibilityandinformationcontentbytheSequencingQuality4ControlConsortium.doi:10.1038/nbt.2957.5

12. LoganB(2000)MelFrequencyCepstralCoefficientsforMusicModeling.ISMIR.6

13. DoplerM,SchedlM,PohleT,KneesP(2008)AccessingMusicCollectionsViaRepresentative7ClusterPrototypesinaHierarchicalOrganizationScheme.ISMIR.8

14. AucouturierJJ,PachetF,SandlerM(2005)“ThewayitSounds”:timbremodelsforanalysisand9retrievalofmusicsignals.IEEETransMultimedia7(6):1028–1035.10

15. ZhouVW,GorenA,BernsteinBE(2011)Chartinghistonemodificationsandthefunctional11organizationofmammaliangenomes.NatRevGenet12(1):7–18.12

16. BarskiA,etal.(2007)High-resolutionprofilingofhistonemethylationsinthehumangenome.13Cell129(4):823–837.14

17. ValouevA,etal.(2008)Genome-wideanalysisoftranscriptionfactorbindingsitesbasedon15ChIP-Seqdata.NatMeth5(9):829–834.16

18. ErnstJ,KellisM(2012)ChromHMM:automatingchromatin-statediscoveryand17characterization.NatMeth9(3):215–216.18

19. SongJ,ChenKC(2015)Spectacle:fastchromatinstateannotationusingspectrallearning.19GenomeBiol16(1):33.20

20. PernerJ,LasserreJ,KinkleyS,VingronM,ChungHR(2014)Inferenceofinteractionsbetween21chromatinmodifiersandhistonemodifications:fromChIP-Seqdatatochromatin-signaling.22NucleicAcidsRes42(22):13689–13695.23

21. StevensSS,VolkmannJ,NewmanEB(1937)AScalefortheMeasurementofthePsychological24MagnitudePitch.TheJournaloftheAcousticalSocietyofAmerica8(3):185–190.25

22. HoffmanMM,etal.(2013)IntegrativeannotationofchromatinelementsfromENCODEdata.26NucleicAcidsRes41(2):827–841.27

23. LiH,DurbinR(2010)Fastandaccuratelong-readalignmentwithBurrows-Wheelertransform.2826(5):589–595.29

24. KentWJ,ZweigAS,BarberG,HinrichsAS,KarolchikD(2010)BigWigandBigBed:enabling30browsingoflargedistributeddatasets.doi:10.1093/bioinformatics/btq351.31

25. DobinA,etal.(2012)STAR:ultrafastuniversalRNA-seqaligner.29(1):15–21.32

.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted January 21, 2016. . https://doi.org/10.1101/037523doi: bioRxiv preprint

Page 12: Chromes from Chromatin: Sonification of the …...2016/01/21  · 21 Introduction 22 Sonification is the process of converting data into sound. Sonification itself has a long, yet

26. RobinsonMD,OshlackA(2010)Ascalingnormalizationmethodfordifferentialexpression1analysisofRNA-seqdata.GenomeBiol11(3):R25.2

27. RitchieME,etal.(2015)limmapowersdifferentialexpressionanalysesforRNA-sequencingand3microarraystudies.NucleicAcidsRes.doi:10.1093/nar/gkv007.4

28. ChenEY,etal.(2013)Enrichr:interactiveandcollaborativeHTML5genelistenrichment5analysistool.BMCBioinformatics14:128.6

78

.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted January 21, 2016. . https://doi.org/10.1101/037523doi: bioRxiv preprint

Page 13: Chromes from Chromatin: Sonification of the …...2016/01/21  · 21 Introduction 22 Sonification is the process of converting data into sound. Sonification itself has a long, yet

FigureLegends1

Figure1:Graphicalrepresentationotheapproachusedtotransformquantitativesignalstomusic.ChIP-2

seqvalues(H3K4me3intheexample)arebinnedinfixed-sizeintervalsoverthegenome.Eachinterval3

correspondstoa1/8note.Averagevaluesoflog-transformofreadcountsineachgenomicbin(red4

lines)arematchedintopredefinednumberofsemitones(chromaticscale).Notesmaybemappedtoa5

specified scale (major andminor scales areexemplified in the figure). Consecutiveequal notes are6

mergedinsinglenotewithdoubleduration.Valuesfallinginthefirstbinareconsideredrests.7

8

Figure2:Evaluationofdifferentcombinationofparametersinpredictingdifferentialgeneexpression9

on a gold-standard subset of 500 genes. Each square corresponds to a number ofMel-Frequency10

CepstralCoefficients(MFCC)usedtosummarizesignalandanumberofcentersforGaussianMixture11

Model(GMM).ColorsaregivenbythecorrespondingAreaUndertheCurve(AUC)12

Figure3:Hierarchicalclusteringofgenomicregionsidentifies8mainclusters(left).Eachclusterbroadly13

correspondstospecificbiologicalpropertiesaccordingtoGeneOntologyenrichedterms(middle).Level14

ofexpressionofgenesincludedineachclustershowspecificdistributions(right).15

16

.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted January 21, 2016. . https://doi.org/10.1101/037523doi: bioRxiv preprint

Page 14: Chromes from Chromatin: Sonification of the …...2016/01/21  · 21 Introduction 22 Sonification is the process of converting data into sound. Sonification itself has a long, yet

Cluster2 Cluster3 Cluster4 Cluster5 Cluster6 Cluster7 Cluster8

Cluster1 2.635e-23 3.548e-18 8.127e-37 8.627e-125 2.859e-83 3.231e-71 3.169e-63

Cluster2 1.068e-01 2.008e-01 4.005e-01 2.801e-01 7.907e-02 9.467e-02

Cluster3 2.880e-01 2.225e-02 1.399e-01 3.910e-01 3.745e-01

Cluster4 4.159e-02 3.025e-01 2.614e-01 2.821e-01

Cluster5 5.098e-04 3.898e-09 7.586e-07

Cluster6 1.259e-02 2.866e-02

Cluster7 4.545e-01

Table1p-valuesofMann-WhitneyU-testfordifferencesindistributionofexpressionbetweenclusters.Testsshowingsignificantdifference1(p≤0.05)arepresentedinboldface.2

3

.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted January 21, 2016. . https://doi.org/10.1101/037523doi: bioRxiv preprint

Page 15: Chromes from Chromatin: Sonification of the …...2016/01/21  · 21 Introduction 22 Sonification is the process of converting data into sound. Sonification itself has a long, yet

1Antibody Scale Octave Key TickSize BinSize

H3K27me3 minor 4 B 600 400

H3K27ac minor 3 B 900 600

H3K9ac minor 3 B 1200 800

H3K36me3 minor 4 B 300 200

H3K4me1 minor 3 B 1200 800

H3K4me2 minor 3 B 300 200

H3K4me3 minor 3 B 300 200

H3K9me3 minor 4 B 300 200

Pol2 minor 4 B 600 400

Table2–ParametersusedtoconvertdifferentChIP-seqsignalsintocorrespondingmusicaltrakcs2

34

.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted January 21, 2016. . https://doi.org/10.1101/037523doi: bioRxiv preprint

Page 16: Chromes from Chromatin: Sonification of the …...2016/01/21  · 21 Introduction 22 Sonification is the process of converting data into sound. Sonification itself has a long, yet

.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted January 21, 2016. . https://doi.org/10.1101/037523doi: bioRxiv preprint

Page 17: Chromes from Chromatin: Sonification of the …...2016/01/21  · 21 Introduction 22 Sonification is the process of converting data into sound. Sonification itself has a long, yet

.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted January 21, 2016. . https://doi.org/10.1101/037523doi: bioRxiv preprint

Page 18: Chromes from Chromatin: Sonification of the …...2016/01/21  · 21 Introduction 22 Sonification is the process of converting data into sound. Sonification itself has a long, yet

.CC-BY-NC-ND 4.0 International licensewas not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (whichthis version posted January 21, 2016. . https://doi.org/10.1101/037523doi: bioRxiv preprint