Upload
maira-borges-wiese
View
214
Download
0
Embed Size (px)
Citation preview
7/27/2019 2011.How to Compare One Million Images
http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 1/40
1/40
HowtoCompareOneMillionImages?
Dr.LevManovich,
Professor,VisualArtsDepartment,UniversityofCalifornia,SanDiego(UCSD).
Director,SoftwareStudiesInitiative(softwarestudies.com)atCaliforniaInstituteforTelecommunicationandInformation(Calit2).
Dr.JeremyDouglass,
Post‐doctoralResearcher,SoftwareStudiesInitiative.
TaraZepel,
Ph.D.candidate,ArtHistory,Theory,&Criticism,UCSD.
Exploringonemillionmangapagesonthe287megapixelHIPerSpace(TheHighlyInteractiveParallelized
DisplaySpace)atCalit2,SanDiego.HIPerSpaceoffers35,840x8,000pixelsresolution(287megapixels)on
31.8feetwideand7.5feettalldisplaywallmadefrom7030‐inchmonitors.
7/27/2019 2011.How to Compare One Million Images
http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 2/40
2/40
INTRODUCTION
ThedescriptionofjointNEH/NSFDiggingintoDatacompetition(2009)organizedbyOfficeof
DigitalHumanitiesattheNationalEndowmentofHumanities(theU.S.federalagencywhichfunds
humanitiesresearch)openedwiththesequestions:“Howdoesthenotionofscaleaffecthumanities
andsocialscienceresearch?Nowthatscholarshaveaccesstohugerepositoriesofdigitizeddata—
farmorethantheycouldreadinalifetime—whatdoesthatmeanforresearch?”Ayear,later,anarticleinNewYorkTime(November16,2010)stated:“Thenextbigideainlanguage,historyand
thearts?Data.”
Whiledigitizedarchivesofhistoricaldocumentscertainlyrepresentajumpinscaleincomparison
totraditionallysmallcorporausedbyhumanists,researchersandcriticsinterestedin
contemporaryculturehaveevenalargerchallenge.WiththenotableexceptionofGoogleBooks,the
sizeofdigitalhistoricalarchivespalesincomparisontothequantityofdigitalmediacreatedby
contemporaryculturalproducersandusers–designs,motiongraphics,websites,blogs,YouTube
videos,Flickrphotos,Facebookpostings,Twittermessages,andotherkindsofprofessionaland
participatorymedia.Thisquantitativechangeisasatleastasimportantastheotherfundamental
effectsofthepolitical,technologicalandsocialprocessesthatstartaftertheendoftheColdWar
(forinstance,freelong‐distancemultimediacommunication).InanearlierarticleIdescribedthisin
thefollowingway:
Theexponentialgrowthofanumberofbothnon‐professionalandprofessionalmediaproducersoverthelast
decadehascreatedafundamentallynewculturalsituationandachallengetoournormalwaysoftracking
andstudyingculture.Hundredsofmillionsofpeopleareroutinelycreatingandsharingculturalcontent‐
blogs,photos,videos,onlinecommentsanddiscussions,etc.Atthesametime,therapidgrowthof
professionaleducationalandculturalinstitutionsinmanynewlyglobalizedcountriesalongwiththeinstant
availabilityofculturalnewsoverthewebandubiquityofmediaanddesignsoftwarehasalsodramatically
increasedthenumberofcultureprofessionalswhoparticipateinglobalculturalproductionanddiscussions.
Before,culturaltheoristsandhistorianscouldgeneratetheoriesandhistoriesbasedonsmalldatasets(for
instance,"ItalianRenaissance,""classicalHollywoodcinema,"“post‐modernism,”etc.)Buthowcanwetrack"globaldigitalcultures",withtheirbillionsofculturalobjects,andhundredsofmillionsofcontributors?
Beforeyoucouldwriteaboutculturebyfollowingwhatwasgoingoninasmallnumberofworldcapitalsand
schools.Buthowcanwefollowthedevelopmentsintensofthousandsofcitiesandeducationalinstitutions?
(Manovich,CulturalAnalytics,2009).
Whiletheavailabilityoflargedigitizedcollectionsofhumanitiesdatacertainlycreatesthecasefor
humaniststousecomputationaltools,theriseofsocialmediaandglobalizationofprofessional
cultureleaveusnootherchoice.Buthowcanweexplorepatternsandrelationsbetweensetsof
photographs,designs,orvideo,whichmaynumberinhundredsofthousands,millions,orbillions?
(Bysummer2010,Facebookcontained48billionphotos;deviantArt.com,theleadingsitefornon‐
professionalart,housed100millionsubmissions;coroflot.com,asiteusedbyprofessional
designershad200,00portfolios.)
In2007wehavesetupanewlabcalledSoftwareStudiesInitiative
(http://www.softwarestudies.com)atUniversityofCalifornia,SanDiego(UCSD)andCalifornia
InstituteforTelecommunicationandInformation(Calit2)toaddressthesechallenges.We
developedanumberofmethodsandtechniquesfortheanalysisandvisualizationoflargesetsof
images,video,andinteractivevisualmedia.Thisarticledescribesourkeymethodthatconsists
fromtwoparts:1)automaticdigitalimageanalysisthatgeneratesnumericaldescriptionsof
variousvisualcharacteristicsoftheimages;2)visualizationsthatshowthecompleteimageset
organizedbythesedimensions.
7/27/2019 2011.How to Compare One Million Images
http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 3/40
3/40
Wehavealreadysuccessfullyappliedthismethodforresearchingmanytypesofvisualmedia:
comics,webcomics,videogames,virtualworlds,video,films,cartoons,motiongraphics,print
magazines,paintings,andphotographs.Examplesinclude167,000imagesfromArtNowFlickr
group,100hoursofKingdomHeartsvideogameplay,and20,000pagesofScienceandPopular
Sciencemagazines(1872‐1922).(Formoredetailsandotherprojects,seeDigitalHumanities
sectionofourlabblog http://www.softwarestudies.com;youcanalsofindourvisualizationson
YouTubeandFlickrathttp://www.flickr.com/photos/culturevis/collections/).
Totesthowthismethodwouldworkwithlargerdatasets,intheFall2009wedownloadedthe
completerunsof883differentmangaseriesfromthemostpopularwebsitefor“scanlations”
OneManga.com.(“Scanlation”referstotomangapublicationswhicharedigitizedandtranslatedby
fans.)Weobtainedallpagesavailablefortheseseriesalongwiththeuser‐assignedtagsindicating
thegenresandintendedaudiencesforthem(SeeDouglass,Huber,Manovich,Understanding
Scanlation).
Theresultingdatasetcontains1,074,790mangapages.EachpageisintheformofaJPEGimage;
averageimageresolutionis850x1150pixels.Weusedourcustomdigitalimageprocessing
softwaretomeasureanumberofvisualfeaturesofeverypage(shading,presenceoffinedetails,
textureproperties)usingsupercomputersattheNationalEnergyResearchScientificComputing
Center(NERSC).(A“feature”isthecommontermindigitalimageprocessing;itreferstoa
numericaldescriptionofsomeimagepropertysuchasaveragegrayscalevalue,orthenumberof
lines.Forthereadersfamiliarwithcomputationaltextanalysis:imagefeaturesarestructurally
similartotextfeatures;theyofferacompactdescriptionofthedata.)
Inthisarticleweusethechallengeofworkingwiththissetofonemillionmangapagestomotivate
theneedforacomputationalapproachfortheexplorationofcollectionsofsuchsize,andtoexplain
ourparticularmethodthatcombinesdigitalimageanalysisandanovelvisualizationtechnique.
Anyreflectiononculturebeginswithaninformalcomparisonbetweenmultipleobjectsinorderto
understandtheirsimilaritiesanddifferences.Forinstance,ifwewantouranalysistoproperly
reflecttherangeofgraphicaltechniquesusedtodaybymangaartistsacrossthousandsofmanga
books,millionsofpagesinthesebooks,andtensofmillionsofindividualpanels,weneedtobeable
toexaminedetailsofindividualimagesandtofindpatternsofdifferenceandsimilarityacrosslarge
numbersofimages.Todothis,weneedamechanismthatwouldallowustopreciselycomparesets
ofimagesofanysize–fromafewdozenstomillions.Wediscusshowourmethod,whichcombines
automaticdigitalimageanalysisandmediavisualization,addressestheserequirements.
HOWTOCOMPAREONEMILLIONIMAGES?
Today,atypicalpublicationinhumanitiesisbasedonthedetailedexaminationofasmallnumberof
artifacts(which,dependingonthefieldcanbeliterarytexts,TVprograms,films,videogames,etc.
Ofcourse,thisdoesnotmeanthattheauthoronlyconsideredtheseartifactsinisolation.Usuallythedetailedanalysisoftheseartifactsisperformedagainstthelargerhorizon‐theknowledgeof
thewiderculturalfield(s)whichisacquiredbothdirectly(forinstance,watchingfilms)or
indirectly(forinstance,readingpublicationinfilmstudies).Buthowreliableorcompleteisthis
backgroundknowledge?Forinstance,tocontinuewiththeexampleofcinema,IMDb
(www.imdb.com)containsinformationforoverahalfamillionfilmsproducedsincethebeginnings
ofcinema;howmanyofthemwereseenbyacademicfilmscholarsandfilmcritics?(Thesame
databaselists1,137,074TVepisodesasofsummer,2001;seeIMDbDatabaseStatistics).
Thefactthatusingtinysampleshasbeenadefaultmethodofhumanitiesuntilnowdoesnotmean
thatweshouldkeepusingitbydefault.IfGooglecananalyzebillionsofwebpagesandovera
trillionlinksseveraltimeseachday,weshouldbeabletodobetterthansimplyconsiderahandful
7/27/2019 2011.How to Compare One Million Images
http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 4/40
4/40
ofartifactsandgeneralizefromthem–evenifwedon’thavethesameresources.Thekeyreason
forthehugeinfrastructuremaintainedbyGoogleistheneedtoindexthewebinrealtime;inthe
caseofculturalanalysis,wedon’thavethesamerequirements,soweshouldbeabletoanalyze
largeculturaldatasetswithmuchsmallercomputerresources.Indeed,todaycomputerscientists
whostudyonlinesocialmediatypicallycaptureandanalyzedozensorhundredsofmillionsof
separateobjects‐photos,twitterupdates,blogposts,etc.‐withverysmallresources.(See
Cha,Kwak,Rodriguez,Ahn,Moon,ITube,YouTube,EverybodyTubes;Crandall,Backstrom,
Huttenlocher,Kleinberg,Mappingtheworld'sphotos;Kwak,Lee,Park,Moon.WhatisTwitter?).
Havingatourdisposalverylargeculturaldatasetswhichcanbeanalyzedautomaticallywith
computersandexploredusinginteractiveinterfacesandvisualizationopensupexistingnew
possibilities.Thesepossibilitiescanpotentiallytransformourunderstandingofculturesinmany
importantways.Insteadofbeingfuzzyandblurred,ourhorizon(knowledgeofaculturalfieldas
whole)canbecomerazorsharpandatthesametimeacquireanewdepth(beingabletosortand
clustermillionsofartifactsalongmultipledimensions).Thiswouldenrichourunderstandingofany
singleartifactbecausewewouldseeitinrelationtopreciselydelineatedlargerpatterns.Itwould
alsoallowustomakemoreconfidentstatementsaboutthefieldatlarge.Perhapsmostimportantly,
itwillerasethedistinctionbetweentheprecisionof“closereading”andimprecisionofa“zoomed
out”view“–betweenadetailedunderstandingofafewworksandveryapproximateideasabout
thefieldasawholewhichwenormallyformbymentallyinterpolatingbetweenasmallnumberof
factsandartifactswestudied.Itwillalsoerasetheseparationbetween“closereading”(detailed
analysisofsmallpartsoftexts)andFrancoMoretti’s“distantreading”(asitiscommonly
understood‐whichisnotthesameashowMorettidefinesit–seehisConjecturesonWorld
Literature,2000):analysisoflargescalepatternsinthedevelopmentofentiregenres,literary
productionofwholecountries,andthelikeusingawholenovelasanatomofanalysis(forinstance,
countinganumberofnovelsindifferentgenrespublishedoverahistoricalperiod.)Ratherthan
choosingonescaleofanalysis,wewouldbeabletoeasilytraversebetweenallthematwill,
observingpatternsatanyscale.
Anyautomaticcomputationalanalysisoflargesamplesofhumancultureswillhavemany
limitationsofitsown,andthereforeitwillnotreplacehumanintuitionandexperience.However,
whileweshouldkeepinmindthesevariouslimitations,theopportunitiesthatitoffersarestill
immense.Forexample,havingaccesstoamillionmangapagesshouldallowus,inprinciple,to
prettyreliablymapthefullspectrumofgraphicalpossibilitiesusedbycontemporarycommercial
Japanesemangaartists.Suchamappingwouldalsoallowustounderstandwhichmangaseriesare
mosttypicalandwhicharemostuniquestylistically;tofindallserieswheregraphicallanguage
significantlychangesovertime(todayalltopserieshavebeenrunningforanumberofyears);to
investigateifshorterseriesandlongerserieshavedifferentpatterns;toseparatetheartistswho
significantlyvarytheirgraphicallanguagesfromseriestoseriesfromtheartistswhodonot;etc.
Wealsowouldbeabletotakeanyhypothesisorobservationwemaymakewhileinformally
lookingthroughasmallsamplesofimages–forinstance,wemayobservethatthemangaaimedat
differentgendersandagegroupshasdistinctgraphicallanguages–andseeifitholdsacrossour
wholeset.
Buthowcanweaccomplishthisinpractice?Howcanwepossiblycompareonemillionmanga
pages?
WHATEYESCAN’TSEE
1.Letusstartbyselectingonlytwopagesfromourmangaimagesetsetandexaminingthem
directlywithoutanysoftwaretools.WewilltakethesepagesfromOnePieceandVampireKnight
series.Thefirstisoneofthebestsellingandtoprated shounen(teenageboys)manga;thesecondis
amongthetopshoujo(teenagegirls)manga.
7/27/2019 2011.How to Compare One Million Images
http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 5/40
5/40
Leftimage:samplepagefromVampireKnight .Rightimage:samplepagefromOnePiece.Accordingtothe
OneManga.comJune2010listoftop50seriestitlesaccessedbysiteusers, OnePiecewasno.2,whileVampire
Knight wasno.13.Accordingtowww.icv2.com,duringQ32010,OnePiecewasno.2inJapan,andVampire
Knightwasno.4.ThedifferenceinVampireKnight rankingislikelytoreflectdifferentproportionsof
male/femalemangareadersinsideandoutsideofJapan.
Wecancertainlynotemanykindsofstylisticdistinctionsbycomparingthesetwopages.For
instance,wemayobservethatapagefromVampireKinight containsdramaticdiagonalangles
createdbybothpaneldivisionsandlinesinsidethepanels,thefullpaletteofgreytonesfromwhite
togreytoblack;thelargeblacklettersrepresentingsoundsandcontributingadditionalvisual
energy.Incontrast,thepagefromOnePiecehasverylittleshading;thelineshavecurves;thepanels
areneatrectangles.However,howdoweknowifthesestylisticdistinctionsholdforallofthe10562OnePiecepagesand1423VampireKnight pageswhichwehaveavailable?Similarly,ifwe
wanttotalkaboutagraphicalstyleofanartistwhomayhaveproducedtensofthousandsofpages
acrossdozensoftitles,wouldselectingandexaminingafewdozenpagesbesufficient?Andwhatif
wewanttocompareallshounenandshoujomangainourdataset?Howmanypagesshouldwepull
outfromeachofthesecategoriestoreachauthoritativeconclusionsaboutthepossibledifferencein
theirgraphicallanguages?
Hereisanotherexample.Let’ssaywewanttocompareOnePiecewithanotherhighlypopular
shounen(teenageboys)manga:Naruto.Herearethetwosamplepagesfromthetwoseries.
7/27/2019 2011.How to Compare One Million Images
http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 6/40
6/40
Leftimage:samplepagefromNaruto.Rightimage:samplepagefromOnePiece.
Wecannoticecertaindifferencesingraphicalstylebetweenthesepages–buthowtypicalarethese
differencesforallof10562OnePiecepagesand8037Narutopageswehaveavailable?Incontrast
tothefirstexamplewherethestylesvarieddramatically,herethedifferencesaremoresubtle–
whichmakesitevenmoreproblematictogeneralizewhatweseeinthesetwopagestothe
completeseries.
7/27/2019 2011.How to Compare One Million Images
http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 7/40
7/40
10461scanlationpagesfromOnePiecewhichwereavailableononemanga.comintheFall2009,organizedby
sequenceofpublication(lefttoright,toptobottom).Thiscompositeimage(wecallsuchimages“montages”)
includesspecialpagesinsertedbyscanlationgroups(someofthemappearasdarkblacksquareswhenyou
7/27/2019 2011.How to Compare One Million Images
http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 8/40
8/40
lookatasmallversionofthevisualization).Note:Tofitallpagesinarectangulargrid,someofthepageswere
cropped.
Ifweonlyexamineasmallnumberofimagesfromamuchlargerimagesetatrandom,wecan’tbe
certainthatourobservationswillholdacrossthecompleteset.Forexample,inthecaseofourone
millionpagescollection(whichitselfisonlyasampleofallmangabeingpublishedcommercially),
100pagesis%0.0001ofallavailablepages.Wedon’thaveanyreasontobelievethatwhateverwe
mayobserveinthese100pagesistypicalofourmangasetawhole.
Thus,thefirstproblemwithusingournativeperceptiontocompareimagesinordertonotice
differencesandsimilaritiesisthatthisapproachdoesnotscale. Ifweonlyexamineasmallselection
ofmangapages,thisdoesnotqualifyustomakegeneralstatementsaboutthegraphicalstyleof
“best‐sellingmanga,”“shounenmanga,”orevenofasinglelongrunningseriessuchas Hajimeno
Ippo(15978pagesinourdataset.)
Examiningonlyasmallsampleofalargerimagesetalsoprecludesusfromunderstandingdetailed
patternsofchangeandevolution.Toillustratethis,wepulledoutthreepagesfromabest‐selling
OnePiecemangaseriesdrawn.Thepagescomefromchapters5,200,and400.Theseriesstartedin
1997;approximately600chapterswerepublishedbytheendof2010,withnewchapters
appearingweekly.Thus,thetimepassedbetweenchapter5andchapter400isapproximatelyeightyears;duringtheseperiods,theartistscreatedover7000pages.(Ourdatasourcedidnothave
exactinformationonpublicationdatesofeverychapter,andthat’swhywehavetoestimatethemin
thisway).Aswecanimmediatelyseebycomparingthesethreepages,thegraphicallanguageof
OnePieceapparentlychangedsignificantlyduringtheseeightyears‐buthow diditchange?Wasita
gradualtransition,aseriesofabruptchanges,orsomeothertemporalpattern?Unlesswehavea
mechanismtocompareamuchlargernumberofpages,wecan’tanswerthisquestion.
SamplepagesfromOnePiecemangaseriesdrawnfromthe5th,200th,and400 thchapters.
Inthisexample,itwillbebeneficialtopulloutalargersetofsamplepagesusingsomesystematic
procedure.Forexample,wecanselectevery100thpagetogetabetterideaofhowseries’visual
languageischangingovertime.Thisapproachcanalsobeappliedforcomparingdifferentmanga
series.Forinstance,wecouldtakeevery200thpageofNarutoandevery200thpageofOnePiece.
(Narutoisthemostpopularmangaseriesaroundtheworldtoday.)Sinceourdatasetcontains
approximately10,000pagesforeachoftheseseries,wewouldendupwith50pagesforeach.We
couldthenexaminethese100pagesinordertodescribethedifferencesbetweenthestylesofthe
7/27/2019 2011.How to Compare One Million Images
http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 9/40
9/40
twoseries.Or,ifourseriesarenotverylong,wecanuseadifferentprocedure.Wecanselectone
pagefromeachchapterofaseriesandthenusesuchsetsofpagestocompareseriestoeachother.
However,suchanapproachwillfailifthestylewithineachchapter varies.Thepageswemayselect
usingoursamplingproceduremaynotproperlyreflectallthesevariations.Thisisafundamental
problemwiththeideaofselectinganumberof“representativepages”andonlyworkingwiththis
smallerset.
Forinstance,considerallpagesfromoneof Abara’schapters(seeillustrationbelow).Whichpagein
eachchapterbestrepresentsitsstyle?Regardlessofwhichpageweselecttostandinforthewhole
chapter,itwillnotadequatelyconveythestylisticvariationsacrossallthepages.Somepages
consistmostlyfromblackandwhiteareas,withlittletexture;otherpageshavelotsofshadingand
usemostlygrey;stillotherscombinepanelsinbothstyles.
Ofcourse,ifweareusingacomputertobrowseourimageset,wearenolongerlimitedtopulling
outasmallsampleofimagesforcloseexamination.Modernoperatingsystems(OS)suchas
Windows,MacOS,iOS,andAndroid,imageorganizersoftwaresuchasGoogle’sPicasaandApple’
iPhoto,andwebphotosharingservicessuchasPhotobucketandFlickrallprovidetheoptionsto
browsethroughlotsofimages.Soifwe,forexample,addall10,461OnePiecepagestoiPhoto,we
canpan,zoominginandoutatwill,quicklymovingbetweenindividualimagesandthewholeset.
Thisshouldhelpustonoticeadditonalvisualdifferencesandaffinitiesacrosslargesetsofimages
beoyndwhosepickedupbyacloseanalysisofasmallsample.Unfortunately,thedisplaymodes
offeredbyexistingconsumersoftwareandwebservicesareratherinflexible.Typically,theonly
modesavailableareaslideshowandanimagegrid.Moreover,usuallytheimagescanbesortedby
onlyafewparameterssuchasuploadeddates,ornames,andtheusercan’teasilychangetheorder
inwhichimagesaredisplayed.Inordertodisplayimagesinadifferentorder,youhavetofirstto
assignnewkeyword(s)toeveryimage.Thispreventsspontaneousdiscoveryofinterestingpatterns
inanimageset.Insteadofreorganizingimagesalongdifferentdimensionsandquicklytrying
differentpossibilities,auserwouldhavetoknowfirsttheexactorderinwhichtodisplayimages.
Ofcourse,ifourimagesethassomeveryobviouspatterns–letssay,itconsistsfromimagesin
threedistinctstylesonly–theselimitedbrowsingmodesandfixedorderwouldbestillsufficient,
andwewilleasilynoticethesepatterns.Butsuchcasesareexceptionsratherthanthenorm.(While
Picassoworkedinanumberofdramaticallydifferentstyles,heisnottypical.)
Analternativetoexaminingasetofimagesinformally‐regardlessofwhetherwearelooking
formallyatafew,orusesoftwaretobrowsethroughmany–istosystematicallydescribeeach
usinganumberofterms,andthenanalyzethedistributionsoftheseterms.Inhumanities,this
processiscalled“annotation.”Aresearcherdefinesadictionaryofdescriptivetermsandthentags
allimages(orfilmshots,transitionsbetweencomicframes,oranyothervisualobjects).Aparallel
topopularpracticeofuserstaggingmediaobjectsinsocialmediasites(thinkoftagsinFlickr),or
addingkeywordstoone’sblogpostisobvious–howeverwhileusersarefreetoaddanykeywords
theywant,inacademicstudiesresearcherstypicallyemploy“closedvocabularies”whereasetof
termsisdefinedbeforehand.Onceallimagesareannotated,wecanlookatalltheimagesthathave
particulartags;wecanplotandcomparethetagfrequenciesandotherstatisticaldescriptions.For
instance,ifweannotatemangapagesusingasetoftagsdescribingvisualstyle,wecanthen
comparehowofteneachstylisticfeaturewasusedforshounenvs.shoujopages.
BarrySaltpioneeredtheuseofthismethodtostudyvisualmedia.Heannotatedallshotsinfirst30
minutesofafewhundredsof20thcenturyfeaturefilmsusinganumberofcharacteristics:shot
scale,cameramovement,andangleofshot.Baltusedasmallnumberofcategoriesforeach
characteristic.Forexample,possiblecameramovementtypeswerepan,tilt,panwithtilt,track,etc.
(BarrySalt’sDatabase).Healsorecordedshotduration(Salt,TheStatisticalStyleAnalysis;Film
StyleandTechnology.)Saltthenuseddescriptivestatisticsmeasuresandgraphstoanalyzethis
data.Inhisveryinfluentialbook UnderstandingComics (1993)ScottMcLoudemployedasimilar
7/27/2019 2011.How to Compare One Million Images
http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 10/40
10/40
methodcomparethevisuallanguageofJapanesemangaandcomicsfromtheWest.Heannotated
typesoftransitionsbetweenframesinanumberofmangaandcomicbooks,andthenused
histogramstovisuallyexplorethedata.
Communicationandmediastudiesfieldhaveasimilarmethodcalled“contentanalysis.”If
humanistsusuallyareconcernedwithworksofaparticularauthor(s),communicationresearchers
typicallyemployedcontentanalysistoanalyzerepresentationsinmassmedia,and,morerecently,
user‐generatedcontent.Therefore,theymorecarefullydeterminetheirsamples;theyalsoemploymultiplepeopleto“code”(thetermusedincontentanalysistorefertoannotation)mediamaterial,
andthencalculatethedegreeofagreementbetweentheresultsofdifferentcoders.Hereiscouple
ofrecentapplicationsofthismethod.Herring,Scheidt,Kouper,andWrightanalyzedthecontentof
457randomlyselectedblogscollected at roughly six-month intervals during 2004-2004 “to assessthe extent to which the characteristics of blogs themselves remained stable or changed during
this active period.” Williams,Martins,Consalvo,andIvory analyzed characters in 150 videogames; the total of 8572 characters were coded to “answer questions about their representations
of gender, race and age in comparison to the US population.”
Thismethodismorepowerfulthaninformalexaminationofmedia,butitsuffersfromthesame
problem–itdoesnotscaletomassivedatasets.McLoud,forinstance,onlyannotatedasmall
numberofcomicbooksandmangatitles.Wouldheobtainthesameresultswithamuchlargerset–
suchasourcollectionofonemillionmangapages?Andevenifhisresultswouldbeconfirmed,what
aboutallpossibleexceptions?Tofindthem,weneedtotageverypage.
Letussaywehaveatrainedviewerwhocanexamineamangapageandselecttherelevanttags.If
thisviewerspends1minuteforeachpageandworks8hoursperday,itwouldtakealmost6years
toannotateonemillionpages.
(Recently,itbecamepossibletousecrowdsourcinginordertospeedupthisprocess.Sincewe
cannotexpecteverypersontohavethesamejudgmentsaboutvisualformortousetagsinthe
sameway,researchersusestatisticaltechniquestocalculateconsistencybetweenthejudgmentsof
allparticipatingworkers,andtodisregardlowqualityresults.However,thisapproachhasa
fundamentallimitation–likeanyotherattempttodescribeimagesusingnaturallanguages,itismuchbetteratcapturingimagescontentthanform.Wediscussthisinmoredetailbelow.)
Summary:Whenweexamineonlyasmallsubsetoflargeimageset,oursamplemaynotbe
representativeofthewholeset;itmaynotproperlyreflectallvariationsinaset;andwemaynotbe
abletostudygradualchangesovertime.
2.Sowhatifweassumethatourdatasetcontainsnotamillionimages,butonlyahundred?Not
thattheproblemofscalegoesaway,isitsufficienttouseoureyesonly?No.Thesecondproblem
withusingoureyesisthatwearenotverygoodatregisteringsubtledifferencesbetweenimages .If
youareconfrontedwithasmallnumberofimagesthathavesubstantialdifferences,yourbraincan
easilyorganizetheseimagesaccordingtotheirvisualdifferences.(HereIamnotconcernedwith
differencesincontent,whichareeasytosee,butwiththedifferencesinvisuallanguage).Thismeansbeingabletoseparatethemintogroups,rankthemaccordingtooneormorekindsofvisual
characteristics,noticetheoutliers(imageswhichstandoutfromtherest),andcompleteother
tasks.Forinstance,wehavenoproblemdistinguishingbetweenpaintingsbyPietMondrianand
TheovanDoesburgcreatedafter1925.Mondrianusedexclusivelyhorizontalandverticalline
orientations,whilevanDoesburgalsoonlyuseddiagonals.Thesepropertiesclearlymark
Mondrian’sandvanDoesburg’spaintings.
7/27/2019 2011.How to Compare One Million Images
http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 11/40
11/40
Left:PietMondrian.CompositionIIinred,blueandyellow.1930.
Right:TheovanDoesburg.SimultaneousCounterComposition.1929.
Butwithlargenumberofimages,whichhavesmallerdifferences,wecannolongerperformthese
tasks.Thefollowingexampleillustratesthisproblem.Thefirstcompositeimagecontainsallpages
fromasinglechapterof AbarabyTsutomuNihei.Thesecondcompositeimagecontainsallpages
fromasinglechapterofBioMegabythesameartist.Do AbaraandBioMegasharethesamestyle,or
dotheyhavesubtlebutimportantdifferences(besidethesizeofthepanels)?Forexample,doesone
titlehasmorestylisticvarietythantheother?Whichpageineachtitleisthemostunusual
stylistically?Evenwiththissmallnumberofmangapages,thesequestionsarealreadyimpossible
toanswer.
7/27/2019 2011.How to Compare One Million Images
http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 12/40
12/40
Achapterfrom AbaramangabyTsutomuNihei.
AchaperfromBioMegamangabyTsutomuNihei.
Summary:evenwithasmallimagesample,wemaynotbeabletonoticesmallvisualdifferences
betweenimages.
3. AbaraandBioMegahaveonlyafewhundredpageseach .Outof883mangaseriesinour
collection,297seriescontainover1000pages,whileanumberofserieshavemorethan10,000
pageseach.Ifwehavedifficultycomparingonlyafewdozenpagesfrom AbaraandBioMega,how
canwepossiblydothiswithmangaserieswhicharemuchlonger?
Annotation/contentanalysismethodswillnothelphere.Touseit,weneedtohaveenoughtagsto
comprehensivelydescribevisualcharacteristicsofmanga.However,creatingavocabularywhich
wecanusetolabelalltypesofvisualdifferencesinmanga–orinanyotherformofvisualculture‐
isproblematicbyitself.Wedonothaveenoughwordsinournaturallanguagestoadequatelydescribevisualcharacteristicsofevenasinglemangaimage –letaloneallotherkindsofhuman‐
createdimages.ConsiderasamplepagefromVampireKnight (theimageontheleftinthefirst
illustrationabove).Canwedescribeallvariationsinbackgroundtexturesinitsfourpanels?Orthe
differencesbetweentherenderingofhairineachofthesepanels?
Thisisthethird problemwithstudyingvisualart,visualculture,andvisualmediausingtraditional
humanitiesapproaches.Regardlessofthemethodologiesandtheoriesbeingemployedinagiven
case,allofthemuseonerepresentationalsystem(anaturallanguage)todescribeanother(images).
Butasthelastexampleshows,wewillnotbeabletogivenamestoallofthevariationsoftextures,
compositions,lines,andshapesusedeveninasinglechapterof Abara,letaloneonemillionmanga
pages.Wecanproceedwithtraditionalapproachessolongaswelimitourselvestodiscussing
7/27/2019 2011.How to Compare One Million Images
http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 13/40
13/40
mangaiconographyandotherdistinctvisualelementswhichhavestandardizedshapesand
meanings:waterdropssignifyingstress,throbbingveinssignifyinganger,andsoforth.Butifwe
actuallywanttostartdiscussingarangeofgraphicalandcompositionalpossibilitiesusedacrossin
manga,weneedanewkindofinstrument.Thisfundamentallimitationappliestoallothervisual
formsdevelopedbyhumanbeings,betheypaintings,drawings,photographs,graphicdesigns,web
pages,visualinterfaces,animations,etc.
Summary:naturallanguagesdonotallowustoproperlydescribeallvisualcharacteristicsofimages,ornamealltheirpossiblevariations.
METHOD:DIGITALIMAGEPROCESSING+VISUALIZATION
Toaddressthesechallenges,wedevelopedasetofmethodsandtechniques,whichtogetherwecall
CulturalAnalytics .Thekeyideaofculturalanalyticsistheusevisualizationtoexplorelargesetsof
imagesandvideo.Thesevisualizationscanuseexistingmetadata(forexample,publicationdatesor
authornames)andalsonewmetadataaddedbyresearchersviaannotationorcoding.However,as
wealreadydiscussed,addingtagsorotherannotationsmanuallyhasseriouslimitations:our
naturalvisualsystemcan’tnoticesubtlevisualdifferencesbetweenalargenumberofimages,and
ournaturallanguagesdonothavetermstodescribeallvisualcharacteristicsofimages,ornametheirpossiblevariations.
Toovercometheselimitations,ourcoremethodusesdigitalimageprocessingandanewtypeof
visualization.Thissectiondescribesthismethod,andthenextsectionsapplyittoprogressively
largernumbersofimagesdrawnfromouronemillionmangadataset.(Forthedescriptionofour
othervisualizationmethods,seeManovich,MediaVisualization).
Themethodinvolvestwosteps:
1.Weusedigitalimageprocessingtoautomaticallymeasureanumberofvisualcharacteristics(i.e.,
features)ofourimages.Inthisprocess,visualqualitiesaremappedintonumbers.(Incomputer
science,thisstepisoftenreferredtoas“featureextraction.”)Forexample,inthecaseofgreytonesmeasuredon0‐255scale,blackisrepresentedas0,whiteas255,and%50greyas127.5.The
examplesofdimensionsthatcanbemeasuredincludecontrast,presenceoftextureandfinedetails,
numberoflinesandtheircurvature,numberandtypeofedges,sizeandpositionsofshapes,andso
on.Inthecaseofcolorimages,wecanalsomeasurethecolorsofallpixels(hue,saturation,
brightness),determinemostfrequentlyusecolors,andcalculatevariousimagestatisticsforR,G,B
colorcomponents(R,G,B)separately.
Typically,suchmeasurementsproducemanynumbersforeachvisualdimensionofanimage.For
instance,everypixelwillhaveitsgrayscalevalue.Ifwemeasureorientationsofalllinesinan
image,wewillendwithaseparatenumberforeveryline.Inordertobeabletocomparemultiple
imageswitheachotheralongparticulardimensions,itisconvenienttousetheaveragesofthe
measurementsoneachdimension.
Forexample,ifweareinterestedingreyscalevalues,wesumvaluesofallpixelsanddividethem
bythenumberofpixels.Inthecaseoflineorientations,wesimilarlyaddanglesofalllinesand
dividethembythenumberoflines.
Besidesthissimpleaveragescalledmean,wecanalsouseothertypesofdescriptivestatisticsto
summarizeimagecharacteristics.Theyincludedifferentrepresentationsofcentraltendencyina
data(median,mode,etc.)andofdatadispersion(variance,standarddeviation,etc.)
Herearetheexamplesofsuchstatisticscalculatedforthesamplemangapageswhichalready
appearedintheprevioussection.
7/27/2019 2011.How to Compare One Million Images
http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 14/40
14/40
Left:apagefromVampireKnight .Mean:164.68.Standarddeviation:102.25.Middle:apagefromOnePiece.Mean:214.58.Standarddeviation:75.65.
Right:apagefromNaruto.Mean:199.96.Standarddeviation:90.92.
Bothmeanandstandarddeviationarecalculatedusinggreyscalevaluesofallpixelsinapage.
VampireKnight page(left)hasthelowestaveragegrayscalevaluebecauseofitsshadedareas,thick
blackcontoursandtype,andtheblackareaatthebottom.OnePiecepage(middle)hasthehighest
averagegrayscalevaluebecauseitusesblackquitesparingly.Thestandarddeviationvaluesare
reversed:VampireKnight pagehasthehighest,whileOnePiecepagehasthelowest.VampireKnight
pagehaslotsofblack,gray,andwhite,andthispushesupitsstandarddeviationmeasurement.One
Piecehaslotsofwhiteandjustalittlebitofblack,andthismakesitsstandarddeviationlow.
2.Wecreate2Dvisualizationsthatpositiontheimagesaccordingtotheirfeaturevalues.Forexample,wemayusehorizontaldimension(X‐axis)torepresentgrayscalemean,andvertical
dimension(Y‐axis)torepresentgrayscalestandarddeviation.Theseimagefeaturescalculatedby
softwareinstep1becameimagecoordinatesina2Dspace.Inthisway,thedifferencesbetween
imagesalongaparticularvisualdimensionaretranslatedintotheirpositionsinspace –something
thathumanvisualsystemcanreadverywell.
Toillustratethis,wevisualizethethreemangapagesfromthepreviousexampleusingtheirmean
andstandarddeviationfeatures.
7/27/2019 2011.How to Compare One Million Images
http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 15/40
15/40
AplotofsamplepagesfromVampireKnight ,OnePiece,andNaruto.Meanandstandarddeviationofapage
areusedasitsXandYcoordinates.
(Bothmeanandstandarddeviationarecalculatedusinggreyscalevaluesofallpixelsinapage.)
Thisplotusesonlytwoverybasicfeatures.Thereforewecan’texpectittocaptureeveryvisual
differencebetweenthesethreeimages.Still,evenwithonlytwofeatures,thespatialpositionsand
thedistancesbetweentheimagesin2Dspacereflectwelloursenseofoverallvisualdifferences
betweentheimages:OnePieceandNarutopagesarecloseby;VampireNight pageisfurtheraway.
MeasuringvisualfeaturesandthenmappingthesefeaturesintoXandY‐axisallowsustoseparate
theoverallperceiveddifferenceintoseparatedimensions.Inthisplot,weuseaverageandthe
spreadofgrayscalevalues(i.e.,meanandstandarddeviation),butwecanalsousemanyother
dimensionscanbealsoused.Thisprocessisnotunlikehowhumanvisualperceptionfunctions.
Humanvisualsystemanalyzesvisualinputseparatelyintermsofdifferentcharacteristics:contrast,
texture,shape,color,andmotion.Mostpsychologistsandneuroscientistsbelievethatthebrain
combinesthisinformationtoarriveatperceptualwholes.Varioustheorieshavebeenproposedto
explainthedetailsofthisprocessandto.AninfluentialtheoryofattentiondevelopedbyAnne
TreismanandGarryGeladesuggeststhatdifferentfeaturesanalyzedattheearlystagesof
perceptionare“bindedtogether”intoconsciouslyexperiencedwholes.AnothertheorybyL.Ward
proposedtheneuralmechanismresponsibleforthebindingoffeatureswhichcodeshape,motion,
color,depthandotherperceptualaspects.SomeofthefeatureswemeasuresuchasGaborfilters
arethoughttobetheexactequivalentsofthefeaturesanalyzedbythebrain;otherscanbe
understoodasbeingequivalenttothecombinationofmultiplefeaturescomputedbythebrain.
7/27/2019 2011.How to Compare One Million Images
http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 16/40
16/40
Aswediscussedearlier,whenabrainisconfrontedwithanumberofverysimilarimages,oravery
largenumberofimages,itnolongercancomputethesedifferencesreliably.Whenwemeasure
featuresinimagesetsandvisualizingimagesaccordingtothefeaturevalues,weessentially
augmentshumanperception;thatis,wescaleupitscapacitytojudgevisualdifferences.
Combinationofdigitalimageanalysisandvisualizationmakespossibletobypasstheproblem
whichhauntedvisualsemioticsinparticular,andalllinguisticdescriptionsofthevisualingeneral:theinabilityoflanguagetoadequatelyrepresentallvariationswhichimagescancontain.For
instance,evenifweusehundredsofwordsforcolors,imagescanshowmillionsofcolorvariations.
Andcolorisnottheworstcase;forotherdimensionssuchastextureorlinecharacter,theterms
providebynaturallanguagesaremuchmorelimited.
Inotherwords,oursensesareabletoregisteramuchlargersetofvaluesonanyanalogdimension
‐loudness,pitch,grayscale,color,motion,orientation,size,etc.–thanourlanguageshavewords
for.Thismakessensebecauselanguagehasdevelopedmuchlaterevolutionarytosupplementthe
senses.Languagedividesthecontinuousworldintolargerdiscretecategoriesthatmakespossible
abstractreasoning,metaphors,andotheruniquecapacities.Itisnotdesignedtoexactlymapthe
wealthofoursensoryexperienceintoanotherrepresentationalsystem.
Soifwecan’trelyonanaturallanguagetocapturewhatoursensescanregister–andwecan’trely
onthesensesbecause,aswediscussedintheprevioussection,theyarenotabletoregistervery
subtledifferencesbetweenimages,orotherculturalartifacts–howcanwetalkaboutvisualculture
andvisualmedia?
Ourapproachistousevisualizationasanewdescriptivesystem.Inotherwords,wedescribe
imageswithimages.Indoingthis,wearetakingadvantageoftheabilityofimagestoregistersubtle
differencesonanyvisualdimension.
Notethatourmethoddoesnotimplythatwearegettingridofdiscretecategories.Rather,instead
ofbeinglimitedtoafewprovidedbylanguage,wecannowdefineasmanyasweneed.
Forexample,letssaythatwewanttodescribegrayscalelevelsinanimage.Weusesoftwaretoread
pixelvaluesfromanimagefile,andcalculateanaveragevalue.Thisaverageisthenusedtoposition
theimageinthevisualizationalongXorY‐axis.
Common8‐bitand24‐bitimageformatssuchasJPEGandPNGuse256discretevaluestorepresent
grayscalelevels.Thisgivesus256separatecategoriesforgrayscalevalues.Thesecategoriesdonot
havedistinctnames.Buttheywork‐theyallowustocomparemultipleimagesintermsoftheir
grayscalevalues.
Wearenotlimitedto256categories–ifwewant,wecanuse1000,10,000,oranyothernumber.
Howdoesthiswork?Whenwecalculatetheaveragevalueofallintegergrayscalepixelvalues,we
getarealnumber.Forinstance,ifourimagecontainsfourpixelswithgrayscalevalues103,106,
121,and112,theaverageofthesevaluesis102+107+127+113)/4=109.75.Ifweroundthesevaluesusingonedecimalplace,wewillhave256x10=2,560distinctcategories.Ifwekeeptwo
decimalplaces,wewillhave25,600distantcategories.Inreality,wedon’tneedtogothathigh,
sincehumanperceptioncan’tevenseethedifferencebetweentwograylevelswhicharenextto
eachother(forinstance,127and128)on0‐256scale.
Whileweutilizeavarietyofvisualizationtechniques,thekeytechniqueusedbyourmethodisa
scatterplot ,i.e.a2DvisualrepresentationthatusesCartesiancoordinatestodisplaytwosetsof
numericalvaluesdescribingdata.Inourcase,eachelementofadatasetisanimage,andthetwo
valuesthatdetermineitspositioninaplotaretwomeasuredvisualqualities(features),suchas
averagegrayscaleandstandarddeviation.
7/27/2019 2011.How to Compare One Million Images
http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 17/40
17/40
Equallyoftenweuse linegraphswheretheX‐axisrepresentsthedatestheimageswerecreated(or
theirpositionsinanarrativesequencesuchasacomicbook),andY‐axisrepresentssomemeasured
value(suchasaveragesaturation).
Alongwithregularscatterplotsandlinegraphs,wealsouseanewvisualizationtechniquewhich
wecallimageplot.Anormalscatterplotandalinegraphdisplaythedataaspointsandlines.An
imageplot superimposesimagesoverdatapointsinagraph.
Thefollowingvisualizationsof127PietMondrian’spaintingsillustratethedifferencebetweena
scatterplotandanimageplot.
127paintingsbyPietMondrian(createdbetween1905and1917)visualizedasascatterplot(left)andas
imageplot(right).X‐axis=brightnessmedian.Y‐axis=saturationmedian.
(Technicaldetails:Wealsousemoreadvancedvisualizationtechniquessuchasscatterplotmatrix
andparallelcoordinates,andmultivariatedataanalysistechniquessuchasPCA,clusteranalysis,
andsoon.However,sincetheconceptsofamulti‐dimensionalfeaturespaceanddimension
reductionaremoreabstract,inthischapterallourexamplesare2Dvisualizationswhereeachaxis
correspondstoasinglefeaturesuchasmeangrayscalevalue,ormetadatawhichwascollected
alongwiththedata‐suchasapositionofapagewithinthesequenceofallpagesinamangatitle.In
general,weprefertousesinglefeaturesforXandY‐axisiftheirgraphrevealsinterestingpatterns
andiftheirmeaningiseasytoexplain;incontrast,itisoftendifficulttointerpetthedimensionsina
graphwhichusesPCAorothermultivariatemethods.)
Whilethetechniquewecall“imageplots”hasalreadybeendescribedinanumberofarticlesinComputerScience(seePeters,MultiMatch),ithasnotbeenmadeavailableinanygraphingor
visualizationapplication.Ourlabdevelopedsoftwaretocreateimageplots;weusethissoftwarein
allourprojects,andalsodistributeitasopensource(SoftwareStudiesInitiative,ImagePlot ).The
softwarerunsonregularWindows,MacandLunixdesktopsandlaptops.Workingwithourlan,
GravityLabatCaliforniaInstituteforTelecommunicationandInformation(Calit2),wealso
developedaninteractiveapplicationthatcangenerateimageplotswhichcancontainthousandsof
individualimagesinrealtime.TheapplicationrunsonscalabletileddisplayssuchasHiperSpace
(TheHighlyInteractiveParallelizedDisplaySpace)whichoffers35,840x8,000pixelsresolution
(287megapixels)on31.8feetwideand7.5feettalldisplaywallmadefrom7030‐inchmonitors
(Yamaoka,Manovich,Douglass,Kuester,CulturalAnalyticsinLarge‐ScaleVisualization
7/27/2019 2011.How to Compare One Million Images
http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 18/40
18/40
Environments).Inthisarticle,allimageplotsaredonewithImagePlotsoftware;scatterplotsand
linegraphsaredonewithImagePlotandMondrian(freedatavisualizationsoftware).
Inthenextsectionsofthisarticle,wewillshowhowourmethodcanbeusedtocompareimagesets
rangingfromafewhundredtoonemillionimages.
COMPARING ABARAANDNOISE (474PAGES)
Havingintroducedourmethod–visualizingimagesasascatterplotaccordingtoquantitative
descriptionsoftheirvisualproperties(features)measuredwithdigitalimageprocessing
techniques–letsnowapplythismethodtoourmangadataset.
Tomakeourvisualizationexampleseasiertofollow,wewillusethesametwovisualfeaturesin
mostoftheexampesbelow.Thefirstfeatureisastandarddeviationofgreyscalevaluesofallpixels
inanimage.Standarddeviationisacommonlyusedmeasureofvariability.Itshowshowmuchthe
dataisdispersedaroundtheaverage.Ifanimagehasabigrangeofgreyscalevalues,itwillhave
largestandarddeviation.Ifanimageemploysonlyafewgreyscalevalues,itsstandarddeviation
willbesmall.
Thesecondfeatureisentropy.Ininformationtheory,theconceptofentropywasdevelopedby
ClaudeE.Shannoninhisfamous1948paper"AMathematicalTheoryofCommunication”.Entropy
describesthedegreeofuncertaintyinthedata–i.e.,howdifficultorhoweasyitistopredictthe
unknowndatavaluesgiventhevalueswealreadyknow.Ifamimageconsistsfromafew
monochromeareas,itsentropywillbelow.Incontrast,ifanimagehaslotsoftextureanddetails,
anditscolors(orgreyscalevaluesinthecaseofablackandwhiteimages)varysignificantlyfrom
placetoplace,itsentropywillbehigh.
Inthefollowingexamples,wewillmapstandarddeviationvaluestoX‐axis,andentropyvaluestoY‐
axis.Wewillstartbycreatingimageplotsof AbaraandNoisepages.Thefirsttitlehas291pages;
thesecondhas183pages.(Thiscountincludesallpages,whichwereavailableforthesetitlesononemanga.com,includingtitleandcreditpages.)
7/27/2019 2011.How to Compare One Million Images
http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 19/40
19/40
Abarapages.Artist:TsutomuNihei.
X‐axis=standarddeviationofgreyscalevaluesofallpixelsinapage.
Y‐axis=entropycalculatedovergreyscalevaluesofallpixelsinapage.
7/27/2019 2011.How to Compare One Million Images
http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 20/40
7/27/2019 2011.How to Compare One Million Images
http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 21/40
21/40
Abarapages(left)andNOISE pages(right).Artist:TsutomuNihei.X‐axis=standarddeviationofgreyscalevaluesofallpixelsinapage.
Y‐axis=entropycalculatedovergreyscalevaluesofallpixelsinapage.
Thesevisualizationsofthetwotitlesside‐by‐sidemakeansweringthequestionsweaskedearlier
easy.Doesonetitlehasmorestylisticvarietythantheother?Yes, Abara’sstylevariesmuchmore
thanNOISE style:thepointsontheleftplotaremoredispersedthanthepointsontherightplot.
Whichpageineachtitleisthemostunusualstylistically?Eachplotrevealsanumberofoutliers–
i.e.pointsthatstandoutfromtherest.(Ofcourse,weshouldkeepinmindthatthetwo
measurementsweareusingintheseplots–i.e.standarddeviationandentropy‐onlycapturesome
dimensionsofavisualstyle.Ifweuseotherfeatures,differentpagesmaystandoutasoutliers.)
VISUALIZINGTEMPORALCHANGESIN ABARAANDNOISE
Mangaisasequentialart.Tounderstandifandhowvisualstyleinatitlevariesoverthesequenceof
itschaptersandindividualpages,wecangenerateimageplotswhereX‐axisrepresentsapage’s
positioninasequence,andY‐axisusessomevisualfeature.Ifwedothisfortwotitles,wecanthen
comparehowtheyvaryintimeinrelationtothisfeature.
Whenwemakeimageplotsusingthisapproach,theycanbecomeverylong.Forexample,letssay
wewanttographasequenceof10,000imagesandmakeeachimage100pixelswide.Such
visualizationwillbe10,000x100=1,000,000pixelswide.Toaddressthisproblem,our
visualizationsoffeaturefilmsrepresenteachshotbyasingleframe.Youcanfindexamplesofsuch
plotswemadeforwholefilmsonourFlickrgallery.Forshortermangaseries,wewilluseboth
imageplotswhichshowallpages,andlinegraphsthatrepresenteachpagebyapoint.Theformer
areeasiertoread;thelatteraresometimesmoreeffectiveinrevealingpatterns.Forlongertitles,
wewilluselinegraphs,sinceimageplotswouldverylong.(Alinegraphdoesnothavetouselines
toconnectthedata.Thedifferencebetweenalinegraphandascatterplotisthattheformer
assumesthatdatavaluesmappedontoX‐axisareseparatedbythesameinterval,i.e.1,2,3,…A
scatterplotdoesnotmakethisassumption).
7/27/2019 2011.How to Compare One Million Images
http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 22/40
22/40
First,wecompare Abara(318pages)andNoise(140pages)usingasimplestvisualfeature:amean
ofgrayscalevaluesofallpixelsinapage(Y‐axis).Themeanvalueindicatesarelativeproportionof
white,blackandgreyareasinapage.Wenumberedallpagesstaringfromthecoverpageand
endingwiththelastpageoftheseries.(Wehaveremovedextrapagesinsertedbyfanswhoscanned
andtranslatedtheseseries;wealsoremoved“bonus”chapters.)Thepagesarepositionedleftto
rightusingthislinearsequence(Y‐axis).
InJapanamangaseriesfirstpublishedinmagazineswhichmaycomeoutweekly,bi‐weekly,ormonthly,andcontainnewchaptersforanumberofseries.Later,anumberofalreadypublished
chaptersofsuccesfulseriesisprintedinaseparatevolume(tankōbon).Theseriestranslatedinto
otherlanguagesarepublishedinsimilarformat(anumberofchapterscollectedinasinglebook)
Globalfanswhoreadscanlatedmangaonthewebcanalsogothoughasequenceofchaptersallat
one.Avisualizationwhicharrangesallpagesinashortseriessuchas AbaraorNOISE assumesthis
readingmode.
Abara(318consequitivepagesin11chapters).X‐aixs=pagepostionintheseries.Y‐axis=grayscalemean.
NOISE (140consequitivepagesin6chapters).X‐aixs=pagepostionintheseries.Y‐axis=grayscale
mean.
ThetwovisualizationsusethesamescaleforX‐aixs,andthisiswhyfirstgraph( Abara,318pages)
7/27/2019 2011.How to Compare One Million Images
http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 23/40
23/40
istwiceaslongasthesecond( NOISE ,140pages).Thepointscorrespondingtoindividualpagesare
connectedbylinesinordertomakethepatternseasiertosee.
Comparingthetwoseriescreatedbythesamearistsidebysideshowssimilaritiesanddifferences
intheirgraphicaldevelopment. Abara’sgraphgraduallygoesupalmostinalinearfashionuntilthe
lastfewpages.NOISE graphalsoslightlygoesuptillthemiddleofchapter5,andthengoesdown.
However,itsaveragegrayscalelevelsneverraiseasmuchasthoseof Abara.
Tohelpusunderstandthegraphicalpatternsineachchapter,graphicdesignerOngKianPeng
(MultimodalAnalysisLab,SingaporeNationalUniversity)visualizedthesequenceofNOISE pages
asabarchart.Eachchapterismarkedinitsowncolor;thechaptersarealsoseparatedby
horizontalintervals.Thegraphalsoomitscoversthechapterscoverstofocusonthepatternsinthe
regularnarrativepages.
NOISE .X‐aixs=pagepostionintheseries.Y‐axis=grayscalemean.Eachchapterisshowninaseparatecolor.
Chaptersareseparatebywhitegaps;thetitlepagesareomitted.
ThenexttwovisualizationscompareAbamaandNOISE stylisticdevelopmentonlowdetail/texture
–highdetail/texturedimensionasrepresentedbyentropyfeature.(HeretheplotofNOISEisscaled
tothesamewidthastheplotofAbaratomakepatternsmorevisible.)
Abarapages.
X‐axis=pagesequentialpositioninthetitle(lefttoright).
Y‐axis=entropycalculatedovergreyscalevaluesofallpixelsinapage.
7/27/2019 2011.How to Compare One Million Images
http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 24/40
24/40
NOISE pages.
X‐axis=pagesequentialpositioninthetitle(lefttoright).
Y‐axis=entropycalculatedovergreyscalevaluesofallpixelsinapage.
Inbothtitles,entropyvalueschangeovertime.ThepatternofthesechangesinNOISE canbe
describedasalinearslightshiftdownward.Thepatternin Abaraismorecomplex.Itcanbe
characterizedasahorizontalline,followedbyacurvewhichfirstgoesdownandthengoesup.
Thetwotemporalpatternsalsohaveaninterestingstructuralsimilarity.Ineachgraph,therange
betweenthetopandbottompoints(i.e..thedifferencebetweentheirvaluesonY‐axis)gradually
increases.Thismeansthatstylisticallythepagesareatfirstareprettysimilar,butbecomemore
variedovertime.(Again,keepinmindthatwearedescribingonlyonestylisticdimension.)
Toillustrateinterestingtemporalpatternsthatcanberevealedusingsuchlinegraphsoffeature
values,wewillgooutsideofouronemillionmangapagesforthenextexample.Wewilllookatthe
webcomicFreakangels.Ourdatasetconsistsfrom342consecutivepagespublishedover15months
(2008‐2009)insix‐pagechaptersappearingweekly.Likeinpreviousgraphs,weuseY‐axisto
representasinglevisualfeature,andreserveXaxisforpagespublicationorder.
Inthefollowingvisualization,Ypositionofeachpageisdeterminedbythemean(average)ofthegreyscalevaluesofallpixelsinthepage.
342consecutivepagesofthewebcomicFreakangels(2008‐2009).
X‐axis=pagepublicationorder.
Y‐axis=meanofgreyscalevaluesofallpixelsinapage.Greyscalevalue=(Rvalue+Gvalue+Bvalue)/3.
7/27/2019 2011.How to Compare One Million Images
http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 25/40
25/40
Aclose‐upofvisualizationsofthewebcomic Freakangels.
Despitetheweeklyintervalsthatseparatethe6‐pageepisodesofFreakangels,visualizationshows
thatvisualvaluesofthepagesareremarkablyconsistent.Forthelargerpartofthepublication
period,thechangesinaveragegrayscalevaluesfollowasmoothcurve(thesameappliestohueand
saturationifweplotthem).Whiletheoverallchangefromlighttodarkimagescorrespondstothe
developmentofthestoryfromdaytonight,thefactthatthegrayscalevaluesshiftsverygradually
andsystematicallyovermanymonthsisagenuinediscovery.Visualizationrevealsthisunexpected
patternandallowsustoseetheexactshapeofthecurve.
COMPARINGVAMPIREKNIGHT ANDONEPIECESAMPLES(2,744PAGES)
AbaraandNoisetitlesarequiteshort:291pagesand183pages,respectively.Howdoesourmethod
scaleforlongermangaseriessuchasVampireKnight (57chapters;1423pages)andOnePiece(563chapters;9745pages)?
VampireKnight publicationstarted inJanuary2005;OnePiecestartedmuchinAugust4,1997.This
explainsthedifferencesinthenumbersofchaptersandtotalpagesinourdownload.Tomakea
comparisonmoremeaningful,wewillonlyuseonlyapartofOnePiecedataset:481‐563chapters
thatcontain1321pages.Asintheearlierexample,wewillvisualizethesetwosetsofpages
accordingtostandarddeviation(X‐axis)andentropy(Y‐axis).Wegraphthedatausingscatter
plots;eachpageisrepresentedbyapoint.
7/27/2019 2011.How to Compare One Million Images
http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 26/40
26/40
1423VampireKnight pages(left)and1321OnePiecepages(right).X‐axis=standarddeviation.Y‐axis=entropy.
Ineachgraph,X‐axisandY‐axisstartandendvaluesaresetasfollows:
Std:min=0.000000,max=126.602400(largestpossiblevaluewhichcanbeobserved).
Entropy:min=‐0.000000,max=7.962000(largestpossiblevaluewhichcanbeobserved).
Wecannoticethatthecenterofthepointsintheleftgraphishigherthanthecenterofthepointsin
therightgraph.RecallthatY‐axiscorrespondstolowtexture/detail–hightexture/detail
dimension.Thisconfirmswhatwecanseeinthetwosamplepages:VampireKnight pagehasmore
shadinganddetailthanOnePiecepage.However,sinceeachsetofpointsalsoextendssignificantly
alongY‐axis,itisclearthatwewereluckyinourchoice.Wecouldhaveeasilyselecteddifferentpagesthatwouldleadustoanoppositeideaaboutthegraphicdifferencebetweenthetwosets.
Toconfirmourobservationabouttheverticaldifferencebetweenthecentersofthetwosetsof
points,wecalculatetheactualaveragesofthedatavaluesprojectedontoY‐axis(entropy).
VampireKnight :meanofentropymeasurementsof1423pages:5.1.
OnePiece:meanofentropymeasurementsof1321pages:5.6.
(Theaveragesareroundedtoonedigit).
Toputthisdifferenceinperspective,itisusefultoknowthemaximumpossiblevalueofentropy
measurementofasingleimageis7.962.Thismeansthatthisvaluerepresents%6.4ofthetotal
possiblerange.
7/27/2019 2011.How to Compare One Million Images
http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 27/40
27/40
1423VampireKnight pages.X‐axis=standarddeviation.Y‐axis=entropy.
7/27/2019 2011.How to Compare One Million Images
http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 28/40
28/40
1430OnePiecepages.X‐axis=standarddeviation.Y‐axis=entropy.
COMPARINGNARUTOANDONEPIECE(17,782PAGES)
WhenweputsidebysidetwosamplepagesfromNarutoandOnePiecemangaseries,wenoticed
thatthedifferencesintheirgraphicalstylesweremoresubtlethanbetweenthesamplepagesfrom
VampireKnight andOnePiece.Canwebetterunderstandthesemoresubtledifferencesusingour
method?
ItisalsointerestingtocompareNaruto(1999‐)andOnePiece(1997-)becausetheyamongmost
popularmangaseriesamongglobalreaders.Thesetitlesareratedno.1andno.3among
OneManga.comglobalreaders(OneManga).WhenwedownloadedthepagesfromOneManga.com
intheFall2009,thefirstserieswaspublishedcontinuouslyfor10years,andthesecondwas
publishedfor12years.Accordingly,ourdownloadcontained8037Narutopagesand9745One
7/27/2019 2011.How to Compare One Million Images
http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 29/40
29/40
Piecepages.
Thefollowingplotscomparethesetwosetsofpages.
Left:8037Narutopages.Right:9745OnePiecepages.Eachpageisrepresentedbyapoint.
X‐axis=standarddeviationofgreyscalevaluesofallpixelsinapage.
Y‐axis=entropycalculatedovergreyscalevaluesofallpixelsinapage.
7/27/2019 2011.How to Compare One Million Images
http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 30/40
30/40
8037Narutopages.X‐axis=standarddeviation.Y‐axis=entropy.
7/27/2019 2011.How to Compare One Million Images
http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 31/40
31/40
9745OnePiecepages.X‐axis=standarddeviation.Y‐axis=entropy.
Projectinglargenumberofpagesfromthetwoseriesintothesamecoordinatespacehelpsusto
betterunderstandthesimilaritiesandthedifferencesbetweentheirgraphicalstyles.The
visualizationsshowthatalongthetwovisualdimensionsusedg,thedistinctionsbetweenthe
languagesofthetwoseriesarequantitativeratherthanqualitative.Thatis,the“pointcloud”of
Narutopagessignificantlyoverlapswiththe“pointcloud”ofOnePiecepagesbothalongandvertical
axis.
Atthesame,thedifferencesbetweenthemarelargerthanacasualexaminationofonlytwopages
implies.Thevisualizationsrevealthatbothseriescoveralargerangeofgraphicalpossibilities:
fromsimpleblackandwhitepageswithminimaldetailandtexture(lowerpartofeach
visualization)tothehighlydetailedandtextured(toppart).ButthecenterofOnePiecepointcloud
isslightlyhigherthanthecenterof Narutopointcloud.ThismeansthatOnePiecehasmorepages
thathavemoretexturesanddetailsthatOnePiecepages.
7/27/2019 2011.How to Compare One Million Images
http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 32/40
32/40
Visualizationsalsorevealthesignificantdifferencesinthegraphicalvariabilitybetweenthetwo
series.Naruto’s“pointcloud”ismuchmoredispersethanOnePiece“pointcloud”bothon
horizontalandverticaldimensions.ThisindicatesthatNaruto’svisuallanguageismorediverse
thanthevisuallanguageof OnePiece.(Wealreadysawasimilardifferencewhenwecompared
AbaraandNOISE –butnowweareseeingthisinamuchlargerdataset.)
Wecanalsoexaminethestylisticdevelopmentoftheselongseriesoverthetimeofpublicationin
thesamewayusedformuchshorter AbaraandNoise.Thefollowinggraphplots9745OnePiecepageslefttorightintheorderofpublication;theverticalpositionisdeterminedbypagegrayscale
mean.Belowthegrapharethethreesamplepagesfromwhichwealreadyreferedtoearlier.
9745OnePiecepages(562chapters).
X‐axis=pagepositioninpublicationorder(lefttoright).
Y‐axis=meanofgreyscalevaluesofallpixelsinapage.
SamplepagesfromOnePiecemangaseriesdrawnfromthe5th,200th,and400 thchapters.
Sincewenowdealingwiththousandsofpagesin562weeklychapterspublishedover12years,we
candiscusstemporalpatternsatnumberofscales.Onthescaleofyears,OnePiecemeanvalues
7/27/2019 2011.How to Compare One Million Images
http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 33/40
33/40
graduallydriftoverthewholetimeperiod.Withinthisoverallalmostlinearpattern,wesee
periodicraisesandfallsthatreversedirectionanywherebetween7and13months.Thus,weget
theanswertothequestionweaskedearlierwhenwecomparedthreesamplepagesdrawnfrom
the5th,200th,and400thchapters–how doesthevisuallanguageofOnePiecechangesovertime?
Thevisualizationalsorevealsthepatternsmaynotbevisibleifweonlyuseasmallnumberof
pages.Thethreepageswelookatearliermissedtheperiodicdropsingrayscalevalueswecansee
whenweplotallpages.Thedipscorrespondtotheflashbackpartsofthenarrativewhichplacethepanelsoverblackbackground,ascanbeseenintheclose‐upoftheearliermontageofall OnePiece
pages.
Aclose‐upofthecompletesetofOnePiecepagesarrangedinagridinpublicationorder(lefttoright,topto
bottom).
VISUALIZINGCOMPLETEMANGAIMAGESET(1,074,790PAGES).
Wecannowfinallyprovideananswertothequestioninthechapter’stitle:howtoseeonemillion
images?Usingthesamemeasurementsandaxesassignments(X‐axis=standarddeviation,Y‐axis–
entropy)aswedidintheplotsofindividualtitlesandseries,wevisualizeourcompletesetofone
millionpages.(Ofcoursewecanalsoorganizethisimagesetinmanyotherwaysusingmanyother
measurements–thisisjustonepossiblemapping.)
7/27/2019 2011.How to Compare One Million Images
http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 34/40
34/40
7/27/2019 2011.How to Compare One Million Images
http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 35/40
35/40
7/27/2019 2011.How to Compare One Million Images
http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 36/40
36/40
Onemillionmangapagesorganizedaccordingtoselectedvisualcharacteristics.
X‐axis=standarddeviationofgreyscalevaluesofallpixelsinapage.
Y‐axis=entropycalculatedovergreyscalevaluesofallpixelsinapage.
Topimage:completevisualization.
Middleimageclose‐upofthetoppart.
Bottomimage:close‐upofthebottomleftcorner..
Notes:
1)Someofthepages‐suchasallcovers‐areincolor.Inordertobeabletofitallimageintoasinglelarge
image(theoriginalis44,000x44,000pixels‐scaledto10,000x10,000forpostingtoFlickr),werendered
everythingingraycale.
2)Becausepagesarerenderedontopofeachother,youdon'tactuallyseeonemillionofdistinctpages–
rather,thevisualizationshowsadistributionofallpageswithtypicalexamplesappearingonthetop.
Onemillionpagescoverthespaceofgraphicalpossibilitiesmorebroadlyandwithmoredensity
thanNarutoorOnePiecealone.Inbetweenthefourgraphicalextremescorrespondingtotheleft,
right,top,andbottomedgesofthepages“cloud,”wefindeverypossibleintermediategraphic
variation.Thissuggeststhatmanga’sgraphiclanguageshouldbeunderstoodasacontinuous
variable.
This,inturn,suggeststhattheveryconceptofstyleasitisnormallyusedmaybecomeproblematic
thenweconsiderverylargeculturaldatasets.Theconceptassumesthatwecanpartitionasetof
worksintoasmallnumberofdiscretecategories.However,ifwefindaverylargesetofvariations
withverysmalldifferencesbetweenthem(suchasinthiscaseofonemillionmangapages),itisno
longerpossibletousethisconcept.Instead,itisbettertousevisualizationandmathematical
descriptionstocharacterizethespaceofpossibleandrealizedvariations.
Tobetterunderstandthedistributionofourdatasetwithinthespaceofallgraphicalpossibilities,
wecanrenderthedatafromthelastvisualizationusingpoints.Suchascatterplotisnotaseasyto
readasanimageplot,howeveritisbetterinshowingtheshapeofpagesdistribution.Theplot
showsthatthedistributionfollowsBell‐curvelikepattern:singledenseclusterswithgradualfall
offtothesides.Thepartsoftheplotwhichremainsblackrepresentthegraphicalpossibilitiesnot
realizedinourmangasample:imageswhicharealmostcompletelywhite(lowerrightcorner),and
imageswhichhavelargeareasofblackandsmallareasofwhite(leftthirdoftheplot).
7/27/2019 2011.How to Compare One Million Images
http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 37/40
37/40
Onemillionmangapagesrenderedaspoints.
X‐axis=standarddeviationofgreyscalevaluesofallpixelsinapage.
Y‐axis=entropycalculatedovergreyscalevaluesofallpixelsinapage.
Thefactthatdigitalimageprocessingandvisualizationofonemillionmangapagesdatasetmake
usquestiontheverybasicconceptofhumanitiesandculturalcriticismisatleastasimportantas
anyparticulardiscoverieswecanmakeaboutthisdataset.Itillustrateshowcomputational
analysisofmassiveculturaldatasetshasapotentialtotransformourtheoreticaland
methodologicalparadigmsforstudyingculture.
7/27/2019 2011.How to Compare One Million Images
http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 38/40
38/40
DEFAMILIARISATIONWITHCOMPUTERS
AlexanderRodchenko.PineTreesinPushkinPark.1927.(Gelatinsilverprint.)
Ourmethodologyreliesonstandardtechniquesofdigitalimageanalysiswhichstartedtobe
developedalreadyinthesecondpartofthe1950sandarenoweverywhere–indigitalcameras,
imageeditingsoftwaresuchasPhotoshop,automatedfactories,medicalimagingandallscience
fieldswhichuseimagesassourcesofdata(fromastronomytobiology).However,whenweadopt
thesetechniquesastoolsfortheculturalresearch,weshouldbeclearabouthowtheyanalyze
imagesandwhatdoesitmeaningeneraltoseethroughcomputer“eyes.”Sincethischapteris
focusedonmotivatingandexplainingourmethodingeneralterms,wewouldonlymakeone
observation.Whenwelookatimagesnormally,weexperiencealltheirvisualdimensionsatonce.
Whenweseparatethesedimensionsusingdigitalimageanalysisandvisualization,webreakthis
gestaltexperience.Beingabletoexamineasetofimagesalongasingularvisualdimensionisa
powerfulmechanismofdefamiliarization(“otstranenie”)–adeviceforseeingwhatwecouldhave
notnoticedpreviously.Ifavant‐gardephotographers,designersandfilmmakersofthe1920suchas
Rodchenko,Moholy‐Nagi,Eistenstein,andVertovweredefamiliarizingthestandardperceptionof
visiblerealityusingdiagonalframingandunusualpointsofview,nowwecanusesoftwareto
defamaliarizeourperceptionsofvisualandmediacultures.
7/27/2019 2011.How to Compare One Million Images
http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 39/40
39/40
ACKNOWLEDGMENTS
SoftwareStudiesInitiativeresearchreportedinthisarticlewasmadepossiblebythegenerous
supportprovidedbyCaliforniaInstituteforTelecommunicationandInformation(Calit2),Center
forResearchinComputingandtheArtsatUCSD(CRCA),andChancellorOfficeofUniversityofCalifornia,SanDiego(UCSD).Thedevelopmentofthecustomdigitalimageanalysissoftwareand
theprocessingofmangaimagesetonsupercomputersattheNationalEnergyResearchScientific
ComputingCenter(NERSC)wasfundedby2009HENHumanitiesHighPerformanceComputing
Award“VisualizingPatternsinDatabasesofCulturalImagesandVideo.”Thedevelopmentof
ImagePlotsoftwarewasfundedbyNEH2010‐2011DigitalStartuplevelIIgrant“Interactive
VisualizationofImageCollectionsforHumanitiesResearch.”
CREDITS
ThevisualizationsanddataanalysisandvisualizationtoolsdevelopedbySoftwareStudies
Initiativearetheresultofsystematiccollaborationsbetweenthekeylabmembers:LevManovich,JeremyDouglass,WilliamHuber,andTaraZepel.HyperSpacesoftwareforinteractivemedia
visualizationwasdevelopedbySoYamaoka(Gravitylab,Calit2).Batchimageprocessingtoolswere
developedbySunsernCheamanunkul(PhDstudent,ComputerScienceandEngineering
Department,UCSD).
REFERENCES
Barry,Salt.“TheStatisticalStyleAnalysisofMotionPictures.”1974.FilmQuarterly,28,1:13‐22.
Barry,Salt. FilmStyleandTechnology:HistoryandAnalysis.London:Starword,1992.
Cha,Meeyoung,HaewoonKwak,PabloRodriguez,Yong‐YeolAhn,andSueMoon.“ITube,YouTube,
EverybodyTubes:AnalyzingtheWorld’sLargestUserGeneratedContentVideoSystem.”2007ACM
InternetMeasurementConference.Web.17July,2011.
<http://an.kaist.ac.kr/traces/papers/imc131‐cha.pdf >.
Cinemetrics.lv.BarrySalt’sDatabase.Web.August11,2011.
<http://www.cinemetrics.lv/satltdb.php‐cm>
Crandall,DavidJ.,LarsBackstrom,DanielHuttenlocher,JonKleinberg.Mappingtheworld'sphotos.
18thinternationalconferenceonWorldwideweb,2009.Web.July17,2011.
<http://www.cs.cornell.edu/~dph/papers/photomap‐www09.pdf >.
Douglass,Jeremy,WilliamHuber,LevManovich."Understandingscanlation:howtoreadone
millionfan‐translatedmangapages."ImageandNarrative,12,1(2011),190–228.Brussels.Web.
<http://lab.softwarestudies.com/2010/11/one‐million‐manga‐pages.html>.
Gaborfilter.“GaborFilter.”Wikipedia.Web.August11,2011.
<http://en.wikipedia.org/wiki/Gabor_filter>.
Herring,Susan,LoisAnnScheidt,InnaKouper,andElijahWright.ALongitudinalContentAnalysis
ofWeblogs:2003‐2004.Blogging,CitizenshipandtheFutureofMedia.London:Routledge,2006.
IMDb.IMDbDatabaseStatistics.Web.July17,2011.<http://www.imdb.com/stats>.
7/27/2019 2011.How to Compare One Million Images
http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 40/40
40/40
Kwak,Haewoon,ChanghyunLee,HosungPark,andSueMoon.“WhatisTwitter,aSocialNetworkor
aNewsMedia?”The19thinternationalconferenceonWorldWideWeb,2010.July17,2011.Web.
<an.kaist.ac.kr/~haewoon/papers/2010‐www‐twitter.pdf >.
SoftwareStudiesInitiative.2011.ImagePlot.Opensourcesoftwareforvisualizationoflargeimage
andvideocollections.Web.<http://lab.softwarestudies.com/p/imageplot.html>
McLoud,Scott.UnderstandingComics:TheInvisibleArt.KitchenSinkPress,1993.
Manovich,Lev.“CulturalAnalytics:VisualingCulturalPatternsintheEraof‘MoreMedia’”.Domus,
Spring2009.Web.<http://lab.softwarestudies.com/2009/06/publications.html#CA_Publications>.
Manovich,Lev.2010."MediaVisualization:VisualTechniquesforExploringLargeMedia
Collections."MediaStudiesFutures,ed.KellyGates.Blackwell,forthcoming2012.July17,2011.
Web.< http://manovich.net/DOCS/media_visualization.2011.pdf>.
Moretti,Franco.ConjecturesonWorldLiterature.NewLeftReview1(2000),55–67.July17,2011.
Web.<http://www.newleftreview.org/A2094>.
OneManga.onemanga.comtop50mangalist.August,12,2011.
<http://www.onemanga.com/directory/top/>.
Peters,Carol,ed.MultiMatch:Technology‐enhancedLearningandAccesstoCulturalHeritage.
D1.1.3–StateoftheArtreportsection.2006.July17,2011.Web.
<http://www.multimatch.eu/docs/publicdels/D1.1.2.pdf >
Treisman,AnneandGarryGelade(1980)."Afeature‐integrationtheoryofattention."Cognitive
Psychology,Vol.12,No.1,pp.97‐136.
Ward,L.M.(December2003)."Synchronousneuraloscillationsandcognitiveprocesses".Trends
Cogn.Sci.7(12):553–9.
Williams,D.,N.Martins,M.ConsalvoandJ.Ivory.2009.“TheVirtualCensus:
RepresentationsofGender,RaceandAgeinVideoGames.”NewMedia&Society.11,
p.815‐834.
Yamaoka,S.,L.Manovich,J.Douglass,F.Kuester.2011.“CulturalAnalyticsinLarge‐Scale
VisualizationEnvironments.”SubmittedtoComputerGraphicsForum.