Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Big Data Exam | ETH Zürich | February 9, 2017 | Answer sheet Name: ____________________________ Legi-Number: ______________________ 1. A □ B □ C □ D □ 2. A □ B □ C □ D □ 3. A □ B □ C □ D □ 4. A □ B □ C □ D □ 5. A □ B □ C □ D □ 6. A □ B □ C □ D □ 7. A □ B □ C □ D □ 8. A □ B □ C □ D □ 9. A □ B □ C □ D □ 10. A □ B □ C □ D □ 11. A □ B □ C □ D □ 12. A □ B □ C □ D □ 13. A □ B □ C □ D □ 14. A □ B □ C □ D □ 15. A □ B □ C □ D □ 16. A □ B □ C □ D □ 17. A □ B □ C □ D □ 18. A □ B □ C □ D □ 19. A □ B □ C □ D □ 20. A □ B □ C □ D □ 21. A □ B □ C □ D □ 22. A □ B □ C □ D □ 23. A □ B □ C □ D □ 24. A □ B □ C □ D □ 25. A □ B □ C □ D □ 26. A □ B □ C □ D □ 27. A □ B □ C □ D □ 28. A □ B □ C □ D □ 29. A □ B □ C □ D □ 30. A □ B □ C □ D □ 31. A □ B □ C □ D □ 32. A □ B □ C □ D □ 33. A □ B □ C □ D □ 34. A □ B □ C □ D □ 35. A □ B □ C □ D □ 36. A □ B □ C □ D □ 37. A □ B □ C □ D □ 38. A □ B □ C □ D □ 39. A □ B □ C □ D □ 40. A □ B □ C □ D □
41. A □ B □ C □ D □ 42. A □ B □ C □ D □ 43. A □ B □ C □ D □ 44. A □ B □ C □ D □ 45. A □ B □ C □ D □ 46. A □ B □ C □ D □ 47. A □ B □ C □ D □ 48. A □ B □ C □ D □ 49. A □ B □ C □ D □ 50. A □ B □ C □ D □ 51. A □ B □ C □ D □ 52. A □ B □ C □ D □ 53. A □ B □ C □ D □ 54. A □ B □ C □ D □ 55. A □ B □ C □ D □ 56. A □ B □ C □ D □ 57. A □ B □ C □ D □ 58. A □ B □ C □ D □ 59. A □ B □ C □ D □ 60. A □ B □ C □ D □ 61. A □ B □ C □ D □ 62. A □ B □ C □ D □ 63. A □ B □ C □ D □ 64. A □ B □ C □ D □ 65. A □ B □ C □ D □ 66. A □ B □ C □ D □ 67. A □ B □ C □ D □ 68. A □ B □ C □ D □ 69. A □ B □ C □ D □ 70. A □ B □ C □ D □ 71. A □ B □ C □ D □ 72. A □ B □ C □ D □ 73. A □ B □ C □ D □ 74. A □ B □ C □ D □ 75. A □ B □ C □ D □ 76. A □ B □ C □ D □ 77. A □ B □ C □ D □ 78. A □ B □ C □ D □ 79. A □ B □ C □ D □ 80. A □ B □ C □ D □
81. A □ B □ C □ D □ 82. A □ B □ C □ D □ 83. A □ B □ C □ D □ 84. A □ B □ C □ D □ 85. A □ B □ C □ D □ 86. A □ B □ C □ D □ 87. A □ B □ C □ D □ 88. A □ B □ C □ D □ 89. A □ B □ C □ D □ 90. A □ B □ C □ D □ 91. A □ B □ C □ D □ 92. A □ B □ C □ D □ 93. A □ B □ C □ D □ 94. A □ B □ C □ D □ 95. A □ B □ C □ D □ 96. A □ B □ C □ D □ 97. A □ B □ C □ D □ 98. A □ B □ C □ D □ 99. A □ B □ C □ D □ 100. A □ B □ C □ D □ 101. A □ B □ C □ D □ 102. A □ B □ C □ D □ 103. A □ B □ C □ D □ 104. A □ B □ C □ D □ 105. A □ B □ C □ D □ 106. A □ B □ C □ D □ 107. A □ B □ C □ D □ 108. A □ B □ C □ D □ 109. A □ B □ C □ D □ 110. A □ B □ C □ D □ 111. A □ B □ C □ D □
Big Data Final Exam
9. February 2017
Department of Computer Science ETH Zürich
Rules (please read carefully)
• You have 90 minutes for the exam. • Please write your name and Legi number on the answer sheet now but do not
start reading or answering any questions until the proctors told you to start. • Any answers or marks on other pages than the answer sheet will be
completely ignored even if correct.
• This is a multiple-choice examination. Each question comes with four proposed answers (A, B, C and D), among which exactly one is correct. Picking the correct answer for a question will give you one point.
• The exam consists of 111 questions. The maximum number of points that can be achieved is 111.
• There is no penalty for incorrect answers. We thus encourage trying to answer every question.
• Please write your answers on the answer sheet by filling the relevant square
completely (A, B, C or D) for each question. Do not use any other way to select an answer (ticking, crossing, etc.). For example, picking A: 0.A ■ B □ C □ D □
• If you pick several answers for a question, no point will be granted for this
question. A ■ B □ C ■ D □ • If you picked an answer and then would like to change it, make it very clear
with an additional circle around the newly chosen answer. Any ambiguity will be handled with no point granted. For example, for changing from A to C: 0.�A ■ B □ C ■ D □
• Only use a black or a blue pen. DO NOT use a pencil. DO NOT use red and green pens.
1. Whenwastherelationalalgebrainvented?A. Intheearly1970sB. Intheearly1990sC. Intheearly2000sD. Intheearly1980s
2. WhichoneofthesedatabaseparadigmsisnotconsideredtobeNoSQL?
A. TriplestoresB. WidecolumnstoresC. RelationaldatabasesD. Documentstores
3. WhichoneoftheseisnotoneofthethreeVsofBigData?
A. VolumeB. VariabilityC. VelocityD. Variety
4. Whatisdatatotality?
A. ItiswhendataiscontainedonasinglemachineB. Itiswhenadatasetcontainsallrelevantdata(e.g.,allhotelsontheplanet)C. Itisamade-upexpressionD. Itiswhendataisaggregated(defaultOLAPdimensions)
5. HowmanyMegabytes(MB)arethereinaPetabyte(PB)?
A. 100,000B. 1,000,000,000,000C. 1,000,000,000D. 1,000,000,000,000,000
6. Whyhasqueryingdatainparallelbecomeincreasinglynecessary?
A. BecauselatencydecreasedfasterthanthroughputincreasedB. BecausestoragecapacityincreasedfasterthanlatencydecreasedC. BecausestoragecapacityincreasedfasterthanthroughputD. Becausethroughputincreasedfasterthanlatencydecreased
7. Whyhasbatchprocessingbecomeincreasinglynecessary?
A. BecausethroughputincreasedfasterthanlatencydecreasedB. BecausestoragecapacityincreasedfasterthanlatencydecreasedC. BecausestoragecapacityincreasedfasterthanthroughputD. Becauselatencydecreasedfasterthanthroughputincreased
8. Threeoftheseconceptsplaysimilarrolesinadatabase,butnotthefourthone.Spottheoddoneout.A. FieldB. DocumentC. RecordD. Row
9. Whichoneoftheseconceptsisnotcommonlyusedasasynonymfor"field"?A. PrimarykeyB. AttributeC. PropertyD. Column
10. Whatdoesnotdefinearelationoversets(underlyingrelationaldatabases)?
A. AblackboxthatreturnsaBooleanforanytuplebuiltwithanelementfromeachoneofthesesets.
B. AsubsetoftheCartesianproductofthesetsC. TheintersectionofthesetsD. Asymmetricgeneralizationoftheconceptoffunction,inwhichelementsinthefirstset
canberelatedtoseveralelementsinothersets.
11. Whichoneoftheserelationalalgebraoperationsisconsideredtobethemostexpensive?A. SelectB. Count(withoutgrouping)C. JoinD. Project
Considerthefollowingtables.Table1
StudentID Name Firstname LectureID Lecturename1 Einstein Albert A Introductiontorelativity2 Turing Alan A Introductiontorelativity1 Einstein Albert B Completeness2 Turing Alan C Databases
Table2
ID Name Firstname Lectures1 Einstein Albert ID:A
Name:IntroductiontorelativityID:BName:Completeness
2 Turing Alan ID:AName:IntroductiontorelativityID:CName:Databases
12. Whichoneofthesetablesisnotinanynormalform?
A. Table1onlyB. Table2onlyC. Table1andtable2D. None
13. Whichoneofthesetablesisinfirstnormalform?
A. Table1onlyB. Table2onlyC. Table1andtable2D. None
14. Whichoneofthesetablesisinsecondnormalform?
A. Table1onlyB. Table2onlyC. Table1andtable2D. None
15. Whatisthemostcompellingreasonfordenormalizingdata?
A. SothattheschematakeslessspaceB. SothatqueriesarefasterC. SothatthedatatakeslessspaceD. Sothatqueriesareshorter
16. Whichoneofthesequeriesperformsaselection?
A. SELECTfirst,lastFROMpeopleB. SELECT*WHEREzipcode=8092FROMpeopleC. SELECTlast,COUNT(first)FROMpeopleGROUPBYlastD. Allofthethree
17. Whichpartofaquerylanguagedealswithschemas?
A. DataDefinitionLanguage(DDL)B. CRUDC. DataManipulationLanguage(DML)D. FLWOR
18. IntheACIDpropertiesandtheCAPtheorem,CforconsistencymeansthesamekindofconsistencyinbothACIDandCAP.Trueorfalse?A. ItdependsonthesizeofthedatabaseB. TrueC. ItdependsontheunderlyingparadigmD. False
19. WhatdoesPmeaninCAP?
A. PartitiontoleranceB. PartitionavailabilityC. PartitionconsistencyD. Partitionfailure
20. Whatisthecommonnamefordatabaseswithawrite-intensiveworkload,regardlessoftheparadigm?A. OLAPB. ROLTPC. OLTPD. ROLAP
21. Whichoneofthefollowingisnotastoragetechnology?
A. S3B. MapReduceC. AlocalfilesystemD. HDFS
22. Anencodingisafunctionthat...
A. MapscharacterstocharactersB. MapscharacterstobitsC. MapsbitstobitsD. Mapsadatamodeltocharacters
23. Whichoneofthefollowingisnotasyntax(inthesenseofdatasyntax)?
A. JSONB. XMLC. ASCIID. XBRL
24. Whatis,amongthesenumbers,themaximumnumberoffilesthattypicallyfitsonafilesystem(localorHDFS)?A. 10,000B. 1,000,000C. 100,000D. 10,000,000,000
25. Whatisthebasicprinciplebehindtheuseofhardwareforlargeamountsofdata?
A. Buyregularservers,butwithverylargeandmoreexpensiveharddisksB. Buyserverswithveryhigh-endCPUandregularharddisksC. BuyserverswithhundredsofTerabytesofRAManduseitinsteadofadisktostorefilesD. Buylotsofcheap,commodityhardware
26. Whichismoresustainableformanagingnodesinaclusteraslargeramountsofdataareinvolved?A. NeitherscalingoutnorscalinguphelpsB. ScalingoutC. ScalingupandscalingoutareidenticalD. Scalingup
27. "Youcanhaveasecondcomputeronceyou’veshownyouknowhowtousethefirstone."ThisquotebyPaulBarhamisverywise,butwhatdoesitmeanforadatascientist?A. CloudprovidersrequirepassinganexambeforetheyletauserscaleupHadoopto
severalnodesinacluster.B. ThisisirrelevanttoBigData.C. Allnodesinadataclusterarethesame.D. Youshouldfirsttrytomakealldatafitonasinglemachinebywritingbettercode,
beforemovingtoaclusterwithmultiplenodes.
28. Whichoneofthesestatisticsisclosesttowhatiscommonlydoneindatacentersinasinglecluster?A. Upto1,000,000machineswitheach100coresand10TBofRAMB. Upto10,000machineswitheach32coresand100GBofRAMC. Upto1000machineswitheach1coreand100MBofRAMD. Upto10machineswitheach10000coresand10PBofRAM
29. Whatarerackunits?
A. Thisisastandardizedunitofmeasurementfortheheightofnodesinarack.B. ThisisaunitofCPUperformancethatistunedforBigDataapplications.C. Theseareservernodes.D. Thisisasynonymforracks.
30. WhichoneoftheseisnothardwarecommonlyfoundinracksusedforHDFS?
A. RAIDcontrollersB. StoragenodesC. NetworkswitchnodesD. Servernodes
31. HowdoesoneidentifyanobjectinS3?
A. Withafilesystempath.B. Withabucketnameandanobjectname.C. Withadatacenteraddressandageographicregion.D. WithaUUID.
32. AmazondocumentstheavailabilityofS3as99.99%.Howmuchtimeinayearcantheyaffordtobedown?A. 5.2minutesB. 52minutesC. 52hoursD. 52seconds
33. WhichoneoftheseisnotanHTTPmethodusedinRESTAPIs?
A. POSTB. REMOVEC. GETD. PUT
34. Whyisitcommonpracticetoreplicatedataacrossdifferentregionsoftheglobe?
A. Tomaximizelatency.B. Tomakesurenonodefailureshappenatall.C. Toberesilienttonodefailures.D. Toberesilientincaseofanaturalcatastrophe.
35. Whichhasthelowestlatency?
A. AlocalMongoDBpointqueryB. GettinganobjectfromAmazonS3overtheAtlanticC. TheybothhavethesameorderofmagnitudeoflatencyD. Latencyisirrelevanttoretrievingdata,itonlymattersforhostingwebsites.
36. WhichofthesesettingscanHDFSnothandlewell?A. BillionsoffilesB. Afewterabyte-sizedfilesC. Afewpetabyte-sizedfilesD. Millionsoffiles
37. Whichoneofthesereasonsismostcompellingtodemonstratetheimportanceofmanagingnodefailureswhenqueryinglargedatasets?A. Becauseanodeneverfailsalone:thewholerackthenfailsB. BecausewecanC. Becauseinlarge-sizeclusters,itisalmostguaranteedthatsomenodeswillfailfrom
timetotime.D. BecausefailurerecoverycomesforfreeusingRAIDstorageinclusters
38. WhichoneofthesefeaturesdoesHDFSnotsupportefficiently?
A. ReplicationB. RandomaccesstoapartofafileC. SplittingafileacrossnodesD. Downloadingamedium-sizedfiletoalocalfolder
39. Whichwasinventedfirst:HDFS(Hadoop)orGFS(Google)?
A. GFSB. BothwereinventedatthesametimeC. HDFSD. Theyarethesametechnology,sothisquestiondoesnotmakesense
40. WhatisatypicalblocksizeforHDFS?
A. 500kBB. 4kBC. 1TBD. 128MB
41. WhicharchitecturedoesanHDFSclusterusetoconnectthenodesonalogicallevel?A. ArandomlygeneratedtopologyB. Master-slaveC. Peer-to-peerD. Snow-flake
42. Onwhichnode(s)isthefilenamespacestoredinHFDS?
A. OntheNameNodeB. OnaseparateZooKeeperserviceC. OnallDataNodesD. Ontheclient
43. Howdoesaclientfetchthecontents(bits)ofanHDFSfile,ingeneral?
A. ViatheNameNodeB. ViaadedicatedstreamingservicethathidesthecomplexityofthearchitectureC. DirectlytoandfromDataNodesD. Withdirectaccesstothenodes'localfilesystem
44. HowdoestheDataNodeprotocolwork?
A. TheNameNodealwaysinitiatestheconnection,andtheDataNodesonlyanswerB. TheDataNodealwaysinitiatestheconnection,andtheNameNodeonlyanswersC. TheclientinitiatestheconnectiontotheDataNodeD. BoththeNameNodeandtheDataNodemayinitiatetheconnection
45. WhatisthedefaultandcommonlyusedplacementstrategyforreplicatingblocksinHDFS?Thefirstreplicaisonthesamenodeastheclientifpossible,thesecondreplicaisonanodeonadifferentrack,andthethirdreplicais...A. Inadifferentdatacenter.B. Onthesamenodeasthesecondreplica.C. Onadifferentnodeonthesamerackasthesecondreplica.D. Onadifferentnodeonthesamerackasthefirstreplica.
46. InHDFS,whyisthesecondreplica,bydefault,onanodeonadifferentrack?
A. Becauseitwasdesignedarbitrarilythatwayandnobodychangedit.B. Becauseitreduceslatency.C. Becauseitavoidsaconcentrationofblocksonthesamerack.D. Becauseitisresilientagainstanaturalcatastrophe.
47. WhichoneoftheseisnotpersistedonNameNodes?
A. ThefilenamespaceB. ThemappingfromfilestoblocksC. ThemappingfromblockstoDataNodesD. Theyallarepersisted
48. Whatisthemostcompellingreasonwhy,inHDFS,therearebackupNameNodesthatcantakeoverincaseofafailure?A. Becauseitprovidesaddedsecurity.B. BecauseitmakesHDFSmoreresilientagainstdataloss.C. Becauseitwouldbetoomuchtohandleforasinglenode.D. Becauseitcantakeasmuchas30minutestorestartafailedNameNode.
49. Columnstoreshaveadatamodelbasedon...
A. GraphsB. CubesC. TreesD. Tables
50. Whichoneofthesetechnologiesisnotacolumnstore?
A. neo4jB. GoogleBigTableC. CassandraD. ApacheHBase
51. HowdowidecolumnstoreslikeHBaseavoidjoins?A. Byde-normalizingtablesandstoringtogetherwhatisaccessedtogetherB. Theydonot,theynativelyandefficientlysupportjoinsC. Withindex-freeadjacencyD. Byusingtreestructurestoembedpre-computedjoins
52. WhatdorelationaldatabasesandawidecolumnstorelikeHBasenothaveincommon?A. RowsB. ProjectionandselectionC. ColumnfamiliesD. Columns
53. WhatareregionsinHBase?
A. Asynonymforcolumnfamilies,whicharespreadacrossmachinesB. Asynonymforcolumnfamilies,whicharespreadacrossgeographiclocationsover
severalcontinentsC. Agroupofcontiguousrows,identifiedwithaninclusiveminimumandanexclusive
maximumD. Agroupofcontiguousrows,identifiedwithanexclusiveminimumandaninclusive
maximum
54. WhenHBaseisusedwithHDFS,RegionServersareoftenco-locatedwith...A. NameNodesB. DataNodesC. SecondaryNameNodesD. BackupNameNodes
55. InHBase,whatisthegranularityofthephysicalstoragelayer(HFile)withrespecttothedatamodel?A. AnHFilestoresacolumnfamilywithinaregionB. AnHFilestoresonerowC. AnHFilestoresawholeregionD. AnHFilestoresawholecolumnfamily
56. HBasecanrunonHDFS,whichhashighlatency.HowcomeHBasestillhaslowlatency,allowingreal-timequeries?A. Lowlatencyappliestofiles,butnottotablesbecauseofthedataindependence
paradigmB. Queriesarepre-compiledandoptimizedC. TheRegionServersstoredataredundantlyandlocallyonRAIDD. ThedataisheldinmemoryontheRegionServers(MemStore,cache)
57. ImagineanHBasetablewithahistoryofreadandwritequeries.Nowanunanticipatedreadquerycomesin.WheredoesHBaseneedtoretrievematchingdata(KeyValues)from?A. BoththememoryandthefilesystemB. TheCPUcacheonlyC. ThefilesystemonlyD. Thememoryonly
58. WhatisaBloomfilter?
A. ItisadatastructurethatcantellwithcertaintyifanelementisorisnotinasetB. ItisadatastructurethatcantellwithariskoferrorifanelementisorisnotinasetC. Itisadatastructurethatcantellwithcertaintythatanelementisinaset,butwitha
riskoferrorthatanelementisnotinaset.D. Itisadatastructurethatcantellwithariskoferrorthatanelementisinaset,andwith
certaintythatanelementisnotinaset.
59. InMapReduce,theshufflingphasetakesplace...A. BeforeandrightaftermappingB. BeforemappingC. AftermappingandbeforereducingD. Afterreducing
60. Fromadatamodelperspectiveandingeneral,MapReducemanipulates,asinputandoutput:A. Bagsofkey-valuepairsB. BagsofgraphedgesC. BagsofindividualvaluesD. Bagsoftrees
61. InMapReduce,whyisitnecessarytosortandpartitiontheoutputofthemappersbeforeitcanbeprocessedbythereducers?A. BecauseitmakesitfasterB. BecauseallvalueswiththesamekeymustbeprocessedbythesamereducerC. BecausethedataisnotreplicatedD. Becausethedataisnotshardedintosplitsuntilafterthemappingphaseiscompleted
62. WhichoneoftheseinputformatscannotbeprocessedbyMapReduceatall?
A. UnstructuredlinesoftextB. TablesC. Key-valuepairsD. Noneofthem:allthreeformatsareinfactsupportedbyMapReduce
63. WhichoneofthesecriteriaisnotrequiredforthereducefunctiontobealsousedasacombinerfunctioninMapReduce?A. Thefunctionmustmapkey-valuespairstokey-valuepairsB. ThefunctionmustbecommutativeC. ThefunctionmustbeassociativeD. Thefunctionmustbeidempotent
64. CanitbethattheboundariesofaMapReducesplitdonotmatchtheboundariesoftheunderlyingHDFSblocks?A. Yes,andthisissolvedatthesoftware(Java)levelbystartingreadingfromthenext
block,evenifitisstoredonadifferentnode.B. Yes,butthisisrareandinthiscaseMapReducecannotrunefficientlyandthedatamay
needtobesplitagain.C. No,thisisarequirementD. No,itcannothappenbecauseofthewaysplitsareconstructedfromblocks.
65. WhathappensinMapReduceiftheproducedpairsexceedsomepercentageofthebufferinmemory?A. ThepairsaretransferredtothenearestmappingnodeB. ThepairsarespilledtothelocaldiskC. ThepairsareforciblysenttothereducerD. Thejobfailsandmustberestartedwithsmallersplits.
66. WhatissueofMapReduce1doesthenewYARN-basedMapReduce2notsolve?
A. Theutilizationoftaskslotshadtobestaticallydeterminedinadvance.B. Ajobhadtohaveamap-shuffle-reducepattern,sothatsomequeriesneededseveral
jobs.C. TheJobTrackerwasabottleneck.D. TheJobTrackerhadtoomany,diverseresponsibilitiesonitsplate.
67. InYARN,whatisthenameofthecontainerthatisassignedtheresponsibilitytohandleallothercontainersforanapplication?A. ApplicationMasterB. ResourceManagerC. NodeManagerD. Scheduler
68. WhichofthefollowingisnotastandardschedulingstrategyforYARN?
A. CapacitySchedulerB. FairSchedulerC. FIFOSchedulerD. LatencyScheduler
69. Whatalgorithminvolvesapplyingthisformulauntilitconverges?
A. PageRankB. AjoinC. CapacitySchedulingD. Tabledenormalization
70. WhattriggerstheevaluationofRDDsinSpark?
A. AtransformationB. AcreationC. AnactionD. Allofthethree
!"#$ = &0.85+ + 0.15 .1 … 1… … …1 … 1
01!"
71. WhatdoesthefollowingSparkqueryoutput?
val rdd1 = sc.parallelize( List("1,2,3,4,5,6", "3,4,5,6,7,8", "6,7,8,9,10") ) val rdd2 = rdd1.flatMap( value => value.split(",") ) rdd2.countByValue()
A. (1,1),(1,2),(2,3),(2,4),(2,5),(3,6),(2,7),(1,8),(1,9),(1,10)B. 1,1,2,2,2,3,2,1,1,1C. (1,1),(2,1),(3,2),(4,2),(5,2),(6,3),(7,2),(8,1),(9,1),(10,1)D. 1,2,3,4,5,6,3,4,5,6,7,8,6,7,8,9,10
72. Whichofthefollowingisnotatransformation?
A. flatMapB. intersectionC. filterD. count
73. WhichSparktransformationismostsuitabletoemulatearelationalprojection?
A. ReduceB. MapC. SampleD. Filter
74. WhatisastageinSpark?
A. It'sthephysicalimplementationofanRDDB. It'sagroupoftaskswiththesamekind(allcreations,alltransformationsorallactions)C. It'sasynonymfortransformationD. It'sagroupoftransformationsthatdoesnotneedshuffling,sothatthesamegroupof
machinescanbereusedtosavenetworkbandwidth
75. Atwo-wayjoininvolvesalotofshufflingfrombothitsinputs.Howcanthismostlikelybeoptimizedingeneral,forasinglejobexecution?A. Byadoptingthepartitioncriterioninanothershufflehappeningearlierinoneofthe
inputs.B. Itcannot,joinsareexpensiveandrequireshuffingonbothsidesanyway.C. Byusingstages.D. BypersistingbothinputRDDs.
/* Under an Apache 2.0 License from Spark */ val lines = spark.read.textFile(args(0)).rdd val links = lines.map{ s => val parts = s.split("\\s+") (parts(0), parts(1)) }.distinct().groupByKey().cache() var ranks = links.mapValues(v => 1.0) val contribs = links.join(ranks).values.flatMap{ case (urls, rank) => val size = urls.size urls.map(url => (url, rank / size)) } ranks = contribs.reduceByKey(_ + _).mapValues(0.15 + 0.85 * _) val output = ranks.collect() output.foreach(tup => println(tup._1 + " has rank: " + tup._2 + "."))
76. WhichonesofthefollowinggraphsdepictstheDAGofRDDsfortheabovePageRankjob,includingtheirmostaccurateandlikelygroupinginstages?A.
D.
77. Whichisthemostcorrectassociationbetweenthesedatashapesandsyntaxes?A.
Tables CSVTrees RDFGraphs XMLCubes XBRL
B. Tables XBRLTrees RDFGraphs XMLCubes CSV
C. Tables CSVTrees XMLGraphs RDFCubes XBRL
D.
Tables XBRLTrees XMLGraphs RDFCubes CSV
78. Aresemi-structureddocumentsinanynormalform?
A. Yes,inthefirstnormalformonly.B. Yes,inthethirdnormalform.C. No.D. Yes,inthesecondnormalformbutnotinthethird.
79. WhatisthemostadequateadjectivetodescribeadocumentthatsuccessfullyconformstotheXMLspecification?A. Well-formedB. CorrectC. ValidD. Anyoftheabove
80. Whichoneofthesedocumentsiswell-formed?
A. <Exam xmlns="http://www.ethz.ch/BigData"> <Questions> <Question id="xml"> Which one of these documents in well-formed (including namespace-well-formedness)? <Answers> <1stAnswer>The answer A</1stAnswer> <2ndAnswer>The answer B</2ndAnswer> <3ndAnswer>The answer C</3rdAnswer> <4thAnswer>The answer D</4stAnswer> </Answers> </Question> </Questions> </Exam>
B. <Exam xmlns="http://www.ethz.ch/BigData"> <Questions> <Question id="xml"> <Title>Which one of these documents in well-formed (including namespace-well-formedness)?<Title> <Answers> <1stAnswer>The answer A</1stAnswer> <2ndAnswer>The answer B</2ndAnswer> <3ndAnswer>The answer C</3rdAnswer> <4thAnswer>The answer D</4stAnswer> </Answers> </Question> </Questions> </Exam>
C. <Exam xmlns="http://www.ethz.ch/BigData"> <Questions> <Question id="xml"> <Title>Which one of these documents in well-formed (including namespace-well-formedness)?<Title> <Answers> <Answer>The answer A</Answer> <Answer>The answer B</Answer> <Answer>The answer C</Answer> <Answer>The answer D</Answer> </Answers> </Question> </Questions> </Exam>
D. <Exam xmlns="http://www.ethz.ch/BigData"> <Questions> <Question id="xml"> Which one of these documents in well-formed (including namespace-well-formedness)? <Answers> <Answer>The answer A</Answer> <Answer>The answer B</Answer> <Answer>The answer C</Answer> <Answer>The answer D</Answer> </Answers> </Question> </Questions> </Exam>
81. Whichoneofthesestatementsisnottrue?A. AnXMLdocumentmustcontainexactlyonetop-levelelement.B. Atop-levelXMLelementmustcontainanattributecalled"doctype".C. AnXMLdocumentmaycontainaDOCTYPEevenifthereisnointernalDTDsubset.D. AnXMLdocumentmaycontainanynumberofcommentsatthetop-level.
82. WhichisthemostadequatedatastructuretodescribetheattributesofanXMLelement?A. AlistofpairsB. AmultisetC. AsetD. Anassociativearray
83. Howmustanampersand(&)beescapedinelementcontent,inXML?
A. &B. &ersand;C. %amp;D. %ampersand;
84. WhichoneofthesetypesisnotsupportednativelybyJSON?A. StringB. NumbersC. QNamesD. Arrays
85. WhichofthefollowingisnotadatamodelforXMLdocuments?A. XPathandXQueryDataModel(XDM)B. XMLInformationSet(Infoset)C. Post-Schema-ValidationInfoset(PSVI)D. AllthreeareXMLdatamodels
86. WhichofthesestatementsholdsfortheXPathandXQueryDataModel?
A. Sequencescanbenested,asequenceofoneitemisnotthesameasthatitemB. Sequencesarealwaysflat,asequenceofoneitemisidentifiedwiththatitemC. Sequencescanbenested,asequenceofoneitemisidentifiedwiththatitemD. Sequencesarealwaysflat,asequenceofoneitemisnotthesameasthatitem
87. WhichoneofthefollowingisnotanatomictypeinXML?
A. ElementnodeB. StringC. DateD. Integer
88. CompletethistablewithtypecardinalitiesinXQuery:
Cardinality Symbol Example(i) "*" xs:integer*(ii) "+" node()+(iii) "?" xs:boolean?(iv) none xs:string
A. (i)Oneormore(ii)Zeroormore(iii)Zeroorone(iv)oneB. (i)Zeroormore(ii)Oneormore(iii)Zeroorone(iv)oneC. (i)Oneormore(ii)Zeroormore(iii)Zero(iv)ZerooroneD. (i)Zeroormore(ii)Oneormore(iii)Zero(iv)Zeroorone
89. CananXMLdocumentbevalidagainstaschemaifitisnotwell-formed?
A. Yes,itcanhappen.B. Itdependsontheschema.C. No,validityandwell-formednessistheoneandsamething.D. No,never,butadocumentcanbewell-formedandnotvalid.
90. Whichonesofthefollowingdocumentsisvalidagainstthefollowingschema?<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="foo"> <xs:complexType> <xs:sequence> <xs:element name="element" type="xs:date"/> <xs:element name="other-element" type="xs:date" minOccurs="0"/> <xs:element name="bar" type="xs:boolean" maxOccurs="2"/> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
A. <foo> <element>2017-02-09</element> <bar>true</bar> <bar>false</bar> </foo>
B. <foo> <element>today</element> <other-element>true</other-element> <other-element>string</other-element> <bar>false</bar> </foo>
C. <foo> <element>2017-02-09</element> <other-element>true</other-element> <other-element>string</other-element> <bar>false</bar> </foo>
D.
<foo> <element>today</element> <bar>true</bar> <bar>false</bar> </foo>
Givenisthefollowingdocumentstoredas"books.xml".
<!DOCTYPE books> <books> <book year="1998"> <title>Digital Fortress</title> <author>Dan Brown</author> </book> <book year="2001"> <title>Deception Point</title> <author>Dan Brown</author> </book> <book year="2000"> <title>Angels and Demons</title> <author>Dan Brown</author> </book> <book year="2003"> <title>The Da Vinci Code</title> <author>Dan Brown</author> </book> <book year="2009"> <title>The Lost Symbol</title> <author>Dan Brown</author> </book> <book year="2013"> <title>Inferno</title> <author>Dan Brown</author> </book> <book year="2017"> <title>Origin</title> <author>Dan Brown</author> </book> <book year="2999"> <title>Timeline</title> <author>Michael Crichton</author> </book> <book year="2002"> <title>Prey</title> <author>Michael Crichton</author> </book> <book year="2004"> <title>State of fear</title> <author>Michael Crichton</author> </book> <book year="2006"> <title>Next</title> <author>Michael Crichton</author> </book> <book year="2009"> <title>Pirate Latitudes</title> <author>Michael Crichton</author> </book> </books>
91. Whatdoesthefollowingqueryoutput?
(for $book in doc("books.xml")/books/book group by $year := $book/@year let $count := count($book) order by $count descending return $year)[1]
A. 2B. <bookyear="2009"><title>PirateLatitudes</title><author>Michael
Crichton</author></book>C. 2009D. 2999
92. WhichXPathqueryoutputstheonebookin"books.xml"withamistakenpublicationdate?A. doc("books.xml")/books/book[xs:integer(@year) gt 2100] B. doc("books.xml")/@year/data()[. gt 2100] C. doc("books.xml")/books/book/@year/data()[. gt 2100] D. doc("books.xml")[xs:integer(@year/data()) gt 2100]
93. Whichlinesofthefollowingschemaneedtobecorrectedforittosuccessfullyvalidatetheinstance"books.xml"?
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="books"> <xs:complexType> (1) <xs:sequence> <xs:element name="book" maxOccurs="unbounded"> (2) <xs:complexType> (1) <xs:sequence> <xs:element name="title" type="xs:string"/> (3) <xs:element name="author" type="xs:string"/> </xs:sequence> <xs:attribute name="year" type="xs:date"/> (4) </xs:complexType> (1) </xs:element> (2) </xs:sequence> </xs:complexType> (1) </xs:element> </xs:schema>
A. Theonemarkedwith(4)B. Allthosemarkedwith(1)C. Allthosemarkedwith(2)D. Theonemarkedwith(3)
94. Whichoneofthoseisnotabuilt-infunctioninXQuery,i.e.,whichofthosecanbeusedasakeywordwithoutparentheses?A. sumB. notC. trueD. and
95. Whichoneofthesestatementsdoesnotholdforbothvaluecomparisons(eq,ne,etc.)andgeneralcomparisons(=,<,etc.)?A. BotharetransitiveB. Bothtakesequencesofitemsasleft-hand-sideandright-hand-sideinputsC. BothcanthrowanerrorincaseofmismatchingtypesD. Bothcanoutputtrue,falseorthrowanerror.
96. Whichofthefollowingisnotacharacterencoding?
A. ISOLatin1B. ASCIIC. XMLD. UTF-8
97. Whichoneofthesemappingsfromtree-shapeddata,suchasXML,torelationaltables,alwaysrequirespriorknowledgeaboutthedatasuchasfieldnamesandstructurallayout?A. Schema-basedshreddingB. TreeencodingC. EdgeshreddingD. Noneofthemrequirespriorknowledge
98. AssumeaverylargeMongoDBcollectionnamed"books"thathasatreeindex
{ "Title": 1, "Year": -1}
Whichoneofthesequeriescannotbeexecutedefficientlyusingthisindex?
A. db.books.find({"Title":"Inferno"})B. db.books.find({"Year":2016})C. db.books.find({"Title":"Inferno","Year":2016,"Author":"DanBrown"})D. db.books.find({"Year":2016,"Title":"Inferno"})
99. InMongoDB,doeseachreplicasetcontainallofthedatainacollection?
A. Never.B. Always.C. Itdependsontheschemaofthecollection.D. Notifthereismorethanonereplicaset.
100. WhatbestdescribesawriteconcerninMongoDB?A. ItistheprotocolforwritingdatatoMongoDB,whichisablockingcalluntilatleasta
certainnumberofreplicashaveacknowledgedthatthedatawaswritten.B. Itistherequirementthatanyquerywritingtoacollectionmustfollowtheschemaof
thiscollection.C. ItisthesituationinwhichMongoDBstartsbeingemotionalabouttoolargeanamount
ofqueriessentintoolittletime.D. Itistheriskthatdatagetslostincaseaservergoesdown.
101. Whichofthesenodelabellingschemesfortreesisefficientforinsertingnewnodesinthemiddleofexistingsiblings?A. ORDPATHIDsB. IntegerIDsC. DeweyIDsD. Theyareallefficientforinsertions.
102. Whichisnotawayofavoidingexpensivejoins?
A. Pre-computingajoininadvanceifitwillbeneededoftenB. Index-freeadjacency,withforeignkeyreferencesstoredasdirectpointersinmemoryC. DenormalizingthedataD. Storingthedatainthirdnormalform
103. Labeledpropertygraphs,onalogicallevel,aremadeofnodes,properties,labelsand...A. nothingelseB. edgesC. tablesD. matrices
104. Whatisthissyntax?
@prefix buildings: <http://www.ethz.ch/buildings#> . @prefix lectures: <http://www.ethz.ch/lectures#> . @prefix exams: <http://www.ethz.ch/exams#> . @prefix properties: <http://www.ethz.ch/properties#> . exam:big-data properties:is-located buildings:HIL; properties:has-attendence 146. lectures:big-data properties:is-located buildings:HG, buildings:CAB.
A. TurtleB. XMLC. JSOND. XBRL
105. Whichlanguagesaremostappropriateforqueryinggraphdatabases?A. XQuery,JSONiqB. SQLC. Cypher,SPARQLD. Java
106. Whichtechnologiesdomostlikelynotuseamaster-slavearchitecture?
A. Aclassicrelationaldatabaseofthe1990ssupportingROLAPB. Adistributedfilesystemfromthe2000slikeHDFSC. Adocumentstorefromthe2010sD. Agraphdatabaselikeneo4j
107. Whichbestdescribeshowthegraphstructureisstoredinternallyinneo4j?
A. Withalotofdouble-linkedlistsB. WithalotofbinaryencodingsC. WithalotoftablesD. Withalotofmatrices
108. WhichofthesepropertiesleastqualifiesOLAP,asopposedtoOLTP?
A. RedundancyB. Fully,sub-secondinteractiveC. Lotsofreads,fewtonowritesD. Analysisoverbigchunksofdata
109. WhatdoesETLmean?
A. Extract,transform,loadB. Extract,transfer,loadC. Export,transform,loadD. Export,transfer,load
110. Whataretheprimaryactionsonadatacube?
A. NavigateB. SelectandprojectC. JoinandgroupD. Sliceanddice
111. HowaretreeandDAGstructuresstoredintheXBRLsyntax?
A. UsingJSON'snativetreestructureB. UsingXBRL'snativetreestructureC. Usingflatlistsofnodesandedges(XLink)D. TherearenotreestructuresinXBRL,onlydatacubes