27
Big Data Exam | ETH Zürich | February 9, 2017 | Answer sheet Name: ____________________________ Legi-Number: ______________________ 1. A B C D 2. A B C D 3. A B C D 4. A B C D 5. A B C D 6. A B C D 7. A B C D 8. A B C D 9. A B C D 10. A B C D 11. A B C D 12. A B C D 13. A B C D 14. A B C D 15. A B C D 16. A B C D 17. A B C D 18. A B C D 19. A B C D 20. A B C D 21. A B C D 22. A B C D 23. A B C D 24. A B C D 25. A B C D 26. A B C D 27. A B C D 28. A B C D 29. A B C D 30. A B C D 31. A B C D 32. A B C D 33. A B C D 34. A B C D 35. A B C D 36. A B C D 37. A B C D 38. A B C D 39. A B C D 40. A B C D 41. A B C D 42. A B C D 43. A B C D 44. A B C D 45. A B C D 46. A B C D 47. A B C D 48. A B C D 49. A B C D 50. A B C D 51. A B C D 52. A B C D 53. A B C D 54. A B C D 55. A B C D 56. A B C D 57. A B C D 58. A B C D 59. A B C D 60. A B C D 61. A B C D 62. A B C D 63. A B C D 64. A B C D 65. A B C D 66. A B C D 67. A B C D 68. A B C D 69. A B C D 70. A B C D 71. A B C D 72. A B C D 73. A B C D 74. A B C D 75. A B C D 76. A B C D 77. A B C D 78. A B C D 79. A B C D 80. A B C D 81. A B C D 82. A B C D 83. A B C D 84. A B C D 85. A B C D 86. A B C D 87. A B C D 88. A B C D 89. A B C D 90. A B C D 91. A B C D 92. A B C D 93. A B C D 94. A B C D 95. A B C D 96. A B C D 97. A B C D 98. A B C D 99. A B C D 100. A B C D 101. A B C D 102. A B C D 103. A B C D 104. A B C D 105. A B C D 106. A B C D 107. A B C D 108. A B C D 109. A B C D 110. A B C D 111. A B C D

Big Data Exam | ETH Zürich | February 9, 2017 | Answer ... · C. Data Manipulation Language (DML) D. FLWOR 18. In the ACID properties and the CAP theorem, C for consistency means

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Big Data Exam | ETH Zürich | February 9, 2017 | Answer sheet Name: ____________________________ Legi-Number: ______________________ 1. A □ B □ C □ D □ 2. A □ B □ C □ D □ 3. A □ B □ C □ D □ 4. A □ B □ C □ D □ 5. A □ B □ C □ D □ 6. A □ B □ C □ D □ 7. A □ B □ C □ D □ 8. A □ B □ C □ D □ 9. A □ B □ C □ D □ 10. A □ B □ C □ D □ 11. A □ B □ C □ D □ 12. A □ B □ C □ D □ 13. A □ B □ C □ D □ 14. A □ B □ C □ D □ 15. A □ B □ C □ D □ 16. A □ B □ C □ D □ 17. A □ B □ C □ D □ 18. A □ B □ C □ D □ 19. A □ B □ C □ D □ 20. A □ B □ C □ D □ 21. A □ B □ C □ D □ 22. A □ B □ C □ D □ 23. A □ B □ C □ D □ 24. A □ B □ C □ D □ 25. A □ B □ C □ D □ 26. A □ B □ C □ D □ 27. A □ B □ C □ D □ 28. A □ B □ C □ D □ 29. A □ B □ C □ D □ 30. A □ B □ C □ D □ 31. A □ B □ C □ D □ 32. A □ B □ C □ D □ 33. A □ B □ C □ D □ 34. A □ B □ C □ D □ 35. A □ B □ C □ D □ 36. A □ B □ C □ D □ 37. A □ B □ C □ D □ 38. A □ B □ C □ D □ 39. A □ B □ C □ D □ 40. A □ B □ C □ D □

41. A □ B □ C □ D □ 42. A □ B □ C □ D □ 43. A □ B □ C □ D □ 44. A □ B □ C □ D □ 45. A □ B □ C □ D □ 46. A □ B □ C □ D □ 47. A □ B □ C □ D □ 48. A □ B □ C □ D □ 49. A □ B □ C □ D □ 50. A □ B □ C □ D □ 51. A □ B □ C □ D □ 52. A □ B □ C □ D □ 53. A □ B □ C □ D □ 54. A □ B □ C □ D □ 55. A □ B □ C □ D □ 56. A □ B □ C □ D □ 57. A □ B □ C □ D □ 58. A □ B □ C □ D □ 59. A □ B □ C □ D □ 60. A □ B □ C □ D □ 61. A □ B □ C □ D □ 62. A □ B □ C □ D □ 63. A □ B □ C □ D □ 64. A □ B □ C □ D □ 65. A □ B □ C □ D □ 66. A □ B □ C □ D □ 67. A □ B □ C □ D □ 68. A □ B □ C □ D □ 69. A □ B □ C □ D □ 70. A □ B □ C □ D □ 71. A □ B □ C □ D □ 72. A □ B □ C □ D □ 73. A □ B □ C □ D □ 74. A □ B □ C □ D □ 75. A □ B □ C □ D □ 76. A □ B □ C □ D □ 77. A □ B □ C □ D □ 78. A □ B □ C □ D □ 79. A □ B □ C □ D □ 80. A □ B □ C □ D □

81. A □ B □ C □ D □ 82. A □ B □ C □ D □ 83. A □ B □ C □ D □ 84. A □ B □ C □ D □ 85. A □ B □ C □ D □ 86. A □ B □ C □ D □ 87. A □ B □ C □ D □ 88. A □ B □ C □ D □ 89. A □ B □ C □ D □ 90. A □ B □ C □ D □ 91. A □ B □ C □ D □ 92. A □ B □ C □ D □ 93. A □ B □ C □ D □ 94. A □ B □ C □ D □ 95. A □ B □ C □ D □ 96. A □ B □ C □ D □ 97. A □ B □ C □ D □ 98. A □ B □ C □ D □ 99. A □ B □ C □ D □ 100. A □ B □ C □ D □ 101. A □ B □ C □ D □ 102. A □ B □ C □ D □ 103. A □ B □ C □ D □ 104. A □ B □ C □ D □ 105. A □ B □ C □ D □ 106. A □ B □ C □ D □ 107. A □ B □ C □ D □ 108. A □ B □ C □ D □ 109. A □ B □ C □ D □ 110. A □ B □ C □ D □ 111. A □ B □ C □ D □

This page left blank intentionally

Big Data Final Exam

9. February 2017

Department of Computer Science ETH Zürich

Rules (please read carefully)

• You have 90 minutes for the exam. • Please write your name and Legi number on the answer sheet now but do not

start reading or answering any questions until the proctors told you to start. • Any answers or marks on other pages than the answer sheet will be

completely ignored even if correct.

• This is a multiple-choice examination. Each question comes with four proposed answers (A, B, C and D), among which exactly one is correct. Picking the correct answer for a question will give you one point.

• The exam consists of 111 questions. The maximum number of points that can be achieved is 111.

• There is no penalty for incorrect answers. We thus encourage trying to answer every question.

• Please write your answers on the answer sheet by filling the relevant square

completely (A, B, C or D) for each question. Do not use any other way to select an answer (ticking, crossing, etc.). For example, picking A: 0.A ■ B □ C □ D □

• If you pick several answers for a question, no point will be granted for this

question. A ■ B □ C ■ D □ • If you picked an answer and then would like to change it, make it very clear

with an additional circle around the newly chosen answer. Any ambiguity will be handled with no point granted. For example, for changing from A to C: 0.�A ■ B □ C ■ D □

• Only use a black or a blue pen. DO NOT use a pencil. DO NOT use red and green pens.

1. Whenwastherelationalalgebrainvented?A. Intheearly1970sB. Intheearly1990sC. Intheearly2000sD. Intheearly1980s

2. WhichoneofthesedatabaseparadigmsisnotconsideredtobeNoSQL?

A. TriplestoresB. WidecolumnstoresC. RelationaldatabasesD. Documentstores

3. WhichoneoftheseisnotoneofthethreeVsofBigData?

A. VolumeB. VariabilityC. VelocityD. Variety

4. Whatisdatatotality?

A. ItiswhendataiscontainedonasinglemachineB. Itiswhenadatasetcontainsallrelevantdata(e.g.,allhotelsontheplanet)C. Itisamade-upexpressionD. Itiswhendataisaggregated(defaultOLAPdimensions)

5. HowmanyMegabytes(MB)arethereinaPetabyte(PB)?

A. 100,000B. 1,000,000,000,000C. 1,000,000,000D. 1,000,000,000,000,000

6. Whyhasqueryingdatainparallelbecomeincreasinglynecessary?

A. BecauselatencydecreasedfasterthanthroughputincreasedB. BecausestoragecapacityincreasedfasterthanlatencydecreasedC. BecausestoragecapacityincreasedfasterthanthroughputD. Becausethroughputincreasedfasterthanlatencydecreased

7. Whyhasbatchprocessingbecomeincreasinglynecessary?

A. BecausethroughputincreasedfasterthanlatencydecreasedB. BecausestoragecapacityincreasedfasterthanlatencydecreasedC. BecausestoragecapacityincreasedfasterthanthroughputD. Becauselatencydecreasedfasterthanthroughputincreased

8. Threeoftheseconceptsplaysimilarrolesinadatabase,butnotthefourthone.Spottheoddoneout.A. FieldB. DocumentC. RecordD. Row

9. Whichoneoftheseconceptsisnotcommonlyusedasasynonymfor"field"?A. PrimarykeyB. AttributeC. PropertyD. Column

10. Whatdoesnotdefinearelationoversets(underlyingrelationaldatabases)?

A. AblackboxthatreturnsaBooleanforanytuplebuiltwithanelementfromeachoneofthesesets.

B. AsubsetoftheCartesianproductofthesetsC. TheintersectionofthesetsD. Asymmetricgeneralizationoftheconceptoffunction,inwhichelementsinthefirstset

canberelatedtoseveralelementsinothersets.

11. Whichoneoftheserelationalalgebraoperationsisconsideredtobethemostexpensive?A. SelectB. Count(withoutgrouping)C. JoinD. Project

Considerthefollowingtables.Table1

StudentID Name Firstname LectureID Lecturename1 Einstein Albert A Introductiontorelativity2 Turing Alan A Introductiontorelativity1 Einstein Albert B Completeness2 Turing Alan C Databases

Table2

ID Name Firstname Lectures1 Einstein Albert ID:A

Name:IntroductiontorelativityID:BName:Completeness

2 Turing Alan ID:AName:IntroductiontorelativityID:CName:Databases

12. Whichoneofthesetablesisnotinanynormalform?

A. Table1onlyB. Table2onlyC. Table1andtable2D. None

13. Whichoneofthesetablesisinfirstnormalform?

A. Table1onlyB. Table2onlyC. Table1andtable2D. None

14. Whichoneofthesetablesisinsecondnormalform?

A. Table1onlyB. Table2onlyC. Table1andtable2D. None

15. Whatisthemostcompellingreasonfordenormalizingdata?

A. SothattheschematakeslessspaceB. SothatqueriesarefasterC. SothatthedatatakeslessspaceD. Sothatqueriesareshorter

16. Whichoneofthesequeriesperformsaselection?

A. SELECTfirst,lastFROMpeopleB. SELECT*WHEREzipcode=8092FROMpeopleC. SELECTlast,COUNT(first)FROMpeopleGROUPBYlastD. Allofthethree

17. Whichpartofaquerylanguagedealswithschemas?

A. DataDefinitionLanguage(DDL)B. CRUDC. DataManipulationLanguage(DML)D. FLWOR

18. IntheACIDpropertiesandtheCAPtheorem,CforconsistencymeansthesamekindofconsistencyinbothACIDandCAP.Trueorfalse?A. ItdependsonthesizeofthedatabaseB. TrueC. ItdependsontheunderlyingparadigmD. False

19. WhatdoesPmeaninCAP?

A. PartitiontoleranceB. PartitionavailabilityC. PartitionconsistencyD. Partitionfailure

20. Whatisthecommonnamefordatabaseswithawrite-intensiveworkload,regardlessoftheparadigm?A. OLAPB. ROLTPC. OLTPD. ROLAP

21. Whichoneofthefollowingisnotastoragetechnology?

A. S3B. MapReduceC. AlocalfilesystemD. HDFS

22. Anencodingisafunctionthat...

A. MapscharacterstocharactersB. MapscharacterstobitsC. MapsbitstobitsD. Mapsadatamodeltocharacters

23. Whichoneofthefollowingisnotasyntax(inthesenseofdatasyntax)?

A. JSONB. XMLC. ASCIID. XBRL

24. Whatis,amongthesenumbers,themaximumnumberoffilesthattypicallyfitsonafilesystem(localorHDFS)?A. 10,000B. 1,000,000C. 100,000D. 10,000,000,000

25. Whatisthebasicprinciplebehindtheuseofhardwareforlargeamountsofdata?

A. Buyregularservers,butwithverylargeandmoreexpensiveharddisksB. Buyserverswithveryhigh-endCPUandregularharddisksC. BuyserverswithhundredsofTerabytesofRAManduseitinsteadofadisktostorefilesD. Buylotsofcheap,commodityhardware

26. Whichismoresustainableformanagingnodesinaclusteraslargeramountsofdataareinvolved?A. NeitherscalingoutnorscalinguphelpsB. ScalingoutC. ScalingupandscalingoutareidenticalD. Scalingup

27. "Youcanhaveasecondcomputeronceyou’veshownyouknowhowtousethefirstone."ThisquotebyPaulBarhamisverywise,butwhatdoesitmeanforadatascientist?A. CloudprovidersrequirepassinganexambeforetheyletauserscaleupHadoopto

severalnodesinacluster.B. ThisisirrelevanttoBigData.C. Allnodesinadataclusterarethesame.D. Youshouldfirsttrytomakealldatafitonasinglemachinebywritingbettercode,

beforemovingtoaclusterwithmultiplenodes.

28. Whichoneofthesestatisticsisclosesttowhatiscommonlydoneindatacentersinasinglecluster?A. Upto1,000,000machineswitheach100coresand10TBofRAMB. Upto10,000machineswitheach32coresand100GBofRAMC. Upto1000machineswitheach1coreand100MBofRAMD. Upto10machineswitheach10000coresand10PBofRAM

29. Whatarerackunits?

A. Thisisastandardizedunitofmeasurementfortheheightofnodesinarack.B. ThisisaunitofCPUperformancethatistunedforBigDataapplications.C. Theseareservernodes.D. Thisisasynonymforracks.

30. WhichoneoftheseisnothardwarecommonlyfoundinracksusedforHDFS?

A. RAIDcontrollersB. StoragenodesC. NetworkswitchnodesD. Servernodes

31. HowdoesoneidentifyanobjectinS3?

A. Withafilesystempath.B. Withabucketnameandanobjectname.C. Withadatacenteraddressandageographicregion.D. WithaUUID.

32. AmazondocumentstheavailabilityofS3as99.99%.Howmuchtimeinayearcantheyaffordtobedown?A. 5.2minutesB. 52minutesC. 52hoursD. 52seconds

33. WhichoneoftheseisnotanHTTPmethodusedinRESTAPIs?

A. POSTB. REMOVEC. GETD. PUT

34. Whyisitcommonpracticetoreplicatedataacrossdifferentregionsoftheglobe?

A. Tomaximizelatency.B. Tomakesurenonodefailureshappenatall.C. Toberesilienttonodefailures.D. Toberesilientincaseofanaturalcatastrophe.

35. Whichhasthelowestlatency?

A. AlocalMongoDBpointqueryB. GettinganobjectfromAmazonS3overtheAtlanticC. TheybothhavethesameorderofmagnitudeoflatencyD. Latencyisirrelevanttoretrievingdata,itonlymattersforhostingwebsites.

36. WhichofthesesettingscanHDFSnothandlewell?A. BillionsoffilesB. Afewterabyte-sizedfilesC. Afewpetabyte-sizedfilesD. Millionsoffiles

37. Whichoneofthesereasonsismostcompellingtodemonstratetheimportanceofmanagingnodefailureswhenqueryinglargedatasets?A. Becauseanodeneverfailsalone:thewholerackthenfailsB. BecausewecanC. Becauseinlarge-sizeclusters,itisalmostguaranteedthatsomenodeswillfailfrom

timetotime.D. BecausefailurerecoverycomesforfreeusingRAIDstorageinclusters

38. WhichoneofthesefeaturesdoesHDFSnotsupportefficiently?

A. ReplicationB. RandomaccesstoapartofafileC. SplittingafileacrossnodesD. Downloadingamedium-sizedfiletoalocalfolder

39. Whichwasinventedfirst:HDFS(Hadoop)orGFS(Google)?

A. GFSB. BothwereinventedatthesametimeC. HDFSD. Theyarethesametechnology,sothisquestiondoesnotmakesense

40. WhatisatypicalblocksizeforHDFS?

A. 500kBB. 4kBC. 1TBD. 128MB

41. WhicharchitecturedoesanHDFSclusterusetoconnectthenodesonalogicallevel?A. ArandomlygeneratedtopologyB. Master-slaveC. Peer-to-peerD. Snow-flake

42. Onwhichnode(s)isthefilenamespacestoredinHFDS?

A. OntheNameNodeB. OnaseparateZooKeeperserviceC. OnallDataNodesD. Ontheclient

43. Howdoesaclientfetchthecontents(bits)ofanHDFSfile,ingeneral?

A. ViatheNameNodeB. ViaadedicatedstreamingservicethathidesthecomplexityofthearchitectureC. DirectlytoandfromDataNodesD. Withdirectaccesstothenodes'localfilesystem

44. HowdoestheDataNodeprotocolwork?

A. TheNameNodealwaysinitiatestheconnection,andtheDataNodesonlyanswerB. TheDataNodealwaysinitiatestheconnection,andtheNameNodeonlyanswersC. TheclientinitiatestheconnectiontotheDataNodeD. BoththeNameNodeandtheDataNodemayinitiatetheconnection

45. WhatisthedefaultandcommonlyusedplacementstrategyforreplicatingblocksinHDFS?Thefirstreplicaisonthesamenodeastheclientifpossible,thesecondreplicaisonanodeonadifferentrack,andthethirdreplicais...A. Inadifferentdatacenter.B. Onthesamenodeasthesecondreplica.C. Onadifferentnodeonthesamerackasthesecondreplica.D. Onadifferentnodeonthesamerackasthefirstreplica.

46. InHDFS,whyisthesecondreplica,bydefault,onanodeonadifferentrack?

A. Becauseitwasdesignedarbitrarilythatwayandnobodychangedit.B. Becauseitreduceslatency.C. Becauseitavoidsaconcentrationofblocksonthesamerack.D. Becauseitisresilientagainstanaturalcatastrophe.

47. WhichoneoftheseisnotpersistedonNameNodes?

A. ThefilenamespaceB. ThemappingfromfilestoblocksC. ThemappingfromblockstoDataNodesD. Theyallarepersisted

48. Whatisthemostcompellingreasonwhy,inHDFS,therearebackupNameNodesthatcantakeoverincaseofafailure?A. Becauseitprovidesaddedsecurity.B. BecauseitmakesHDFSmoreresilientagainstdataloss.C. Becauseitwouldbetoomuchtohandleforasinglenode.D. Becauseitcantakeasmuchas30minutestorestartafailedNameNode.

49. Columnstoreshaveadatamodelbasedon...

A. GraphsB. CubesC. TreesD. Tables

50. Whichoneofthesetechnologiesisnotacolumnstore?

A. neo4jB. GoogleBigTableC. CassandraD. ApacheHBase

51. HowdowidecolumnstoreslikeHBaseavoidjoins?A. Byde-normalizingtablesandstoringtogetherwhatisaccessedtogetherB. Theydonot,theynativelyandefficientlysupportjoinsC. Withindex-freeadjacencyD. Byusingtreestructurestoembedpre-computedjoins

52. WhatdorelationaldatabasesandawidecolumnstorelikeHBasenothaveincommon?A. RowsB. ProjectionandselectionC. ColumnfamiliesD. Columns

53. WhatareregionsinHBase?

A. Asynonymforcolumnfamilies,whicharespreadacrossmachinesB. Asynonymforcolumnfamilies,whicharespreadacrossgeographiclocationsover

severalcontinentsC. Agroupofcontiguousrows,identifiedwithaninclusiveminimumandanexclusive

maximumD. Agroupofcontiguousrows,identifiedwithanexclusiveminimumandaninclusive

maximum

54. WhenHBaseisusedwithHDFS,RegionServersareoftenco-locatedwith...A. NameNodesB. DataNodesC. SecondaryNameNodesD. BackupNameNodes

55. InHBase,whatisthegranularityofthephysicalstoragelayer(HFile)withrespecttothedatamodel?A. AnHFilestoresacolumnfamilywithinaregionB. AnHFilestoresonerowC. AnHFilestoresawholeregionD. AnHFilestoresawholecolumnfamily

56. HBasecanrunonHDFS,whichhashighlatency.HowcomeHBasestillhaslowlatency,allowingreal-timequeries?A. Lowlatencyappliestofiles,butnottotablesbecauseofthedataindependence

paradigmB. Queriesarepre-compiledandoptimizedC. TheRegionServersstoredataredundantlyandlocallyonRAIDD. ThedataisheldinmemoryontheRegionServers(MemStore,cache)

57. ImagineanHBasetablewithahistoryofreadandwritequeries.Nowanunanticipatedreadquerycomesin.WheredoesHBaseneedtoretrievematchingdata(KeyValues)from?A. BoththememoryandthefilesystemB. TheCPUcacheonlyC. ThefilesystemonlyD. Thememoryonly

58. WhatisaBloomfilter?

A. ItisadatastructurethatcantellwithcertaintyifanelementisorisnotinasetB. ItisadatastructurethatcantellwithariskoferrorifanelementisorisnotinasetC. Itisadatastructurethatcantellwithcertaintythatanelementisinaset,butwitha

riskoferrorthatanelementisnotinaset.D. Itisadatastructurethatcantellwithariskoferrorthatanelementisinaset,andwith

certaintythatanelementisnotinaset.

59. InMapReduce,theshufflingphasetakesplace...A. BeforeandrightaftermappingB. BeforemappingC. AftermappingandbeforereducingD. Afterreducing

60. Fromadatamodelperspectiveandingeneral,MapReducemanipulates,asinputandoutput:A. Bagsofkey-valuepairsB. BagsofgraphedgesC. BagsofindividualvaluesD. Bagsoftrees

61. InMapReduce,whyisitnecessarytosortandpartitiontheoutputofthemappersbeforeitcanbeprocessedbythereducers?A. BecauseitmakesitfasterB. BecauseallvalueswiththesamekeymustbeprocessedbythesamereducerC. BecausethedataisnotreplicatedD. Becausethedataisnotshardedintosplitsuntilafterthemappingphaseiscompleted

62. WhichoneoftheseinputformatscannotbeprocessedbyMapReduceatall?

A. UnstructuredlinesoftextB. TablesC. Key-valuepairsD. Noneofthem:allthreeformatsareinfactsupportedbyMapReduce

63. WhichoneofthesecriteriaisnotrequiredforthereducefunctiontobealsousedasacombinerfunctioninMapReduce?A. Thefunctionmustmapkey-valuespairstokey-valuepairsB. ThefunctionmustbecommutativeC. ThefunctionmustbeassociativeD. Thefunctionmustbeidempotent

64. CanitbethattheboundariesofaMapReducesplitdonotmatchtheboundariesoftheunderlyingHDFSblocks?A. Yes,andthisissolvedatthesoftware(Java)levelbystartingreadingfromthenext

block,evenifitisstoredonadifferentnode.B. Yes,butthisisrareandinthiscaseMapReducecannotrunefficientlyandthedatamay

needtobesplitagain.C. No,thisisarequirementD. No,itcannothappenbecauseofthewaysplitsareconstructedfromblocks.

65. WhathappensinMapReduceiftheproducedpairsexceedsomepercentageofthebufferinmemory?A. ThepairsaretransferredtothenearestmappingnodeB. ThepairsarespilledtothelocaldiskC. ThepairsareforciblysenttothereducerD. Thejobfailsandmustberestartedwithsmallersplits.

66. WhatissueofMapReduce1doesthenewYARN-basedMapReduce2notsolve?

A. Theutilizationoftaskslotshadtobestaticallydeterminedinadvance.B. Ajobhadtohaveamap-shuffle-reducepattern,sothatsomequeriesneededseveral

jobs.C. TheJobTrackerwasabottleneck.D. TheJobTrackerhadtoomany,diverseresponsibilitiesonitsplate.

67. InYARN,whatisthenameofthecontainerthatisassignedtheresponsibilitytohandleallothercontainersforanapplication?A. ApplicationMasterB. ResourceManagerC. NodeManagerD. Scheduler

68. WhichofthefollowingisnotastandardschedulingstrategyforYARN?

A. CapacitySchedulerB. FairSchedulerC. FIFOSchedulerD. LatencyScheduler

69. Whatalgorithminvolvesapplyingthisformulauntilitconverges?

A. PageRankB. AjoinC. CapacitySchedulingD. Tabledenormalization

70. WhattriggerstheevaluationofRDDsinSpark?

A. AtransformationB. AcreationC. AnactionD. Allofthethree

!"#$ = &0.85+ + 0.15 .1 … 1… … …1 … 1

01!"

71. WhatdoesthefollowingSparkqueryoutput?

val rdd1 = sc.parallelize( List("1,2,3,4,5,6", "3,4,5,6,7,8", "6,7,8,9,10") ) val rdd2 = rdd1.flatMap( value => value.split(",") ) rdd2.countByValue()

A. (1,1),(1,2),(2,3),(2,4),(2,5),(3,6),(2,7),(1,8),(1,9),(1,10)B. 1,1,2,2,2,3,2,1,1,1C. (1,1),(2,1),(3,2),(4,2),(5,2),(6,3),(7,2),(8,1),(9,1),(10,1)D. 1,2,3,4,5,6,3,4,5,6,7,8,6,7,8,9,10

72. Whichofthefollowingisnotatransformation?

A. flatMapB. intersectionC. filterD. count

73. WhichSparktransformationismostsuitabletoemulatearelationalprojection?

A. ReduceB. MapC. SampleD. Filter

74. WhatisastageinSpark?

A. It'sthephysicalimplementationofanRDDB. It'sagroupoftaskswiththesamekind(allcreations,alltransformationsorallactions)C. It'sasynonymfortransformationD. It'sagroupoftransformationsthatdoesnotneedshuffling,sothatthesamegroupof

machinescanbereusedtosavenetworkbandwidth

75. Atwo-wayjoininvolvesalotofshufflingfrombothitsinputs.Howcanthismostlikelybeoptimizedingeneral,forasinglejobexecution?A. Byadoptingthepartitioncriterioninanothershufflehappeningearlierinoneofthe

inputs.B. Itcannot,joinsareexpensiveandrequireshuffingonbothsidesanyway.C. Byusingstages.D. BypersistingbothinputRDDs.

/* Under an Apache 2.0 License from Spark */ val lines = spark.read.textFile(args(0)).rdd val links = lines.map{ s => val parts = s.split("\\s+") (parts(0), parts(1)) }.distinct().groupByKey().cache() var ranks = links.mapValues(v => 1.0) val contribs = links.join(ranks).values.flatMap{ case (urls, rank) => val size = urls.size urls.map(url => (url, rank / size)) } ranks = contribs.reduceByKey(_ + _).mapValues(0.15 + 0.85 * _) val output = ranks.collect() output.foreach(tup => println(tup._1 + " has rank: " + tup._2 + "."))

76. WhichonesofthefollowinggraphsdepictstheDAGofRDDsfortheabovePageRankjob,includingtheirmostaccurateandlikelygroupinginstages?A.

B.

C.

D.

77. Whichisthemostcorrectassociationbetweenthesedatashapesandsyntaxes?A.

Tables CSVTrees RDFGraphs XMLCubes XBRL

B. Tables XBRLTrees RDFGraphs XMLCubes CSV

C. Tables CSVTrees XMLGraphs RDFCubes XBRL

D.

Tables XBRLTrees XMLGraphs RDFCubes CSV

78. Aresemi-structureddocumentsinanynormalform?

A. Yes,inthefirstnormalformonly.B. Yes,inthethirdnormalform.C. No.D. Yes,inthesecondnormalformbutnotinthethird.

79. WhatisthemostadequateadjectivetodescribeadocumentthatsuccessfullyconformstotheXMLspecification?A. Well-formedB. CorrectC. ValidD. Anyoftheabove

80. Whichoneofthesedocumentsiswell-formed?

A. <Exam xmlns="http://www.ethz.ch/BigData"> <Questions> <Question id="xml"> Which one of these documents in well-formed (including namespace-well-formedness)? <Answers> <1stAnswer>The answer A</1stAnswer> <2ndAnswer>The answer B</2ndAnswer> <3ndAnswer>The answer C</3rdAnswer> <4thAnswer>The answer D</4stAnswer> </Answers> </Question> </Questions> </Exam>

B. <Exam xmlns="http://www.ethz.ch/BigData"> <Questions> <Question id="xml"> <Title>Which one of these documents in well-formed (including namespace-well-formedness)?<Title> <Answers> <1stAnswer>The answer A</1stAnswer> <2ndAnswer>The answer B</2ndAnswer> <3ndAnswer>The answer C</3rdAnswer> <4thAnswer>The answer D</4stAnswer> </Answers> </Question> </Questions> </Exam>

C. <Exam xmlns="http://www.ethz.ch/BigData"> <Questions> <Question id="xml"> <Title>Which one of these documents in well-formed (including namespace-well-formedness)?<Title> <Answers> <Answer>The answer A</Answer> <Answer>The answer B</Answer> <Answer>The answer C</Answer> <Answer>The answer D</Answer> </Answers> </Question> </Questions> </Exam>

D. <Exam xmlns="http://www.ethz.ch/BigData"> <Questions> <Question id="xml"> Which one of these documents in well-formed (including namespace-well-formedness)? <Answers> <Answer>The answer A</Answer> <Answer>The answer B</Answer> <Answer>The answer C</Answer> <Answer>The answer D</Answer> </Answers> </Question> </Questions> </Exam>

81. Whichoneofthesestatementsisnottrue?A. AnXMLdocumentmustcontainexactlyonetop-levelelement.B. Atop-levelXMLelementmustcontainanattributecalled"doctype".C. AnXMLdocumentmaycontainaDOCTYPEevenifthereisnointernalDTDsubset.D. AnXMLdocumentmaycontainanynumberofcommentsatthetop-level.

82. WhichisthemostadequatedatastructuretodescribetheattributesofanXMLelement?A. AlistofpairsB. AmultisetC. AsetD. Anassociativearray

83. Howmustanampersand(&)beescapedinelementcontent,inXML?

A. &amp;B. &ampersand;C. %amp;D. %ampersand;

84. WhichoneofthesetypesisnotsupportednativelybyJSON?A. StringB. NumbersC. QNamesD. Arrays

85. WhichofthefollowingisnotadatamodelforXMLdocuments?A. XPathandXQueryDataModel(XDM)B. XMLInformationSet(Infoset)C. Post-Schema-ValidationInfoset(PSVI)D. AllthreeareXMLdatamodels

86. WhichofthesestatementsholdsfortheXPathandXQueryDataModel?

A. Sequencescanbenested,asequenceofoneitemisnotthesameasthatitemB. Sequencesarealwaysflat,asequenceofoneitemisidentifiedwiththatitemC. Sequencescanbenested,asequenceofoneitemisidentifiedwiththatitemD. Sequencesarealwaysflat,asequenceofoneitemisnotthesameasthatitem

87. WhichoneofthefollowingisnotanatomictypeinXML?

A. ElementnodeB. StringC. DateD. Integer

88. CompletethistablewithtypecardinalitiesinXQuery:

Cardinality Symbol Example(i) "*" xs:integer*(ii) "+" node()+(iii) "?" xs:boolean?(iv) none xs:string

A. (i)Oneormore(ii)Zeroormore(iii)Zeroorone(iv)oneB. (i)Zeroormore(ii)Oneormore(iii)Zeroorone(iv)oneC. (i)Oneormore(ii)Zeroormore(iii)Zero(iv)ZerooroneD. (i)Zeroormore(ii)Oneormore(iii)Zero(iv)Zeroorone

89. CananXMLdocumentbevalidagainstaschemaifitisnotwell-formed?

A. Yes,itcanhappen.B. Itdependsontheschema.C. No,validityandwell-formednessistheoneandsamething.D. No,never,butadocumentcanbewell-formedandnotvalid.

90. Whichonesofthefollowingdocumentsisvalidagainstthefollowingschema?<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="foo"> <xs:complexType> <xs:sequence> <xs:element name="element" type="xs:date"/> <xs:element name="other-element" type="xs:date" minOccurs="0"/> <xs:element name="bar" type="xs:boolean" maxOccurs="2"/> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>

A. <foo> <element>2017-02-09</element> <bar>true</bar> <bar>false</bar> </foo>

B. <foo> <element>today</element> <other-element>true</other-element> <other-element>string</other-element> <bar>false</bar> </foo>

C. <foo> <element>2017-02-09</element> <other-element>true</other-element> <other-element>string</other-element> <bar>false</bar> </foo>

D.

<foo> <element>today</element> <bar>true</bar> <bar>false</bar> </foo>

Givenisthefollowingdocumentstoredas"books.xml".

<!DOCTYPE books> <books> <book year="1998"> <title>Digital Fortress</title> <author>Dan Brown</author> </book> <book year="2001"> <title>Deception Point</title> <author>Dan Brown</author> </book> <book year="2000"> <title>Angels and Demons</title> <author>Dan Brown</author> </book> <book year="2003"> <title>The Da Vinci Code</title> <author>Dan Brown</author> </book> <book year="2009"> <title>The Lost Symbol</title> <author>Dan Brown</author> </book> <book year="2013"> <title>Inferno</title> <author>Dan Brown</author> </book> <book year="2017"> <title>Origin</title> <author>Dan Brown</author> </book> <book year="2999"> <title>Timeline</title> <author>Michael Crichton</author> </book> <book year="2002"> <title>Prey</title> <author>Michael Crichton</author> </book> <book year="2004"> <title>State of fear</title> <author>Michael Crichton</author> </book> <book year="2006"> <title>Next</title> <author>Michael Crichton</author> </book> <book year="2009"> <title>Pirate Latitudes</title> <author>Michael Crichton</author> </book> </books>

91. Whatdoesthefollowingqueryoutput?

(for $book in doc("books.xml")/books/book group by $year := $book/@year let $count := count($book) order by $count descending return $year)[1]

A. 2B. <bookyear="2009"><title>PirateLatitudes</title><author>Michael

Crichton</author></book>C. 2009D. 2999

92. WhichXPathqueryoutputstheonebookin"books.xml"withamistakenpublicationdate?A. doc("books.xml")/books/book[xs:integer(@year) gt 2100] B. doc("books.xml")/@year/data()[. gt 2100] C. doc("books.xml")/books/book/@year/data()[. gt 2100] D. doc("books.xml")[xs:integer(@year/data()) gt 2100]

93. Whichlinesofthefollowingschemaneedtobecorrectedforittosuccessfullyvalidatetheinstance"books.xml"?

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="books"> <xs:complexType> (1) <xs:sequence> <xs:element name="book" maxOccurs="unbounded"> (2) <xs:complexType> (1) <xs:sequence> <xs:element name="title" type="xs:string"/> (3) <xs:element name="author" type="xs:string"/> </xs:sequence> <xs:attribute name="year" type="xs:date"/> (4) </xs:complexType> (1) </xs:element> (2) </xs:sequence> </xs:complexType> (1) </xs:element> </xs:schema>

A. Theonemarkedwith(4)B. Allthosemarkedwith(1)C. Allthosemarkedwith(2)D. Theonemarkedwith(3)

94. Whichoneofthoseisnotabuilt-infunctioninXQuery,i.e.,whichofthosecanbeusedasakeywordwithoutparentheses?A. sumB. notC. trueD. and

95. Whichoneofthesestatementsdoesnotholdforbothvaluecomparisons(eq,ne,etc.)andgeneralcomparisons(=,<,etc.)?A. BotharetransitiveB. Bothtakesequencesofitemsasleft-hand-sideandright-hand-sideinputsC. BothcanthrowanerrorincaseofmismatchingtypesD. Bothcanoutputtrue,falseorthrowanerror.

96. Whichofthefollowingisnotacharacterencoding?

A. ISOLatin1B. ASCIIC. XMLD. UTF-8

97. Whichoneofthesemappingsfromtree-shapeddata,suchasXML,torelationaltables,alwaysrequirespriorknowledgeaboutthedatasuchasfieldnamesandstructurallayout?A. Schema-basedshreddingB. TreeencodingC. EdgeshreddingD. Noneofthemrequirespriorknowledge

98. AssumeaverylargeMongoDBcollectionnamed"books"thathasatreeindex

{ "Title": 1, "Year": -1}

Whichoneofthesequeriescannotbeexecutedefficientlyusingthisindex?

A. db.books.find({"Title":"Inferno"})B. db.books.find({"Year":2016})C. db.books.find({"Title":"Inferno","Year":2016,"Author":"DanBrown"})D. db.books.find({"Year":2016,"Title":"Inferno"})

99. InMongoDB,doeseachreplicasetcontainallofthedatainacollection?

A. Never.B. Always.C. Itdependsontheschemaofthecollection.D. Notifthereismorethanonereplicaset.

100. WhatbestdescribesawriteconcerninMongoDB?A. ItistheprotocolforwritingdatatoMongoDB,whichisablockingcalluntilatleasta

certainnumberofreplicashaveacknowledgedthatthedatawaswritten.B. Itistherequirementthatanyquerywritingtoacollectionmustfollowtheschemaof

thiscollection.C. ItisthesituationinwhichMongoDBstartsbeingemotionalabouttoolargeanamount

ofqueriessentintoolittletime.D. Itistheriskthatdatagetslostincaseaservergoesdown.

101. Whichofthesenodelabellingschemesfortreesisefficientforinsertingnewnodesinthemiddleofexistingsiblings?A. ORDPATHIDsB. IntegerIDsC. DeweyIDsD. Theyareallefficientforinsertions.

102. Whichisnotawayofavoidingexpensivejoins?

A. Pre-computingajoininadvanceifitwillbeneededoftenB. Index-freeadjacency,withforeignkeyreferencesstoredasdirectpointersinmemoryC. DenormalizingthedataD. Storingthedatainthirdnormalform

103. Labeledpropertygraphs,onalogicallevel,aremadeofnodes,properties,labelsand...A. nothingelseB. edgesC. tablesD. matrices

104. Whatisthissyntax?

@prefix buildings: <http://www.ethz.ch/buildings#> . @prefix lectures: <http://www.ethz.ch/lectures#> . @prefix exams: <http://www.ethz.ch/exams#> . @prefix properties: <http://www.ethz.ch/properties#> . exam:big-data properties:is-located buildings:HIL; properties:has-attendence 146. lectures:big-data properties:is-located buildings:HG, buildings:CAB.

A. TurtleB. XMLC. JSOND. XBRL

105. Whichlanguagesaremostappropriateforqueryinggraphdatabases?A. XQuery,JSONiqB. SQLC. Cypher,SPARQLD. Java

106. Whichtechnologiesdomostlikelynotuseamaster-slavearchitecture?

A. Aclassicrelationaldatabaseofthe1990ssupportingROLAPB. Adistributedfilesystemfromthe2000slikeHDFSC. Adocumentstorefromthe2010sD. Agraphdatabaselikeneo4j

107. Whichbestdescribeshowthegraphstructureisstoredinternallyinneo4j?

A. Withalotofdouble-linkedlistsB. WithalotofbinaryencodingsC. WithalotoftablesD. Withalotofmatrices

108. WhichofthesepropertiesleastqualifiesOLAP,asopposedtoOLTP?

A. RedundancyB. Fully,sub-secondinteractiveC. Lotsofreads,fewtonowritesD. Analysisoverbigchunksofdata

109. WhatdoesETLmean?

A. Extract,transform,loadB. Extract,transfer,loadC. Export,transform,loadD. Export,transfer,load

110. Whataretheprimaryactionsonadatacube?

A. NavigateB. SelectandprojectC. JoinandgroupD. Sliceanddice

111. HowaretreeandDAGstructuresstoredintheXBRLsyntax?

A. UsingJSON'snativetreestructureB. UsingXBRL'snativetreestructureC. Usingflatlistsofnodesandedges(XLink)D. TherearenotreestructuresinXBRL,onlydatacubes

Thispageleftblankintentionally