9
IT 241 Information Discovery Fall 2013 Exam 1 Page 1 Thursday, Sept. 26, 2013 Name _______Key______________________ [12 pts] 1. Below is one of the visualization pipelines from the text. For each of the following concepts, list which stage in the pipeline or interaction/actions would the concept be best associated. Handling missing values ____data transform_______ Normalizing values _____ data transform _______ Relational database tables ______raw data_____________ A CSV file ________ raw data or data tables ____ A scatterplot ______visual structure_________________ Selecting datapoints in a parallel coordinates graph _____viz or view transformations____________ 2. Name four dimensions of data displayed in Minard’s map of Napoleon’s march to Moscow and back (see back page). [4 pts] _____army size__________ ___________direction_________________ ______location_______________ _______temperature and date____________ 3. A graphic can be classified as an exploratory visualization, an explanatory visualization or an example of visual art. [8 pts] a. Explain the difference between exploratory and explanatory visualization. Exploratory—the viz just displays the data without trying to bias the viewer Explanatory – the viz displays data with an attempt to convince the user

Juniata Collegefaculty.juniata.edu/rhodes/ida/exams/exam1f13key.docx · Web viewThe value 1010 0101 in binary is ___165__ in decimal and its corresponding hexadecimal digits are _A5__

  • Upload
    others

  • View
    13

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Juniata Collegefaculty.juniata.edu/rhodes/ida/exams/exam1f13key.docx · Web viewThe value 1010 0101 in binary is ___165__ in decimal and its corresponding hexadecimal digits are _A5__

IT 241 Information Discovery Fall 2013 Exam 1 Page 1

Thursday, Sept. 26, 2013Name _______Key______________________

[12 pts]1. Below is one of the visualization pipelines from the text.

For each of the following concepts, list which stage in the pipeline or interaction/actions would the concept be best associated.

Handling missing values ____data transform_______ Normalizing values _____ data transform _______

Relational database tables ______raw data_____________

A CSV file ________ raw data or data tables ____ A scatterplot ______visual structure_________________

Selecting datapoints in a parallel coordinates graph _____viz or view transformations____________

2. Name four dimensions of data displayed in Minard’s map of Napoleon’s march to Moscow and back (see back page). [4 pts]

_____army size__________ ___________direction_________________

______location_______________ _______temperature and date____________

3. A graphic can be classified as an exploratory visualization, an explanatory visualization or an example of visual art.[8 pts]

a. Explain the difference between exploratory and explanatory visualization.Exploratory—the viz just displays the data without trying to bias the viewerExplanatory – the viz displays data with an attempt to convince the user about some aspect or information interpretation

b. Classify each of these visualizations as one of the three

Nightengale’s rose petal of the cause of death in the army _____explanatory___________

A scatterplot of car horsepower versus gas efficiency ________exploratory but also explanatory__________

The map of the internet (see last page) ______visual art____________

A matrix of scatterplots from the US census ________exploratory____________

Page 2: Juniata Collegefaculty.juniata.edu/rhodes/ida/exams/exam1f13key.docx · Web viewThe value 1010 0101 in binary is ___165__ in decimal and its corresponding hexadecimal digits are _A5__

IT 241 Information Discovery Fall 2013 Exam 1 Page 2

[20 pts]4. Data coding

a. The value 1010 0101 in binary is ___165__ in decimal and its corresponding hexadecimal digits are _A5__.

Converting decimal 42 to binary becomes __101010___ .

If the 8 bit ASCII codes in decimal for “A” and “a” are 65 and 97, respectively, then the decimal ASCII codes for the string “Bad” are ____66 97 100_____.

If we want to store 42 unique values, we would need at least __6_ bits to represent those values.

b. A 800 x 500 pixel color image coded in RGB (+ 1 alpha byte) format requires 800*500*4 = 1,600,000_bytes.

c. Compression techniques are used for images, audio and video. What does is mean if the compression is “lossy”?After the compression, information is lost, so the original image or sound cannot be exactly reproduced

d. A 20 second stereo sound clip (2 channels) sampled at 48000 samples per second with a 16 bit depth will

result in storing ____20*2*48000*16/8= 3,840,000____ bytes.

e. What are the colors for these RGB hexadecimal encodings?

000000 = ____black_____ 555555 = _______grey_________

0000FF = ____blue________ 00FF00 = ______green_________

f. If your data is simply a table of data with rows and columns, then what editable file structure is appropriate?

___CSV_ (choose from XML, CSV, BMP, XLS)

If your data contains elements that have sub-elements, what editable data file structure would be appropriate?

______XML_____ (XML, CSV, BMP, XLS)

5. Plot on the number line with small circles this set of 10 univariate numbers {45, 55, 63, 67, 68, 74, 80, 84, 90, 95} then superimpose a Tukey box plot representing the median and 25th and 75th percentiles.

[16 pts]

┼────┼────┼────┼────┼────┼────┼────┼────┼────┼────┼────┼40 45 50 55 60 65 70 75 80 85 90 95

What are these data’s mean? _72.1___

If a standard deviation spans the middle 70% of the data above and below the mean, estimate its value __~20___

If we normalize this data set to fall between 0 and 100, then these 4 values are recoded as…

45-> __0___, 55-> ___20_____, 90->___90___ and 95-> __100____

x x | x x x | x x x | x x

Page 3: Juniata Collegefaculty.juniata.edu/rhodes/ida/exams/exam1f13key.docx · Web viewThe value 1010 0101 in binary is ___165__ in decimal and its corresponding hexadecimal digits are _A5__

IT 241 Information Discovery Fall 2013 Exam 1 Page 3

6. We saw the following relational database SQL query in class. Fill in the blanks below regarding the query.[4 pts]

SELECT S.lastName, S.firstName, S.major FROM Student S, Enroll EWHERE E.grade<='B' AND E.stuId=S.stuId

Describe the result set of this query, i.e., what rows of data would come from the application of the query?

List of student names and majors of those who have received the grade of B or A in any course (A is less than B).

7. Describe three possible ways to handle missing data in a data set.[6 pts]

a. Delete the data observation

b. Code the missing data with a sentinel or flag value

c. Substitute the mean of the column OR a similar observation’s value

8. The following attributes are found in a daily weather data set in various towns or counties in some state or province. Associate the best descriptor for each attribute. If you do not understand the meaning of the attribute, please ask for clarification.

Choices: Nominal-Categorical (Cat), Nominal-Arbitrary (no categories) (Arb), Ordinal Continuous (Cont), Ordinal Discrete (Disc),Spatial/geographic (Geo), Temporal/Time (Time)

[8 pts]

Date TimeDay of Week (Mon, Tue, etc) CatLatitude-Longitude GeoCounty/town name ArbHigh temperature in F ContRainfall in mm ContNumber of highway fatalities DiscProminent cloud type (choice of 6 types + ‘clear’)

Cat

Page 4: Juniata Collegefaculty.juniata.edu/rhodes/ida/exams/exam1f13key.docx · Web viewThe value 1010 0101 in binary is ___165__ in decimal and its corresponding hexadecimal digits are _A5__

IT 241 Information Discovery Fall 2013 Exam 1 Page 4

9. True/false miscellany.

[22 pts]

__T/F__ Statistics can be computed data columns are ordinal.

__ T __ Nominal-categorical can be converted to ordinal when the nominal data is ordered.

__ T __ Nominal-categorical unordered only if the can be converted to a series of binary attributes.

__ F __ A Likert scale is unordered.

__ F __ Calculating a mode statistic can be directly applied to continuous data.

__ F __ Bins with the ranges [80,90) and [90,100], puts the value 90 into the [80,90) bin.

__ F __ Frequency counts are appropriate only for data in discrete ordinal form.

__ T __ The correlation statistic is a value in the range [-1,1]

__ T __ A correlation of 0.9 is one that says the two attributes have value pairs that agree roughly when relatively high or low.

__ F __ A correlation close to zero means that both attributes can be ignored.

__ F __ Linear regression attempts to fit a line to the data that maximizes the y-distance between the line and the data points.

__ F __ Linear regression is limited to 1 dependent variable and 1 independent variable.

__ F __ Attributes normally correspond to rows and tuples correspond to columns in relational databases.

__ T __ Relational databases relate data by common values between tables.

__ T __ CSV files should have the same number of values in each row.

__ F __ CSV files may not use the first row as column names.

__ F __ Since commas are used as separators in CSV, you cannot use commas in the data like “Name, Jr.”

__ T __ Spaces and tabs in CSV files are ignored unless used for word separation.

__ T __ XML looks like HTML (web page sources) except that we can define our own tags.

___ T _ Data can be represented in multiple ways in XML format.

__ T __ JSON is an alternative data file format to XML and CSV formats.

___ T _ Word document files (.doc, .docx) contain much binary data.

Page 5: Juniata Collegefaculty.juniata.edu/rhodes/ida/exams/exam1f13key.docx · Web viewThe value 1010 0101 in binary is ___165__ in decimal and its corresponding hexadecimal digits are _A5__

IT 241 Information Discovery Fall 2013 Exam 1 Page 5