32
Chapter 2 Data Collection

Chapter 2 Data Collection. Before any data are collected, you need to carefully define the question and develop operational definitions! Explicitly define

Embed Size (px)

Citation preview

Chapter 2 Data Collection

Before any data are collected, you need to carefully define the question and develop operational definitions!

Explicitly define the scope of the inferences including limitations.

Stopping distances of bike - smooth vs tread tires– On asphalt? dry? wet?– Which brands of smooth tires? Which type of

brake?

• There is a tradeoff between more precise answers to narrower questions or less precise answers to more general questions.

2.1 General Principles in the Collection of Engineering Data 2.1.1 Measurement

"An engineer planning a study ought to ensure that data on relevant variables will be collected by well-trained people using measurement equipment of known and adequate quality."

"Training technicians has to be taken seriously."

• Biases, intentional or unintentional, are to be avoided.

• Measurements can be made blind without personnel knowing what condition is being tested.– Medical experiments often have patients and doctors

blind to medication given.

• Other techniques for ensuring fair play (such as randomization, blocking) are discussed later.

2.1.3 Recording

Develop• Documented protocols• Recording forms

Include documentation explicitly on the recording forms

- Ambient temperature, unusual events- Put documentation into permanent computer data base 'meta-data'

2.2 Sampling in Enumerative Studies

Simple random samplePut all part numbers in a box.

In ExcelPart # Random #• Sort by random #. • Pick first n rows for sampled parts.

Stratified random sampleSplit parts to inventory into strata– Big expense parts– Small expense parts

Stratification assures adequate sampling of subcategories and potentially more precision estimates

Advantage of random sampling

• Assumes objectivity• Insurance against biases, intentional or

unintentional• Allows quantification of potential error via

probability

2.3 Principles for Effective Experimentation 2.3.1 Taxonomy of Variables

Response variable - System output of interest- Compression strength of taconite- Strength of glued boards

Managed variable - Set by experiments.- Experimental variable - Set at different levels

- Three levels of temperature for gluing- Controlled variable

- Use 3 glues but all at the same temperature Freezing effect on glue bond

- Experimental variable - freezing temperature- Controlled variable - drying time, wood type, drying temp

2.3.2 Handling Extraneous Variables

An extraneous variable is one that can influence the response but is not of primary interest.

– Stopping times of bicycles with treaded and smooth

tires. The particular rider affects stopping times.

– Strength of glued wood. The moisture content of the wood can affect the

strength

Sometimes the extraneous variable is observed, like rider, and sometimes it's unobserved, like moisture. Sometimes the extraneous variable is even unanticipated.

Inattention to extraneous variables can add noise to the comparisons or confuse (confound) the experimental results.– We are interested in comparing types of golf clubs. If

we use golf balls of various condition, the variability due to golf ball conditions makes it harder to measure effects precisely, adds noise to the system. Other extraneous variables include golfer, temperature, wind speed, golfer fatigue, etc.

– If the glue 1 is set on a humid day and glue 2 is set on a dry day, observed differences could be due to glue type or humidity effects. Here glue and humidity effects are completely confounded, confused with each other.

Strategies for reducing effects of extraneous variables

– Controlling variables– Blocking– Randomization

Controlling a variable means keeping it at the same level.– Glue all boards at a nearly fixed temperature.– Have one rider for all runs of smooth and treaded

tires.– Use new golf balls of the same type.

A block of experimental units, experimental times, experimental conditions, etc. is a homogeneous group of experimental units within which different levels of primary experimental variables can be applied and compared in a relatively uniform environment.

Blocking is a very important concept. There will be exam questions about this concept.

– Have each rider use a treaded and smooth tire

bike. Each rider is a 'block' . A block with 2 treatment levels is a paired design.

– For comparing 3 glues, take 10 boards and cut each board into thirds. Use each glue on one part of each board. The boards are blocks.

– Most often each treatment is replicated once in each block

• Randomization is insurance against biases that might otherwise occur. – Each rider will ride bikes twice, once treaded and

once smooth tire. We don't want all smooth tired runs done first. We could randomize (flip a coin) to decide.

– If we have 30 small boards for gluing • We could randomly assign 10 boards to each glue.

A completely randomized design.• If there are some obvious differences between the

boards, it may help to divide the boards into 10 blocks of 3 boards. Within each block assign on board to each glue. A randomized block design.

– The order of gluing the 30 boards would also be randomized (and possibly blocked) to guard against having one glue done earlier in the day.

• Blocking often provides better insurance. • Unblocked randomizing can end up with more of one

glue earlier in the day.

! Blocks are set up Before units are assigned to treatments.

• If we hit 10 golf balls with a titanium driver, these 10 balls are not a block.

• This is common mistake by students on exams.

Both randomization and blocking are like insurance policies.

• In some cases not having the insurance won’t hurt.• Other times not having the insurance can hurt big time.• The cost, hassle of randomization and potentially

blocking isn’t very big. – Usually randomization is worth the cost.– Infrequently randomization is not worth the cost.

• But think carefully about whether there are potential pitfalls to not randomizing.

• Bouncing balls on wood and cement surfaces.

2.3.3 Comparative StudyA comparative study compares treatments, for example comparing 2

glues. • Even when investigating a particular new treatment, it's best to

do a comparative study with the old glue. • If we only use the new glue on a batch of boards and compare the

strengths to historical board strengths, it could be that the new boards are different from the historical boards. Any observed difference could be due to glue effects or due to changes in the boards.

In medical studies it's standard to include some patients who receive

the old drug or no drug for a head to head comparison with the new drug. The patients getting no drug are a 'control' group. This is another use of the term 'control'

2.3.4 Replication

• Replication means carrying through the whole process of adjusting values for the supervised variables, making an experimental 'run', and observing the results of that run – more than once.

• "Simply re-measuring an experimental unit does not amount to real replication." Or not resetting the entire process means not having true replicates. See example 9, page 45.

• Example 10: Making one of each of 2 designs of paper

planes and retesting the 2 planes does not accomplish independent replications of the designs. If we only make 2 planes, we don’t know if the two planes more different than we would find by making 2 planes from the same design.

2.4. Some Common Experimental Plans 2.4.1 Completely Randomized Designs

• In a completely randomized design all units or runs are put into a simple hat and randomly assigned to each treatment.

• Number the 30 boards. Pick 10 numbers for each (boards) for

each glue. – Put the board numbers 1-30 in column 1 of Excel. – Put random numbers into column 2.– Sort by the random column 2.– Assign the board numbers in

• row 1-10 to glue 1• rows 11-20 to glue 2• rows 21-30 to glue 3

2.4.2. Randomized Complete Block Design

• Units are broken into hopefully homogeneous blocks, and treatments are randomized to units within each block.

– Form 10 sets of 3 similar boards in each set (block).– Within each set (block) assign 1 board randomly to each

glue

• Most commonly each treatment is replicated once in

each bock. Example 12 is unusual in this regard.

2.5 Preparing to Collect Engineering Data

Read the book. Problem Definition• Step 1: Identify the problem.• Step 2: Understand the context of the

problem.• Step 3: State in precise terms the objective

and scope of the study.

Study Definition• Step 4: Identify the response variables(s) and

appropriate instrumentation.• Step 5: Identify possible factors influencing

responses.• Step 6: Decide how (and if so how) to manage

factors likely to affect the responses.• Step 7: Develop a detailed data collection

protocol and time table for the first phase.

Physical Preparation• Step 8: Assign responsibility for careful supervision.• Step 9: Identify technicians and provide necessary

instruction in objectives and methods.• Step 10: Prepare data collection forms and/or

equipment.• Step 11: Do a dry run of analysis on fictitious data.• Step 12: Write up a 'best guess' prediction of results.

See the text for more details.

Some Study Questions

• What advantages does an experimental study have compared to an observational study?

• What is the difference between a population and a sample? • Give an example of multivariate data. • Managed variables are either experimental or controlled variables.

What is a controlled variable? • What is an extraneous variable? • What are the 3 strategies for reducing effects of extraneous

variables?

• What is a “block”?

• Blocks are set up B_____ units are assigned to treatments. Fill in the blank.

• What is the potential advantage to the randomized block design versus a completely randomized design?

• Give an example where 2 measurements are not

separate, independent replicates.