Hardware Testing and Benchmarking Methodology

8/14/2019 Hardware Testing and Benchmarking Methodology

1/2

Hardware Testing and Benchmarking Methodology On many hardware review websites explanations of the methodology used for testing and the amount datadisplayed in the review are lacking in quality and quantity. I'll attempt to show what needs to be addressed toimprove the quality of these reviews, what you should find in high-quality reviews, and how to find misleadinginformation.

Qualitative vs. Quantitativehen a reviewer states '!roduct " is the fastest#most quiet#best value,' this is a qualitative statement. $hereviewer is using statements that are unquantifiable, that is they can't be put into numbers. $his is badbecause these statements can't be tested or proven true, and therefore it misleads the reader.

%owever, these statements can be quantified, the reviewer might say '!roduct " is the most quiet because itperformed at & d()&m at such and such speed compared to product * + ee $able .' $his may not be asflashy or as eloquent as a simple '$his is the best' statement, but with correct testing methodologies to backup that sentence, it's actually telling you something about the item, rather than showing off the reviewer'swriting style. Variable Control$esting can be completely ruined unless the number of variables involved are reduced to the absolute

minimum number. /or experiments, the ideal is to have only one variable. /or example, voltage, ambienttemperature, and airflow are controlled, while the temperature varies. In the real world having only onevariable is not always possible, but the good experimenter attempts to reduce the effects of anything thatmight affect the test. $his might be accomplished by turning disconnecting or turning off any other attachments to a computer, or removing un-necessary parts +such as a sound card or raid array whiletesting a video card. hen variables aren't kept in check, accuracy may be reduced, although the perceivedprecision may be high +see below . Accuracy vs. Precision$his probably brings back vague memories of high school physics or chemistry for most readers, so a quickrefresher. If something is accurate, it is close to the correct values, and if a result is precise it is repeatableand exact. 0sing a target as a metaphor, the bull's-eye is the 'correct' value, while the farther away from thecenter you are the less 'correct' the value is. %over over each target with your mouse to see the explanation.

$his relates to hardware testing somewhat abstractly, but the general idea is that testing conditions can allow

for a high degree of precision +1&.234 degrees 5elsius for example but can be completely wrong if not allvariables are controlled +see above . On the flip side, the results can be correct, but not precise, however because of a large sample si6e the mean of the values obtained is close to the bull's-eye. $hus, precisiondoes not have to be too great if the sample si6e is large enough. /or hardware testers, this means that themost precise instrumentation available should be used, along with the greatest number of samplesobtainable within a given time period. 7n accurate hardware review finds the data to agree closely withcontrol values for the test and#or hypothesi6ed or expected results for the test.

igni!icant "igits$he basic idea here is that you don't keep more digits than you measured. /or example, when using a ruler,you write down a number which takes into account the smallest divisions on the stick +such as millimetersand then estimate the next digit. $hus, for a ruler with millimeter graduations a possible measurement mightbe &2.3mm . $his last digit approximation isn't usually applicable to hardware testing, as most measuring is

digital rather than analogue as with the ruler.

7lso, when plugging raw numbers into equations or averaging data the calculator will sometimes give moredigits than are significant. $he calculator may give the mean as 2.2222222... but you only measured three

8/14/2019 Hardware Testing and Benchmarking Methodology

2/2

significant digits in your data +2.21, 2.18, 2.34... so the mean is correctly stated to be 2.22.

hen a reviewer doesn't keep the correct number of significant digits +most often increasing the number of digits it can lead to the idea that the data is more or less precise than in actuality. #e$eatability o! #esults$he repeatability of a test is the most telling gauge of how well an experiment has been done. In hardware

testing, if the reviewer and a third party follow the guidelines on this page and get the same results, the testresults are repeatable. $his goes along with accuracy and precision, as more people repeat a test and theresults from those tests land near the other tests, it can show the experiment's precision. If these new resultsare also near the control value 'bull's-eye', the experiment is also accurate.

am$le i%e$he more times a test is done under the correct conditions, the more likely spurious results +small mistakesin each individual test do not adversely affect the final averages +mean, median, mode etc. . 9ach instanceof a product#item should be tested multiple times, and it is preferable to have more than one of the items toperform tests on, as some examples may perform better or worse than others.

tandard "eviation& tandard 'rror $he tandard deviation of a population is a measure of the data's spread of values. tandard error is the

standard deviation over the root of the number of samples and results in a percentage. $hus, standard error takes into account the number of samples in the experiment +the more the better .

I'm oversimplifying, but a small standard deviation or standard error means that most of the values recordedduring testing are within a small range of the arithmetic mean of the data. (asically, the smaller the standarddeviation or standard error the more the numbers are to be trusted, and if standard error is used in place of standard deviation, the reviewer is unashamed of the sample si6e of the testing. Human 'rror Often during an individual run of testing, small errors may be made +i.e. the tester forgets to remove avariable are to be expected. ith a number of tests and peer review +repeatability of results these humanerrors can be reduced or eliminated.

Bias(ias, a form of human error, is perhaps the biggest problem in hardware review. :any reviewers are giventheir review items for free and are lavished with praise and help from the companies that are being reviewed.

7lso, test items may be hand-picked by the company supplying them to be high performers, which distortsresults. +$his assumes that the reviewer is a third-party to begin with, and not paid by the manufacturer of theproduct. $he result of this is that reviews may be unfairly weighted to companies with more money to spendon and products to give away to willing reviewers.

Documents

Hardware Testing and Benchmarking Methodology