19
RELIABILITY OF SENSORY ASSESSORS: ISSUES OF RETENTION AND LEARNING JANNA BITNES 1,2,3 , ØYDIS UELAND 1 , PER MØLLER 2 and MAGNI MARTENS 1,2 1 Matforsk AS Norwegian Food Research Institute N-1430 Ås, Norway 2 University of Copenhagen Faculty of Life Sciences DK-1958 Frederiksberg C, Denmark Accepted for Publication March 4, 2008 ABSTRACT The objective of this study was to investigate whether the sensory per- formance of assessors in a sensory panel may be retained after years of retirement from sensory work. The approach was to register previous perfor- mance in basic tastes identification tests and to compare with their present performance in the same tests, as well as in judging complex foods. The results showed that the identified performance was within the response curve of the performance over several years. However, there was no clear relationship between the performances in identification and in judging foods. The retired assessors who had been away from sensory work for several years showed lower performance than the active assessors did, but with larger variability within the group. The retired assessors performed better than the active assessors concerning the evaluation of bitter. PRACTICAL APPLICATIONS The results from this study provides knowledge about how time without sensory training for a panelist might challenge the expertise obtained through many years as assessor in a sensory panel. Our findings showed that the sensory panelists were reliable despite several years of retirement. However, this was dependent on individual differences, attribute, product and method in 3 Corresponding author. TEL: 47-90-60-23-12; FAX: 47-22-15-24-57; EMAIL: janna.bitnes@ nortura.no Journal of Sensory Studies 23 (2008) 852–870. All Rights Reserved. © 2008, The Author(s) Journal compilation © 2008, Wiley Periodicals, Inc. 852

Reliability of Sensory Assessors

Embed Size (px)

Citation preview

Page 1: Reliability of Sensory Assessors

RELIABILITY OF SENSORY ASSESSORS: ISSUES OFRETENTION AND LEARNING

JANNA BITNES1,2,3, ØYDIS UELAND1, PER MØLLER2 andMAGNI MARTENS1,2

1Matforsk ASNorwegian Food Research Institute

N-1430 Ås, Norway

2University of CopenhagenFaculty of Life Sciences

DK-1958 Frederiksberg C, Denmark

Accepted for Publication March 4, 2008

ABSTRACT

The objective of this study was to investigate whether the sensory per-formance of assessors in a sensory panel may be retained after years ofretirement from sensory work. The approach was to register previous perfor-mance in basic tastes identification tests and to compare with their presentperformance in the same tests, as well as in judging complex foods. The resultsshowed that the identified performance was within the response curve of theperformance over several years. However, there was no clear relationshipbetween the performances in identification and in judging foods. The retiredassessors who had been away from sensory work for several years showedlower performance than the active assessors did, but with larger variabilitywithin the group. The retired assessors performed better than the activeassessors concerning the evaluation of bitter.

PRACTICAL APPLICATIONS

The results from this study provides knowledge about how time withoutsensory training for a panelist might challenge the expertise obtained throughmany years as assessor in a sensory panel. Our findings showed that thesensory panelists were reliable despite several years of retirement. However,this was dependent on individual differences, attribute, product and method in

3 Corresponding author. TEL: 47-90-60-23-12; FAX: 47-22-15-24-57; EMAIL: [email protected]

Journal of Sensory Studies 23 (2008) 852–870. All Rights Reserved.© 2008, The Author(s)Journal compilation © 2008, Wiley Periodicals, Inc.

852

mgrumet
Zone de texte
19587
Page 2: Reliability of Sensory Assessors

question. This will have practical application in the food industry, wheresensory panels vary according to the needs and available resources of thecompanies. The results open for the use of sensory assessors at infrequentintervals, but at the same time, they remind us that the assessors’ sensoryabilities might change for each product, attribute and sensory task.

INTRODUCTION

The main issue raised in the present study is the reliability of sensoryassessors, with special focus on how intervals without practice affect sensoryperformance. Research questions addressed are: (1) the effect of several yearsof retirement from sensory work on performance; and (2) the ability to regainformer performance level.

A sensory judgment is an outcome of a number of physical, physiologicaland psychological processes. Frijters (1993) distinguished between three pro-cessing stages in single stimulus judgment: first, a physical function trans-forms the physical stimulus into a sensation, second, the sensation is encodedand stored into working memory, where it may be accessed for judgmentalprocessing, and third, the values of the coded sensations are transferred into aresponse. Cognitive factors mainly interfere during encoding and responseformation.

Recent theories on memory describe separate explicit and implicitmemory systems (Kéri 2003). Explicit memory includes conscious and con-trolled intentional processes, such as remembering a particular taste, whereasimplicit memory refers to automatic, nonintentional, noncontrolled learningwithout conscious awareness. One might suggest that learning in a sensorypanel concerns both processes. In Degel et al. (2001), implicit learning wasalso called “phenomenally unconscious” because subjects were not informedabout the existence of the relations, let alone instructed to search for them,even if the test situation may require intentional retrieval in some cases.Furthermore, implicit memory, in contrast to implicit learning, refers to “situ-ations in which effects of previous experiences can be observed despite the factthat the participants are not instructed to relate their current performance to alearning episode” (Buchner and Wippic 1998). Therefore, implicit memory isalso described as phenomenally unconscious.

The detection and identification of food items, as well as raising expec-tations about their taste, is usually more concerned with memory about theproduct than of the product itself. Köster (2006) raised several questions onmemory function on normal everyday life: “Does a stimulus never come to atabula rasa, but always drops in on our expectations and can therefore notalways be perceived the same way?” and “Does our brain remain at rest when

853RETENTION AND LEARNING ISSUES

Page 3: Reliability of Sensory Assessors

all input is according to expectation?” In everyday life, people seldom learnintentionally. We learn incidentally, unintentionally and uncontrollably, andexplicit retrieval is almost never demanded. Consequently, we have to trainpeople extensively to make reliable sensory assessors. In this respect, it is ofinterest to investigate what happens with a retention interval over several yearswith absence of explicit learning tasks.

Köster (2006) summarized recent studies on incidental learning andmemory for taste, texture and flavor, and showed that learning and memorywere dependent on sensory attributes and products. For instance, incidentallearning and memory were found to be better for bitterness and sourness,stimuli that represented a potential danger, than e.g., sweetness. In Mølleret al. (2004), young people were found to be better than elderly in intentionallearning (but not in incidental learning). The authors suggested that this dif-ference was not due to better encoding of the stimuli by the young, but to thefact that they were better at rejecting the distractors.

Routinely training a sensory panel has been found to maintain the reli-ability of sensory assessors. In a study by Bitnes et al. (2007a), trainingseemed to counterbalance for the loss in basic taste sensitivity with ageingshown for one-stimulus solutions. In that study, the effect of ageing, expe-rience and exposure on taste identification performance, measured as cor-rectly identified sweet, salty, sour and bitter taste stimuli, was studiedannually from 1976 to 2003. A corollary of this study would be to investi-gate what happens when an assessor has a break in the sensory training.Is it like riding a bike? – will the sensory abilities emerge and retain? Whenis it too late? All these questions will be of practical importance in the foodindustry where sensory assessors perform sensory analyses at infrequentintervals.

In the present study, we aimed to investigate the effect of several years ofretirement from sensory work on performance, and the ability to regain formerperformance level.

MATERIALS AND METHODS

Subjects

The subjects investigated in the present study constituted two groups, 12active assessors, and eight retired assessors. The active assessors were presentas a control group. All subjects were female.

Active Assessors. Twelve active assessors of the sensory panel at theNorwegian Food Research Institute, Matforsk AS, participated in the study.

854 J. BITNES ET AL.

Page 4: Reliability of Sensory Assessors

The institute has been the basis of many research and industrial programs, andthe panel has regularly performed sensory evaluations of all types of foodproducts. The members of the panel were selected and trained according toguidelines in the International Standard (ISO 1993) standard. Sensory expe-rience of the assessors varied between 1 and 18 years, while age of theassessors varied between 33 and 62 years (Table 1). The assessors met thefollowing criteria: healthy, not taking any prescribed medicine, nonsmoking,not pregnant, not allergic to food and not wearing dentures.

Retired Assessors. Eight retired assessors of the sensory panel atMatforsk AS participated in the experimental group. The assessors ranged inage and in years since they last conducted sensory analysis (Table 1). Agediffered between 63 and 80 years, and years since they last conductedsensory analysis, retention time, differed between 3 and 21 years. Prior toretirement, the assessors experience varied between 5 and 23 years. Four ofthe retired assessors used medication on a regular basis and two of themwere smokers.

TABLE 1.THE ACTIVE AND THE RETIRED ASSESSORS (A)

A Years ofsensoryexperience

Age today(years)

Retention (yearssince lastsensory test)

Active 1 6 59 02 1 33 03 14 58 04 2 55 05 2 39 06 10 56 07 16 61 08 18 58 09 10 46 0

10 2 35 011 16 48 012 14 62 0

Retired 13 9 76 1114 8 68 1915 5 63 2116 23 69 317 14 79 1218 7 75 1919 8 80 1820 21 67 2

855RETENTION AND LEARNING ISSUES

Page 5: Reliability of Sensory Assessors

Stimuli

The products evaluated increased in complexity from pure solutions ofsucrose, sodium chloride, citric acid and caffeine in water, to the complex foodtomato soup constituting different levels of the same substances.

Simple Solutions of the Basic Tastes in Water. Sucrose, sodium chlo-ride, citric acid and caffeine/quinine dissolved in water were each prepared inthree different concentrations to provide the basic taste stimuli of sweet, salty,sour and bitter (ISO 1991). The solutions were 2, 4 and 6 g/L sucrose, 0.1, 0.3and 0.6 g/L sodium chloride, 0.05, 0.10 and 0.15 g/L citric acid, 0.00015,0.0003 and 0.0005 g/L quinine hydrochloride, or 0.06, 0.14 and 0.27 g/Lcaffeine. Concentration ranges have regularly been evaluated by Matforsk ASand were considered adequate. From the year 1995, the samples containingquinine were replaced by caffeine because quinine was claimed to causeallergic reactions. In addition, one sample of tap water without any addedcompounds was presented in each session, giving a total of 13 differentsamples. Water for each of the tests was tapped at the same time of the testsession and used both for making solutions and as water sample in the test. Thetap water consisted of filtrated water that had regularly been tested for off-flavors. The solutions were prepared the day before testing and kept refriger-ated at 4C.

Complex Solutions. Mixtures of the basic tastes were presented intomato soup. Five different concentrations of sucrose, sodium chloride, citricacid and caffeine were added to a basic recipe of tomato soup giving 20 samples.The solutions were 0, 19, 30, 41 and 60 g/L sucrose, 0, 0.8, 1.9, 3.0 and 6.0 g/Lsodium chloride, 0, 0.35, 0.50, 0.65 and 0.90 g/L of citric acid and of caffeine,respectively. The tomato soup samples consisted of 20% tomato purée (Heinz,Pittsburgh, PA), water and the respective stimulus. The solutions were mixed,heated and chilled the day before testing and kept refrigerated at 4C.

Experiments

The assessors were tested for their ability in basic taste identification andjudging foods. The methods used were an identification test of the basic tastesin water solution and a quantification test of the basic tastes in tomato soup.

Identification Test. The assessors identified the taste qualities sweet,salty, sour and bitter at near threshold levels in water. Samples (25 mL) werepresented in tasteless plastic cups and served in the sensory laboratory (ISO1988) at room temperature (20 � 2C) in random order within each testsession, and the order of presentation was identical for all subjects. There were

856 J. BITNES ET AL.

Page 6: Reliability of Sensory Assessors

three replicates of each sample randomly divided into three sessions, oneserved before and two served 1 h after lunch. Before the first session, theassessors received five reference samples of the strongest concentration ofsucrose, citric acid, sodium chloride, caffeine and water. They were instructedto taste the samples and, try to remember what they tasted like. Then theyreceived 13 samples, each with a three-digit random number. Changing ofjudgments was not allowed, and, above all, they were specifically advisedagainst trying to look for a serving order. The assessors responded with one ofthe five choices: sweet, salty, sour, bitter or water. The chance of guessingcorrectly was very low (0.82 ¥ 10-7%, P [1 out of 5] 13 times). The measure ofthe identification performance was therefore not adjusted with regard to theselection probability. Before the sessions and after each sample, the assessorswere rinsed with water and expectorated in order to clean the mouth.

In addition to the identification experiment conducted in the presentstudy, historical data of the same experiments from 1976 to 2003 have beeninvestigated to study how the tasting abilities have endured over the years. Inthis period, the panel had a stable composition of 10–12 members, continu-ously replacing persons who retired or left the panel (Bitnes et al. 2007a).

Quantification Test. The assessors scored the taste intensities of sweet,salty, sour and bitter in tomato soup. The procedure was conducted with acontinuous unstructured line scale anchored with the terms “no intensity” onthe left and “high intensity” on the right (ISO 1985). Water and unsaltedcrackers were available for rinsing palates between tasting samples. Samplesof tomato soup (25 mL) were presented in tasteless plastic cups and served inthe sensory laboratory (ISO 1988) at room temperature (20 � 2C) in a randomorder within each test session, and the order of presentation were identical forall subjects. The assessors concentrated on separate stimulus dimensions ineach session. In four sessions in over 2 days, they scored the intensities ofsweet, salty, sour and bitter, respectively. Before lunch the first day, theyscored the intensity of sweet for five samples in two replicates randomizedwithin two sessions. After lunch, they scored the intensity of salty for fivesamples in two replicates randomized within two sessions. At day 2, identifi-cation of sour and bitter followed the same procedures. Before each session,they got two samples, one control without added stimulus, and one with thestrongest concentration of the component. In the tasting session, the assessorsscored the intensities of five samples, each marked with a three-digit randomnumber.

The retired assessors registered their evaluations manually on paperforms as they were used to, while the active assessors registered their evalu-ations into a computer system (Compusense five Release 4.2 by CompusenseInc., Guelph, ON, Canada).

857RETENTION AND LEARNING ISSUES

Page 7: Reliability of Sensory Assessors

Data Analysis

Identification Test. Averages over three replicates and three concentra-tion levels (low, medium and high concentrations plus water) for the fivesubstances of sweet, salty, sour and bitter tastes plus water showed the iden-tification ability of the basic tastes in water solution. In addition, the correctidentification averages were collected under presumably identical conditionsover the period the individual assessors were members of the panel at Mat-forsk. The resulting data for each assessor could therefore show the retentionof the tasting abilities after different numbers of years’ absence.

Quantification Test. The sensory data from the descriptive analysis oftomato soup were subjected to one-way analysis of variance (ANOVA) usingPanel Check Version 1.2.1 (http://www.matforsk.no/panelcheck by MatforskAS, Ås, Norway) and using Minitab Release 14.2 (copyright 1972–2005 byMinitab Inc., State College, PA). Panel Check is a free software package thatutilizes simple graphical techniques to investigate the performance of sensoryassessors and the validity of the sensory terms and replicates (Tomic et al.2005). Two plots were studied to investigate the performance of the assessors inthe present study. (1) Line plot visualizes the scores of each judge; and (2)P*MSE plot visualizes ANOVA results in a graphical plot. Here, the assessors’ability to detect differences between samples is plotted against their repeatabil-ity using the P values and MSE values from the ANOVA calculations. In thepresent study, only an ANOVA table (Table 2) shows the results from the PanelCheck program. A t-test was performed in Minitab on the data to compareidentification abilities before and after years of retention time.

The sensory data from the descriptive analysis of tomato soup were alsosubjected to partial least squares regression (PLSR) using The UnscramblerVersion 9.6 (CAMO ASA, Oslo, Norway, 1986–2004) to estimate the perfor-mance of the assessors and the validity of the sensory terms and replicates.ANOVA-PLSR (APLSR) was performed on the raw data from the descriptivetest with four attributes and two replicates. The 0/1 design variable constitutedthe x-matrix while the sensory variables constituted the y-matrix. The assessorlevel and the replicate level differences were determined and removed. Themean-level differences between the assessors were typically caused by levelvariation in use of the sensory scales, and differences between replicates weretypically caused by drift from one replicate to another because of changes inuse of the sensory scales with time. A second APLSR was performed on thecorrected data to investigate the remaining systematic structure in the data. Theremaining variances caused by patterns of systematic differences betweenthe samples, the signal (total variance before modeling), and the residualvariance, the noise (the mean MSE for each assessor, cross-validated APLSR

858 J. BITNES ET AL.

Page 8: Reliability of Sensory Assessors

after modeling optimal number of principal components), were calculated(Martens et al. 1998; Thybo and Martens 2000; Kreutzmann et al. 2006;Bitnes et al. 2007b). The signal was then compared to the corresponding noise,where a higher signal than noise indicated a higher profiling performance.

RESULTS

The assessors, both the active and the retired assessors, showed both highand low performance for the two different tests and for the different tastestimuli within the particular test.

Identification Test

Figure 1 summarizes percent correct identification of the basic tastesin water solution averaged across sessions 1, 2 and 3. The active assessors

TABLE 2.ONE-WAY ANOVA SHOWING F, P AND MSE VALUE FOR FOUR TASTE QUALITIES FOR

EACH ASSESSOR (A)

A Sweet Salty Sour Bitter

F P MSE F P MSE F P MSE F P MSE

Active 1 19,711.00 *** 0.001 2,190.44 *** 0.009 28.49 *** 0.650 1.91 NS 6.3192 11.41 ** 1.490 25.45 ** 0.892 18.96 ** 0.428 2.94 NS 3.7643 2,451.75 *** 0.008 54.48 *** 0.186 3.94 NS 1.751 2.81 NS 2.8674 4,126.80 *** 0.005 5,216.00 *** 0.004 5.14 * 3.282 1.54 NS 7.5215 166.30 *** 0.114 174.08 *** 0.099 5.53 * 2.257 3.52 NS 3.7096 461.89 *** 0.031 81.27 *** 0.174 1.73 NS 3.049 1.52 NS 5.3807 118.85 *** 0.165 12.67 ** 1.335 0.98 NS 6.283 4.02 NS 2.9718 16.13 ** 1.119 56.69 *** 0.306 3.25 NS 3.499 4.31 NS 2.8669 – – – – – – 2.92 NS 4.800 1.62 NS 6.464

10 68.27 *** 0.266 56.78 *** 0.282 11.18 ** 1.045 1.61 NS 6.96711 319.15 *** 0.065 – – – 32.30 *** 0.462 2.40 NS 5.53712 453.52 *** 0.043 1,381.57 *** 0.015 0.98 NS 7.298 3.19 NS 3.842

Retired 13 8.50 * 2.084 0.79 NS 5.534 0.58 NS 7.210 2.61 NS 3.18414 13.76 ** 0.616 1.66 NS 3.142 1.45 NS 3.557 1.15 NS 1.95415 250.92 *** 0.083 24.29 ** 0.794 6.53 * 2.551 17.54 ** 1.09016 23.21 ** 0.835 8.62 * 1.633 6.37 * 1.998 1.89 NS 3.43717 257.06 *** 0.054 14.41 ** 0.837 12.45 ** 0.655 34.97 *** 0.32318 2.08 NS 3.140 1.40 NS 1.183 0.31 NS 2.887 5.76 * 1.36919 0.48 NS 9.836 26.34 *** 0.636 0.54 NS 9.290 0.96 NS 6.77420 422.32 *** 0.046 6.20 * 2.755 3.83 NS 3.294 115.23 *** 0.173

* P < 0.05, ** P < 0.01, *** P < 0.001.NS, not significant; –, missing due to no difference between the scores.

859RETENTION AND LEARNING ISSUES

Page 9: Reliability of Sensory Assessors

(in white, 1–12) showed higher identification abilities than the retired asses-sors (in black, 13–20). Mean correct identifications were 79.8% by the activeassessors, and 66.3% by the retired assessors. While 11 out of 12 activeassessors reached 75% correct identifications, only three out of seven of theretired assessors reached 75%. There were larger differences between perfor-mances by the individuals in the group of elderly subjects than in the group ofyoung subjects.

Figure 2 summarizes the average percent correct identifications for eachidentification test each of the (1) active and (2) retired assessors has conductedduring their years of sensory work in the panel. The data show large variationsin number of tests and years of participation, age range of the individualassessor, and within the response curvature for each assessor. Mean percentcorrect identifications across all the years each assessor had participated in thetest varied between 61.1% and 84.1% (one assessor at 32%), including allthe assessors. The assessors that had a history of retirement prior to the lastidentification test did in some cases show a decline in the tasting abilities;for instance, see assessors 14, 15 and 17. However, mean performance afterretirement was within the response curvature for each assessor. Only oneassessor, assessor 15, showed a significant (t-test comparing mean values ofperformance before and after years of retirement, P < 0.01) lower performanceafter the retirement.

1 1

1 1

1

2

2

2

2

2

3

3

3

3

3

4

4

4

4 4

5

5

5

5

5

6

6

6

6

6

7

7

7

7

7

8

8

8

8

89

9 9 9

9

10 10

10

10

1011

11 11

11

11

12 12 12

12

1213

13

13

13

13

14

14

14 14 1415

15

15

15 15

16

16 16

16

16

17

17 17

17

17

18

18

18 18

18

20

20

20 20

20

0

20

40

60

80

100

Sweet

Perc

ent

corr

ect

identifications

Active (white)Mean=79.8%

Retired (black)Mean= 66.3%

Salty Sour Bitter Water

FIG. 1. PERCENT CORRECT IDENTIFICATIONS OF SWEET, SALTY, SOUR AND BITTERTASTES ADDED WATER SOLUTIONS PLUS WATER WITHOUT ANY ADDED STIMULUSFOR ACTIVE (WHITE BARS, 1–12) AND RETIRED (BLACK BARS, 13–20) ASSESSORS

The results from assessor 5 are from 6 months earlier. No results from assessor 19 from that time.

860 J. BITNES ET AL.

Page 10: Reliability of Sensory Assessors

A01020406080100

1

4

7

10

13

16

19

22

25

28

31

34

37

40

43

Mea

n =

83.

8

A02020406080100

12

34

56

78

910

1112

13

Mea

n =

77.

5

A03020406080100

1

7

11

15

19

23

27

31

35

39

43

47

51

55

59

63

Mea

n =

74.

9

A04020406080100

1

3

5

7

9

11

13

15

17

19

21

A05020406080100

12

34

56

78

910

1112

1314

1516

Mea

n =

78.

4

A06020406080100

1

5

9

13

17

21

25

29

33

37

41

45

49

53

Mea

n =

80.

7

A07020406080100

1

6

11

16

21

26

31

36

41

46

51

56

61

A08020406080100

1

6

11

16

21

26

31

36

41

46

51

56

61

66

Mea

n =

64.

8

A09020406080100

1

5

9

13

17

21

25

29

33

37

41

45

49

53

A10020406080100

12

34

56

78

910

1112

1314

15

Mea

n =

82.

6

A11020406080100

1

6

11

16

21

26

31

36

41

46

51

56

61

66

A12020406080100

1

5

9

13

17

21

25

29

33

37

41

45

49

53

Percent correct identifications

Tes

t #

Mea

n =

78.

9

Mea

n =

78.

7

Mea

n =

72.

5

Mea

n =

63.

4

Mea

n =

76.

2

a

FIG

.2.

PER

CE

NT

CO

RR

EC

TID

EN

TIF

ICA

TIO

NS

OF

TH

EB

ASI

CTA

STE

S(A

)A

ctiv

e(w

hite

dots

,1–1

2)an

d(B

)re

tired

(bla

ckdo

ts,1

3–20

)as

sess

ors

show

nfo

rea

chte

stdu

ring

the

year

sof

sens

ory

wor

k.B

roke

nlin

ean

dgr

eydo

tsin

dica

teth

ete

sts

whe

reth

ere

tired

asse

ssor

sw

ere

test

edto

geth

erw

ithth

eac

tive

asse

ssor

saf

ter

seve

ral

year

sof

retir

emen

tfr

omse

nsor

yw

ork.

861RETENTION AND LEARNING ISSUES

Page 11: Reliability of Sensory Assessors

A18020406080100

13

46

79

1011

12

A14020406080100

12

34

56

78

9A

13020406080100

12

34

56

78

910

1112

1314

1516

A17020406080100

1346789

10111213141516171819202122

A16020406080100

1

9

14

19

24

29

34

39

44

49

54

59

64

A20020406080100

0

3

6

91112131414151617171819

A19020406080100

12

34

5

Percent correct identifications

Tes

t #

A15020406080100

12

45

67

89

1011

12

M

ean

= 6

1.5

M

ean

= 5

9.0

M

ean

= 8

4.1

Mea

n =

74.

4

M

ean

= 7

0.4

Mea

n =

79.

5

Mea

n =

81.

6

M

ean

= 5

3.9*

*

M

ean

= 7

0.4

Mea

n =

59.

0

M

ean

= 3

1.6

M

ean

= 7

2.1

Mea

n =

79.

5

M

ean

= 7

6.6

Mea

n =

74.

4

b

FIG

.2.

CO

NT

INU

ED

862 J. BITNES ET AL.

Page 12: Reliability of Sensory Assessors

Quantification Test

A better performance by the active assessors was shown by small differ-ences between the replicates in Fig. 3, and by a higher signal/noise ratio andvariables lying further away from the target line (signal = noise) than for theretired assessors in Fig. 4. The latter means that the signal is larger than thenoise for the active assessors. They had a better ability to discriminate betweendifferent products.

Comparing the active assessors and the retired assessors in Fig. 3 showsdifferences in repeatability dependent on taste qualities. The retired assessorsseemed to have most difficulties judging the salty and sour taste qualities,while the active assessors had most difficulties judging the bitter and sour tastequality. The graphs in Fig. 3 also show that the variation among the retiredassessors was much larger than for the active assessors. In Fig. 4, some of theretired assessors had a higher noise than signal (14, 18). However, looking atthe retired assessors separately, some of them have the same tasting abilities asthe active assessors (15, 17, 20).

ANOVA confirmed that the active assessors performed better thanthe retired assessors (Table 2), shown by higher F values and a larger amountof significant variables for the active assessors compared with the retiredassessors.

Salty

1

3

5

7

9

Level 1 Level 2 Level 3 Level 4 Level 5

Concentration

Sour

1

3

5

7

9

Level 1 Level 2 Level 3 Level 4 Level 5

Consentration

Bitter

1

3

5

7

9

Level 1 Level 2 Level 3 Level 4 Level 5

Concentration

Sweet

1

3

5

7

9

Level 1 Level 2 Level 3 Level 4 Level 5

Concentration

Tas

te in

tens

ity

Tas

te in

tens

ity

Tas

te in

tens

ity

Tas

te in

tens

ity

FIG. 3. SENSORY PERFORMANCE IN JUDGING TOMATO SOUP, BY ACTIVE (WHITEBARS) AND RETIRED (BLACK BARS) ASSESSORS COMPARING INTENSITY SCORES FOR

THE SWEET, SALTY, SOUR AND BITTER TASTANTS, REPLICATE 1 AND 2

863RETENTION AND LEARNING ISSUES

Page 13: Reliability of Sensory Assessors

Differences in the sensory performances were dependent on the tastequalities. Comparing active and retired assessors in Fig. 3 show that theassessors vary in their ability to repeat their judgments (SDs) and in theirability to score the intensities and rank the samples. One tendency was that theretired assessors perceived the lowest concentration levels as more intense, andthe highest concentration levels as less intense for the retired assessors com-pared with the active assessors. All assessors display large standard deviationsfor the sour taste, but only a few of the retired assessors have large SDs forsweet taste. The active assessors show a better ability to rank the salty samplesthan the retired assessors do. Here, the standard deviation is smaller for theactive assessors. However, the retired assessors show a better ability to rank thebitter samples.

Figure 5 also show that differences in each assessor’s sensory abilitiesdepend on the taste quality in question. Sweet seems to be the taste qualityeasiest to perceive and give intensity scores, while sour seems to be the hardestone. Large differences between active and retired assessors were evident. Onthe one hand, the active assessors achieved a much higher signal/noise scorefor salty than the retired assessors, while on the other hand, the retired asses-sors achieved higher scores for the bitter taste. Furthermore, large differences

Signal

No

ise

1086420

10

8

6

4

2

0

9

8

76

5

4

3 20

2

19

18

17

16 15

14 13

12 1110

1

All

FIG. 4. SENSORY PERFORMANCE IN JUDGING TOMATO SOUP, BY ACTIVE (WHITEDOTS, 1–12) AND RETIRED (BLACK DOTS, 13–20) ASSESSORS COMPARING SIGNAL

TO NOISE

864 J. BITNES ET AL.

Page 14: Reliability of Sensory Assessors

in the performances of the assessors can be observed. Assessors who per-formed well for one stimulus were not always successful for others. However,three retired assessors (13, 14 and 18) performed worse than the others for allthe taste qualities, and three of them performed as the active assessors (15, 17and 20).

The variance analyses, ANOVA (Table 2), showed that especially bittertaste was difficult to evaluate for the active assessors. None of them coulddiscriminate between the samples significantly. Sour showed many assessorswith no significant values.

Identification Test Versus Quantification Test

Some relationships could be seen comparing Fig. 1, showing the ability toidentify the basic tastes in water solutions, with Table 2 and Figs. 3–5,showing the sensory performance judging tomato soup. Two retired assessors(13, 14) who were among the assessors with the lowest signal/noise ratio, i.e.,that had the lowest judging ability for foods, also had a decline (although, notsignificant) in the identification ability for the basic tastes after the retentiontime. Only sweet taste of tomato soup was significant by these two assessors.

Signal

No

ise

1086420

10

8

6

4

2

0

1086420

10

8

6

4

2

0

Sweet Salty

Sour Bitter987

6

543202

19

18

17 16153141

121110 1

All

98

7

65

43

20

219

18

17 1615

1413

121110

1

All

9

87

6

5

4

3

20

2

19

18

17

16

1514

13

12

11

101All

9

87

65

4

3

20

2

19

18

17

1615

14

1312

11

10

1All

FIG. 5. SENSORY PERFORMANCE IN JUDGING TOMATO SOUP, BY ACTIVE (WHITEDOTS, 1–12) AND RETIRED (BLACK DOTS, 13–20) ASSESSORS COMPARING SIGNAL TO

NOISE FOR THE SWEET, SALTY, SOUR AND BITTER TASTANTS

865RETENTION AND LEARNING ISSUES

Page 15: Reliability of Sensory Assessors

One other retired assessor (18) that was less able in judging the food productshowed a high performance in the identification task, and others that per-formed excellent in judging the food product performed less well identifyingthe basic tastes (15, 17). Other assessors (16, 20) performed well at both tests.However, Fig. 2 shows that one of the assessors (19) that had very low abilityin judging the food product also had a very low performance in the identifi-cation task in general.

DISCUSSION

The objective of this study was to investigate whether the sensory per-formance may be retained after years of retirement from sensory work bylooking at performances, previous and present, in an identification test, and tocompare it with the present performance in judging complex foods.

In the present study, the assessors’ performances were within themaximum and minimum values of their curve, even though some of theperformances showed a decrease after several years of retirement from sensorywork. However, only one assessor, showed a significant decrease in the per-formance after years of retirement. Maybe coming back to work after so manyyears is like “a bad day at work” (some initial difficulties are expected in anyjob).

Another study that refers to the same identification test as in the presentpaper is the study by Bitnes et al. (2007a). In this study, the performance of thesensory assessors was reliable despite their ageing. Ageing did not explain anysignificant decrease in number of correct identifications. The study showedthat experience and exposure could counterbalance the performance impair-ment generally observed with ageing.

In the food industry, sensory panels vary according to the needs andavailable resources of the companies, ranging from full-time employed pan-elists to company workers that are called to participate at infrequent intervals.A panel leader might expect training to involve 10 h for a relatively simpleproduct (Stone and Sidel 2004), to 120 h or more for a more complex product(Meilgaard et al. 1999). This is almost never achievable in food companies.However, Chambers et al. (2004) suggested that only limited training might benecessary to find differences among products.

In the present study, the retired assessors showed a lower performancethan the active assessors did, but some of the assessors performed as well asthe active assessors. Our expectations were that the performance would belower for all the assessors, because quantification of taste intensity requiresextensive training prior to the test, especially since the retired assessors hadbeen away from sensory work for several years, six of them for over 10 years.

866 J. BITNES ET AL.

Page 16: Reliability of Sensory Assessors

In Murray et al. (2001), product specific training improves the performance ofdescriptive panels to a higher degree than training with external standards. Theactive assessors in the present study did not have any product-specific trainingprior to the evaluation of tomato soup. The idea was that they should gener-alize their prior knowledge of the attributes and their definitions to this “new”product. The retired assessors should also generalize their prior knowledgefrom many years ago before they retired. In this setting, the assessors are at asomewhat equal level, neither of them have product specific training prior tothe evaluation of tomato soup. Schifferstein (1996) claimed that the learningeffect in discrimination and identification testing probably evolved from thefact that subjects gradually learn to pay attention to a specific part of astimulus. In the present case where we have assessors who are trained to focusattention on a stimulus percept, they might also unconsciously focus attentionthe same way even though they have retired from sensory work.

Another surprising result in the present study was that the retired asses-sors showed higher performances for bitter compared with the active assessors.Other studies show that incidental learning and memory are better for bitter-ness and sourness than, e.g., sweetness (Köster et al. 2004a,b; Köster 2006).Explanations proposed by the authors suggest that taste memory is tuned todetect novel and potentially dangerous stimuli rather than to remember fea-tures of earlier experienced stimuli with great precision. One might hypoth-esize that the retired assessors of the present study are less concerned with thesensory task, leading them to depend more on their intuition than the activeassessors. In a preliminary study by Köster et al. (2004b), elderly performed aswell as the young, and in some cases better, when investigating incidentallearning with well-known foods where the purpose of the test was hidden.Intentional learning, where the subjects were confronted directly with the task,was believed to be more prominent in early stages of life, e.g., school anduniversity periods, and could therefore explain the poor performance by theelderly. This finding can be related to the present finding, showing the reli-ability of sensory assessors despite ageing and retention period, and the factthat they are relatively unfamiliar with the testing situation. In the presentstudy, the assessors are not asked to remember sensory attributes, either in anincidental or in an intentional task, but their implicit memory may play a partin their judgments of them.

Studies show that specific product knowledge is more favorable thanknowledge about the attributes or the method (Murray et al. 2001; Bitnes et al.2007b). These findings contribute to the explanation that sensory learningevolves from verbal learning and focus of attention rather than perceptuallearning. One explanation proposed by Hughson and Boakes (2002) is the useof the vocabulary and information about the product varieties that expertsemploy in such tasks. One might therefore question whether the high perfor-

867RETENTION AND LEARNING ISSUES

Page 17: Reliability of Sensory Assessors

mances of some of the retired assessors in the present study were a result ofdevelopment of perceptual abilities, or because of an ongoing focus on sensoryfeatures owing to the fact that they once were assessors.

The present study could not find a systematic relationship between sensoryperformance in identifying single stimulus solutions and judging foods. Thisfinding is in accordance with current literature (Rousset-Akrim et al. 1995;Meilgaard et al. 1999; Stone and Sidel 2004; Bitnes et al. in press) where thereare no meaningful relationship between sensitivity of aqueous solutions ofsweet, salty, sour and bitter and performance in evaluation of complex foods.However, Lesschaeve and Issanchou (1996) observed some significant relation-ships between screening selection procedure and odor profiling performance.The olfactory memory, estimated through a Recognition Memory for Odor test,had a strong influence on the discrimination power of the panelists.

The main issue raised in the present study was the reliability of sensoryassessors. The present study confirms previous findings showing that specifi-cally selected and trained assessors are relatively reliable. The assessors in thepresent study retained much of the expertise they had prior to the retirement.With statistical tools to remove variation caused by level differences, e.g., inthe use of the sensory scales, one might get relevant results that show theassessors’ ability to discriminate between the products and not their ability touse the scale properly.

However, as in all sensory work, the performance is dependent on theindividual assessor, the product, the attribute and the method in question. Theprofiling of tomato soup in the present study might not at all be typical forthe assessor’s sensory performance in judging foods. Further research isneeded to confirm the findings in this study.

REFERENCES

BITNES, J., MARTENS, H., UELAND, Ø. and MARTENS, M. 2007a. Lon-gitudinal study of taste identification of sensory panellists: Effect ofageing, experience and exposure. Food Qual. Prefer. 18, 230–241.

BITNES, J., RØDBOTTEN, M., LEA, P., UELAND, Ø. and MARTENS, M.2007b. Effect of product knowledge on profiling performance comparingvarious sensory laboratories. J. Sensory Studies 22, 66–80.

BITNES, J., UELAND, Ø., MØLLER, P. and MARTENS, M. Reliability ofsensory assessors: Issues of complexity. J. Sensory Studies. In press.

BUCHNER, A. and WIPPIC, W. 1998. Differences and commonaltiesbetween implicit learning and implicit memory. In Handbook of ImplicitLearning (M.A. Stadler and P.A. Frensch, eds.) pp. 3–46, Sage Publica-tions, Thousand Oaks, CA.

868 J. BITNES ET AL.

Page 18: Reliability of Sensory Assessors

CHAMBERS, D.H., ALLISON, A.A. and CHAMBERS, E. 2004. Trainingeffects on performance of descriptive panelists. J. Sensory Studies 19,486–499.

DEGEL, J., PIPER, D. and KÖSTER, E.P. 2001. Implicit learning and implicitmemory for odors: The influence of odor identification and retentiontime. Chem. Senses 26, 267–280.

FRIJTERS, J.E.R. 1993. Functional measurement in the study of mixturepercepts. Chem. Senses 18, 93–100.

HUGHSON, A.L. and BOAKES, R.A. 2002. The knowing nose: The role ofknowledge in wine expertise. Food Qual. Prefer. 13, 463–472.

ISO. 1985. Sensory analysis: Methodology. Flavour profile methods. In ISO-standard 6564, The International Organization for Standardization,Geneva, Switzerland.

ISO. 1988. Sensory analysis: General guidance for the design of test rooms. InISO-standard 8589, The International Organization for Standardization,Geneva, Switzerland.

ISO. 1991. Sensory analysis: Methodology. Method for investigating sensitiv-ity of taste. In ISO-standard 3972, The International Organization forStandardization, Geneva, Switzerland.

ISO. 1993. Sensory analysis: Methodology. General guidance for the selec-tion, training and monitoring of assessors, Part 1: Selected assessors.In ISO-standard 8586-1, The International Organization for Standardiza-tion, Geneva, Switzerland.

KÉRI, S. 2003. The cognitive neuroscience of category learning. Brain Res.Rev. 43, 85–109.

KÖSTER, E.P. 2006. Memory for food and food expectations: A special case?Food Qual. Prefer. 17, 3–5.

KÖSTER, M.A., PRESCOTT, J. and KÖSTER, E.P. 2004a. Incidental learn-ing and memory for three basic tastes in food. Chem. Senses 29, 441–453.

KÖSTER, E.P., MØLLER, P. and MARTENS, M. 2004b. The role of memoryin the food experiences of young and elderly persons. In Healthy Ageingpp. 64–72, EU Fifth Framework Programme QLK1-CT-1999-00010,Final report, WP4.

KREUTZMANN, S., THYBO, A.K. and BREDIE, W.L.P. 2006. Training of asensory panel and profiling of winter hardy and coloured carrot geno-types. Food Qual. Prefer. 18, 482–489.

LESSCHAEVE, I. and ISSANCHOU, S. 1996. Could selection tests detect thefuture performance of descriptive panellists? Food Qual. Prefer. 7, 177–183.

MARTENS, H., BREDIE, W., BYRNE, D., BØKNÆS, N., HYLDIG, G.,JACOBSEN, C., JENSEN, H.S., JØRGENSEN, B., THYBO, A.K.,WARM, K. ET AL. 1998. Analysis of sensory assessors by bilinear

869RETENTION AND LEARNING ISSUES

Page 19: Reliability of Sensory Assessors

modelling. Proceedings of the Fourth Sensometrics Meeting, p. 136,August 6–8, Copenhagen, Denmark.

MEILGAARD, M., CIVILLE, G.V. and CARR, B.T. 1999. Selectionand training of panel members. In Sensory Evaluation Techniques (M.Meilgaard, G.C. Civille and B.T. Carr, eds.) 142 pp., CRC Press, FL.

MØLLER, P., WULFF, C. and KÖSTER, E.P. 2004. Do age differencesin odour memory depend on differences in verbal memory? Neuroreport15, 915–917.

MURRAY, J.M., DELAHUNTY, C.M. and BAXTER, I.A. 2001. Descriptivesensory analysis: Past, present and future. Food Res. Intern. 34(6), 461–471.

ROUSSET-AKRIM, S., MARTIN, J., PILANDON, C. and TOURAILLE, C.1995. Research of selective tests for discerning “an efficient assessor” intexture profiling. J. Sensory Studies 10, 217–237.

SCHIFFERSTEIN, H.N.J. 1996. Cognitive factors affecting taste intensityjudgments. Food Qual. Prefer. 7, 167–175.

STONE, H. and SIDEL, J.L. 2004. The organization and operation of asensory evaluation program. In Sensory Evaluation Practices (H. Stoneand J.L Sidel, eds.) pp. 21–67, Food Science and Technology: Inter-national series, Elsevier Academic Press, London, UK.

THYBO, A.K. and MARTENS, H. 2000. Analysis of sensory assessors inprofiling of potatoes by multivariate modelling. Food Qual. Prefer. 11,283–288.

TOMIC, O., NILSEN, A.N., MARTENS, M. and NÆS, T. 2005. Visualizationof sensory profiling data for performance monitoring. Lebenson. Wiss.Technol. 40, 262–269.

870 J. BITNES ET AL.