19
Bayesian Unidimensional IRT Models: Graded Response Model Overview This SAS Web Example demonstrates how to fit graded response models by using the MCMC procedure. The graded response model is used to model ordered polytomous data. The “Analysis” section presents a brief mathematical description of the model. The “Example” section analyzes an instrument by using the MCMC procedure. Initially, the PROC MCMC model specification is written with prior knowledge of both the number of items and the number of categories per item. This prior knowledge is hard-coded into the PROC MCMC model specification. In other words, the PROC MCMC model specification is written such that the program can be used only for instruments with a specific number of items and for items with a specific number of categories. The purpose of the initial example is to illustrate the basic anatomy of a graded response model as specified in PROC MCMC. The example is then extended to demonstrate how you can use the SAS macro language to generalize the PROC MCMC model specification so that you can reuse your SAS program for instruments that contain any number of items and any number of categories per item. As a result, what begins as a lengthy model specification is reduced to just a few lines of SAS code. The SAS source code for this example is available as an attachment in a text file. In Adobe Acrobat, right-click the icon and select Save Embedded File to Disk. You can also double-click the icon to open the file immediately. Analysis In unidimensional item response theory (IRT) models, an instrument (test) consists of a number of items (questions) that require responses that are to be chosen from a predetermined number of categories (options). The purpose of the instrument is to measure a single latent trait of the test subjects. The latent trait is assumed to be measurable and to have a range that encompasses the real line. An individual’s location within this range, , is assumed to be a continuous random variable. When there are only two response categories, you can use binary response models to analyze the data. See the web example “Bayesian IRT Models: Unidimensional Binary Models” for a discussion of these models and how to implement them by using PROC MCMC. When there are more than two categories and the categories are ordered, meaning that some responses indicate more (or less) of the latent trait being measured, you can use an extension of the binary models known as a graded response model to analyze the data. 1 The purpose of the graded response model is to enable you to estimate the probability that a test subject will choose a particular response for each item, to 1 There are other models for polytomous data besides the graded response model, such as the partial credit model and the generalized partial credit model.

Bayesian Unidimensional IRT Models: Graded Response Model · 2016-06-24 · Bayesian Unidimensional IRT Models: Graded Response Model Overview This SAS Web Example demonstrates how

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

  • Bayesian Unidimensional IRT Models: GradedResponse Model

    OverviewThis SAS Web Example demonstrates how to fit graded response models by using the MCMC procedure.The graded response model is used to model ordered polytomous data. The “Analysis” section presentsa brief mathematical description of the model. The “Example” section analyzes an instrument by usingthe MCMC procedure. Initially, the PROC MCMC model specification is written with prior knowledge ofboth the number of items and the number of categories per item. This prior knowledge is hard-coded intothe PROC MCMC model specification. In other words, the PROC MCMC model specification is writtensuch that the program can be used only for instruments with a specific number of items and for items witha specific number of categories. The purpose of the initial example is to illustrate the basic anatomy of agraded response model as specified in PROC MCMC. The example is then extended to demonstrate how youcan use the SAS macro language to generalize the PROC MCMC model specification so that you can reuseyour SAS program for instruments that contain any number of items and any number of categories per item.As a result, what begins as a lengthy model specification is reduced to just a few lines of SAS code.

    The SAS source code for this example is available as an attachment in a text file. In Adobe Acrobat,right-click the icon and select Save Embedded File to Disk. You can also double-click the icon to open thefile immediately.

    AnalysisIn unidimensional item response theory (IRT) models, an instrument (test) consists of a number of items(questions) that require responses that are to be chosen from a predetermined number of categories (options).The purpose of the instrument is to measure a single latent trait of the test subjects. The latent trait is assumedto be measurable and to have a range that encompasses the real line. An individual’s location within thisrange, � , is assumed to be a continuous random variable. When there are only two response categories,you can use binary response models to analyze the data. See the web example “Bayesian IRT Models:Unidimensional Binary Models” for a discussion of these models and how to implement them by usingPROC MCMC. When there are more than two categories and the categories are ordered, meaning that someresponses indicate more (or less) of the latent trait being measured, you can use an extension of the binarymodels known as a graded response model to analyze the data.1 The purpose of the graded response model isto enable you to estimate the probability that a test subject will choose a particular response for each item, to

    1There are other models for polytomous data besides the graded response model, such as the partial credit model and thegeneralized partial credit model.

    data graded; input person item1 item2 item3 @@; datalines;1 3 2 2 2 1 1 2 3 2 2 3 4 2 2 4 5 2 2 2 6 3 2 3 7 3 2 2 8 3 2 29 3 2 2 10 1 2 2 11 2 3 3 12 2 2 2 13 2 1 2 14 2 2 2 15 2 2 3 16 2 3 317 3 2 3 18 2 1 2 19 1 2 2 20 2 1 3 21 2 3 3 22 1 1 2 23 2 1 2 24 3 2 325 2 2 3 26 2 2 3 27 3 2 2 28 3 2 4 29 2 3 2 30 2 1 2 31 3 2 3 32 3 2 333 2 2 2 34 1 1 2 35 2 2 3 36 2 3 3 37 2 2 4 38 1 1 1 39 3 3 3 40 2 2 441 3 1 2 42 3 4 3 43 3 3 3 44 1 2 2 45 3 3 3 46 3 2 3 47 1 1 2 48 3 2 349 2 3 3 50 2 2 2 51 2 3 3 52 2 2 2 53 2 1 2 54 2 4 3 55 3 2 2 56 2 2 357 2 3 3 58 3 2 2 59 2 2 2 60 2 2 3 61 2 2 3 62 1 2 2 63 2 1 2 64 1 1 265 2 2 3 66 1 2 2 67 2 3 3 68 1 3 2 69 3 4 4 70 2 3 2 71 1 1 1 72 2 2 273 3 3 3 74 3 1 2 75 2 2 2 76 2 1 2 77 3 3 5 78 3 2 3 79 2 3 4 80 2 3 381 3 3 2 82 3 2 2 83 3 2 3 84 1 2 3 85 1 2 1 86 2 1 2 87 2 2 2 88 3 2 389 3 4 3 90 2 3 3 91 3 3 2 92 3 2 3 93 3 2 3 94 2 3 3 95 3 3 2 96 3 1 397 2 1 2 98 1 1 2 99 2 3 3 100 2 2 4 101 3 2 3 102 2 2 2 103 3 2 3 104 1 2 2105 1 3 3 106 3 2 3 107 2 2 3 108 3 2 2 109 2 2 2 110 2 1 2 111 2 2 2 112 3 2 3113 2 1 2 114 2 1 2 115 2 2 2 116 1 2 2 117 3 2 3 118 2 2 3 119 2 2 2 120 3 2 3121 1 3 3 122 3 3 4 123 1 2 2 124 1 2 1 125 2 2 4 126 2 2 3 127 2 1 3 128 1 2 3129 1 2 2 130 2 3 3 131 2 2 3 132 2 2 3 133 3 2 2 134 3 3 3 135 2 2 2 136 3 2 2137 2 2 3 138 2 1 2 139 2 1 2 140 3 2 3 141 2 1 2 142 2 2 3 143 2 2 2 144 2 2 2145 2 2 2 146 1 2 3 147 3 2 3 148 2 2 2 149 2 2 2 150 3 2 3 151 2 1 3 152 2 2 2153 2 2 2 154 3 3 3 155 2 2 2 156 3 3 3 157 2 2 2 158 2 2 2 159 3 3 4 160 2 2 2161 2 2 3 162 3 2 3 163 3 3 3 164 2 2 3 165 2 2 2 166 2 4 3 167 3 2 3 168 1 2 2169 2 3 3 170 3 2 2 171 2 2 3 172 2 2 2 173 2 1 2 174 3 2 3 175 2 3 3 176 3 2 3177 2 2 3 178 3 2 3 179 1 2 2 180 2 2 2 181 3 3 3 182 2 2 2 183 2 1 2 184 3 2 3185 3 2 3 186 2 2 3 187 3 2 3 188 2 1 2 189 2 3 2 190 2 2 2 191 1 1 1 192 3 2 3193 2 2 3 194 2 2 2 195 2 2 2 196 2 4 3 197 2 2 2 198 2 2 3 199 2 1 2 200 2 3 2201 2 2 3 202 3 1 2 203 2 2 2 204 3 3 3 205 3 2 2 206 2 2 3 207 2 2 3 208 3 3 3209 2 2 3 210 2 2 3 211 2 2 2 212 2 2 3 213 2 2 3 214 1 1 2 215 1 1 2 216 2 2 3217 2 2 2 218 2 2 2 219 3 2 3 220 2 1 2 221 3 3 4 222 3 2 3 223 3 2 3 224 2 3 3225 2 2 2 226 1 1 2 227 3 1 3 228 1 2 2 229 2 2 2 230 1 2 2 231 3 3 4 232 3 2 2233 1 2 2 234 2 2 2 235 2 2 3 236 2 2 2 237 1 2 2 238 2 2 2 239 3 2 3 240 2 2 2241 2 2 2 242 3 3 4 243 2 2 2 244 2 2 3 245 3 1 2 246 3 3 3 247 2 2 2 248 2 2 3249 3 1 2 250 1 2 2 251 3 3 3 252 1 1 2 253 2 2 2 254 2 2 3 255 3 3 3 256 2 2 3257 3 2 3 258 2 3 3 259 3 2 3 260 3 3 3 261 3 2 3 262 2 3 2 263 2 1 2 264 1 2 2265 2 2 2 266 3 3 3 267 3 3 3 268 2 2 3 269 3 2 3 270 2 3 2 271 1 2 1 272 2 2 2273 3 2 3 274 2 2 3 275 3 2 2 276 2 2 3 277 2 2 3 278 2 2 2 279 3 2 2 280 2 3 3281 2 3 4 282 1 1 2 283 2 1 2 284 2 2 3 285 2 2 3 286 3 2 2 287 2 2 3 288 2 2 3289 2 2 4 290 1 3 4 291 2 1 2 292 2 1 3 293 2 2 2 294 2 3 3 295 2 2 2 296 3 2 2297 2 1 2 298 1 2 3 299 3 3 4 300 3 3 3 301 3 2 3 302 3 2 2 303 1 2 2 304 2 2 2305 2 2 3 306 2 3 3 307 1 2 3 308 3 3 3 309 3 2 3 310 2 2 2 311 3 2 2 312 3 4 3313 1 3 3 314 2 3 3 315 2 2 3 316 3 3 3 317 3 1 2 318 2 2 2 319 2 3 3 320 2 2 3321 2 3 2 322 2 2 3 323 3 3 3 324 1 3 3 325 2 1 1 326 2 2 2 327 3 1 2 328 2 3 3329 3 2 3 330 1 3 3 331 3 2 3 332 3 2 4 333 3 3 4 334 1 2 3 335 1 1 2 336 2 1 2337 3 3 4 338 1 2 2 339 2 3 3 340 3 3 3 341 2 2 2 342 3 2 3 343 1 1 2 344 2 2 3345 2 1 2 346 2 2 3 347 3 3 3 348 2 2 3 349 1 2 2 350 2 2 3 351 3 2 2 352 2 1 2353 1 2 3 354 2 2 3 355 2 1 2 356 1 1 1 357 2 2 3 358 2 2 2 359 2 2 3 360 2 1 2361 3 2 2 362 2 2 3 363 3 4 5 364 2 3 3 365 2 3 2 366 1 3 2 367 3 3 3 368 2 1 2369 2 2 3 370 3 2 2 371 1 2 2 372 2 3 3 373 1 2 2 374 1 2 2 375 2 2 2 376 2 1 1377 1 3 3 378 3 3 3 379 2 2 2 380 2 1 2 381 1 2 2 382 2 1 2 383 3 2 3 384 2 2 2385 2 2 2 386 3 1 3 387 2 2 2 388 2 3 3 389 3 2 3 390 2 2 3 391 3 2 3 392 2 1 2393 1 2 2 394 2 2 2 395 3 3 4 396 3 3 2 397 2 1 1 398 2 2 3 399 2 3 3 400 3 2 3401 2 2 2 402 2 2 2 403 2 1 2 404 2 2 3 405 1 2 2 406 3 1 3 407 1 1 2 408 3 2 3409 1 2 2 410 2 2 2 411 3 2 3 412 2 1 2 413 3 2 3 414 2 2 3 415 3 2 3 416 1 2 2417 2 2 3 418 3 3 3 419 2 1 2 420 1 2 2 421 2 2 3 422 1 1 2 423 3 2 3 424 2 3 3425 2 3 3 426 2 2 3 427 2 2 2 428 2 2 2 429 2 3 2 430 2 3 3 431 1 1 2 432 2 3 2433 1 1 2 434 2 1 2 435 2 2 2 436 2 3 3 437 3 2 3 438 1 3 3 439 1 2 2 440 2 3 3441 2 3 3 442 3 3 3 443 2 1 2 444 2 1 1 445 2 2 2 446 2 2 2 447 3 3 2 448 3 2 3449 2 1 2 450 2 2 1 451 2 3 4 452 2 2 2 453 2 3 3 454 3 2 3 455 3 3 3 456 3 2 3457 1 1 1 458 2 2 3 459 2 1 2 460 1 1 2 461 2 1 2 462 2 3 3 463 2 2 2 464 3 2 2465 3 1 2 466 2 1 2 467 2 1 2 468 3 3 3 469 3 2 3 470 2 2 3 471 1 1 2 472 3 2 2473 1 1 2 474 3 3 3 475 3 2 3 476 3 3 3 477 2 2 2 478 2 2 2 479 3 2 3 480 1 1 1481 3 2 3 482 2 2 3 483 2 3 3 484 3 2 3 485 3 3 3 486 2 2 3 487 2 2 2 488 3 2 2489 2 3 2 490 1 2 2 491 2 2 2 492 3 2 2 493 2 2 2 494 2 2 2 495 2 2 2 496 3 2 3497 3 3 3 498 3 1 2 499 2 1 2 500 3 1 2 501 3 3 4 502 3 3 3 503 2 1 3 504 1 1 2505 2 3 4 506 2 2 3 507 3 1 2 508 2 1 2 509 2 1 1 510 2 2 3 511 3 3 3 512 3 3 3513 2 2 3 514 1 2 3 515 3 3 3 516 2 3 3 517 3 2 2 518 2 2 2 519 2 1 3 520 3 2 3521 3 2 3 522 3 2 2 523 2 1 3 524 3 3 3 525 2 1 2 526 3 3 4 527 3 2 3 528 2 2 2529 2 1 2 530 1 1 2 531 3 2 3 532 2 2 2 533 2 2 2 534 1 2 2 535 3 3 3 536 1 2 3537 3 3 4 538 2 1 2 539 2 2 2 540 2 3 5 541 2 2 2 542 2 2 4 543 3 2 3 544 3 1 1545 2 1 2 546 2 2 3 547 3 1 2 548 3 2 3 549 2 1 2 550 3 2 3 551 2 2 2 552 2 3 3553 3 3 4 554 1 2 2 555 2 2 2 556 2 2 2 557 3 2 2 558 1 2 2 559 2 2 2 560 3 3 3561 1 2 2 562 3 2 3 563 2 2 2 564 2 1 2 565 2 3 4 566 2 1 2 567 2 3 3 568 2 2 3569 3 2 3 570 1 2 2 571 2 2 3 572 2 2 3 573 2 2 2 574 2 3 3 575 2 2 2 576 2 2 2577 2 1 2 578 3 2 3 579 3 3 3 580 3 3 2 581 2 2 2 582 3 2 3 583 2 2 2 584 2 1 1585 3 3 3 586 3 3 3 587 2 2 3 588 2 1 2 589 3 1 3 590 2 2 3 591 3 2 3 592 3 3 3593 2 2 2 594 2 1 3 595 2 1 3 596 2 2 2 597 3 2 3 598 3 3 5 599 2 1 2 600 1 1 2601 2 1 2 602 3 3 3 603 1 1 2 604 2 4 4 605 3 2 3 606 2 3 3 607 1 2 3 608 2 2 2609 3 2 3 610 1 1 2 611 2 2 4 612 3 3 3 613 2 2 2 614 2 2 2 615 1 2 1 616 3 2 2617 2 2 3 618 3 2 3 619 2 1 2 620 3 1 2 621 2 2 2 622 1 2 3 623 3 3 4 624 1 1 2625 3 2 2 626 2 2 3 627 2 2 3 628 2 2 2 629 2 1 2 630 3 3 4 631 1 1 2 632 2 3 3633 2 2 3 634 1 2 3 635 2 3 3 636 3 1 2 637 3 2 2 638 1 2 2 639 2 2 2 640 2 1 2641 3 2 3 642 3 2 2 643 2 2 3 644 1 1 2 645 2 2 3 646 2 2 3 647 2 2 2 648 2 2 3649 2 3 3 650 2 2 3 651 2 2 2 652 2 2 2 653 1 1 2 654 3 1 2 655 3 3 3 656 2 1 3657 2 2 3 658 1 2 2 659 2 2 2 660 1 1 2 661 3 2 3 662 2 3 3 663 2 3 3 664 2 2 3665 2 2 3 666 3 3 3 667 1 3 2 668 3 2 3 669 2 2 3 670 2 3 2 671 3 2 3 672 1 2 2673 3 2 2 674 2 1 2 675 2 2 2 676 1 1 2 677 3 2 3 678 2 1 1 679 1 1 2 680 1 1 2681 2 2 3 682 2 2 4 683 3 2 2 684 1 1 2 685 2 1 2 686 3 2 3 687 3 2 3 688 3 3 2689 2 2 2 690 3 2 3 691 3 3 3 692 1 2 2 693 2 2 2 694 2 2 2 695 2 2 2 696 1 2 3697 2 2 3 698 3 2 2 699 3 2 2 700 2 2 3 701 3 1 2 702 2 2 2 703 2 2 3 704 3 2 2705 2 2 3 706 2 3 3 707 1 3 3 708 2 2 3 709 2 3 3 710 2 2 3 711 1 1 2 712 1 1 2713 1 1 2 714 1 1 1 715 1 2 2 716 2 2 2 717 2 3 3 718 1 2 3 719 2 2 3 720 2 2 3721 3 3 3 722 1 3 2 723 2 2 2 724 2 1 2 725 2 2 2 726 2 2 2 727 3 2 3 728 2 2 3729 3 2 4 730 1 1 2 731 2 1 2 732 1 2 3 733 2 1 2 734 3 3 3 735 3 2 3 736 2 2 3737 1 1 3 738 2 2 2 739 3 3 2 740 3 2 3 741 2 1 2 742 2 2 3 743 2 1 2 744 2 2 2745 1 2 2 746 1 1 2 747 3 2 2 748 2 2 3 749 3 2 3 750 2 3 3 751 2 3 3 752 3 2 3753 2 1 1 754 2 1 2 755 3 3 3 756 2 2 2 757 2 3 3 758 3 3 5 759 2 2 3 760 1 2 2761 3 2 3 762 1 1 2 763 1 2 2 764 2 3 3 765 2 2 2 766 2 2 2 767 2 2 2 768 1 2 3769 2 1 1 770 2 1 2 771 3 2 3 772 2 3 3 773 3 3 3 774 2 2 2 775 2 3 3 776 1 1 2777 3 2 3 778 2 2 2 779 3 2 3 780 2 3 5 781 2 2 2 782 2 2 3 783 3 3 4 784 2 2 2785 1 1 1 786 2 3 3 787 2 1 2 788 2 1 2 789 2 2 1 790 2 3 3 791 2 2 3 792 2 2 3793 2 2 2 794 3 3 4 795 3 2 2 796 3 2 3 797 2 3 3 798 3 2 2 799 2 2 3 800 1 3 3801 2 1 2 802 2 3 3 803 2 3 3 804 3 2 3 805 3 2 2 806 3 3 4 807 1 2 3 808 3 3 3809 2 1 2 810 1 2 3 811 3 3 3 812 3 2 3 813 2 3 2 814 1 2 2 815 3 3 3 816 3 2 2817 2 3 3 818 2 1 2 819 2 2 3 820 2 2 2 821 3 1 3 822 2 2 3 823 3 2 3 824 1 2 3825 1 2 3 826 3 2 3 827 2 2 2 828 2 2 3 829 2 1 2 830 2 2 2 831 2 2 2 832 2 2 2833 3 1 2 834 3 3 4 835 2 1 2 836 3 1 2 837 2 2 2 838 2 1 2 839 3 1 1 840 1 2 3841 2 2 3 842 2 2 3 843 2 1 2 844 2 2 2 845 2 2 3 846 2 2 2 847 2 1 2 848 2 2 2849 2 2 3 850 2 3 4 851 1 2 2 852 2 3 3 853 3 2 3 854 2 2 3 855 3 2 3 856 2 1 2857 3 3 3 858 1 1 2 859 2 2 2 860 3 4 3 861 2 2 2 862 3 1 3 863 3 3 3 864 3 2 3865 3 2 3 866 2 2 2 867 2 2 3 868 3 2 3 869 1 2 2 870 1 2 2 871 2 2 3 872 2 1 2873 3 3 3 874 3 1 1 875 2 3 3 876 1 3 2 877 2 2 3 878 2 2 3 879 3 2 3 880 3 2 2881 2 1 2 882 2 2 2 883 3 3 3 884 2 2 2 885 2 3 3 886 2 2 2 887 3 2 3 888 3 2 3889 2 1 2 890 3 4 4 891 2 4 3 892 3 1 2 893 2 3 3 894 3 2 2 895 2 2 2 896 3 3 4897 2 1 2 898 2 3 3 899 3 2 2 900 3 3 2 901 2 2 3 902 2 1 1 903 3 2 3 904 2 2 2905 3 3 3 906 3 2 3 907 2 3 3 908 2 2 2 909 3 3 3 910 2 1 2 911 2 2 2 912 2 2 2913 1 1 2 914 2 1 2 915 3 3 3 916 3 3 3 917 2 2 3 918 3 2 3 919 3 3 4 920 3 2 2921 2 1 2 922 2 3 2 923 3 2 4 924 2 1 3 925 3 2 2 926 1 2 2 927 1 2 2 928 2 1 2929 3 2 3 930 3 2 3 931 3 3 3 932 2 2 2 933 1 3 3 934 1 2 1 935 2 2 2 936 2 3 3937 3 3 3 938 2 2 2 939 1 1 2 940 2 3 4 941 2 4 4 942 2 2 2 943 2 3 3 944 2 3 2945 2 1 2 946 2 2 2 947 3 3 3 948 2 3 5 949 1 1 2 950 3 3 3 951 2 3 3 952 2 2 1953 1 1 2 954 3 2 2 955 1 2 2 956 2 3 2 957 1 1 3 958 2 3 3 959 1 2 2 960 2 2 3961 1 2 2 962 3 2 3 963 3 2 3 964 2 2 2 965 2 2 2 966 2 2 2 967 3 3 5 968 2 2 2969 1 1 2 970 2 1 2 971 2 3 2 972 2 2 3 973 3 2 2 974 3 3 3 975 2 1 2 976 2 2 1977 2 2 3 978 2 2 2 979 2 2 3 980 2 3 3 981 1 1 1 982 3 3 4 983 3 2 2 984 2 2 2985 2 2 2 986 3 3 3 987 2 2 2 988 3 3 4 989 2 2 2 990 3 2 3 991 3 3 3 992 3 3 4993 2 1 2 994 3 1 2 995 1 2 2 996 2 2 2 997 3 3 3 998 2 2 3 999 2 2 3 1000 3 2 4;

    /**********************************************************************************//* S A S W E B E X A M P L E M A C R O *//* *//* NAME: DIMENSIONS *//* TITLE: Bayesian Unidimensional IRT Models: Graded Response Model *//* DATE: July 11, 2105 */ /* PRODUCT: STAT *//* PROCS: MCMC */ /* SUPPORT: Allen McDowell */ /* *//* ARGUMENTS: *//* *//* DATA - name of data set with the instrument to be evaluated *//* *//* VARLIST - names of variables that contain the subjects' item responses *//* *//* DESCRIPTION: *//* The logic of the macro assumes that your data set is in wide form, meaning *//* that each row of the data set contains all the item responses for a single *//* subject. The macro computes and saves the number of items in the global *//* macro variable N. The names of the variables that contain the item *//* responses are saved in global macro variables named Item1, Item2, ..., ItemN}. *//* The macro saves the number of categories in each item in the global macro *//* variables Dim1, Dim2, ..., DimN. The computation for the number of categories *//* per item assumes that every category is represented in the response data set. *//* If your data do not satisfy this condition, you need to either modify the *//* DIMENSIONS macro to accommodate missing categories or manually create the *//* macro variables Dim1, Dim2, ..., DimN. *//* *//**********************************************************************************/

    %macro dimensions(data=, varlist=); options nonotes; ods select none; proc summary noprint completetypes data=&data; class &varlist; output out=temp; ways 1; run;

    proc means data=temp(drop=_:) n; output out=freq(keep=_STAT_ item: where=(_STAT_="N")); run;

    proc transpose data=freq out=temp; run;

    %global n; data _null_; %let dsid=%sysfunc(open(temp)); %let n=%sysfunc(attrn(&dsid,nobs)); %let rc=%sysfunc(close(&dsid)); run;

    %do i = 1 %to &n; %global dim&i; %global item&i; %end;

    data _null_; retain i 1; set temp; if _N_= i then do; call symput('item'||left(i),_NAME_); call symput('dim'||left(i),COL1); end; i+1; run; ods select all;%mend dimensions;

    /************************************************************************************//* S A S W E B E X A M P L E M A C R O *//* *//* NAME: PARMS *//* TITLE: Bayesian Unidimensional IRT Models: Graded Response Model *//* PRODUCT: STAT *//* PROCS: MCMC *//* DATE: July 11, 2105 *//* SUPPORT: Allen McDowell *//* *//* ARGUMENTS: *//* *//* SCALE - optional argument that increases or decreases the distance *//* between starting values *//* *//* DESCRIPTION: *//* *//* The %PARMS macro writes a separate PARMS statement for the discrimination *//* parameters and assigns a starting value of 1 to each parameter. Then it writes *//* a separate PARMS statement for each category boundary location parameter. The *//* starting values that are assigned are equally spaced and centered around 0. *//* The macro supports an optional scale argument that enables you to increase or *//* decrease the distance between starting values. The default value for the scale *//* parameter is 1; specifying a value greater than 1 increases the size of the *//* interval between starting values; specifying a value less than 1 decreases the *//* size of the interval between starting values. *//* *//* The %PARMS macro also writes a copy of the SAS statements that it generates to *//* the SAS log. This enables you to see exactly how the parameters are blocked and *//* what starting values are being specified. More importantly, if you want to *//* change the way the parameters are blocked, or if you need greater flexibility *//* in specifying starting values than the macro provides, you can copy the PARMS *//* statements from the SAS log, paste them into your PROC MCMC program, and make *//* changes to the PARMS statements directly rather than modifying the %PARMS macro. *//* *//************************************************************************************/

    %macro parms(scale=); options nonotes; %if &scale eq %then %do; %let scale=1; %end; %let alpha=; %do i = 1 %to &n; %let delta&i=; %end; %do i = 1 %to &n; %let alpha = &alpha alpha&i 1 ; %do j = 2 %to &&dim&i; %if %sysevalf(&&dim&i/2)=%sysevalf(%sysfunc(int(&&dim&i/2))) %then %do; %let delta&i=&&delta&i delta&i&j %sysevalf(((-(&&dim&i/2)+(&j-1))/2)*&scale); %end; %else %do; %let delta&i=&&delta&i delta&i&j %sysevalf((-(&&dim&i/2)+(&j-1))*&scale); %end; %end; %end; %do i = 1 %to &n; parms %str(&&delta&i); %put parms &&delta&i%str(;); %end; parms %str(&alpha); %put parms &alpha%str(;);%mend parms;

    /************************************************************************************//* S A S W E B E X A M P L E M A C R O *//* *//* NAME: DELTA *//* TITLE: Bayesian Unidimensional IRT Models: Graded Response Model *//* PRODUCT: STAT *//* PROCS: MCMC *//* DATE: July 11, 2105 *//* SUPPORT: Allen McDowell *//* *//* ARGUMENTS: *//* *//* MEAN - specifies the common means of the prior distributions of the *//* category boundary location parameters *//* VAR - specifies the common variances of the prior distributions of *//* the category boundary location parameters *//* *//* DESCRIPTION: *//* *//* The %DELTA macro automates the writing of the block of PRIOR statements for the *//* category boundary location parameters. The macro also writes the entire block of *//* statements that it generates to the SAS log. That way, if you want more *//* flexibility in generating the PRIOR statements than the macro provides, you can *//* copy the statements from the SAS log and use them as a starting point. *//* *//************************************************************************************/

    %macro delta(MEAN=, VAR=); %do i = 1 %to &n; %do j = 2 %to &&dim&i; %let k=%eval(&j-1); %if &j=2 %then %do; prior delta&i&j: ~normal(&mean, var=&var); %put prior delta&i&j: ~normal(&mean, var=&var)%str(;); %end; %else %do; prior delta&i&j: ~normal(&mean, var=&var, lower=delta&i&k); %put prior delta&i&j: ~normal(&mean, var=&var, lower=delta&i&k)%str(;); %end; %end; %end;%mend delta;

    /**************************************************************************************//* S A S W E B E X A M P L E M A C R O *//* *//* NAME: LOOPS *//* TITLE: Bayesian Unidimensional IRT Models: Graded Response Model *//* PRODUCT: STAT *//* PROCS: MCMC *//* DATE: July 11, 2105 *//* SUPPORT: Allen McDowell *//* *//* ARGUMENTS: none *//* *//* DESCRIPTION: *//* *//* the %LOOPS macro Automates the process of writing the programming statements *//* for the graded response model by using a nested loop; the outer loop is indexed *//* by the number of items, and the inner loop is indexed by the number of categories *//* per item. The information that the %LOOPS macro needs is supplied by the global */ /* macro variables that are created by the %DIMENSIONS macro, so no user input is *//* required. The %LOOPS macro also writes the programming statements that it *//* generates to the SAS log if you want to experiment with the programming statements */ /* you can copy the statements that the macro generates from the SAS log and use *//* them as a starting point. A separate MODEL statement is generated for each item. *//* The MODEL statements are generated in a separate loop within the %LOOPS macro so *//* that they are written out as a block in the SAS log. *//* *//**************************************************************************************/

    %macro loops; %do i=1 %to &n; array cp&i[&&dim&i]; %put array cp&i[%left(&&dim&i)]%str(;); cp&i[1]=1; %put cp&i[1]=1%str(;); %do k=2 %to &&dim&i; cp&i[&k]=logistic(alpha&i*(theta-delta&i&k)); %put cp&i[&k]=logistic(alpha&i*(theta-delta&i&k))%str(;); %end; array p&i[%eval(&&dim&i)]; %put array p&i[%eval(&&dim&i)]%str(;); p&i[1]=1-cp&i[2]; %put p&i[1]=1-cp&i[2]%str(;); %do k=2 %to %eval(&&dim&i-1); p&i[&k]=cp&i[&k]-cp&i[%eval(&k+1)]; %put p&i[&k]=cp&i[&k]-cp&i[%eval(&k+1)]%str(;); %end; p&i[%eval(&&dim&i)]=cp&i[%eval(&&dim&i)]; %put p&i[%eval(&&dim&i)]=cp&i[%eval(&&dim&i)]%str(;); %end; %do i=1 %to &n; model &&item&i ~ table(p&i) nooutpost; %put model %trim(&&item&i) ~ table(p&i) nooutpost%str(;); %end;%mend loops;

    %dimensions(data=graded, varlist=item1-item3)ods output PostSumInt=PostSumInt;proc mcmc data=graded nmc=80000 outpost=outpost seed=10000 nthreads=-1; random theta~normal(0, var=1) subject=person nooutpost; %parms(scale=1) prior alpha: ~normal(1, var=12); %delta(mean=0, var=12) %loopsrun;

    /************************************************************************************//* S A S W E B E X A M P L E M A C R O *//* *//* NAME: PLOTS *//* TITLE: Bayesian Unidimensional IRT Models: Graded Response Model *//* PRODUCT: STAT *//* PROCS: MCMC *//* DATE: July 11, 2105 *//* SUPPORT: Allen McDowell *//* *//* ARGUMENTS: *//* *//* DATA - specifies the name of the data set that contains the MCMC *//* procedure's posterior summaries and intervals table. *//* OUT - specifies the name of the output data set that the macro *//* generates. *//* *//* DESCRIPTION: *//* *//* The %PLOTS macro generates a data set that is suitable for producing the CBC, */ /* ORF, OIC, IIC, and TIC plots. The macro saves the cumulative probabilities in *//* variables that are named with the prefix CP. The marginal probabilities are *//* saved in variables that are named with the prefix P. The category information *//* functions are saved in variables that are named with the prefix CI. The item *//* information functions are saved in variables that are named with the prefix I. */ /* The test information function is saved in the variable Info. */ /* *//************************************************************************************/

    %macro plots(DATA=, OUT=); options nonotes; proc transpose data=&data(keep=Parameter Mean) out=parms(drop=_NAME_); ID Parameter; run; data &out; set parms; array alpha{&n} alpha1-alpha&n; array i{&n} i1-i&n; %do l=1 %to &n; %let q = %eval(&&dim&l-1); %let r= %eval(&&dim&l); array delta&l{&q} delta&l.2-delta&l&r; array cp&l{&r}; array p&l{&r}; array ci&l{&r}; %end; do theta=-10 to 10 by .25; %do l=1 %to &n; %let q = %eval(&&dim&l-1); %let r= %eval(&&dim&l); cp&l[1]=1; %do k=2 %to &r; cp&l[&k]=logistic(alpha&l*(theta-delta&l&k)); label cp&l&k= %sysfunc(trim(&&item&l))": category &k"; %end; %do k=1 %to &q; p&l[&k]=cp&l[&k]-cp&l[%eval(&k+1)]; label p&l&k= %sysfunc(trim(&&item&l))": category &k"; %end; p&l[&r]=cp&l[&r]; label p&l&r= %sysfunc(trim(&&item&l))": category &r"; i[&l]=0; %do k=1 %to &r; ci&l&k=alpha[&l]**2*p&l[&k]*(1-p&l[&k]); label ci&l&k= %sysfunc(trim(&&item&l))": category &k"; i[&l] + ci&l&k; label i&l= %sysfunc(trim(&&item&l)); %end; %end; info=0; %do l=1 %to &n; info = info + i[&l]; %end; output; end; run;%mend plots;

    %plots(data=PostSumInt, out=plots)

    /************************************************************************************//* S A S W E B E X A M P L E M A C R O *//* *//* NAME: CBC *//* TITLE: Bayesian Unidimensional IRT Models: Graded Response Model *//* PRODUCT: STAT *//* PROCS: MCMC *//* DATE: July 11, 2105 *//* SUPPORT: Allen McDowell *//* *//* ARGUMENTS: *//* *//* DATA - specifies the name of the output data set that you specify in the *//* OUT= argument when you invoke the %PLOTS macro. *//* *//* DESCRIPTION: *//* *//* The %CBC macro plots the category boundary curves from a graded response model. *//* The macro loops through the items and the categories for each item and uses *//* PROC SGPLOT to generate a CBC plot for each item. It uses the global macro *//* variables that are created by the %DIMENSIONS macro as the parameters for the *//* loops and to create the titles for the plots. *//* *//************************************************************************************/

    %macro cbc(DATA=); options nonotes; title "Category Boundary Curves"; %do i=1 %to &n; proc sgplot data=&data; title2 "&&item&i"; %do j=2 %to &&dim&i; series x=theta y=cp&i&j; %end; yaxis label="Probability"; xaxis label="Trait ((*ESC*){unicode theta})"; refline .5 / axis=y; run; %end; title;%mend cbc;

    %cbc(data=plots)

    /************************************************************************************//* S A S W E B E X A M P L E M A C R O *//* *//* NAME: ORF *//* TITLE: Bayesian Unidimensional IRT Models: Graded Response Model *//* PRODUCT: STAT *//* PROCS: MCMC *//* DATE: July 11, 2105 *//* SUPPORT: Allen McDowell *//* *//* ARGUMENTS: *//* *//* DATA - specifies the name of the output data set that you specify in the *//* OUT= argument when you invoke the %PLOTS macro. *//* *//* DESCRIPTION: *//* *//* The %ORF macro plots the option response functions from a graded response model. *//* The macro loops through the items and the categories for each item and uses *//* PROC SGPLOT to generate an ORF plot for each item. It uses the global macro *//* variables that are created by the %DIMENSIONS macro as the parameters for the *//* loops and to create the titles for the plots. *//* *//************************************************************************************/

    %macro orf(DATA=); options nonotes; title "Option Response Functions"; %do i=1 %to &n; proc sgplot data=&data; title2 "&&item&i"; %do j=1 %to &&dim&i; series x=theta y=p&i&j; %end; yaxis label="Probability"; xaxis label="Trait ((*ESC*){unicode theta})"; refline .5 / axis=y; run; %end; title;%mend orf;

    %orf(data=plots)

    /************************************************************************************//* S A S W E B E X A M P L E M A C R O *//* *//* NAME: IIC *//* TITLE: Bayesian Unidimensional IRT Models: Graded Response Model *//* PRODUCT: STAT *//* PROCS: MCMC *//* DATE: July 11, 2105 *//* SUPPORT: Allen McDowell *//* *//* ARGUMENTS: *//* *//* DATA - specifies the name of the output data set that you specify in the *//* OUT= argument when you invoke the %PLOTS macro. *//* *//* DESCRIPTION: *//* *//* The %IIC macro plots the option information curves and an item information curve *//* for each item. The macro loops through the items and the categories for each *//* item and uses PROC SGPLOT to generate an OIC plot for each option and an IIC *//* plot for each item. It uses the global macro variables that are created by the *//* %DIMENSIONS macro as the parameters for the loops and to create the titles for *//* the plots. *//* *//************************************************************************************/

    %macro iic(DATA=); options nonotes; title "Category & Item Information Curves"; %do i=1 %to &n; proc sgplot data=&data; title2 "&&item&i"; %do j=1 %to &&dim&i; series x=theta y=ci&i&j; %end; series x=theta y=i&i; yaxis label="Information"; xaxis label="Trait ((*ESC*){unicode theta})"; run; %end; title;%mend iic;

    %iic(data=plots)

    /************************************************************************************//* S A S W E B E X A M P L E M A C R O *//* *//* NAME: TIC *//* TITLE: Bayesian Unidimensional IRT Models: Graded Response Model *//* PRODUCT: STAT *//* PROCS: MCMC *//* DATE: July 11, 2105 *//* SUPPORT: Allen McDowell *//* *//* ARGUMENTS: *//* *//* DATA - specifies the name of the output data set that you specify in the *//* OUT= argument when you invoke the %PLOTS macro. *//* *//* DESCRIPTION: *//* *//* The %TIC macro plots the test information curve. *//* *//************************************************************************************/

    %macro tic(DATA=); options nonotes; proc sgplot data=&data; title "Test Information Curve"; series x=theta y=info; yaxis label="Information"; xaxis label="Trait ((*ESC*){unicode theta})"; run; title;%mend tic;

    %tic(data=plots)

    SAS source code for this example. Right-click to save file.

  • 2 F

    estimate the levels of the latent traits of the test subjects, and to evaluate how well the items, individually andcollectively, measure the test subject’s latent trait.

    The graded response model specifies the cumulative probability of scoring in, or selecting, each of Kcategories or higher as

    P �jk.�/ De˛j .��ıjk/

    1C e˛j .��ıjk/

    where � is the latent trait, ˛j is the discrimination parameter for item j, and ıjk is the category boundarylocation for the kth category of item j. By definition, the probability of responding in the lowest category orhigher is 1, and the probability of responding in category K+1 or higher is 0. A plot of the graded responsemodel’s cumulative probabilities as a function of � , often referred to as a category boundary curve (CBC),has the shape of an ogive.2 The point of inflection of a category boundary curve is located at ıjk , and theprobability of obtaining a category score k or higher is 0.50 at ıjk . The slopes of the boundary curves at ıjkare proportional to ˛j .

    If you are familiar with item response theory models for binary responses, you will undoubtedly recognizethat the equation for the graded response model’s cumulative probability is identical to the equation forthe marginal probability from a two-parameter logistic (2PL) model. In fact, you can think of the gradedresponse model as the successive application of the 2PL model to an ordered series of bifurcated responses(De Ayala 2009, chapter 7).

    To compute the marginal probability, pjk , of selecting the kth category of item j, you take the differencebetween the cumulative probabilities for adjacent categories:

    pjk D P�jk � P

    �j;kC1

    A plot of pjk as a function of � is known as an option response function (ORF).3

    After you fit a graded response model, you can use the parameter estimates to compute the amount ofinformation that is provided by each response category. The option information function (OIF) for eachgraded response option is the negative of the expected value of the second derivative of the log-likelihoodfunction and is computed as follows:

    Ijk.�/ D

    (�@2ln.pjk/

    @�2

    )pjk D ˛

    2j � pjk.1 � pjk/

    An item’s information function is the sum of the option information functions:

    Ij .�/ D

    KXkD0

    Ijk.�/

    2Category boundary curves are also referred to in the item response theory literature as cumulative probability curves, categorycharacteristic curves, or boundary characteristic curves (De Ayala 2009, chapter 7).

    3Option response functions are also referred to in the item response theory literature as category probability curves, categoryresponse functions, operating characteristic curves, or option characteristic curves (De Ayala 2009, chapter 7).

  • Bayesian Estimation F 3

    Similarly, the instrument’s total information function is the sum of the item information functions:

    I.�/ D

    JXjD1

    Ij .�/

    Bayesian EstimationBayesian estimation requires that you specify the likelihood function of the response variable and specifyprior distributions for the unknown model parameters. The likelihood for the graded response model is justthe probability distribution function of a categorical distribution. To specify the likelihood in PROC MCMC,you use a MODEL statement and the table distribution.

    The unknown parameters are � , ˛j , and ıjk . Unless you have specific prior information about these distribu-tions, it is common practice to specify a standard normal distribution for � and diffuse prior distributions forthe ˛j and ıjk parameters. In this example, � is treated as a random effect that is indexed by test subject,and it is assigned a standard normal prior distribution. The ˛j and ıjk parameters have theoretical rangesthat encompass the real line. It is common practice to assign diffuse normal, truncated normal, or lognormaldistributions to the ˛j parameters. For each of the J items, the ıjk parameters must satisfy the followingorder constraint: ıj2 < ıj3 < � � � < ıj;K�1 < ıj;K . There are several strategies that you can use to imposethese order constraints on the prior distributions. In the example that follows, the order constraints areimposed by specifying truncated normal distributions as the priors, with ıj;k being specified as the lowertruncation boundary for the prior distribution of ıj;kC1.

    ExampleThis example fits a graded response model to a hypothetical instrument that has three items. The first itemhas three categories, the second item has four categories, and the third item has five categories. The followingDATA step reads the data set Graded. The variables Item1, Item2, and Item3 record the responses to the threeitems on the instrument, and the variable Person indexes the test subjects.

    data graded;input person item1 item2 item3 @@;datalines;

    1 3 2 2 2 1 1 2 3 2 2 3 4 2 2 4 5 2 2 2 6 3 2 3 7 3 2 2 8 3 2 29 3 2 2 10 1 2 2 11 2 3 3 12 2 2 2 13 2 1 2 14 2 2 2 15 2 2 3 16 2 3 317 3 2 3 18 2 1 2 19 1 2 2 20 2 1 3 21 2 3 3 22 1 1 2 23 2 1 2 24 3 2 325 2 2 3 26 2 2 3 27 3 2 2 28 3 2 4 29 2 3 2 30 2 1 2 31 3 2 3 32 3 2 3

    ... more lines ...

    977 2 2 3 978 2 2 2 979 2 2 3 980 2 3 3 981 1 1 1 982 3 3 4 983 3 2 2 984 2 2 2985 2 2 2 986 3 3 3 987 2 2 2 988 3 3 4 989 2 2 2 990 3 2 3 991 3 3 3 992 3 3 4993 2 1 2 994 3 1 2 995 1 2 2 996 2 2 2 997 3 3 3 998 2 2 3 999 2 2 3 1000 3 2 4;

  • 4 F

    The following six elements are essential to a PROC MCMC specification for a graded response model:

    � PROC MCMC statement

    � RANDOM statement for �

    � PARMS statements for ˛j and ıjk

    � PRIOR statements for ˛j and ıjk

    � programming statements that compute the cumulative and marginal probabilities

    � MODEL statements for each item

    The model specification in PROC MCMC is highly dependent on the number of items contained in theinstrument and the number of categories per item. The following statements specify the graded responsemodel for the Graded data set:

    ods graphics on;ods output PostSumInt=PostSumInt;proc mcmc data=graded nmc=80000 outpost=outpost seed=10000 nthreads=-1;

    random theta~normal(0, var=1) subject=person nooutpost;parms alpha1 1 alpha2 1 alpha3 1;parms delta12 -1 delta13 1;parms delta22 -1 delta23 0 delta24 1;parms delta32 -1 delta33 -.5 delta34 .5 delta35 1;prior alpha: ~normal(1, var=12);prior delta12: ~normal(0, var=12);prior delta13: ~normal(0, var=12, lower=delta12);prior delta22: ~normal(0, var=12);prior delta23: ~normal(0, var=12, lower=delta22);prior delta24: ~normal(0, var=12, lower=delta23);prior delta32: ~normal(0, var=12);prior delta33: ~normal(0, var=12, lower=delta32);prior delta34: ~normal(0, var=12, lower=delta33);prior delta35: ~normal(0, var=12, lower=delta34);array cp1[3]; array cp2[4]; array cp3[5];array p1[3]; array p2[4]; array p3[5];cp1[1]=1;cp1[2]=logistic(alpha1*(theta-delta12));cp1[3]=logistic(alpha1*(theta-delta13));cp2[1]=1;cp2[2]=logistic(alpha2*(theta-delta22));cp2[3]=logistic(alpha2*(theta-delta23));cp2[4]=logistic(alpha2*(theta-delta24));cp3[1]=1;cp3[2]=logistic(alpha3*(theta-delta32));cp3[3]=logistic(alpha3*(theta-delta33));cp3[4]=logistic(alpha3*(theta-delta34));cp3[5]=logistic(alpha3*(theta-delta35));p1[1]=1-cp1[2];p1[2]=cp1[2]-cp1[3];

  • Example F 5

    p1[3]=cp1[3];p2[1]=1-cp2[2];p2[2]=cp2[2]-cp2[3];p2[3]=cp2[3]-cp2[4];p2[4]=cp2[4];p3[1]=1-cp3[2];p3[2]=cp3[2]-cp3[3];p3[3]=cp3[3]-cp3[4];p3[4]=cp3[4]-cp3[5];p3[5]=cp3[5];model item1 ~ table(p1);model item2 ~ table(p2);model item3 ~ table(p3);

    run;

    The ODS OUTPUT statement saves the posterior summaries and intervals table to the data set PostSumInt.The contents of PostSumInt are used later to generate CBC, ORF, item information curve (IIC), and testinformation curve (TIC) plots.

    The NMC= option in the PROC MCMC statement specifies 80,000 samples. In general, the Markov chainsfor the graded response model’s parameters tend to be highly autocorrelated. You might need to specifylarger samples than you would for many other types of models to obtain a reasonable effective sample size.The OUTPOST= option in the PROC MCMC statement saves the MCMC samples in a data set namedOutpost. The NTHREADS=–1 option sets the number of available threads to the number of hyperthreadedcores available on your system. The SEED= option sets the seed for the pseudorandom number generatorand ensures reproducibility.

    The RANDOM statement specifies the prior distribution for � as a standard normal distribution. TheSUBJECT= option specifies that the variable Person identifies the subjects. The NOOUTPOST optionsuppresses the output of the posterior samples of the � random-effects parameters to the OUTPOST= dataset; this reduces the execution time. However, if you want to perform analysis on the posterior samples of � ,you can omit this option.

    The four PARMS statements declare the parameters that are to be estimated, allocates them to four blocks,and assigns starting values. Experimentation indicates that the graded response model can be fairly sensitiveto the starting values that you assign. Specifically, the starting values for the ıjk parameters must satisfythe order constraints and should not be heavily skewed. Assigning values that are evenly and symmetricallyspaced about the mean of the prior distribution seems to work well.

    The ten PRIOR statements assign the prior distributions for the ˛j and ıj parameters. All the ˛j parametersare assigned a diffuse normal prior with a mean of 1. Some modelers use prior distributions that restrict the˛j to be nonnegative. The parameters ı12, ı22, and ı32 are assigned diffuse normal priors with means equalto 0. The remaining ıjk parameters are assigned diffuse, truncated normal distributions with means equal to0 and lower truncation boundaries equal to ıj;k�1.

    There are six ARRAY statements. The first three arrays (CP1, CP2, and CP3) will be populated with thecumulative probabilities of the categories for each of the three items; the last three arrays (P1, P2, and P3)will be populated with the marginal probabilities of the categories for each of the three items.

    The 25 programming statements that follow compute the cumulative and marginal probabilities.

    Finally, there are three MODEL statements, one for each item. Each MODEL statement specifies that theresponse variable has a categorical (table) distribution. The TABLE function in PROC MCMC requires that

  • 6 F

    you specify the name of an array as its only argument. The appropriate arrays are the marginal probabilityarrays P1, P2, and P3.

    When you run PROC MCMC, you should check the various diagnostic plots and statistics to verify thatthe Markov chains have converged. The results of a simulation study indicate that relatively slow mixingand high autocorrelation are common characteristics of the graded response model. A variety of parametertransformations were tried, but they yielded little or no improvement in either the mixing or the degree ofautocorrelation. Neither slow mixing nor autocorrelation produces bias in the parameter estimates, so youronly real concern is to ensure that the nominal sample size is large enough to produce an effective samplesize sufficient for statistical inference.

    Output 1 shows the posterior summaries and intervals table for the graded response model. The estimates ofthe discrimination parameters, ˛j , indicate that item 3 does a better job of discriminating between respondentsthan items 1 or 2, and item 2 does a better job than item 1. The estimates of the category boundary locations,ıjk , are the levels of the latent trait � at which the probability of obtaining a category score k or higher is0.50. For example, the estimate for ı12 is -2.11 and indicates that a person with a latent trait of that level hasa 50% chance of responding in category 2 or higher for item 1.

    Output 1 Posterior Summaries and Intervals

    The MCMC ProcedureThe MCMC Procedure

    Posterior Summaries and Intervals

    Parameter N MeanStandardDeviation

    95%HPD Interval

    alpha1 80000 0.9460 0.0923 0.7674 1.1299alpha2 80000 1.8640 0.2038 1.4939 2.2574alpha3 80000 4.5788 1.0907 2.7483 6.7309delta12 80000 -2.1145 0.1942 -2.4998 -1.7491delta13 80000 0.9612 0.1125 0.7431 1.1886delta22 80000 -1.1614 0.0853 -1.3291 -0.9959delta23 80000 0.9903 0.0767 0.8439 1.1439delta24 80000 3.1914 0.2541 2.7133 3.7121delta32 80000 -2.0286 0.1156 -2.2616 -1.8140delta33 80000 -0.0127 0.0421 -0.0973 0.0645delta34 80000 1.7071 0.0943 1.5198 1.8889delta35 80000 2.6141 0.1742 2.2797 2.9607

    Simplifying and Generalizing the Graded Response Model Specification byUsing the SAS Macro LanguageFor many types of models, after you write out an example, you can reuse the SAS statements with other datasets by just substituting a new data set name and perhaps a new variable list. However, in the case of thegraded response model, the model syntax is highly dependent on the number of items in the instrument andthe number of categories per item. For example, if you have an instrument with four items, you cannot usethe SAS statements that have been presented thus far and just substitute a new data set name and variable list.You would have to write additional PARMS, PRIOR, MODEL, and programming statements and perhapsmodify some of the existing statements. The exact number and form of these additional statements dependon the number of categories in each of the four items. Having to write a new SAS program for every model

  • Gathering Preliminary Information about the Data F 7

    can become tedious. However, you can automate much of the process of writing the syntax for a gradedresponse model and for producing the CBC, ORF, IIC, and TIC plots by using the SAS macro language. Theremainder of this example presents a few simple macros to get you started.

    Gathering Preliminary Information about the DataAs you begin writing macros to automate the process of writing PROC MCMC syntax, you will discover thatyou require access to certain characteristics of the instrument that you want to analyze. Specifically, youneed the following information:

    � the number of items

    � the names of the variables that contain the subjects’ responses to the items

    � the number of categories in each item

    The following SAS statements create a macro named %DIMENSIONS that gathers this information andsaves it in global macro variables. The macro has two required arguments. You use the DATA= argumentto specify the name of the data set that contains the instrument to be analyzed. You use the VARLIST=argument to provide a list of the names of the variables in the data set that contain the subjects’ item responses.The logic of the macro assumes that your data set is in wide form, meaning that each row of the data setcontains all the item responses for a single subject. The macro computes and saves the number of items inthe global macro variable N. The names of the variables that contain the item responses are saved in globalmacro variables named Item1, Item2, : : : , Item&N. Finally, the macro saves the number of categories in eachitem in the global macro variables Dim1, Dim2, : : : , Dim&N. The computation for the number of categoriesper item assumes that every category is represented in the response data set. If your data do not satisfy thiscondition, you need to either modify the %DIMENSIONS macro to accommodate missing categories ormanually create the macro variables Dim1, Dim2, : : : , Dim&N.

    %macro dimensions(data=, varlist=);options nonotes;ods select none;proc summary noprint completetypes data=&data;

    class &varlist;output out=temp;ways 1;

    run;

    proc means data=temp(drop=_:) n;output out=freq(keep=_STAT_ item: where=(_STAT_="N"));

    run;

    proc transpose data=freq out=temp;run;

    %global n;data _null_;

    %let dsid=%sysfunc(open(temp));

  • 8 F

    %let n=%sysfunc(attrn(&dsid,nobs));%let rc=%sysfunc(close(&dsid));

    run;

    %do i = 1 %to &n;%global dim&i;%global item&i;

    %end;

    data _null_;retain i 1;set temp;

    if _N_= i then do;call symput('item'||left(i),_NAME_);call symput('dim'||left(i),COL1);

    end;i+1;run;ods select all;

    %mend dimensions;

    In the preceding example, the data set is named Graded, and there are three item response variables, namedItem1, Item2, and Item3. To use the %DIMENSIONS macro, you submit the following statement:

    %dimensions(data=graded, varlist=item1-item3)

    All the macros that are described in the following sections use the global macros that are created by the%DIMENSIONS macro, so you must execute %DIMENSIONS before you can use any of the other macros.

    Automating the PARMS StatementsRecall that the PROC MCMC specification of the graded response model includes the following block ofPARMS statements:

    parms alpha1 1 alpha2 1 alpha3 1;parms delta12 -1 delta13 1;parms delta22 -1 delta23 0 delta24 1;parms delta32 -1 delta33 -.5 delta34 .5 delta35 1;

    The PARMS statements determine the blocking of the parameters for the sampling algorithm and enable youto optionally specify starting values for the parameters. You could write a single PARMS statement and putall the parameters in a single block, but experiments with graded response models indicate that this almostalways results in inferior mixing compared to placing the parameters in multiple blocks. The strategy that isused in the previous example and pursued in the following macro is to put all the ˛j parameters in a separateblock and to place the ıjk for each item in a separate block. Thus, if you have N items, you need N + 1PARMS statements. As mentioned previously, experimentation also shows that providing reasonable startingvalues for the parameters seems to be a necessity for the graded response model.

    The following statements create a SAS macro named %PARMS that uses the information that is collectedwhen you execute the %DIMENSIONS macro and automatically generates PARMS statements for a graded

  • Automating the PARMS Statements F 9

    response model:

    %macro parms(scale=);options nonotes;%if &scale eq %then %do;%let scale=1;%end;%let alpha=;%do i = 1 %to &n;

    %let delta&i=;%end;%do i = 1 %to &n;

    %let alpha = &alpha alpha&i 1 ;%do j = 2 %to &&dim&i;

    %if %sysevalf(&&dim&i/2)=%sysevalf(%sysfunc(int(&&dim&i/2))) %then %do;%let delta&i=&&delta&i delta&i&j %sysevalf(((-(&&dim&i/2)+(&j-1))/2)*&scale);

    %end;%else %do;%let delta&i=&&delta&i delta&i&j %sysevalf((-(&&dim&i/2)+(&j-1))*&scale);

    %end;%end;

    %end;%do i = 1 %to &n;

    parms %str(&&delta&i);%put parms &&delta&i%str(;);

    %end;parms %str(&alpha);%put parms &alpha%str(;);

    %mend parms;

    The %PARMS macro writes a separate PARMS statement for the ˛j parameters and assigns a starting valueof 1 to each parameter. Then it writes a separate PARMS statement for each item that specifies the ıjkparameters for each respective item. The starting values that are assigned are equally spaced and centeredaround 0. The macro supports an optional scale argument that enables you to increase or decrease the distancebetween starting values. The default value for the scale parameter is 1; specifying a value greater than 1increases the size of the interval between starting values (increases the spread); specifying a value less than 1decreases the size of the interval between starting values (decreases the spread).

    The %PARMS macro also writes a copy of the SAS statements that it generates to the SAS log. This enablesyou to see exactly how the parameters are blocked and what starting values are being specified. Moreimportantly, if you want to change the way the parameters are blocked, or if you need greater flexibility inspecifying starting values than the macro provides, you can copy the PARMS statements from the SAS log,paste them into your PROC MCMC program, and make changes to the PARMS statements directly ratherthan modifying the %PARMS macro.

    To use the %PARMS macro, you submit the following statement:

    %parms

  • 10 F

    Automating the PRIOR StatementsThe preceding example includes the following block of PRIOR statements for the ˛j and ıjk parameters:

    prior alpha: ~normal(1, var=12);prior delta12: ~normal(0, var=12);prior delta13: ~normal(0, var=12, lower=delta12);prior delta22: ~normal(0, var=12);prior delta23: ~normal(0, var=12, lower=delta22);prior delta24: ~normal(0, var=12, lower=delta23);prior delta32: ~normal(0, var=12);prior delta33: ~normal(0, var=12, lower=delta32);prior delta34: ~normal(0, var=12, lower=delta33);prior delta35: ~normal(0, var=12, lower=delta34);

    There is a single PRIOR statement for all the ˛j parameters, so no automation is needed. However, becauseof the order constraints that must be imposed on the ıjk parameters, you need a separate PRIOR statementfor each ıjk . The following statements create the macro %DELTA, which automates the writing of the blockof PRIOR statements for the ıjk parameters:

    %macro delta(MEAN=, VAR=);%do i = 1 %to &n;

    %do j = 2 %to &&dim&i;%let k=%eval(&j-1);%if &j=2 %then %do;

    prior delta&i&j: ~normal(&mean, var=&var);%put prior delta&i&j: ~normal(&mean, var=&var)%str(;);

    %end;%else %do;

    prior delta&i&j: ~normal(&mean, var=&var, lower=delta&i&k);%put prior delta&i&j: ~normal(&mean, var=&var, lower=delta&i&k)%str(;);

    %end;%end;

    %end;%mend delta;

    The %DELTA macro has two required arguments, MEAN= and VAR=, which specify the common meansand variances, respectively, of the prior distributions of the ıjk parameters. The lower truncation boundariesof the ıjk parameters are automatically generated by the macro. The macro also writes the entire block ofstatements that it generates to the SAS log. That way, if you want more flexibility in generating the PRIORstatements than the macro provides, you can copy the statements from the SAS log and use them as a startingpoint.

    To use the %DELTA macro, you submit the following statement (but supplying any values that you want forthe two arguments):

    %delta(mean=0, var=12)

  • Automating the Programming and MODEL Statements F 11

    Automating the Programming and MODEL StatementsThe computations in the programming statements are fairly straightforward, but the number of computationsrequired entirely depends on the number of items and the number of categories per item. Automating theprocess of writing the programming statements is again just a matter of writing a nested loop; the outer loopis indexed by the number of items, and the inner loop is indexed by the number of categories per item. Theinformation that the %LOOPS macro needs is supplied by the global macro variables that are created by the%DIMENSIONS macro, so no user input is required. The %LOOPS macro also writes the programmingstatements that it generates to the SAS log, so again, if you want to experiment with the programmingstatements, you can copy the statements that the %LOOPS macro generates from the SAS log and use themas a starting point. A separate MODEL statement is generated for each item. The MODEL statements aregenerated in a separate loop within the %LOOPS macro so that they are written out as a block in the SAS log.The following statements create the %LOOPS macro:

    %macro loops;%do i=1 %to &n;

    array cp&i[&&dim&i];%put array cp&i[%left(&&dim&i)]%str(;);cp&i[1]=1;%put cp&i[1]=1%str(;);%do k=2 %to &&dim&i;

    cp&i[&k]=logistic(alpha&i*(theta-delta&i&k));%put cp&i[&k]=logistic(alpha&i*(theta-delta&i&k))%str(;);

    %end;array p&i[%eval(&&dim&i)];%put array p&i[%eval(&&dim&i)]%str(;);p&i[1]=1-cp&i[2];%put p&i[1]=1-cp&i[2]%str(;);%do k=2 %to %eval(&&dim&i-1);

    p&i[&k]=cp&i[&k]-cp&i[%eval(&k+1)];%put p&i[&k]=cp&i[&k]-cp&i[%eval(&k+1)]%str(;);

    %end;p&i[%eval(&&dim&i)]=cp&i[%eval(&&dim&i)];%put p&i[%eval(&&dim&i)]=cp&i[%eval(&&dim&i)]%str(;);

    %end;%do i=1 %to &n;

    model &&item&i ~ table(p&i) nooutpost;%put model %trim(&&item&i) ~ table(p&i) nooutpost%str(;);

    %end;%mend loops;

    The Simplified PROC MCMC SpecificationThe following is what the specification for a graded response model looks like when you use PROC MCMCand the %DIMENSIONS, %PARMS, %DELTA, and %LOOPS macros:

  • 12 F

    %dimensions(data=graded, varlist=item1-item3)ods output PostSumInt=PostSumInt;proc mcmc data=graded nmc=80000 outpost=outpost seed=10000 nthreads=-1;

    random theta~normal(0, var=1) subject=person nooutpost;%parms(scale=1)prior alpha: ~normal(1, var=12);%delta(mean=0, var=12)%loops

    run;

    Generating Diagnostic PlotsTo produce CBC, ORF, OIC, IIC, and TIC plots, you use the means of the posterior distributions that aresaved in the data set PostSumInt to compute the following quantities:

    � the cumulative probability of scoring in or selecting each of the Kj categories or higher (for all items)over a range of values of � (CBC)

    � the marginal probability of scoring in or selecting the kth category of item j (for all categories of allitems) over a range of values of � (ORF)

    � the option information functions for each category of each item over a range of values of � (OIC)

    � the sum of the option information functions for each item (IIC)

    � the sum of all the item information functions (TIC)

    Creating the PLOTS Data SetThe following statements create a macro, %PLOTS, that generates a data set that is suitable for producing theCBC, ORF, OIC, IIC, and TIC plots.

    %macro plots(DATA=, OUT=);options nonotes;proc transpose data=&data(keep=Parameter Mean) out=parms(drop=_NAME_);

    ID Parameter;run;data &out;

    set parms;array alpha{&n} alpha1-alpha&n;array i{&n} i1-i&n;%do l=1 %to &n;

    %let q = %eval(&&dim&l-1);%let r= %eval(&&dim&l);array delta&l{&q} delta&l.2-delta&l&r;array cp&l{&r};

  • Creating the PLOTS Data Set F 13

    array p&l{&r};array ci&l{&r};

    %end;do theta=-10 to 10 by .25;

    %do l=1 %to &n;%let q = %eval(&&dim&l-1);%let r= %eval(&&dim&l);cp&l[1]=1;%do k=2 %to &r;cp&l[&k]=logistic(alpha&l*(theta-delta&l&k));label cp&l&k= %sysfunc(trim(&&item&l))": category &k";

    %end;%do k=1 %to &q;p&l[&k]=cp&l[&k]-cp&l[%eval(&k+1)];label p&l&k= %sysfunc(trim(&&item&l))": category &k";

    %end;p&l[&r]=cp&l[&r];label p&l&r= %sysfunc(trim(&&item&l))": category &r";i[&l]=0;%do k=1 %to &r;ci&l&k=alpha[&l]**2*p&l[&k]*(1-p&l[&k]);label ci&l&k= %sysfunc(trim(&&item&l))": category &k";i[&l] + ci&l&k;label i&l= %sysfunc(trim(&&item&l));

    %end;%end;info=0;%do l=1 %to &n;

    info = info + i[&l];%end;output;

    end;run;

    %mend plots;

    The %PLOTS macro has two required arguments. The DATA= argument specifies the name of the data setthat contains the MCMC procedure’s posterior summaries and intervals table. You use an ODS OUTPUTstatement to create this data set when you fit the model by using PROC MCMC. The OUT= argumentspecifies the name of the output data set that the macro generates. The %PLOTS macro saves the cumulativeprobabilities in the variables CP11, : : : , CP1&Dim1, : : : , CP&N1, : : : , CP&N&Dim&N. The marginalprobabilities are saved in the variables P11, : : : , P1&Dim1, : : : , P&N1, : : : , P&N&Dim&N. The categoryinformation functions are saved in the variables CI11, : : : , CI1&Dim1, : : : , CI&N1, : : : , CI&N&Dim&N. Theitem information functions are saved in the variables I1, : : : , CI1&N, and the test information function issaved in the variable Info. You invoke the macro by submitting the following statement (but supplying anydata set name that you want for the two arguments):

    %plots(data=PostSumInt, out=plots)

  • 14 F

    Plotting Category Boundary CurvesThe following SAS statements create the macro %CBC, which plots the category boundary curves. Themacro has one required argument, DATA=, which specifies the name of the output data set that you specifyin the OUT= argument when you invoke the %PLOTS macro. The %CBC macro loops through the itemsand the categories for each item and uses PROC SGPLOT to generate a CBC plot for each item. It uses theglobal macro variables that are created by the %DIMENSIONS macro as the parameters for the loops and tocreate the titles for the plots.

    %macro cbc(DATA=);options nonotes;title "Category Boundary Curves";%do i=1 %to &n;

    proc sgplot data=&data;title2 "&&item&i";%do j=2 %to &&dim&i;series x=theta y=cp&i&j;

    %end;yaxis label="Probability";xaxis label="Trait ((*ESC*){unicode theta})";refline .5 / axis=y;

    run;%end;title;

    %mend cbc;

    You invoke the macro by submitting the following statement (but supplying any data set name that you wantfor the input data set):

    %cbc(data=plots)

    Figure 1 displays the resulting CBC plots for the three items, which show the cumulative probability ofscoring in or selecting each of the Kj categories or higher (for all items) over a range of values of � .

  • Plotting Option Response Functions F 15

    Figure 1 CBC Plots

    Plotting Option Response FunctionsThe following SAS statements create the macro %ORF, which plots the option response functions. Themacro has one required argument, DATA=, which specifies the name of the output data set that you specifyin the OUT= argument when you invoke the %PLOTS macro. The %ORF macro loops through the itemsand the categories for each item and uses PROC SGPLOT to generate an ORF plot for each item. It uses theglobal macro variables that are created by the %DIMENSIONS macro as the parameters for the loops and tocreate the titles for the plots.

    %macro orf(DATA=);options nonotes;title "Option Response Functions";%do i=1 %to &n;

    proc sgplot data=&data;title2 "&&item&i";%do j=1 %to &&dim&i;series x=theta y=p&i&j;

  • 16 F

    %end;yaxis label="Probability";xaxis label="Trait ((*ESC*){unicode theta})";refline .5 / axis=y;

    run;%end;title;

    %mend orf;

    You invoke the macro by submitting the following statement (but supplying any data set name that you wantfor the input data set):

    %orf(data=plots)

    Figure 2 displays the resulting ORF plots for the three items, which show the marginal probabilities of scoringin or selecting the kth category of item j over a range of values of � .

    Figure 2 ORF Plots

  • Plotting Option and Item Information Curves F 17

    Plotting Option and Item Information CurvesThe following SAS statements create the macro %IIC, which plots the option information curves and an iteminformation curve for each item. The macro has one required argument, DATA=, which specifies the nameof the output data set that you specify in the OUT= argument when you invoke the %PLOTS macro. The%IIC macro loops through the items and the categories for each item and uses PROC SGPLOT to generate aOIC plot for each option and an IIC for each item. It uses the global macro variables that are created by the%DIMENSIONS macro as the parameters for the loops and to create the titles for the plots.

    %macro iic(DATA=);options nonotes;title "Category & Item Information Curves";%do i=1 %to &n;

    proc sgplot data=&data;title2 "&&item&i";%do j=1 %to &&dim&i;

    series x=theta y=ci&i&j;%end;series x=theta y=i&i;yaxis label="Information";xaxis label="Trait ((*ESC*){unicode theta})";

    run;%end;title;

    %mend iic;

    You invoke the macro by submitting the following statement (but supplying any data set name that you wantfor the input data set):

    %iic(data=plots)

    Figure 3 displays the resulting CIC and IIC plots for the three items. The CIC plots display the optioninformation functions for each category of each item over a range of values of � (OIC). The IIC plots displaythe sum of the option information functions for each item.

  • 18 F

    Figure 3 CIC and IIC Plots

    Plotting the Test Information CurveThe following SAS statements create the macro %TIC, which plots the test information curve. The macro hasone required argument, DATA=, which specifies the name of the output data set that you specify in the OUT=argument when you invoke the %PLOTS macro. The %TIC macro does exactly what the manual versiondoes; its only virtue is to eliminate the need to copy and paste the original manually generated program.

    %macro tic(DATA=);options nonotes;proc sgplot data=&data;

    title "Test Information Curve";series x=theta y=info;yaxis label="Information";xaxis label="Trait ((*ESC*){unicode theta})";

    run;title;

  • References F 19

    %mend tic;

    You invoke the macro by submitting the following statement (but supplying any data set name that you wantfor the input data set):

    %tic(data=plots)

    Figure 4 displays the resulting TIC plot, which displays the sum of all the item information functions.

    Figure 4 Test Information Curve

    References

    De Ayala, R. J. (2009). The Theory and Practice of Item Response Theory. New York: Guilford Press.