A methodology is presented for assessing the reliability of an ordinally-scaled index and is illustrated by using data from a clinical trial in which gingival inflammation was assessed with the PMGI index, independently, by five examiners. One of the examiners was an experienced examiner, the others newly trained. All subjects were evaluated by each examiner initially and at the end of the study period. The reliability of the average score per subject, maximum score per subject, and the percentage of affected sites per person are estimated by the intraclass correlation coefficient. Procedures are presented that utilize various forms of the weighted kappa statistic for dissecting patterns in examiner agreement for specific sites, types of sites, all sites, and for the individual components and categories of the index. It is shown how these procedures can be useful for training and calibrating multiple examiners, who will be using such an index in a clinical study, so that adequate reliability levels can be realized.