1. Poor test-retest reliability has led some researchers to suggest that PDs are less stable than previously believed. An alternative hypothesis is that the assessment instruments overemphasize transitory behavioral symptoms (e.g., self-cutting in borderline patients) and underemphasize underlying personality processes that are much more stable over time (e.g., emotional dysregulation and self-hatred in borderline patients).

2. One way it reduces measurement error is by ensuring that raters are "calibrated" with one another. Consider the situation with rating scales, where raters can use any value as often as they wish. Inevitably, certain raters will tend toward extreme values (e.g., values of 0 and 7 on a 0-7 scale) whereas others will tend toward middle values (e.g., values of 4 and 5). Thus, the ratings reflect not only the characteristics of the patients but also the calibration of the raters. The Q-Sort method, with its fixed distribution, eliminates this kind of measurement error, because all clinicians must assign each value the same number of times. If use of a standard item set gives clinicians a common vocabulary, use of a fixed distribution can be said to give them a common "grammar" (Block, 1978).

3. The material presented in this section is adapted from Lingiardi, Shedler, & Gazzillo (2006). See the original publication for a more complete description of the case, treatment methods, and findings.

4. Averaging across raters enhances the reliability of the resulting scores.

5. The relatively low thresholds reflect the fact that the reference sample consisted of patients with PD diagnoses. Thus, a T-score of 50 indicates "average" functioning among patients with PD diagnoses, and a T-score of 60 represents an elevation of one standard deviation relative to other patients with PD diagnoses.

6. The material presented here is adapted from Westen & Weinberger (2004).

7. The material in this section is adapted from Shedler & Westen, 2004a.

8. The reliability of a composite or aggregate personality description is measured by coefficient alpha, which reflects the intercorrelations between the patients (columns of data) included in the aggregate description. The logic is identical to computing the reliability of a psychometric scale, except that patients are treated as scale "items" (columns in the data file) and SWAP-200 items are treated as cases (rows in the data file). See note 4 for additional details.

