Step 19 Validity Reliability Studies

Validation studies must be conducted before a simulator can be implemented in the curriculum of training programs and considered for use as an assessment tool. Validation results of a training simulator should be reproducible.

Validation protocols should be even more rigorous if the simulator is to be used as a tool for trainee assessment as the stakes are much higher for tools of assessment. In appropriate use of assessment, data could be used to discipline existing practitioners, prevent others from entering the field, or performing certain tasks.

Though the cognitive science community has a much more sophisticated grasp on validation measures, Litwin has provided translation for application of medical surveys while Gallagher et al. (35) has provided translation for use in surgical simulators.

When considering the validation of a simulator, a number of validity criteria should be considered:

1. Face validity: A type of validity assessed by having experts review the contents of a test to see if it seems appropriate. It is a very gross, subjective type of validation and usually only used when building the simulator.

2. Content validity: An estimate of the validity of a testing instrument based on a detailed examination of the contents of the test items. It is also subjective in nature. It is obtained when asking experts to review each item to determine whether it is appropriate to the task intended to be trained. The overall cohesiveness of the simulator is assessed, determining whether it contains the realism, steps, and skills that are used in a given procedure.

3. Construct validity: A set of procedures to evaluate a testing instrument based on the degree to which the test items identify the quality, ability, or trait it was designed to measure. This is a much more complex measure of validity and there are many constructs that can be examined. The most basic example is the ability of an assessment tool to differentiate experts and novices performing a given task.

4. Concurrent validity: An evaluation in which the relationship between the test scores and the scores on another instrument purporting to measure the same construct are related. It can be thought of as a subset of construct validity. It is achieved when testing a simulator versus gold standard methods of training. One problem with this method in surgical skills training is that very often no gold standard exists.

5. Discriminate validity: An evaluation that reflects the extent to which the scores generated by the assessment tool actually correlate with factors with which they should correlate. Another subset of construct validity, it is a sophisticated analysis looking at correlations. One example is a simulator's ability to differentiate ability levels within a group with similar experience, such as discriminating abilities of all the residents in postgraduate year 1.

6. Predictive validity: The extent to which the scores on a test are predictive of actual performance. Predictive validity is probably the most important validation measure to be considered given our training dilemma. It predicts who will and who will not perform actual surgical tasks well.

A reliable simulator measures something in a reproducible manner. Standards of acceptable reliability depend on the purpose of the test and the cost of misclassification. Usually expressed as a value between 0 and 1, it represents the proportion of the variability in scores attributable to true differences between subjects (36,37). Wanzel et al. (38) used a reliability of > 0.8, as there was a high cost of misclassification because they were using their tool for assessment.

One of the common pitfalls for validation studies in the surgical sciences has been to use the same device that is being used for training (the intervention), as the very same tool used for evaluation pre- and post-training. Such a design merely shows that training on an instrument leads to improvement on that very instrument and lacks tangible evidence of translation to clinical skills.

For subjective evaluation, the agreement between two different observers is known as inter-rater reliability and agreement between observations on the same subject on two separate occasions is known as intra-rater reliability.

For virtual reality simulators that generate objective data, this is less of an issue, but validation of these instruments has and should continue to be correlated with reliable subjective evaluation.

One useful measurement to obtain once a simulation tool has been embedded in the curriculum for some time is what is called a training transfer ratio (Fig. 11). This is

Validation studies must be conducted before a simulator can be implemented in the curriculum of training programs and considered for use as an assessment tool. Validation results of a training simulator should be reproducible.

A reliable simulator measures something in a reproducible manner.

For subjective evaluation, the agreement between two different observers is known as inter-rater reliability and agreement between observations on the same subject on two separate occasions is known as intra-rater reliability.

FIGURE 11 ■ In this hypothetical example, the slope of the line at any point would represent the training transfer ratio (TTR). For a novice who has logged five simulation hours, this may be equivalent to five operating room hours (TTR = 1.0), as they may get more

2 15

Training Transfer Ratio: Hypothetical learning curve for a medical procedure

Sim hours

Training Transfer Ratio: Hypothetical learning curve for a medical procedure benefit from the simulator, while someone who has logged 15 operating room hours in this case would have a TTR = 0.40. Clearly, the value of simulation at different levels of training is unknown and this would need to

Sim hours

Successful licensing depends on enlisting the help of an academic institution's department of technology transfer and licensing. A relationship between developers and technology transfer officials should be established at the onset of development so that plans can be made to protect future intellectual property.

It is therefore important to follow university protocol when demonstrating the simulator to the public. Otherwise, commercialization potential might be forfeited.

It is the author's belief that in order to be implemented into training programs, a simulator should at least achieve reliable face, content, and construct validity.

be worked out for each skill task.

defined as the number of simulated case hours equivalent to one operative case hour. Such a ratio may not be quite so simple as simulated cases may be more useful for less experienced trainees than for those with experience. This ratio truly represents the derivative or slope of the learning curve at any given point during training, which is just that a "curve." In the aviation industry, this ratio has been estimated as half hour simulated time = one hour logged in a real plane.

0 0

Post a comment