DevChakraborty.com

JAFROC


JAFROC analysis is intended for analysis of human observer free-response data (Chakraborty et al. 2004).  JAFROC software is available on this website. It is very important that the FROC data be collected properly; see recommendations.

Details (download pdf)
The figure-of-merit A1J is defined as the probability that lesions (including unmarked lesions) are rated higher than NL marks on normal images. It is estimated by the non-parametric Mann-Whitney-Wilcoxon U-statistic applied to the lesion ratings and the ROC-equivalent ratings of normal images, where unmarked lesions and unmarked normal images are formally assigned the "negative infinity" rating (in the software "negative infinity" is -2000).  The figure of merit is defined by
   .                                                             
Here Xi is the ROC-equivalent rating for normal case i, Yj is the rating for the jth lesion, NN is the number of normal images and NL is the number of lesions.  The significance of observed inter-modality A1J differences is determined by jackknifing cases and analyzing the pseudo-value matrix as in DBM-MRMC analysis (Dorfman et al. 1992) with a mixed model analysis of variance (ANOVA).  Analysis was performed on the binned data (≤ 6 bins) for human observer and on the quasi-continuous data for CAD.  Software implementing the JAFROC method is available on this website.  The reason for the notation A1J is that this figure of merit is the non-parametric estimate of the area under the alternative free-response receiver operating characteristic (AFROC) curve, which was originally denoted A1 (Chakraborty 1989; Chakraborty et al. 1990), except that in JAFROC only normal images are used to estimate the x-coordinate (FPF) of the AFROC curve. 

The original figure-of-merit definition (Chakraborty et al. 2004) included a weighting factor for each lesion which allowed for the possibility that detection of different lesions in an image might have different clinical consequences.  It was subsequently found (Chakraborty 2006) that for unequal weighting this definition failed the NH test when jackknifing was used for significance testing.  Correct NH behavior – a rejection rate equal to the significance level of the test to within sampling variability - is a test of validity of the analysis, since it means the variability of the figure of merit difference is being correctly estimated.  An alternate figure of merit definition was proposed which involved a weighted average of the ratings for each abnormal image (Chakraborty 2006).  This had the correct NH behavior but since the number of comparisons is smaller the statistical power is lower.  The version of JAFROC (2.0) implementing lesion weighting has been withdrawn. A version employing the original figure of merit definition but using bootstrapping for significance testing is in preparation which should correct the NH behavior without sacrificing statistical power.

Users of JAFROC are sometimes surprised that a modality (for example, mod-A) on which the observer marks some of the lesions without marking any normal image does not yield a perfect figure of merit (A1J = 1) and conversely a modality (for example, mod-B) on which the observer marks some of the normal images and does not mark any of the lesions does not yield a zero figure of merit (A1J = 0).  In fact it is observed that 0 < (A1J)mod-B < (A1J)mod-A < 1.  The observer who marks only lesions is obviously better than the observer who marks only normal images.  If the observer marked every lesion and did not mark any normal image, the figure of merit would be unity and if the observer marked every normal image and did not mark any lesion, the figure of merit would be zero.     

JAFROC has been extensively validated using a simulator that included inter-image and intra-image correlations of the ratings (Chakraborty et al. 2004).  In JAFROC analysis when a case is jackknifed all mark-ratings on the case are removed from the analysis and each case yields one pseudovalue.  No assumptions regarding the correlations are made (however, ANOVA assumes the pseudo-values are independent and normally distributed, see below).  The simulation testing confirmed that the method had the correct NH behavior even in the presence of strong intra-image correlations.  The simulator used in that testing assumed a constant total number of suspicious regions per image, denoted T in (Chakraborty et al. 2004).  This did not account for the expected randomness (i.e., case dependence) of these numbers. It also assumed that all lesions were considered for marking, i.e., identified as suspicious regions, which implies n = 1 and LLF = 1 at the end-point, which is generally not observed to be the case. Recently JAFROC has been re-validated using the search-model based simulator described above which accounts for these factors and correlations (Chakraborty et al. 2008).  Since the jackknife pseudo-values are only asymptotically independent and normally distributed and therefore do not satisfy ANOVA assumptions, the bootstrap method is generally regarded as more reliable.  A future release of JAFROC software will use bootstrapping for significance testing.

D. P. Chakraborty, "Maximum Likelihood analysis of free-response receiver operating characteristic (FROC) data," Med. Phys. 16 (4), 561-568 (1989).
D. P. Chakraborty and L. H. L. Winter, "Free-Response Methodology: Alternate Analysis and a New Observer-Performance Experiment," Radiology 174, 873-881 (1990).
D. D. Dorfman, K. S. Berbaum and C. E. Metz, "ROC characteristic rating analysis: Generalization to the Population of Readers and Patients with the Jackknife method," Invest. Radiol. 27 (9), 723-731 (1992).
D. P. Chakraborty and K. S. Berbaum, "Observer studies involving detection and localization: Modeling, analysis and validation," Medical Physics 31 (8), 2313-2330 (2004).
D. P. Chakraborty, "Analysis of location specific observer performance data: validated extensions of the jackknife free-response (JAFROC) method," Acad Radiol. 13 (10), 1187-1193 (2006).
D. P. Chakraborty and H. J. Yoon, "Investigation of methods for analyzing location specific observer performance data," Proc. SPIE Medical Imaging 2008: Image Perception, Observer Performance, and Technology Assessment 6917  (2008).



Dev P. Chakraborty, Ph. D. | 2103 Noble Ct, Murrysville PA 15668 | ©2005 DevChakraborty.com