Figure 3. A, Pattern of behavioral performances for the pooled human and pooled monkey. Each 24 × 24 matrix summarizes confusions of all two-way tasks: the color of bin (i,j) indicates the unbiased performance (d′) of the binary recognition task with objects i and j. Objects have been reordered based on a hierarchical clustering of human confusion patterns to highlight structure in the matrix. We observe qualitative similarities in the confusion patterns. For example, (camel, dog) and (tank, truck) are two often confused object pairs in both monkeys and humans. B, Comparison of d′ estimates of all 276 tasks (mean ± SE as estimated by bootstrap, 100 resamples) of the pooled human with that of the pooled monkey (top) and a low-level pixel representation (bottom). C, Quantification of consistency as noise-adjusted correlation of d′ vectors. The pooled monkey shows patterns of confusions that are highly correlated with pooled human subject confusion patterns (consistency of pooled monkey, 0.78). Importantly, low-level visual representations do not share these confusion patterns (pixels, 0.37; V1+, 0.52). Furthermore, a state-of-the-art deep convolutional neural network representation was highly predictive of human confusion patterns (CNN2013, 0.86), in contrast to an alternative model of the ventral stream (HMAX, 0.55). The dashed lines indicate thresholds at p = 0.1, 0.05 confidence for consistency to the gold-standard pooled human, estimated from pairs of individual human subjects. D, Comparison of d′ estimates of all 276 tasks (mean ± SE as estimated by bootstrap, 100 resamples) between the two monkeys.