Sample stimuli

sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9

How to use

from brainscore_vision import load_benchmark
benchmark = load_benchmark("Baker2022frankenstein-accuracy_delta")
score = benchmark(my_model)

Model scores

Min Alignment Max Alignment

Rank

Model

Score

1
.983
2
.974
3
.972
4
.957
5
.940
6
.939
7
.929
8
.929
9
.929
10
.899
11
.897
12
.889
13
.887
14
.874
15
.873
16
.867
17
.867
18
.865
19
.862
20
.858
21
.855
22
.854
23
.850
24
.844
25
.841
26
.824
27
.806
28
.803
29
.801
30
.799
31
.798
32
.793
33
.775
34
.770
35
.767
36
.767
37
.766
38
.766
39
.761
40
.754
41
.741
42
.741
43
.731
44
.731
45
.730
46
.727
47
.726
48
.724
49
.717
50
.715
51
.714
52
.713
53
.712
54
.698
55
.696
56
.692
57
.681
58
.680
59
.671
60
.671
61
.666
62
.666
63
.662
64
.658
65
.658
66
.647
67
.645
68
.644
69
.641
70
.627
71
.623
72
.617
73
.608
74
.607
75
.599
76
.578
77
.576
78
.572
79
.568
80
.568
81
.567
82
.558
83
.552
84
.545
85
.539
86
.536
87
.527
88
.523
89
.519
90
.503
91
.503
92
.496
93
.494
94
.487
95
.485
96
.485
97
.465
98
.464
99
.463
100
.463
101
.462
102
.457
103
.454
104
.453
105
.448
106
.448
107
.448
108
.435
109
.429
110
.427
111
.420
112
.418
113
.410
114
.398
115
.395
116
.392
117
.378
118
.374
119
.372
120
.365
121
.365
122
.362
123
.343
124
.339
125
.335
126
.335
127
.335
128
.334
129
.314
130
.313
131
.310
132
.308
133
.305
134
.301
135
.288
136
.286
137
.284
138
.282
139
.278
140
.276
141
.270
142
.270
143
.257
144
.232
145
.232
146
.223
147
.220
148
.220
149
.212
150
.204
151
.201
152
.160
153
.158
154
.156
155
.156
156
.155
157
.144
158
.142
159
.135
160
.134
161
.131
162
.119
163
.111
164
.098
165
.096
166
.089
167
.088
168
.042
169
.038
170
.028
171
.007
172
.006
173
.003
174
.002
175
.000
176
.000
177
.000
178
.000
179
.000
180
.000
181
.000
182
.000
183
.000
184
.000
185
.000
186
.000
187
.000
188
.000
189
.000
190
.000
191
.000
192
.000
193
.000
194
.000
195
.000
196
.000
197
.000
198
.000
199
.000
200
.000
201
.000
202
.000
203
.000
204
.000
205
.000
206
.000
207
.000
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302

Benchmark bibtex

@article{BAKER2022104913,
                title = {Deep learning models fail to capture the configural nature of human shape perception},
                journal = {iScience},
                volume = {25},
                number = {9},
                pages = {104913},
                year = {2022},
                issn = {2589-0042},
                doi = {https://doi.org/10.1016/j.isci.2022.104913},
                url = {https://www.sciencedirect.com/science/article/pii/S2589004222011853},
                author = {Nicholas Baker and James H. Elder},
                keywords = {Biological sciences, Neuroscience, Sensory neuroscience},
                abstract = {Summary
                A hallmark of human object perception is sensitivity to the holistic configuration of the local shape features of an object. Deep convolutional neural networks (DCNNs) are currently the dominant models for object recognition processing in the visual cortex, but do they capture this configural sensitivity? To answer this question, we employed a dataset of animal silhouettes and created a variant of this dataset that disrupts the configuration of each object while preserving local features. While human performance was impacted by this manipulation, DCNN performance was not, indicating insensitivity to object configuration. Modifications to training and architecture to make networks more brain-like did not lead to configural processing, and none of the networks were able to accurately predict trial-by-trial human object judgements. We speculate that to match human configural sensitivity, networks must be trained to solve a broader range of object tasks beyond category recognition.}
        }

Ceiling

0.85.

Note that scores are relative to this ceiling.

Data: Baker2022frankenstein

Metric: accuracy_delta