Sample stimuli

sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9

How to use

from brainscore_vision import load_benchmark
benchmark = load_benchmark("Baker2022frankenstein-accuracy_delta")
score = benchmark(my_model)

Model scores

Min Alignment Max Alignment

Rank

Model

Score

1
.983
2
.974
3
.972
4
.957
5
.940
6
.939
7
.929
8
.929
9
.929
10
.899
11
.897
12
.889
13
.887
14
.874
15
.873
16
.867
17
.867
18
.865
19
.862
20
.858
21
.855
22
.854
23
.850
24
.844
25
.841
26
.824
27
.824
28
.806
29
.803
30
.801
31
.799
32
.798
33
.793
34
.775
35
.770
36
.767
37
.767
38
.766
39
.766
40
.761
41
.754
42
.741
43
.741
44
.731
45
.731
46
.730
47
.727
48
.724
49
.717
50
.715
51
.714
52
.713
53
.712
54
.698
55
.696
56
.694
57
.692
58
.681
59
.680
60
.671
61
.666
62
.662
63
.658
64
.658
65
.647
66
.645
67
.644
68
.641
69
.627
70
.623
71
.617
72
.608
73
.607
74
.599
75
.578
76
.572
77
.568
78
.568
79
.567
80
.558
81
.552
82
.545
83
.539
84
.536
85
.527
86
.523
87
.519
88
.517
89
.503
90
.503
91
.496
92
.494
93
.487
94
.485
95
.485
96
.465
97
.464
98
.463
99
.463
100
.462
101
.457
102
.456
103
.454
104
.453
105
.448
106
.448
107
.448
108
.435
109
.429
110
.427
111
.420
112
.418
113
.410
114
.398
115
.395
116
.378
117
.374
118
.372
119
.365
120
.365
121
.362
122
.343
123
.339
124
.335
125
.335
126
.335
127
.334
128
.331
129
.314
130
.313
131
.310
132
.308
133
.305
134
.301
135
.288
136
.286
137
.286
138
.284
139
.282
140
.278
141
.276
142
.270
143
.270
144
.257
145
.232
146
.232
147
.220
148
.212
149
.204
150
.201
151
.192
152
.160
153
.156
154
.156
155
.155
156
.142
157
.135
158
.131
159
.119
160
.111
161
.098
162
.096
163
.089
164
.055
165
.042
166
.038
167
.028
168
.007
169
.006
170
.003
171
.002
172
.000
173
.000
174
.000
175
.000
176
.000
177
.000
178
.000
179
.000
180
.000
181
.000
182
.000
183
.000
184
.000
185
.000
186
.000
187
.000
188
.000
189
.000
190
.000
191
.000
192
.000
193
.000
194
.000
195
.000
196
.000
197
.000
198
.000
199
.000
200
.000
201
.000
202
.000
203
.000
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297

Benchmark bibtex

@article{BAKER2022104913,
                title = {Deep learning models fail to capture the configural nature of human shape perception},
                journal = {iScience},
                volume = {25},
                number = {9},
                pages = {104913},
                year = {2022},
                issn = {2589-0042},
                doi = {https://doi.org/10.1016/j.isci.2022.104913},
                url = {https://www.sciencedirect.com/science/article/pii/S2589004222011853},
                author = {Nicholas Baker and James H. Elder},
                keywords = {Biological sciences, Neuroscience, Sensory neuroscience},
                abstract = {Summary
                A hallmark of human object perception is sensitivity to the holistic configuration of the local shape features of an object. Deep convolutional neural networks (DCNNs) are currently the dominant models for object recognition processing in the visual cortex, but do they capture this configural sensitivity? To answer this question, we employed a dataset of animal silhouettes and created a variant of this dataset that disrupts the configuration of each object while preserving local features. While human performance was impacted by this manipulation, DCNN performance was not, indicating insensitivity to object configuration. Modifications to training and architecture to make networks more brain-like did not lead to configural processing, and none of the networks were able to accurately predict trial-by-trial human object judgements. We speculate that to match human configural sensitivity, networks must be trained to solve a broader range of object tasks beyond category recognition.}
        }

Ceiling

0.85.

Note that scores are relative to this ceiling.

Data: Baker2022frankenstein

Metric: accuracy_delta