Sample stimuli

sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9

How to use

from brainscore_vision import load_benchmark
benchmark = load_benchmark("Baker2022fragmented-accuracy_delta")
score = benchmark(my_model)

Model scores

Min Alignment Max Alignment

Rank

Model

Score

1
.987
2
.986
3
.986
4
.984
5
.984
6
.984
7
.983
8
.982
9
.982
10
.982
11
.981
12
.978
13
.975
14
.970
15
.965
16
.960
17
.960
18
.960
19
.957
20
.954
21
.946
22
.945
23
.944
24
.944
25
.937
26
.935
27
.926
28
.925
29
.917
30
.909
31
.903
32
.901
33
.901
34
.901
35
.889
36
.882
37
.881
38
.868
39
.860
40
.858
41
.858
42
.838
43
.836
44
.836
45
.834
46
.832
47
.822
48
.811
49
.806
50
.803
51
.802
52
.799
53
.796
54
.791
55
.788
56
.785
57
.760
58
.758
59
.756
60
.751
61
.740
62
.739
63
.735
64
.734
65
.734
66
.730
67
.721
68
.720
69
.709
70
.698
71
.671
72
.670
73
.663
74
.656
75
.649
76
.646
77
.626
78
.617
79
.603
80
.602
81
.592
82
.590
83
.583
84
.582
85
.566
86
.558
87
.558
88
.550
89
.543
90
.541
91
.538
92
.532
93
.528
94
.524
95
.523
96
.515
97
.508
98
.507
99
.499
100
.494
101
.478
102
.473
103
.470
104
.456
105
.446
106
.445
107
.438
108
.433
109
.424
110
.421
111
.417
112
.412
113
.412
114
.412
115
.411
116
.400
117
.392
118
.392
119
.388
120
.365
121
.350
122
.336
123
.336
124
.333
125
.323
126
.308
127
.308
128
.304
129
.289
130
.287
131
.282
132
.280
133
.274
134
.272
135
.268
136
.264
137
.253
138
.251
139
.236
140
.225
141
.221
142
.217
143
.216
144
.206
145
.204
146
.195
147
.195
148
.186
149
.178
150
.167
151
.161
152
.149
153
.124
154
.115
155
.111
156
.096
157
.096
158
.053
159
.038
160
.032
161
.030
162
.029
163
.025
164
.021
165
.015
166
.014
167
.011
168
.011
169
.003
170
.000
171
.000
172
.000
173
.000
174
.000
175
.000
176
.000
177
.000
178
.000
179
.000
180
.000
181
.000
182
.000
183
.000
184
.000
185
.000
186
.000
187
.000
188
.000
189
.000
190
.000
191
.000
192
.000
193
.000
194
.000
195
.000
196
.000
197
.000
198
.000
199
.000
200
.000
201
.000
202
.000
203
.000
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296

Benchmark bibtex

@article{BAKER2022104913,
                title = {Deep learning models fail to capture the configural nature of human shape perception},
                journal = {iScience},
                volume = {25},
                number = {9},
                pages = {104913},
                year = {2022},
                issn = {2589-0042},
                doi = {https://doi.org/10.1016/j.isci.2022.104913},
                url = {https://www.sciencedirect.com/science/article/pii/S2589004222011853},
                author = {Nicholas Baker and James H. Elder},
                keywords = {Biological sciences, Neuroscience, Sensory neuroscience},
                abstract = {Summary
                A hallmark of human object perception is sensitivity to the holistic configuration of the local shape features of an object. Deep convolutional neural networks (DCNNs) are currently the dominant models for object recognition processing in the visual cortex, but do they capture this configural sensitivity? To answer this question, we employed a dataset of animal silhouettes and created a variant of this dataset that disrupts the configuration of each object while preserving local features. While human performance was impacted by this manipulation, DCNN performance was not, indicating insensitivity to object configuration. Modifications to training and architecture to make networks more brain-like did not lead to configural processing, and none of the networks were able to accurately predict trial-by-trial human object judgements. We speculate that to match human configural sensitivity, networks must be trained to solve a broader range of object tasks beyond category recognition.}
        }

Ceiling

0.94.

Note that scores are relative to this ceiling.

Data: Baker2022fragmented

Metric: accuracy_delta