Sample stimuli

sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9

How to use

from brainscore_vision import load_benchmark
benchmark = load_benchmark("Baker2022fragmented-accuracy_delta")
score = benchmark(my_model)

Model scores

Min Alignment Max Alignment

Rank

Model

Score

1
.987
2
.987
3
.986
4
.986
5
.985
6
.984
7
.984
8
.984
9
.983
10
.982
11
.982
12
.982
13
.981
14
.978
15
.975
16
.970
17
.965
18
.964
19
.960
20
.960
21
.960
22
.957
23
.954
24
.946
25
.945
26
.944
27
.944
28
.937
29
.935
30
.926
31
.925
32
.917
33
.909
34
.903
35
.901
36
.901
37
.901
38
.889
39
.882
40
.881
41
.868
42
.860
43
.858
44
.858
45
.838
46
.836
47
.836
48
.834
49
.832
50
.832
51
.822
52
.811
53
.806
54
.803
55
.802
56
.799
57
.796
58
.791
59
.788
60
.787
61
.785
62
.774
63
.760
64
.758
65
.756
66
.755
67
.751
68
.740
69
.739
70
.735
71
.734
72
.734
73
.730
74
.723
75
.721
76
.720
77
.709
78
.698
79
.691
80
.684
81
.671
82
.670
83
.663
84
.656
85
.649
86
.646
87
.626
88
.617
89
.603
90
.602
91
.597
92
.592
93
.590
94
.583
95
.582
96
.575
97
.566
98
.558
99
.558
100
.550
101
.543
102
.541
103
.538
104
.532
105
.528
106
.524
107
.523
108
.515
109
.507
110
.499
111
.494
112
.478
113
.473
114
.470
115
.446
116
.445
117
.438
118
.433
119
.424
120
.421
121
.417
122
.412
123
.412
124
.412
125
.411
126
.400
127
.392
128
.392
129
.388
130
.365
131
.350
132
.336
133
.336
134
.333
135
.323
136
.308
137
.308
138
.304
139
.289
140
.287
141
.282
142
.280
143
.274
144
.272
145
.268
146
.264
147
.251
148
.236
149
.221
150
.217
151
.216
152
.204
153
.195
154
.195
155
.186
156
.178
157
.167
158
.161
159
.149
160
.124
161
.115
162
.111
163
.096
164
.096
165
.053
166
.038
167
.032
168
.030
169
.029
170
.021
171
.015
172
.014
173
.011
174
.011
175
.003
176
.000
177
.000
178
.000
179
.000
180
.000
181
.000
182
.000
183
.000
184
.000
185
.000
186
.000
187
.000
188
.000
189
.000
190
.000
191
.000
192
.000
193
.000
194
.000
195
.000
196
.000
197
.000
198
.000
199
.000
200
.000
201
.000
202
.000
203
.000
204
.000
205
.000
206
.000
207
.000
208
.000
209
.000
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302

Benchmark bibtex

@article{BAKER2022104913,
                title = {Deep learning models fail to capture the configural nature of human shape perception},
                journal = {iScience},
                volume = {25},
                number = {9},
                pages = {104913},
                year = {2022},
                issn = {2589-0042},
                doi = {https://doi.org/10.1016/j.isci.2022.104913},
                url = {https://www.sciencedirect.com/science/article/pii/S2589004222011853},
                author = {Nicholas Baker and James H. Elder},
                keywords = {Biological sciences, Neuroscience, Sensory neuroscience},
                abstract = {Summary
                A hallmark of human object perception is sensitivity to the holistic configuration of the local shape features of an object. Deep convolutional neural networks (DCNNs) are currently the dominant models for object recognition processing in the visual cortex, but do they capture this configural sensitivity? To answer this question, we employed a dataset of animal silhouettes and created a variant of this dataset that disrupts the configuration of each object while preserving local features. While human performance was impacted by this manipulation, DCNN performance was not, indicating insensitivity to object configuration. Modifications to training and architecture to make networks more brain-like did not lead to configural processing, and none of the networks were able to accurately predict trial-by-trial human object judgements. We speculate that to match human configural sensitivity, networks must be trained to solve a broader range of object tasks beyond category recognition.}
        }

Ceiling

0.94.

Note that scores are relative to this ceiling.

Data: Baker2022fragmented

Metric: accuracy_delta