Sample stimuli

sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9

How to use

from brainscore_vision import load_benchmark
benchmark = load_benchmark("ImageNet-C-blur-top1")
score = benchmark(my_model)

Model scores

Min Alignment Max Alignment

Rank

Model

Score

1
.623
2
.596
3
.576
4
.548
5
.542
6
.538
7
.533
8
.525
9
.525
10
.516
11
.514
12
.514
13
.510
14
.505
15
.503
16
.500
17
.494
18
.483
19
.475
20
.474
21
.468
22
.463
23
.463
24
.456
25
.455
26
.448
27
.442
28
.436
29
.434
30
.433
31
.431
32
.431
33
.426
34
.416
35
.416
36
.414
37
.413
38
.413
39
.412
40
.409
41
.406
42
.404
43
.404
44
.403
45
.399
46
.396
47
.395
48
.392
49
.391
50
.390
51
.388
52
.388
53
.388
54
.385
55
.384
56
.383
57
.381
58
.381
59
.379
60
.379
61
.378
62
.377
63
.375
64
.375
65
.371
66
.370
67
.368
68
.366
69
.366
70
.364
71
.363
72
.362
73
.361
74
.361
75
.357
76
.357
77
.355
78
.355
79
.354
80
.352
81
.351
82
.351
83
.351
84
.349
85
.349
86
.348
87
.346
88
.345
89
.345
90
.343
91
.340
92
.339
93
.335
94
.331
95
.330
96
.326
97
.324
98
.324
99
.321
100
.321
101
.320
102
.312
103
.312
104
.312
105
.312
106
.307
107
.306
108
.306
109
.306
110
.305
111
.301
112
.298
113
.297
114
.289
115
.288
116
.287
117
.285
118
.284
119
.282
120
.280
121
.279
122
.276
123
.275
124
.274
125
.273
126
.271
127
.267
128
.266
129
.264
130
.263
131
.260
132
.260
133
.260
134
.259
135
.259
136
.258
137
.258
138
.257
139
.256
140
.256
141
.255
142
.255
143
.252
144
.252
145
.249
146
.248
147
.248
148
.244
149
.240
150
.239
151
.239
152
.237
153
.234
154
.232
155
.231
156
.231
157
.229
158
.229
159
.229
160
.225
161
.218
162
.218
163
.216
164
.214
165
.214
166
.211
167
.210
168
.207
169
.206
170
.206
171
.204
172
.202
173
.201
174
.195
175
.193
176
.192
177
.190
178
.190
179
.190
180
.190
181
.187
182
.185
183
.183
184
.181
185
.179
186
.179
187
.175
188
.174
189
.173
190
.171
191
.170
192
.168
193
.165
194
.163
195
.156
196
.155
197
.154
198
.154
199
.146
200
.144
201
.132
202
.132
203
.130
204
.127
205
.127
206
.120
207
.119
208
.108
209
.102
210
.020
211
.020
212
.006
213
.003
214
.003
215
.002
216
.002
217
.002
218
.001
219
.001
220
.001
221
.001
222
.001
223
.001
224
.001
225
.001
226
.001
227
.001
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408

Benchmark bibtex

@ARTICLE{Hendrycks2019-di,
   title         = "Benchmarking Neural Network Robustness to Common Corruptions
                    and Perturbations",
   author        = "Hendrycks, Dan and Dietterich, Thomas",
   abstract      = "In this paper we establish rigorous benchmarks for image
                    classifier robustness. Our first benchmark, ImageNet-C,
                    standardizes and expands the corruption robustness topic,
                    while showing which classifiers are preferable in
                    safety-critical applications. Then we propose a new dataset
                    called ImageNet-P which enables researchers to benchmark a
                    classifier's robustness to common perturbations. Unlike
                    recent robustness research, this benchmark evaluates
                    performance on common corruptions and perturbations not
                    worst-case adversarial perturbations. We find that there are
                    negligible changes in relative corruption robustness from
                    AlexNet classifiers to ResNet classifiers. Afterward we
                    discover ways to enhance corruption and perturbation
                    robustness. We even find that a bypassed adversarial defense
                    provides substantial common perturbation robustness.
                    Together our benchmarks may aid future work toward networks
                    that robustly generalize.",
   month         =  mar,
   year          =  2019,
   archivePrefix = "arXiv",
   primaryClass  = "cs.LG",
   eprint        = "1903.12261",
   url           = "https://arxiv.org/abs/1903.12261"
}

Ceiling

1.00.

Note that scores are relative to this ceiling.

Data: ImageNet-C-blur

Metric: top1