Sample stimuli

sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9

How to use

from brainscore_vision import load_benchmark
benchmark = load_benchmark("ImageNet-C-blur-top1")
score = benchmark(my_model)

Model scores

Min Alignment Max Alignment

Rank

Model

Score

1
.623
2
.612
3
.596
4
.576
5
.560
6
.548
7
.542
8
.538
9
.533
10
.528
11
.525
12
.525
13
.516
14
.514
15
.514
16
.510
17
.505
18
.503
19
.500
20
.494
21
.483
22
.475
23
.474
24
.471
25
.468
26
.463
27
.463
28
.456
29
.455
30
.448
31
.444
32
.442
33
.436
34
.434
35
.433
36
.431
37
.431
38
.426
39
.416
40
.416
41
.414
42
.413
43
.413
44
.412
45
.409
46
.406
47
.404
48
.404
49
.403
50
.399
51
.396
52
.395
53
.392
54
.391
55
.390
56
.388
57
.388
58
.388
59
.385
60
.384
61
.383
62
.383
63
.382
64
.381
65
.381
66
.379
67
.379
68
.378
69
.377
70
.375
71
.375
72
.371
73
.370
74
.368
75
.366
76
.366
77
.366
78
.365
79
.365
80
.364
81
.363
82
.363
83
.362
84
.361
85
.361
86
.361
87
.360
88
.360
89
.359
90
.359
91
.357
92
.357
93
.357
94
.357
95
.356
96
.355
97
.355
98
.354
99
.354
100
.352
101
.352
102
.351
103
.351
104
.351
105
.349
106
.349
107
.348
108
.346
109
.345
110
.345
111
.343
112
.340
113
.339
114
.335
115
.331
116
.330
117
.329
118
.326
119
.324
120
.324
121
.321
122
.321
123
.320
124
.312
125
.312
126
.312
127
.312
128
.307
129
.306
130
.306
131
.306
132
.305
133
.301
134
.298
135
.297
136
.289
137
.288
138
.287
139
.285
140
.284
141
.282
142
.280
143
.279
144
.276
145
.275
146
.274
147
.273
148
.271
149
.267
150
.266
151
.264
152
.263
153
.260
154
.260
155
.260
156
.259
157
.259
158
.258
159
.258
160
.257
161
.256
162
.256
163
.255
164
.255
165
.252
166
.252
167
.249
168
.248
169
.248
170
.244
171
.240
172
.239
173
.239
174
.237
175
.234
176
.232
177
.231
178
.231
179
.229
180
.229
181
.229
182
.225
183
.218
184
.218
185
.216
186
.214
187
.214
188
.211
189
.210
190
.207
191
.206
192
.206
193
.204
194
.202
195
.201
196
.195
197
.193
198
.192
199
.187
200
.187
201
.187
202
.187
203
.187
204
.185
205
.183
206
.181
207
.179
208
.179
209
.179
210
.179
211
.179
212
.175
213
.173
214
.171
215
.170
216
.168
217
.165
218
.163
219
.156
220
.155
221
.154
222
.154
223
.146
224
.144
225
.132
226
.132
227
.130
228
.127
229
.127
230
.120
231
.119
232
.108
233
.102
234
.020
235
.020
236
.006
237
.003
238
.003
239
.002
240
.002
241
.002
242
.001
243
.001
244
.001
245
.001
246
.001
247
.001
248
.001
249
.001
250
.001
251
.001
252
.001
253
.001
254
.001
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457

Benchmark bibtex

@ARTICLE{Hendrycks2019-di,
   title         = "Benchmarking Neural Network Robustness to Common Corruptions
                    and Perturbations",
   author        = "Hendrycks, Dan and Dietterich, Thomas",
   abstract      = "In this paper we establish rigorous benchmarks for image
                    classifier robustness. Our first benchmark, ImageNet-C,
                    standardizes and expands the corruption robustness topic,
                    while showing which classifiers are preferable in
                    safety-critical applications. Then we propose a new dataset
                    called ImageNet-P which enables researchers to benchmark a
                    classifier's robustness to common perturbations. Unlike
                    recent robustness research, this benchmark evaluates
                    performance on common corruptions and perturbations not
                    worst-case adversarial perturbations. We find that there are
                    negligible changes in relative corruption robustness from
                    AlexNet classifiers to ResNet classifiers. Afterward we
                    discover ways to enhance corruption and perturbation
                    robustness. We even find that a bypassed adversarial defense
                    provides substantial common perturbation robustness.
                    Together our benchmarks may aid future work toward networks
                    that robustly generalize.",
   month         =  mar,
   year          =  2019,
   archivePrefix = "arXiv",
   primaryClass  = "cs.LG",
   eprint        = "1903.12261",
   url           = "https://arxiv.org/abs/1903.12261"
}

Ceiling

1.00.

Note that scores are relative to this ceiling.

Data: ImageNet-C-blur

Metric: top1