Sample stimuli

sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9

How to use

from brainscore_vision import load_benchmark
benchmark = load_benchmark("ImageNet-C-digital-top1")
score = benchmark(my_model)

Model scores

Min Alignment Max Alignment

Rank

Model

Score

1
.731
2
.713
3
.710
4
.699
5
.685
6
.676
7
.671
8
.671
9
.670
10
.664
11
.656
12
.655
13
.654
14
.649
15
.646
16
.645
17
.641
18
.638
19
.638
20
.632
21
.631
22
.624
23
.619
24
.617
25
.615
26
.605
27
.601
28
.598
29
.598
30
.596
31
.594
32
.591
33
.584
34
.581
35
.580
36
.580
37
.579
38
.575
39
.575
40
.573
41
.572
42
.571
43
.568
44
.568
45
.567
46
.566
47
.562
48
.562
49
.561
50
.560
51
.560
52
.557
53
.555
54
.546
55
.541
56
.538
57
.536
58
.534
59
.533
60
.530
61
.530
62
.529
63
.528
64
.528
65
.527
66
.526
67
.523
68
.520
69
.519
70
.516
71
.516
72
.515
73
.514
74
.511
75
.509
76
.509
77
.508
78
.508
79
.507
80
.507
81
.507
82
.507
83
.507
84
.507
85
.506
86
.506
87
.505
88
.505
89
.505
90
.504
91
.504
92
.504
93
.503
94
.502
95
.502
96
.502
97
.502
98
.501
99
.501
100
.501
101
.500
102
.500
103
.495
104
.494
105
.493
106
.490
107
.489
108
.488
109
.488
110
.485
111
.485
112
.485
113
.484
114
.480
115
.477
116
.474
117
.473
118
.473
119
.472
120
.470
121
.468
122
.460
123
.458
124
.450
125
.449
126
.449
127
.449
128
.444
129
.439
130
.437
131
.435
132
.433
133
.432
134
.431
135
.431
136
.428
137
.426
138
.421
139
.421
140
.419
141
.417
142
.409
143
.407
144
.407
145
.405
146
.404
147
.402
148
.399
149
.398
150
.393
151
.392
152
.391
153
.391
154
.391
155
.391
156
.390
157
.388
158
.388
159
.387
160
.386
161
.385
162
.384
163
.383
164
.383
165
.381
166
.379
167
.378
168
.377
169
.377
170
.375
171
.374
172
.373
173
.373
174
.372
175
.370
176
.369
177
.369
178
.368
179
.365
180
.360
181
.358
182
.357
183
.355
184
.350
185
.350
186
.348
187
.347
188
.342
189
.339
190
.338
191
.334
192
.333
193
.332
194
.328
195
.326
196
.324
197
.321
198
.319
199
.297
200
.297
201
.295
202
.295
203
.295
204
.295
205
.294
206
.290
207
.290
208
.290
209
.290
210
.288
211
.287
212
.278
213
.277
214
.277
215
.275
216
.273
217
.272
218
.272
219
.271
220
.269
221
.268
222
.261
223
.256
224
.251
225
.249
226
.246
227
.236
228
.228
229
.216
230
.213
231
.211
232
.203
233
.196
234
.193
235
.188
236
.028
237
.028
238
.006
239
.006
240
.004
241
.003
242
.002
243
.001
244
.001
245
.001
246
.001
247
.001
248
.001
249
.001
250
.001
251
.001
252
.001
253
.001
254
.001
255
.001
256
.001
257
.001
258
.001
259
.001
260
.001
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455

Benchmark bibtex

@ARTICLE{Hendrycks2019-di,
   title         = "Benchmarking Neural Network Robustness to Common Corruptions
                    and Perturbations",
   author        = "Hendrycks, Dan and Dietterich, Thomas",
   abstract      = "In this paper we establish rigorous benchmarks for image
                    classifier robustness. Our first benchmark, ImageNet-C,
                    standardizes and expands the corruption robustness topic,
                    while showing which classifiers are preferable in
                    safety-critical applications. Then we propose a new dataset
                    called ImageNet-P which enables researchers to benchmark a
                    classifier's robustness to common perturbations. Unlike
                    recent robustness research, this benchmark evaluates
                    performance on common corruptions and perturbations not
                    worst-case adversarial perturbations. We find that there are
                    negligible changes in relative corruption robustness from
                    AlexNet classifiers to ResNet classifiers. Afterward we
                    discover ways to enhance corruption and perturbation
                    robustness. We even find that a bypassed adversarial defense
                    provides substantial common perturbation robustness.
                    Together our benchmarks may aid future work toward networks
                    that robustly generalize.",
   month         =  mar,
   year          =  2019,
   archivePrefix = "arXiv",
   primaryClass  = "cs.LG",
   eprint        = "1903.12261",
   url           = "https://arxiv.org/abs/1903.12261"
}

Ceiling

1.00.

Note that scores are relative to this ceiling.

Data: ImageNet-C-digital

Metric: top1