Sample stimuli

sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9

How to use

from brainscore_vision import load_benchmark
benchmark = load_benchmark("ImageNet-C-digital-top1")
score = benchmark(my_model)

Model scores

Min Alignment Max Alignment

Rank

Model

Score

1
.731
2
.713
3
.710
4
.699
5
.685
6
.676
7
.671
8
.671
9
.670
10
.664
11
.656
12
.655
13
.654
14
.649
15
.646
16
.645
17
.641
18
.638
19
.638
20
.632
21
.631
22
.624
23
.619
24
.617
25
.615
26
.605
27
.601
28
.598
29
.598
30
.596
31
.594
32
.591
33
.584
34
.581
35
.580
36
.580
37
.579
38
.575
39
.575
40
.573
41
.572
42
.571
43
.568
44
.568
45
.567
46
.566
47
.562
48
.562
49
.561
50
.560
51
.560
52
.557
53
.555
54
.546
55
.541
56
.538
57
.536
58
.534
59
.533
60
.530
61
.530
62
.529
63
.528
64
.528
65
.527
66
.526
67
.523
68
.520
69
.519
70
.516
71
.515
72
.514
73
.513
74
.511
75
.509
76
.509
77
.508
78
.507
79
.507
80
.507
81
.507
82
.506
83
.505
84
.505
85
.505
86
.504
87
.504
88
.503
89
.502
90
.502
91
.501
92
.501
93
.501
94
.500
95
.495
96
.493
97
.490
98
.489
99
.488
100
.488
101
.485
102
.485
103
.485
104
.484
105
.480
106
.477
107
.474
108
.473
109
.473
110
.472
111
.470
112
.468
113
.460
114
.458
115
.450
116
.449
117
.449
118
.449
119
.444
120
.439
121
.437
122
.435
123
.433
124
.432
125
.431
126
.431
127
.428
128
.426
129
.421
130
.421
131
.419
132
.417
133
.409
134
.407
135
.407
136
.405
137
.404
138
.402
139
.399
140
.398
141
.393
142
.392
143
.391
144
.391
145
.391
146
.391
147
.390
148
.388
149
.388
150
.387
151
.386
152
.385
153
.384
154
.383
155
.383
156
.381
157
.379
158
.378
159
.377
160
.377
161
.375
162
.374
163
.373
164
.373
165
.372
166
.370
167
.369
168
.369
169
.368
170
.365
171
.360
172
.358
173
.357
174
.355
175
.350
176
.350
177
.348
178
.347
179
.342
180
.339
181
.338
182
.334
183
.333
184
.332
185
.328
186
.326
187
.324
188
.321
189
.319
190
.297
191
.297
192
.295
193
.295
194
.295
195
.295
196
.294
197
.290
198
.290
199
.290
200
.290
201
.288
202
.287
203
.278
204
.277
205
.277
206
.275
207
.273
208
.272
209
.272
210
.271
211
.269
212
.268
213
.261
214
.256
215
.251
216
.249
217
.246
218
.236
219
.235
220
.228
221
.216
222
.213
223
.211
224
.203
225
.196
226
.193
227
.188
228
.028
229
.028
230
.006
231
.006
232
.004
233
.003
234
.002
235
.001
236
.001
237
.001
238
.001
239
.001
240
.001
241
.001
242
.001
243
.001
244
.001
245
.001
246
.001
247
.001
248
.001
249
.001
250
.001
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455

Benchmark bibtex

@ARTICLE{Hendrycks2019-di,
   title         = "Benchmarking Neural Network Robustness to Common Corruptions
                    and Perturbations",
   author        = "Hendrycks, Dan and Dietterich, Thomas",
   abstract      = "In this paper we establish rigorous benchmarks for image
                    classifier robustness. Our first benchmark, ImageNet-C,
                    standardizes and expands the corruption robustness topic,
                    while showing which classifiers are preferable in
                    safety-critical applications. Then we propose a new dataset
                    called ImageNet-P which enables researchers to benchmark a
                    classifier's robustness to common perturbations. Unlike
                    recent robustness research, this benchmark evaluates
                    performance on common corruptions and perturbations not
                    worst-case adversarial perturbations. We find that there are
                    negligible changes in relative corruption robustness from
                    AlexNet classifiers to ResNet classifiers. Afterward we
                    discover ways to enhance corruption and perturbation
                    robustness. We even find that a bypassed adversarial defense
                    provides substantial common perturbation robustness.
                    Together our benchmarks may aid future work toward networks
                    that robustly generalize.",
   month         =  mar,
   year          =  2019,
   archivePrefix = "arXiv",
   primaryClass  = "cs.LG",
   eprint        = "1903.12261",
   url           = "https://arxiv.org/abs/1903.12261"
}

Ceiling

1.00.

Note that scores are relative to this ceiling.

Data: ImageNet-C-digital

Metric: top1