Sample stimuli

sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9

How to use

from brainscore_vision import load_benchmark
benchmark = load_benchmark("ImageNet-C-digital-top1")
score = benchmark(my_model)

Model scores

Min Alignment Max Alignment

Rank

Model

Score

1
.731
2
.710
3
.699
4
.676
5
.671
6
.671
7
.670
8
.664
9
.656
10
.654
11
.649
12
.646
13
.645
14
.641
15
.638
16
.638
17
.632
18
.631
19
.619
20
.617
21
.615
22
.605
23
.601
24
.598
25
.598
26
.594
27
.591
28
.584
29
.581
30
.580
31
.580
32
.575
33
.575
34
.573
35
.572
36
.571
37
.568
38
.568
39
.567
40
.566
41
.562
42
.562
43
.561
44
.560
45
.560
46
.557
47
.555
48
.546
49
.541
50
.538
51
.534
52
.533
53
.530
54
.530
55
.529
56
.528
57
.528
58
.527
59
.526
60
.523
61
.520
62
.519
63
.516
64
.515
65
.514
66
.511
67
.508
68
.507
69
.507
70
.507
71
.506
72
.505
73
.504
74
.504
75
.502
76
.502
77
.501
78
.501
79
.501
80
.495
81
.493
82
.490
83
.489
84
.488
85
.488
86
.485
87
.485
88
.485
89
.484
90
.480
91
.477
92
.474
93
.473
94
.473
95
.472
96
.470
97
.468
98
.460
99
.458
100
.450
101
.449
102
.449
103
.449
104
.444
105
.439
106
.437
107
.435
108
.433
109
.432
110
.431
111
.428
112
.426
113
.421
114
.419
115
.417
116
.409
117
.407
118
.407
119
.405
120
.404
121
.402
122
.399
123
.398
124
.393
125
.392
126
.391
127
.391
128
.391
129
.391
130
.390
131
.388
132
.388
133
.387
134
.386
135
.385
136
.384
137
.383
138
.383
139
.381
140
.379
141
.378
142
.377
143
.377
144
.375
145
.374
146
.373
147
.373
148
.372
149
.370
150
.369
151
.369
152
.368
153
.365
154
.360
155
.358
156
.357
157
.355
158
.350
159
.350
160
.348
161
.347
162
.342
163
.339
164
.338
165
.334
166
.333
167
.332
168
.328
169
.326
170
.324
171
.321
172
.319
173
.297
174
.297
175
.295
176
.295
177
.295
178
.295
179
.294
180
.291
181
.291
182
.291
183
.291
184
.290
185
.290
186
.290
187
.290
188
.288
189
.287
190
.278
191
.277
192
.277
193
.275
194
.273
195
.272
196
.272
197
.271
198
.269
199
.268
200
.266
201
.261
202
.256
203
.251
204
.249
205
.246
206
.236
207
.228
208
.216
209
.213
210
.211
211
.203
212
.196
213
.193
214
.188
215
.028
216
.028
217
.006
218
.006
219
.004
220
.003
221
.002
222
.001
223
.001
224
.001
225
.001
226
.001
227
.001
228
.001
229
.001
230
.001
231
.001
232
.001
233
.001
234
.001
235
.001
236
.001
237
.001
238
.001
239
.001
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418

Benchmark bibtex

@ARTICLE{Hendrycks2019-di,
   title         = "Benchmarking Neural Network Robustness to Common Corruptions
                    and Perturbations",
   author        = "Hendrycks, Dan and Dietterich, Thomas",
   abstract      = "In this paper we establish rigorous benchmarks for image
                    classifier robustness. Our first benchmark, ImageNet-C,
                    standardizes and expands the corruption robustness topic,
                    while showing which classifiers are preferable in
                    safety-critical applications. Then we propose a new dataset
                    called ImageNet-P which enables researchers to benchmark a
                    classifier's robustness to common perturbations. Unlike
                    recent robustness research, this benchmark evaluates
                    performance on common corruptions and perturbations not
                    worst-case adversarial perturbations. We find that there are
                    negligible changes in relative corruption robustness from
                    AlexNet classifiers to ResNet classifiers. Afterward we
                    discover ways to enhance corruption and perturbation
                    robustness. We even find that a bypassed adversarial defense
                    provides substantial common perturbation robustness.
                    Together our benchmarks may aid future work toward networks
                    that robustly generalize.",
   month         =  mar,
   year          =  2019,
   archivePrefix = "arXiv",
   primaryClass  = "cs.LG",
   eprint        = "1903.12261",
   url           = "https://arxiv.org/abs/1903.12261"
}

Ceiling

1.00.

Note that scores are relative to this ceiling.

Data: ImageNet-C-digital

Metric: top1