Sample stimuli

sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9

How to use

from brainscore_vision import load_benchmark
benchmark = load_benchmark("ImageNet-C-digital-top1")
score = benchmark(my_model)

Model scores

Min Alignment Max Alignment

Rank

Model

Score

1
.731
2
.710
3
.699
4
.676
5
.671
6
.671
7
.670
8
.664
9
.656
10
.654
11
.649
12
.646
13
.645
14
.641
15
.638
16
.638
17
.632
18
.631
19
.619
20
.617
21
.615
22
.605
23
.601
24
.598
25
.598
26
.594
27
.591
28
.584
29
.581
30
.580
31
.580
32
.575
33
.575
34
.573
35
.572
36
.571
37
.568
38
.568
39
.567
40
.566
41
.562
42
.562
43
.561
44
.560
45
.560
46
.557
47
.555
48
.546
49
.541
50
.538
51
.534
52
.533
53
.530
54
.530
55
.529
56
.528
57
.528
58
.527
59
.526
60
.520
61
.519
62
.516
63
.515
64
.514
65
.511
66
.508
67
.507
68
.507
69
.507
70
.506
71
.505
72
.504
73
.504
74
.502
75
.502
76
.501
77
.501
78
.501
79
.495
80
.493
81
.490
82
.489
83
.488
84
.488
85
.485
86
.485
87
.485
88
.484
89
.480
90
.477
91
.474
92
.473
93
.473
94
.472
95
.470
96
.468
97
.460
98
.458
99
.450
100
.449
101
.449
102
.449
103
.444
104
.439
105
.437
106
.435
107
.433
108
.432
109
.431
110
.428
111
.426
112
.421
113
.419
114
.417
115
.409
116
.407
117
.407
118
.405
119
.404
120
.402
121
.399
122
.398
123
.393
124
.392
125
.391
126
.391
127
.391
128
.391
129
.390
130
.388
131
.388
132
.387
133
.386
134
.385
135
.384
136
.383
137
.383
138
.381
139
.379
140
.378
141
.377
142
.377
143
.375
144
.374
145
.373
146
.373
147
.372
148
.370
149
.369
150
.369
151
.368
152
.365
153
.360
154
.358
155
.357
156
.355
157
.350
158
.350
159
.348
160
.347
161
.342
162
.339
163
.338
164
.334
165
.333
166
.332
167
.328
168
.326
169
.324
170
.321
171
.319
172
.297
173
.297
174
.295
175
.294
176
.291
177
.291
178
.291
179
.291
180
.288
181
.287
182
.278
183
.277
184
.277
185
.275
186
.273
187
.272
188
.272
189
.271
190
.269
191
.268
192
.266
193
.261
194
.256
195
.251
196
.249
197
.246
198
.236
199
.228
200
.216
201
.213
202
.211
203
.203
204
.196
205
.193
206
.188
207
.028
208
.028
209
.006
210
.006
211
.004
212
.003
213
.002
214
.001
215
.001
216
.001
217
.001
218
.001
219
.001
220
.001
221
.001
222
.001
223
.001
224
.001
225
.001
226
.001
227
.001
228
.001
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396

Benchmark bibtex

@ARTICLE{Hendrycks2019-di,
   title         = "Benchmarking Neural Network Robustness to Common Corruptions
                    and Perturbations",
   author        = "Hendrycks, Dan and Dietterich, Thomas",
   abstract      = "In this paper we establish rigorous benchmarks for image
                    classifier robustness. Our first benchmark, ImageNet-C,
                    standardizes and expands the corruption robustness topic,
                    while showing which classifiers are preferable in
                    safety-critical applications. Then we propose a new dataset
                    called ImageNet-P which enables researchers to benchmark a
                    classifier's robustness to common perturbations. Unlike
                    recent robustness research, this benchmark evaluates
                    performance on common corruptions and perturbations not
                    worst-case adversarial perturbations. We find that there are
                    negligible changes in relative corruption robustness from
                    AlexNet classifiers to ResNet classifiers. Afterward we
                    discover ways to enhance corruption and perturbation
                    robustness. We even find that a bypassed adversarial defense
                    provides substantial common perturbation robustness.
                    Together our benchmarks may aid future work toward networks
                    that robustly generalize.",
   month         =  mar,
   year          =  2019,
   archivePrefix = "arXiv",
   primaryClass  = "cs.LG",
   eprint        = "1903.12261",
   url           = "https://arxiv.org/abs/1903.12261"
}

Ceiling

1.00.

Note that scores are relative to this ceiling.

Data: ImageNet-C-digital

Metric: top1