Sample stimuli

sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9

How to use

from brainscore_vision import load_benchmark
benchmark = load_benchmark("ImageNet-C-digital-top1")
score = benchmark(my_model)

Model scores

Min Alignment Max Alignment

Rank

Model

Score

1
.731
2
.710
3
.699
4
.676
5
.671
6
.671
7
.670
8
.664
9
.656
10
.654
11
.649
12
.646
13
.645
14
.641
15
.638
16
.638
17
.632
18
.631
19
.619
20
.617
21
.615
22
.605
23
.601
24
.598
25
.598
26
.594
27
.591
28
.584
29
.581
30
.580
31
.580
32
.575
33
.575
34
.573
35
.572
36
.571
37
.568
38
.568
39
.567
40
.566
41
.562
42
.562
43
.561
44
.560
45
.560
46
.557
47
.555
48
.546
49
.541
50
.538
51
.534
52
.533
53
.530
54
.530
55
.529
56
.528
57
.528
58
.527
59
.526
60
.523
61
.520
62
.519
63
.516
64
.515
65
.514
66
.511
67
.508
68
.507
69
.507
70
.507
71
.506
72
.505
73
.504
74
.504
75
.502
76
.502
77
.501
78
.501
79
.501
80
.495
81
.493
82
.490
83
.489
84
.488
85
.488
86
.485
87
.485
88
.485
89
.484
90
.480
91
.477
92
.474
93
.473
94
.473
95
.472
96
.470
97
.468
98
.460
99
.458
100
.450
101
.449
102
.449
103
.449
104
.444
105
.439
106
.437
107
.435
108
.433
109
.432
110
.431
111
.428
112
.426
113
.421
114
.419
115
.417
116
.409
117
.407
118
.407
119
.405
120
.404
121
.402
122
.399
123
.398
124
.393
125
.392
126
.391
127
.391
128
.391
129
.391
130
.390
131
.388
132
.388
133
.387
134
.386
135
.385
136
.384
137
.383
138
.383
139
.381
140
.379
141
.378
142
.377
143
.377
144
.375
145
.374
146
.373
147
.373
148
.372
149
.370
150
.369
151
.369
152
.368
153
.365
154
.360
155
.358
156
.357
157
.355
158
.350
159
.350
160
.348
161
.347
162
.342
163
.339
164
.338
165
.334
166
.333
167
.332
168
.328
169
.326
170
.324
171
.321
172
.319
173
.297
174
.297
175
.295
176
.294
177
.291
178
.291
179
.291
180
.291
181
.288
182
.287
183
.278
184
.277
185
.277
186
.275
187
.273
188
.272
189
.272
190
.271
191
.269
192
.268
193
.266
194
.261
195
.256
196
.251
197
.249
198
.246
199
.236
200
.228
201
.216
202
.213
203
.211
204
.203
205
.196
206
.193
207
.188
208
.028
209
.028
210
.006
211
.006
212
.004
213
.003
214
.002
215
.001
216
.001
217
.001
218
.001
219
.001
220
.001
221
.001
222
.001
223
.001
224
.001
225
.001
226
.001
227
.001
228
.001
229
.001
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405

Benchmark bibtex

@ARTICLE{Hendrycks2019-di,
   title         = "Benchmarking Neural Network Robustness to Common Corruptions
                    and Perturbations",
   author        = "Hendrycks, Dan and Dietterich, Thomas",
   abstract      = "In this paper we establish rigorous benchmarks for image
                    classifier robustness. Our first benchmark, ImageNet-C,
                    standardizes and expands the corruption robustness topic,
                    while showing which classifiers are preferable in
                    safety-critical applications. Then we propose a new dataset
                    called ImageNet-P which enables researchers to benchmark a
                    classifier's robustness to common perturbations. Unlike
                    recent robustness research, this benchmark evaluates
                    performance on common corruptions and perturbations not
                    worst-case adversarial perturbations. We find that there are
                    negligible changes in relative corruption robustness from
                    AlexNet classifiers to ResNet classifiers. Afterward we
                    discover ways to enhance corruption and perturbation
                    robustness. We even find that a bypassed adversarial defense
                    provides substantial common perturbation robustness.
                    Together our benchmarks may aid future work toward networks
                    that robustly generalize.",
   month         =  mar,
   year          =  2019,
   archivePrefix = "arXiv",
   primaryClass  = "cs.LG",
   eprint        = "1903.12261",
   url           = "https://arxiv.org/abs/1903.12261"
}

Ceiling

1.00.

Note that scores are relative to this ceiling.

Data: ImageNet-C-digital

Metric: top1