Sample stimuli

sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9

How to use

from brainscore_vision import load_benchmark
benchmark = load_benchmark("ImageNet-C-noise-top1")
score = benchmark(my_model)

Model scores

Min Alignment Max Alignment

Rank

Model

Score

1
.742
2
.715
3
.708
4
.708
5
.700
6
.692
7
.689
8
.671
9
.670
10
.651
11
.636
12
.634
13
.633
14
.631
15
.630
16
.618
17
.614
18
.608
19
.597
20
.595
21
.594
22
.593
23
.589
24
.587
25
.582
26
.577
27
.570
28
.566
29
.561
30
.555
31
.554
32
.548
33
.548
34
.548
35
.539
36
.538
37
.534
38
.532
39
.527
40
.526
41
.523
42
.523
43
.517
44
.517
45
.514
46
.513
47
.513
48
.512
49
.511
50
.504
51
.499
52
.491
53
.484
54
.482
55
.476
56
.470
57
.462
58
.457
59
.456
60
.454
61
.448
62
.444
63
.442
64
.428
65
.426
66
.420
67
.413
68
.412
69
.408
70
.405
71
.404
72
.400
73
.399
74
.399
75
.396
76
.391
77
.390
78
.389
79
.389
80
.388
81
.388
82
.388
83
.387
84
.386
85
.385
86
.381
87
.381
88
.381
89
.379
90
.378
91
.378
92
.376
93
.375
94
.374
95
.373
96
.369
97
.369
98
.366
99
.366
100
.365
101
.364
102
.363
103
.363
104
.362
105
.360
106
.358
107
.357
108
.355
109
.353
110
.349
111
.344
112
.344
113
.344
114
.341
115
.338
116
.337
117
.334
118
.332
119
.330
120
.327
121
.327
122
.325
123
.317
124
.316
125
.309
126
.304
127
.304
128
.302
129
.296
130
.284
131
.284
132
.283
133
.282
134
.272
135
.269
136
.267
137
.265
138
.262
139
.262
140
.260
141
.256
142
.253
143
.253
144
.252
145
.252
146
.250
147
.248
148
.248
149
.248
150
.245
151
.244
152
.243
153
.242
154
.239
155
.237
156
.228
157
.228
158
.228
159
.228
160
.228
161
.227
162
.227
163
.226
164
.222
165
.217
166
.215
167
.214
168
.210
169
.210
170
.209
171
.209
172
.207
173
.204
174
.201
175
.199
176
.199
177
.196
178
.196
179
.195
180
.194
181
.194
182
.194
183
.194
184
.194
185
.194
186
.193
187
.193
188
.186
189
.186
190
.184
191
.182
192
.178
193
.178
194
.176
195
.174
196
.173
197
.170
198
.169
199
.166
200
.152
201
.151
202
.151
203
.148
204
.147
205
.141
206
.139
207
.139
208
.136
209
.131
210
.131
211
.130
212
.128
213
.125
214
.124
215
.123
216
.121
217
.118
218
.115
219
.114
220
.113
221
.111
222
.110
223
.109
224
.107
225
.105
226
.105
227
.104
228
.104
229
.104
230
.104
231
.100
232
.096
233
.092
234
.090
235
.087
236
.086
237
.079
238
.077
239
.077
240
.059
241
.053
242
.043
243
.012
244
.007
245
.006
246
.003
247
.003
248
.002
249
.001
250
.001
251
.001
252
.001
253
.001
254
.001
255
.001
256
.001
257
.001
258
.001
259
.001
260
.001
261
.000
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455

Benchmark bibtex

@ARTICLE{Hendrycks2019-di,
   title         = "Benchmarking Neural Network Robustness to Common Corruptions
                    and Perturbations",
   author        = "Hendrycks, Dan and Dietterich, Thomas",
   abstract      = "In this paper we establish rigorous benchmarks for image
                    classifier robustness. Our first benchmark, ImageNet-C,
                    standardizes and expands the corruption robustness topic,
                    while showing which classifiers are preferable in
                    safety-critical applications. Then we propose a new dataset
                    called ImageNet-P which enables researchers to benchmark a
                    classifier's robustness to common perturbations. Unlike
                    recent robustness research, this benchmark evaluates
                    performance on common corruptions and perturbations not
                    worst-case adversarial perturbations. We find that there are
                    negligible changes in relative corruption robustness from
                    AlexNet classifiers to ResNet classifiers. Afterward we
                    discover ways to enhance corruption and perturbation
                    robustness. We even find that a bypassed adversarial defense
                    provides substantial common perturbation robustness.
                    Together our benchmarks may aid future work toward networks
                    that robustly generalize.",
   month         =  mar,
   year          =  2019,
   archivePrefix = "arXiv",
   primaryClass  = "cs.LG",
   eprint        = "1903.12261",
   url           = "https://arxiv.org/abs/1903.12261"
}

Ceiling

1.00.

Note that scores are relative to this ceiling.

Data: ImageNet-C-noise

Metric: top1