Sample stimuli

sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9

How to use

from brainscore_vision import load_benchmark
benchmark = load_benchmark("ImageNet-C-noise-top1")
score = benchmark(my_model)

Model scores

Min Alignment Max Alignment

Rank

Model

Score

1
.742
2
.715
3
.708
4
.708
5
.700
6
.692
7
.689
8
.671
9
.670
10
.651
11
.636
12
.634
13
.633
14
.631
15
.630
16
.618
17
.614
18
.608
19
.597
20
.595
21
.594
22
.593
23
.589
24
.587
25
.582
26
.577
27
.570
28
.566
29
.561
30
.555
31
.554
32
.548
33
.548
34
.548
35
.539
36
.538
37
.534
38
.532
39
.527
40
.526
41
.523
42
.523
43
.517
44
.517
45
.514
46
.513
47
.513
48
.512
49
.511
50
.504
51
.499
52
.491
53
.484
54
.482
55
.476
56
.470
57
.462
58
.457
59
.456
60
.454
61
.448
62
.444
63
.442
64
.428
65
.426
66
.420
67
.413
68
.412
69
.408
70
.405
71
.404
72
.400
73
.399
74
.399
75
.396
76
.391
77
.390
78
.389
79
.389
80
.388
81
.388
82
.388
83
.387
84
.386
85
.385
86
.381
87
.381
88
.381
89
.381
90
.381
91
.379
92
.379
93
.378
94
.377
95
.376
96
.376
97
.375
98
.374
99
.373
100
.369
101
.369
102
.367
103
.366
104
.366
105
.366
106
.365
107
.364
108
.363
109
.363
110
.362
111
.360
112
.360
113
.358
114
.358
115
.357
116
.355
117
.353
118
.349
119
.344
120
.344
121
.344
122
.341
123
.338
124
.337
125
.334
126
.332
127
.330
128
.327
129
.327
130
.325
131
.317
132
.316
133
.309
134
.304
135
.304
136
.302
137
.296
138
.284
139
.284
140
.283
141
.282
142
.272
143
.269
144
.267
145
.265
146
.262
147
.262
148
.260
149
.256
150
.253
151
.253
152
.252
153
.252
154
.250
155
.248
156
.248
157
.248
158
.245
159
.244
160
.243
161
.242
162
.239
163
.237
164
.228
165
.228
166
.228
167
.228
168
.228
169
.227
170
.227
171
.226
172
.222
173
.217
174
.215
175
.214
176
.210
177
.210
178
.209
179
.209
180
.207
181
.204
182
.201
183
.199
184
.199
185
.196
186
.196
187
.195
188
.194
189
.194
190
.194
191
.194
192
.194
193
.194
194
.193
195
.193
196
.186
197
.186
198
.184
199
.182
200
.178
201
.178
202
.176
203
.174
204
.173
205
.170
206
.169
207
.166
208
.152
209
.151
210
.151
211
.148
212
.147
213
.141
214
.139
215
.139
216
.136
217
.131
218
.131
219
.130
220
.128
221
.125
222
.124
223
.123
224
.121
225
.118
226
.115
227
.114
228
.113
229
.111
230
.110
231
.109
232
.107
233
.105
234
.105
235
.104
236
.104
237
.104
238
.104
239
.100
240
.096
241
.092
242
.090
243
.087
244
.086
245
.079
246
.077
247
.077
248
.059
249
.053
250
.043
251
.012
252
.007
253
.006
254
.003
255
.003
256
.002
257
.001
258
.001
259
.001
260
.001
261
.001
262
.001
263
.001
264
.001
265
.001
266
.001
267
.001
268
.001
269
.001
270
.001
271
.000
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454

Benchmark bibtex

@ARTICLE{Hendrycks2019-di,
   title         = "Benchmarking Neural Network Robustness to Common Corruptions
                    and Perturbations",
   author        = "Hendrycks, Dan and Dietterich, Thomas",
   abstract      = "In this paper we establish rigorous benchmarks for image
                    classifier robustness. Our first benchmark, ImageNet-C,
                    standardizes and expands the corruption robustness topic,
                    while showing which classifiers are preferable in
                    safety-critical applications. Then we propose a new dataset
                    called ImageNet-P which enables researchers to benchmark a
                    classifier's robustness to common perturbations. Unlike
                    recent robustness research, this benchmark evaluates
                    performance on common corruptions and perturbations not
                    worst-case adversarial perturbations. We find that there are
                    negligible changes in relative corruption robustness from
                    AlexNet classifiers to ResNet classifiers. Afterward we
                    discover ways to enhance corruption and perturbation
                    robustness. We even find that a bypassed adversarial defense
                    provides substantial common perturbation robustness.
                    Together our benchmarks may aid future work toward networks
                    that robustly generalize.",
   month         =  mar,
   year          =  2019,
   archivePrefix = "arXiv",
   primaryClass  = "cs.LG",
   eprint        = "1903.12261",
   url           = "https://arxiv.org/abs/1903.12261"
}

Ceiling

1.00.

Note that scores are relative to this ceiling.

Data: ImageNet-C-noise

Metric: top1