Sample stimuli

sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9

How to use

from brainscore_vision import load_benchmark
benchmark = load_benchmark("ImageNet-C-weather-top1")
score = benchmark(my_model)

Model scores

Min Alignment Max Alignment

Rank

Model

Score

1
.726
2
.720
3
.718
4
.716
5
.685
6
.678
7
.677
8
.675
9
.673
10
.672
11
.672
12
.671
13
.670
14
.667
15
.657
16
.648
17
.646
18
.645
19
.633
20
.631
21
.624
22
.606
23
.605
24
.602
25
.601
26
.597
27
.596
28
.594
29
.594
30
.591
31
.589
32
.588
33
.587
34
.585
35
.582
36
.575
37
.573
38
.572
39
.570
40
.559
41
.553
42
.553
43
.546
44
.540
45
.540
46
.540
47
.537
48
.529
49
.528
50
.528
51
.528
52
.520
53
.518
54
.516
55
.515
56
.513
57
.511
58
.510
59
.509
60
.509
61
.504
62
.504
63
.503
64
.503
65
.501
66
.499
67
.495
68
.489
69
.489
70
.488
71
.487
72
.487
73
.486
74
.486
75
.484
76
.482
77
.481
78
.479
79
.474
80
.473
81
.472
82
.472
83
.471
84
.471
85
.470
86
.469
87
.466
88
.465
89
.465
90
.461
91
.459
92
.458
93
.456
94
.455
95
.455
96
.453
97
.453
98
.450
99
.449
100
.448
101
.447
102
.445
103
.444
104
.442
105
.436
106
.433
107
.432
108
.430
109
.429
110
.429
111
.429
112
.422
113
.421
114
.420
115
.419
116
.418
117
.416
118
.415
119
.413
120
.410
121
.403
122
.403
123
.402
124
.399
125
.393
126
.391
127
.390
128
.387
129
.386
130
.385
131
.379
132
.377
133
.377
134
.373
135
.369
136
.368
137
.367
138
.366
139
.362
140
.360
141
.357
142
.355
143
.354
144
.353
145
.353
146
.351
147
.350
148
.350
149
.350
150
.347
151
.344
152
.344
153
.344
154
.343
155
.343
156
.343
157
.340
158
.340
159
.340
160
.337
161
.336
162
.333
163
.330
164
.327
165
.326
166
.321
167
.319
168
.317
169
.315
170
.314
171
.310
172
.309
173
.304
174
.298
175
.295
176
.293
177
.284
178
.284
179
.283
180
.282
181
.276
182
.275
183
.269
184
.269
185
.269
186
.269
187
.266
188
.262
189
.260
190
.255
191
.255
192
.255
193
.255
194
.252
195
.244
196
.243
197
.243
198
.242
199
.231
200
.230
201
.228
202
.228
203
.228
204
.228
205
.228
206
.228
207
.226
208
.224
209
.224
210
.220
211
.207
212
.201
213
.199
214
.198
215
.195
216
.188
217
.187
218
.186
219
.166
220
.159
221
.152
222
.145
223
.143
224
.142
225
.138
226
.136
227
.131
228
.126
229
.086
230
.008
231
.008
232
.006
233
.004
234
.003
235
.002
236
.002
237
.001
238
.001
239
.001
240
.001
241
.001
242
.001
243
.001
244
.001
245
.001
246
.001
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455

Benchmark bibtex

@ARTICLE{Hendrycks2019-di,
   title         = "Benchmarking Neural Network Robustness to Common Corruptions
                    and Perturbations",
   author        = "Hendrycks, Dan and Dietterich, Thomas",
   abstract      = "In this paper we establish rigorous benchmarks for image
                    classifier robustness. Our first benchmark, ImageNet-C,
                    standardizes and expands the corruption robustness topic,
                    while showing which classifiers are preferable in
                    safety-critical applications. Then we propose a new dataset
                    called ImageNet-P which enables researchers to benchmark a
                    classifier's robustness to common perturbations. Unlike
                    recent robustness research, this benchmark evaluates
                    performance on common corruptions and perturbations not
                    worst-case adversarial perturbations. We find that there are
                    negligible changes in relative corruption robustness from
                    AlexNet classifiers to ResNet classifiers. Afterward we
                    discover ways to enhance corruption and perturbation
                    robustness. We even find that a bypassed adversarial defense
                    provides substantial common perturbation robustness.
                    Together our benchmarks may aid future work toward networks
                    that robustly generalize.",
   month         =  mar,
   year          =  2019,
   archivePrefix = "arXiv",
   primaryClass  = "cs.LG",
   eprint        = "1903.12261",
   url           = "https://arxiv.org/abs/1903.12261"
}

Ceiling

1.00.

Note that scores are relative to this ceiling.

Data: ImageNet-C-weather

Metric: top1