Sample stimuli

sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9

How to use

from brainscore_vision import load_benchmark
benchmark = load_benchmark("ImageNet-C-weather-top1")
score = benchmark(my_model)

Model scores

Min Alignment Max Alignment

Rank

Model

Score

1
.726
2
.720
3
.718
4
.716
5
.685
6
.678
7
.677
8
.675
9
.673
10
.672
11
.672
12
.671
13
.670
14
.667
15
.657
16
.648
17
.646
18
.645
19
.633
20
.631
21
.624
22
.606
23
.605
24
.602
25
.601
26
.597
27
.596
28
.594
29
.594
30
.591
31
.589
32
.588
33
.587
34
.585
35
.582
36
.575
37
.573
38
.572
39
.570
40
.559
41
.553
42
.553
43
.546
44
.540
45
.540
46
.540
47
.537
48
.529
49
.528
50
.528
51
.528
52
.520
53
.518
54
.516
55
.515
56
.513
57
.511
58
.510
59
.509
60
.509
61
.504
62
.504
63
.503
64
.503
65
.501
66
.499
67
.495
68
.489
69
.488
70
.487
71
.487
72
.486
73
.486
74
.486
75
.484
76
.483
77
.483
78
.482
79
.481
80
.480
81
.480
82
.479
83
.474
84
.473
85
.472
86
.472
87
.471
88
.471
89
.470
90
.469
91
.469
92
.468
93
.467
94
.466
95
.465
96
.465
97
.465
98
.461
99
.459
100
.458
101
.456
102
.455
103
.455
104
.453
105
.453
106
.450
107
.449
108
.448
109
.447
110
.445
111
.444
112
.442
113
.436
114
.433
115
.432
116
.430
117
.429
118
.429
119
.429
120
.422
121
.421
122
.420
123
.419
124
.418
125
.416
126
.415
127
.413
128
.410
129
.403
130
.403
131
.402
132
.399
133
.393
134
.391
135
.390
136
.387
137
.386
138
.385
139
.379
140
.377
141
.377
142
.373
143
.369
144
.368
145
.367
146
.366
147
.362
148
.360
149
.357
150
.355
151
.354
152
.353
153
.353
154
.351
155
.350
156
.350
157
.350
158
.347
159
.344
160
.344
161
.344
162
.343
163
.343
164
.343
165
.340
166
.340
167
.340
168
.337
169
.336
170
.333
171
.330
172
.327
173
.326
174
.321
175
.319
176
.317
177
.315
178
.314
179
.310
180
.309
181
.304
182
.298
183
.295
184
.293
185
.284
186
.284
187
.283
188
.276
189
.275
190
.269
191
.269
192
.269
193
.269
194
.266
195
.262
196
.260
197
.255
198
.255
199
.255
200
.255
201
.252
202
.244
203
.243
204
.243
205
.242
206
.231
207
.230
208
.228
209
.228
210
.228
211
.228
212
.228
213
.228
214
.226
215
.224
216
.224
217
.220
218
.207
219
.201
220
.199
221
.198
222
.195
223
.188
224
.187
225
.186
226
.166
227
.159
228
.152
229
.145
230
.143
231
.142
232
.138
233
.136
234
.131
235
.126
236
.086
237
.008
238
.008
239
.006
240
.004
241
.003
242
.002
243
.002
244
.001
245
.001
246
.001
247
.001
248
.001
249
.001
250
.001
251
.001
252
.001
253
.001
254
.001
255
.001
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456

Benchmark bibtex

@ARTICLE{Hendrycks2019-di,
   title         = "Benchmarking Neural Network Robustness to Common Corruptions
                    and Perturbations",
   author        = "Hendrycks, Dan and Dietterich, Thomas",
   abstract      = "In this paper we establish rigorous benchmarks for image
                    classifier robustness. Our first benchmark, ImageNet-C,
                    standardizes and expands the corruption robustness topic,
                    while showing which classifiers are preferable in
                    safety-critical applications. Then we propose a new dataset
                    called ImageNet-P which enables researchers to benchmark a
                    classifier's robustness to common perturbations. Unlike
                    recent robustness research, this benchmark evaluates
                    performance on common corruptions and perturbations not
                    worst-case adversarial perturbations. We find that there are
                    negligible changes in relative corruption robustness from
                    AlexNet classifiers to ResNet classifiers. Afterward we
                    discover ways to enhance corruption and perturbation
                    robustness. We even find that a bypassed adversarial defense
                    provides substantial common perturbation robustness.
                    Together our benchmarks may aid future work toward networks
                    that robustly generalize.",
   month         =  mar,
   year          =  2019,
   archivePrefix = "arXiv",
   primaryClass  = "cs.LG",
   eprint        = "1903.12261",
   url           = "https://arxiv.org/abs/1903.12261"
}

Ceiling

1.00.

Note that scores are relative to this ceiling.

Data: ImageNet-C-weather

Metric: top1