Sample stimuli

sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9

How to use

from brainscore_vision import load_benchmark
benchmark = load_benchmark("ImageNet-C-weather-top1")
score = benchmark(my_model)

Model scores

Min Alignment Max Alignment

Rank

Model

Score

1
.726
2
.720
3
.685
4
.678
5
.677
6
.675
7
.673
8
.672
9
.670
10
.667
11
.657
12
.648
13
.646
14
.645
15
.633
16
.631
17
.624
18
.606
19
.605
20
.602
21
.601
22
.597
23
.596
24
.594
25
.591
26
.589
27
.588
28
.587
29
.585
30
.582
31
.575
32
.573
33
.572
34
.559
35
.553
36
.553
37
.546
38
.540
39
.540
40
.540
41
.537
42
.529
43
.528
44
.528
45
.528
46
.520
47
.518
48
.516
49
.515
50
.513
51
.511
52
.504
53
.504
54
.503
55
.503
56
.501
57
.499
58
.495
59
.489
60
.488
61
.487
62
.484
63
.482
64
.481
65
.479
66
.474
67
.473
68
.472
69
.472
70
.471
71
.471
72
.466
73
.465
74
.461
75
.459
76
.458
77
.456
78
.455
79
.455
80
.453
81
.453
82
.450
83
.449
84
.448
85
.447
86
.445
87
.444
88
.442
89
.436
90
.433
91
.432
92
.430
93
.429
94
.429
95
.429
96
.421
97
.420
98
.419
99
.418
100
.416
101
.415
102
.413
103
.410
104
.403
105
.403
106
.402
107
.399
108
.393
109
.391
110
.390
111
.387
112
.386
113
.385
114
.379
115
.377
116
.377
117
.373
118
.369
119
.368
120
.367
121
.366
122
.362
123
.360
124
.357
125
.355
126
.354
127
.353
128
.353
129
.351
130
.350
131
.350
132
.350
133
.347
134
.344
135
.344
136
.344
137
.343
138
.343
139
.343
140
.340
141
.340
142
.340
143
.337
144
.336
145
.333
146
.330
147
.327
148
.326
149
.321
150
.319
151
.317
152
.315
153
.314
154
.310
155
.309
156
.304
157
.298
158
.295
159
.293
160
.284
161
.284
162
.283
163
.276
164
.275
165
.269
166
.269
167
.269
168
.269
169
.266
170
.262
171
.260
172
.255
173
.255
174
.255
175
.255
176
.252
177
.244
178
.243
179
.243
180
.242
181
.231
182
.230
183
.228
184
.228
185
.228
186
.228
187
.228
188
.228
189
.226
190
.224
191
.224
192
.220
193
.207
194
.201
195
.199
196
.198
197
.195
198
.188
199
.187
200
.186
201
.166
202
.159
203
.152
204
.145
205
.143
206
.142
207
.138
208
.136
209
.131
210
.126
211
.086
212
.008
213
.008
214
.006
215
.004
216
.003
217
.002
218
.002
219
.001
220
.001
221
.001
222
.001
223
.001
224
.001
225
.001
226
.001
227
.001
228
.001
229
.001
230
.001
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420

Benchmark bibtex

@ARTICLE{Hendrycks2019-di,
   title         = "Benchmarking Neural Network Robustness to Common Corruptions
                    and Perturbations",
   author        = "Hendrycks, Dan and Dietterich, Thomas",
   abstract      = "In this paper we establish rigorous benchmarks for image
                    classifier robustness. Our first benchmark, ImageNet-C,
                    standardizes and expands the corruption robustness topic,
                    while showing which classifiers are preferable in
                    safety-critical applications. Then we propose a new dataset
                    called ImageNet-P which enables researchers to benchmark a
                    classifier's robustness to common perturbations. Unlike
                    recent robustness research, this benchmark evaluates
                    performance on common corruptions and perturbations not
                    worst-case adversarial perturbations. We find that there are
                    negligible changes in relative corruption robustness from
                    AlexNet classifiers to ResNet classifiers. Afterward we
                    discover ways to enhance corruption and perturbation
                    robustness. We even find that a bypassed adversarial defense
                    provides substantial common perturbation robustness.
                    Together our benchmarks may aid future work toward networks
                    that robustly generalize.",
   month         =  mar,
   year          =  2019,
   archivePrefix = "arXiv",
   primaryClass  = "cs.LG",
   eprint        = "1903.12261",
   url           = "https://arxiv.org/abs/1903.12261"
}

Ceiling

1.00.

Note that scores are relative to this ceiling.

Data: ImageNet-C-weather

Metric: top1