Sample stimuli

sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9

How to use

from brainscore_vision import load_benchmark
benchmark = load_benchmark("ImageNet-C-weather-top1")
score = benchmark(my_model)

Model scores

Min Alignment Max Alignment

Rank

Model

Score

1
.726
2
.720
3
.685
4
.678
5
.677
6
.675
7
.673
8
.672
9
.670
10
.667
11
.657
12
.648
13
.646
14
.645
15
.633
16
.631
17
.624
18
.606
19
.605
20
.602
21
.601
22
.597
23
.596
24
.594
25
.591
26
.589
27
.588
28
.585
29
.582
30
.575
31
.573
32
.572
33
.559
34
.553
35
.553
36
.546
37
.540
38
.540
39
.540
40
.537
41
.529
42
.528
43
.528
44
.528
45
.520
46
.518
47
.516
48
.515
49
.513
50
.511
51
.504
52
.504
53
.503
54
.503
55
.501
56
.499
57
.495
58
.489
59
.488
60
.487
61
.484
62
.482
63
.481
64
.479
65
.474
66
.473
67
.472
68
.472
69
.471
70
.471
71
.466
72
.465
73
.461
74
.459
75
.458
76
.456
77
.455
78
.455
79
.453
80
.453
81
.450
82
.449
83
.448
84
.447
85
.445
86
.444
87
.442
88
.436
89
.433
90
.432
91
.430
92
.429
93
.429
94
.429
95
.421
96
.420
97
.419
98
.418
99
.416
100
.415
101
.413
102
.410
103
.403
104
.403
105
.402
106
.399
107
.393
108
.391
109
.390
110
.387
111
.386
112
.385
113
.379
114
.377
115
.377
116
.373
117
.369
118
.368
119
.367
120
.366
121
.362
122
.357
123
.355
124
.354
125
.353
126
.353
127
.351
128
.350
129
.350
130
.350
131
.347
132
.344
133
.344
134
.344
135
.343
136
.343
137
.343
138
.340
139
.340
140
.340
141
.337
142
.336
143
.333
144
.330
145
.327
146
.326
147
.321
148
.319
149
.317
150
.315
151
.314
152
.310
153
.309
154
.304
155
.298
156
.295
157
.293
158
.284
159
.284
160
.283
161
.276
162
.275
163
.269
164
.269
165
.269
166
.269
167
.266
168
.262
169
.260
170
.255
171
.252
172
.244
173
.243
174
.243
175
.242
176
.231
177
.230
178
.229
179
.229
180
.229
181
.229
182
.228
183
.226
184
.224
185
.224
186
.220
187
.212
188
.207
189
.201
190
.199
191
.198
192
.195
193
.188
194
.187
195
.186
196
.166
197
.159
198
.152
199
.145
200
.143
201
.142
202
.138
203
.136
204
.131
205
.126
206
.086
207
.008
208
.008
209
.006
210
.004
211
.003
212
.002
213
.002
214
.001
215
.001
216
.001
217
.001
218
.001
219
.001
220
.001
221
.001
222
.001
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405

Benchmark bibtex

@ARTICLE{Hendrycks2019-di,
   title         = "Benchmarking Neural Network Robustness to Common Corruptions
                    and Perturbations",
   author        = "Hendrycks, Dan and Dietterich, Thomas",
   abstract      = "In this paper we establish rigorous benchmarks for image
                    classifier robustness. Our first benchmark, ImageNet-C,
                    standardizes and expands the corruption robustness topic,
                    while showing which classifiers are preferable in
                    safety-critical applications. Then we propose a new dataset
                    called ImageNet-P which enables researchers to benchmark a
                    classifier's robustness to common perturbations. Unlike
                    recent robustness research, this benchmark evaluates
                    performance on common corruptions and perturbations not
                    worst-case adversarial perturbations. We find that there are
                    negligible changes in relative corruption robustness from
                    AlexNet classifiers to ResNet classifiers. Afterward we
                    discover ways to enhance corruption and perturbation
                    robustness. We even find that a bypassed adversarial defense
                    provides substantial common perturbation robustness.
                    Together our benchmarks may aid future work toward networks
                    that robustly generalize.",
   month         =  mar,
   year          =  2019,
   archivePrefix = "arXiv",
   primaryClass  = "cs.LG",
   eprint        = "1903.12261",
   url           = "https://arxiv.org/abs/1903.12261"
}

Ceiling

1.00.

Note that scores are relative to this ceiling.

Data: ImageNet-C-weather

Metric: top1