Sample stimuli

sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9

How to use

from brainscore_vision import load_benchmark
benchmark = load_benchmark("ImageNet-C-weather-top1")
score = benchmark(my_model)

Model scores

Min Alignment Max Alignment

Rank

Model

Score

1
.726
2
.720
3
.685
4
.678
5
.677
6
.675
7
.673
8
.672
9
.670
10
.667
11
.657
12
.648
13
.646
14
.645
15
.633
16
.631
17
.624
18
.606
19
.605
20
.602
21
.601
22
.597
23
.596
24
.594
25
.591
26
.589
27
.588
28
.585
29
.582
30
.575
31
.573
32
.572
33
.559
34
.553
35
.553
36
.546
37
.540
38
.540
39
.540
40
.537
41
.529
42
.528
43
.528
44
.528
45
.520
46
.518
47
.516
48
.515
49
.513
50
.511
51
.504
52
.504
53
.503
54
.501
55
.499
56
.495
57
.489
58
.488
59
.487
60
.484
61
.482
62
.481
63
.479
64
.474
65
.473
66
.472
67
.472
68
.471
69
.471
70
.466
71
.465
72
.461
73
.459
74
.458
75
.456
76
.455
77
.455
78
.453
79
.453
80
.450
81
.449
82
.448
83
.447
84
.445
85
.444
86
.442
87
.436
88
.433
89
.432
90
.430
91
.429
92
.429
93
.429
94
.421
95
.420
96
.419
97
.418
98
.416
99
.415
100
.413
101
.410
102
.403
103
.403
104
.402
105
.399
106
.393
107
.391
108
.390
109
.387
110
.386
111
.385
112
.379
113
.377
114
.377
115
.373
116
.369
117
.368
118
.367
119
.366
120
.362
121
.357
122
.355
123
.354
124
.353
125
.353
126
.351
127
.350
128
.350
129
.350
130
.347
131
.344
132
.344
133
.344
134
.343
135
.343
136
.343
137
.340
138
.340
139
.340
140
.337
141
.336
142
.333
143
.330
144
.327
145
.326
146
.321
147
.319
148
.317
149
.315
150
.314
151
.310
152
.309
153
.304
154
.298
155
.295
156
.293
157
.284
158
.284
159
.283
160
.276
161
.275
162
.269
163
.269
164
.269
165
.269
166
.266
167
.262
168
.260
169
.255
170
.252
171
.244
172
.243
173
.243
174
.242
175
.231
176
.230
177
.229
178
.229
179
.229
180
.229
181
.228
182
.226
183
.224
184
.224
185
.220
186
.212
187
.207
188
.201
189
.199
190
.198
191
.195
192
.188
193
.187
194
.186
195
.166
196
.159
197
.152
198
.145
199
.143
200
.142
201
.138
202
.136
203
.131
204
.126
205
.086
206
.008
207
.008
208
.006
209
.004
210
.003
211
.002
212
.002
213
.001
214
.001
215
.001
216
.001
217
.001
218
.001
219
.001
220
.001
221
.001
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395

Benchmark bibtex

@ARTICLE{Hendrycks2019-di,
   title         = "Benchmarking Neural Network Robustness to Common Corruptions
                    and Perturbations",
   author        = "Hendrycks, Dan and Dietterich, Thomas",
   abstract      = "In this paper we establish rigorous benchmarks for image
                    classifier robustness. Our first benchmark, ImageNet-C,
                    standardizes and expands the corruption robustness topic,
                    while showing which classifiers are preferable in
                    safety-critical applications. Then we propose a new dataset
                    called ImageNet-P which enables researchers to benchmark a
                    classifier's robustness to common perturbations. Unlike
                    recent robustness research, this benchmark evaluates
                    performance on common corruptions and perturbations not
                    worst-case adversarial perturbations. We find that there are
                    negligible changes in relative corruption robustness from
                    AlexNet classifiers to ResNet classifiers. Afterward we
                    discover ways to enhance corruption and perturbation
                    robustness. We even find that a bypassed adversarial defense
                    provides substantial common perturbation robustness.
                    Together our benchmarks may aid future work toward networks
                    that robustly generalize.",
   month         =  mar,
   year          =  2019,
   archivePrefix = "arXiv",
   primaryClass  = "cs.LG",
   eprint        = "1903.12261",
   url           = "https://arxiv.org/abs/1903.12261"
}

Ceiling

1.00.

Note that scores are relative to this ceiling.

Data: ImageNet-C-weather

Metric: top1