Sample stimuli

sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9

How to use

from brainscore_vision import load_benchmark
benchmark = load_benchmark("ImageNet-C-noise-top1")
score = benchmark(my_model)

Model scores

Min Alignment Max Alignment

Rank

Model

Score

1
.742
2
.715
3
.708
4
.700
5
.692
6
.670
7
.651
8
.636
9
.634
10
.633
11
.631
12
.630
13
.618
14
.614
15
.608
16
.597
17
.595
18
.594
19
.593
20
.587
21
.582
22
.577
23
.570
24
.566
25
.561
26
.555
27
.554
28
.548
29
.548
30
.548
31
.539
32
.538
33
.532
34
.527
35
.526
36
.523
37
.523
38
.517
39
.514
40
.513
41
.513
42
.512
43
.511
44
.504
45
.499
46
.491
47
.484
48
.482
49
.476
50
.470
51
.462
52
.457
53
.456
54
.454
55
.444
56
.442
57
.428
58
.420
59
.413
60
.412
61
.408
62
.404
63
.400
64
.399
65
.399
66
.396
67
.391
68
.390
69
.389
70
.389
71
.388
72
.388
73
.388
74
.387
75
.386
76
.385
77
.381
78
.381
79
.379
80
.376
81
.375
82
.374
83
.369
84
.369
85
.366
86
.366
87
.364
88
.363
89
.363
90
.362
91
.360
92
.355
93
.353
94
.349
95
.344
96
.344
97
.344
98
.341
99
.338
100
.337
101
.334
102
.332
103
.330
104
.327
105
.327
106
.325
107
.317
108
.316
109
.309
110
.304
111
.304
112
.302
113
.296
114
.284
115
.284
116
.283
117
.282
118
.272
119
.269
120
.267
121
.265
122
.262
123
.262
124
.260
125
.256
126
.253
127
.253
128
.252
129
.252
130
.250
131
.248
132
.248
133
.248
134
.245
135
.244
136
.243
137
.242
138
.239
139
.237
140
.228
141
.228
142
.228
143
.228
144
.228
145
.227
146
.227
147
.226
148
.222
149
.217
150
.215
151
.214
152
.210
153
.210
154
.209
155
.209
156
.207
157
.204
158
.201
159
.199
160
.199
161
.196
162
.196
163
.195
164
.194
165
.194
166
.194
167
.194
168
.194
169
.194
170
.193
171
.193
172
.186
173
.186
174
.184
175
.182
176
.178
177
.178
178
.176
179
.174
180
.173
181
.170
182
.169
183
.166
184
.152
185
.151
186
.151
187
.148
188
.147
189
.141
190
.139
191
.139
192
.136
193
.131
194
.131
195
.130
196
.128
197
.125
198
.124
199
.123
200
.121
201
.118
202
.115
203
.114
204
.113
205
.111
206
.110
207
.109
208
.107
209
.105
210
.105
211
.104
212
.104
213
.104
214
.104
215
.100
216
.096
217
.092
218
.090
219
.087
220
.086
221
.079
222
.077
223
.077
224
.059
225
.053
226
.043
227
.012
228
.007
229
.006
230
.003
231
.003
232
.002
233
.001
234
.001
235
.001
236
.001
237
.001
238
.001
239
.001
240
.001
241
.001
242
.001
243
.001
244
.001
245
.001
246
.001
247
.000
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419

Benchmark bibtex

@ARTICLE{Hendrycks2019-di,
   title         = "Benchmarking Neural Network Robustness to Common Corruptions
                    and Perturbations",
   author        = "Hendrycks, Dan and Dietterich, Thomas",
   abstract      = "In this paper we establish rigorous benchmarks for image
                    classifier robustness. Our first benchmark, ImageNet-C,
                    standardizes and expands the corruption robustness topic,
                    while showing which classifiers are preferable in
                    safety-critical applications. Then we propose a new dataset
                    called ImageNet-P which enables researchers to benchmark a
                    classifier's robustness to common perturbations. Unlike
                    recent robustness research, this benchmark evaluates
                    performance on common corruptions and perturbations not
                    worst-case adversarial perturbations. We find that there are
                    negligible changes in relative corruption robustness from
                    AlexNet classifiers to ResNet classifiers. Afterward we
                    discover ways to enhance corruption and perturbation
                    robustness. We even find that a bypassed adversarial defense
                    provides substantial common perturbation robustness.
                    Together our benchmarks may aid future work toward networks
                    that robustly generalize.",
   month         =  mar,
   year          =  2019,
   archivePrefix = "arXiv",
   primaryClass  = "cs.LG",
   eprint        = "1903.12261",
   url           = "https://arxiv.org/abs/1903.12261"
}

Ceiling

1.00.

Note that scores are relative to this ceiling.

Data: ImageNet-C-noise

Metric: top1