Sample stimuli

sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9

How to use

from brainscore_vision import load_benchmark
benchmark = load_benchmark("ImageNet-C-noise-top1")
score = benchmark(my_model)

Model scores

Min Alignment Max Alignment

Rank

Model

Score

1
.742
2
.715
3
.708
4
.700
5
.692
6
.670
7
.651
8
.636
9
.634
10
.633
11
.631
12
.630
13
.618
14
.614
15
.608
16
.597
17
.595
18
.594
19
.593
20
.587
21
.582
22
.577
23
.570
24
.566
25
.561
26
.555
27
.554
28
.548
29
.548
30
.548
31
.539
32
.538
33
.532
34
.527
35
.526
36
.523
37
.517
38
.514
39
.513
40
.513
41
.512
42
.511
43
.504
44
.499
45
.491
46
.484
47
.482
48
.476
49
.470
50
.462
51
.457
52
.456
53
.454
54
.444
55
.442
56
.428
57
.420
58
.413
59
.412
60
.408
61
.404
62
.400
63
.399
64
.399
65
.396
66
.391
67
.390
68
.389
69
.389
70
.388
71
.388
72
.388
73
.387
74
.386
75
.385
76
.381
77
.381
78
.379
79
.376
80
.375
81
.374
82
.369
83
.369
84
.366
85
.366
86
.364
87
.363
88
.363
89
.362
90
.360
91
.355
92
.353
93
.349
94
.344
95
.344
96
.344
97
.341
98
.338
99
.337
100
.334
101
.332
102
.330
103
.327
104
.327
105
.325
106
.317
107
.316
108
.309
109
.304
110
.304
111
.302
112
.296
113
.284
114
.284
115
.283
116
.282
117
.272
118
.269
119
.267
120
.265
121
.262
122
.262
123
.260
124
.256
125
.253
126
.253
127
.252
128
.252
129
.250
130
.248
131
.248
132
.248
133
.245
134
.244
135
.243
136
.242
137
.239
138
.237
139
.228
140
.228
141
.228
142
.228
143
.228
144
.227
145
.227
146
.226
147
.222
148
.217
149
.215
150
.214
151
.210
152
.210
153
.209
154
.209
155
.207
156
.204
157
.201
158
.199
159
.199
160
.196
161
.196
162
.195
163
.194
164
.194
165
.194
166
.194
167
.194
168
.194
169
.193
170
.193
171
.186
172
.186
173
.184
174
.182
175
.178
176
.178
177
.176
178
.174
179
.173
180
.170
181
.169
182
.166
183
.152
184
.151
185
.151
186
.148
187
.147
188
.141
189
.139
190
.139
191
.136
192
.131
193
.131
194
.130
195
.128
196
.125
197
.124
198
.123
199
.121
200
.118
201
.115
202
.114
203
.113
204
.111
205
.110
206
.109
207
.107
208
.105
209
.105
210
.104
211
.104
212
.104
213
.104
214
.100
215
.099
216
.099
217
.099
218
.099
219
.096
220
.092
221
.091
222
.090
223
.087
224
.086
225
.079
226
.077
227
.077
228
.059
229
.053
230
.043
231
.012
232
.007
233
.006
234
.003
235
.003
236
.002
237
.001
238
.001
239
.001
240
.001
241
.001
242
.001
243
.001
244
.001
245
.001
246
.001
247
.001
248
.001
249
.001
250
.001
251
.000
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418

Benchmark bibtex

@ARTICLE{Hendrycks2019-di,
   title         = "Benchmarking Neural Network Robustness to Common Corruptions
                    and Perturbations",
   author        = "Hendrycks, Dan and Dietterich, Thomas",
   abstract      = "In this paper we establish rigorous benchmarks for image
                    classifier robustness. Our first benchmark, ImageNet-C,
                    standardizes and expands the corruption robustness topic,
                    while showing which classifiers are preferable in
                    safety-critical applications. Then we propose a new dataset
                    called ImageNet-P which enables researchers to benchmark a
                    classifier's robustness to common perturbations. Unlike
                    recent robustness research, this benchmark evaluates
                    performance on common corruptions and perturbations not
                    worst-case adversarial perturbations. We find that there are
                    negligible changes in relative corruption robustness from
                    AlexNet classifiers to ResNet classifiers. Afterward we
                    discover ways to enhance corruption and perturbation
                    robustness. We even find that a bypassed adversarial defense
                    provides substantial common perturbation robustness.
                    Together our benchmarks may aid future work toward networks
                    that robustly generalize.",
   month         =  mar,
   year          =  2019,
   archivePrefix = "arXiv",
   primaryClass  = "cs.LG",
   eprint        = "1903.12261",
   url           = "https://arxiv.org/abs/1903.12261"
}

Ceiling

1.00.

Note that scores are relative to this ceiling.

Data: ImageNet-C-noise

Metric: top1