Sample stimuli

sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9

How to use

from brainscore_vision import load_benchmark
benchmark = load_benchmark("ImageNet-C-noise-top1")
score = benchmark(my_model)

Model scores

Min Alignment Max Alignment

Rank

Model

Score

1
.742
2
.715
3
.708
4
.700
5
.692
6
.670
7
.651
8
.636
9
.634
10
.633
11
.631
12
.630
13
.618
14
.614
15
.608
16
.597
17
.595
18
.594
19
.593
20
.587
21
.582
22
.577
23
.570
24
.566
25
.561
26
.555
27
.554
28
.548
29
.548
30
.548
31
.539
32
.538
33
.532
34
.527
35
.526
36
.523
37
.517
38
.514
39
.513
40
.513
41
.512
42
.511
43
.504
44
.499
45
.491
46
.484
47
.482
48
.476
49
.470
50
.462
51
.457
52
.456
53
.454
54
.444
55
.442
56
.428
57
.420
58
.413
59
.412
60
.408
61
.404
62
.400
63
.399
64
.399
65
.391
66
.390
67
.389
68
.389
69
.388
70
.388
71
.388
72
.387
73
.386
74
.385
75
.381
76
.381
77
.379
78
.376
79
.375
80
.374
81
.369
82
.369
83
.366
84
.366
85
.364
86
.363
87
.363
88
.362
89
.360
90
.355
91
.353
92
.349
93
.344
94
.344
95
.344
96
.341
97
.338
98
.337
99
.334
100
.332
101
.330
102
.327
103
.327
104
.325
105
.317
106
.316
107
.309
108
.304
109
.304
110
.302
111
.296
112
.284
113
.284
114
.283
115
.282
116
.272
117
.269
118
.267
119
.265
120
.262
121
.262
122
.260
123
.256
124
.253
125
.253
126
.252
127
.252
128
.250
129
.248
130
.248
131
.248
132
.245
133
.244
134
.243
135
.242
136
.239
137
.237
138
.228
139
.228
140
.228
141
.228
142
.228
143
.227
144
.227
145
.226
146
.222
147
.217
148
.215
149
.214
150
.210
151
.210
152
.209
153
.209
154
.207
155
.204
156
.201
157
.199
158
.199
159
.196
160
.196
161
.195
162
.194
163
.194
164
.194
165
.193
166
.193
167
.186
168
.186
169
.184
170
.182
171
.178
172
.178
173
.176
174
.174
175
.173
176
.170
177
.169
178
.166
179
.152
180
.151
181
.151
182
.148
183
.147
184
.141
185
.139
186
.139
187
.136
188
.131
189
.131
190
.130
191
.128
192
.125
193
.124
194
.123
195
.121
196
.118
197
.115
198
.114
199
.113
200
.111
201
.110
202
.109
203
.107
204
.105
205
.105
206
.100
207
.099
208
.099
209
.099
210
.099
211
.096
212
.092
213
.091
214
.090
215
.087
216
.086
217
.079
218
.077
219
.077
220
.059
221
.053
222
.043
223
.012
224
.007
225
.006
226
.003
227
.003
228
.002
229
.001
230
.001
231
.001
232
.001
233
.001
234
.001
235
.001
236
.001
237
.001
238
.001
239
.001
240
.000
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395

Benchmark bibtex

@ARTICLE{Hendrycks2019-di,
   title         = "Benchmarking Neural Network Robustness to Common Corruptions
                    and Perturbations",
   author        = "Hendrycks, Dan and Dietterich, Thomas",
   abstract      = "In this paper we establish rigorous benchmarks for image
                    classifier robustness. Our first benchmark, ImageNet-C,
                    standardizes and expands the corruption robustness topic,
                    while showing which classifiers are preferable in
                    safety-critical applications. Then we propose a new dataset
                    called ImageNet-P which enables researchers to benchmark a
                    classifier's robustness to common perturbations. Unlike
                    recent robustness research, this benchmark evaluates
                    performance on common corruptions and perturbations not
                    worst-case adversarial perturbations. We find that there are
                    negligible changes in relative corruption robustness from
                    AlexNet classifiers to ResNet classifiers. Afterward we
                    discover ways to enhance corruption and perturbation
                    robustness. We even find that a bypassed adversarial defense
                    provides substantial common perturbation robustness.
                    Together our benchmarks may aid future work toward networks
                    that robustly generalize.",
   month         =  mar,
   year          =  2019,
   archivePrefix = "arXiv",
   primaryClass  = "cs.LG",
   eprint        = "1903.12261",
   url           = "https://arxiv.org/abs/1903.12261"
}

Ceiling

1.00.

Note that scores are relative to this ceiling.

Data: ImageNet-C-noise

Metric: top1