Sample stimuli

sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9

How to use

from brainscore_vision import load_benchmark
benchmark = load_benchmark("ImageNet-C-noise-top1")
score = benchmark(my_model)

Model scores

Min Alignment Max Alignment

Rank

Model

Score

1
.742
2
.715
3
.708
4
.700
5
.692
6
.670
7
.651
8
.636
9
.634
10
.633
11
.631
12
.630
13
.618
14
.614
15
.608
16
.597
17
.595
18
.594
19
.593
20
.587
21
.582
22
.577
23
.570
24
.566
25
.561
26
.555
27
.554
28
.548
29
.548
30
.548
31
.539
32
.538
33
.532
34
.527
35
.526
36
.523
37
.517
38
.514
39
.513
40
.513
41
.512
42
.511
43
.504
44
.499
45
.491
46
.484
47
.482
48
.476
49
.470
50
.462
51
.457
52
.456
53
.454
54
.444
55
.442
56
.428
57
.420
58
.413
59
.412
60
.408
61
.404
62
.400
63
.399
64
.399
65
.396
66
.391
67
.390
68
.389
69
.389
70
.388
71
.388
72
.388
73
.387
74
.386
75
.385
76
.381
77
.381
78
.379
79
.376
80
.375
81
.374
82
.369
83
.369
84
.366
85
.366
86
.364
87
.363
88
.363
89
.362
90
.360
91
.355
92
.353
93
.349
94
.344
95
.344
96
.344
97
.341
98
.338
99
.337
100
.334
101
.332
102
.330
103
.327
104
.327
105
.325
106
.317
107
.316
108
.309
109
.304
110
.304
111
.302
112
.296
113
.284
114
.284
115
.283
116
.282
117
.272
118
.269
119
.267
120
.265
121
.262
122
.262
123
.260
124
.256
125
.253
126
.253
127
.252
128
.252
129
.250
130
.248
131
.248
132
.248
133
.245
134
.244
135
.243
136
.242
137
.239
138
.237
139
.228
140
.228
141
.228
142
.228
143
.228
144
.227
145
.227
146
.226
147
.222
148
.217
149
.215
150
.214
151
.210
152
.210
153
.209
154
.209
155
.207
156
.204
157
.201
158
.199
159
.199
160
.196
161
.196
162
.195
163
.194
164
.194
165
.194
166
.193
167
.193
168
.186
169
.186
170
.184
171
.182
172
.178
173
.178
174
.176
175
.174
176
.173
177
.170
178
.169
179
.166
180
.152
181
.151
182
.151
183
.148
184
.147
185
.141
186
.139
187
.139
188
.136
189
.131
190
.131
191
.130
192
.128
193
.125
194
.124
195
.123
196
.121
197
.118
198
.115
199
.114
200
.113
201
.111
202
.110
203
.109
204
.107
205
.105
206
.105
207
.100
208
.099
209
.099
210
.099
211
.099
212
.096
213
.092
214
.091
215
.090
216
.087
217
.086
218
.079
219
.077
220
.077
221
.059
222
.053
223
.043
224
.012
225
.007
226
.006
227
.003
228
.003
229
.002
230
.001
231
.001
232
.001
233
.001
234
.001
235
.001
236
.001
237
.001
238
.001
239
.001
240
.001
241
.000
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405

Benchmark bibtex

@ARTICLE{Hendrycks2019-di,
   title         = "Benchmarking Neural Network Robustness to Common Corruptions
                    and Perturbations",
   author        = "Hendrycks, Dan and Dietterich, Thomas",
   abstract      = "In this paper we establish rigorous benchmarks for image
                    classifier robustness. Our first benchmark, ImageNet-C,
                    standardizes and expands the corruption robustness topic,
                    while showing which classifiers are preferable in
                    safety-critical applications. Then we propose a new dataset
                    called ImageNet-P which enables researchers to benchmark a
                    classifier's robustness to common perturbations. Unlike
                    recent robustness research, this benchmark evaluates
                    performance on common corruptions and perturbations not
                    worst-case adversarial perturbations. We find that there are
                    negligible changes in relative corruption robustness from
                    AlexNet classifiers to ResNet classifiers. Afterward we
                    discover ways to enhance corruption and perturbation
                    robustness. We even find that a bypassed adversarial defense
                    provides substantial common perturbation robustness.
                    Together our benchmarks may aid future work toward networks
                    that robustly generalize.",
   month         =  mar,
   year          =  2019,
   archivePrefix = "arXiv",
   primaryClass  = "cs.LG",
   eprint        = "1903.12261",
   url           = "https://arxiv.org/abs/1903.12261"
}

Ceiling

1.00.

Note that scores are relative to this ceiling.

Data: ImageNet-C-noise

Metric: top1