Sample stimuli

sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9

How to use

from brainscore_vision import load_benchmark
benchmark = load_benchmark("ImageNet-C-blur-top1")
score = benchmark(my_model)

Model scores

Min Alignment Max Alignment

Rank

Model

Score

1
.623
2
.612
3
.596
4
.576
5
.560
6
.548
7
.542
8
.538
9
.533
10
.528
11
.525
12
.525
13
.516
14
.514
15
.514
16
.510
17
.505
18
.503
19
.500
20
.494
21
.483
22
.475
23
.474
24
.471
25
.468
26
.463
27
.463
28
.456
29
.455
30
.448
31
.444
32
.442
33
.436
34
.434
35
.433
36
.431
37
.431
38
.426
39
.416
40
.416
41
.414
42
.413
43
.413
44
.412
45
.409
46
.406
47
.404
48
.404
49
.403
50
.399
51
.396
52
.395
53
.392
54
.391
55
.390
56
.388
57
.388
58
.388
59
.385
60
.384
61
.383
62
.383
63
.382
64
.381
65
.381
66
.379
67
.379
68
.379
69
.378
70
.377
71
.375
72
.375
73
.371
74
.370
75
.368
76
.366
77
.366
78
.366
79
.365
80
.364
81
.363
82
.362
83
.361
84
.361
85
.359
86
.357
87
.357
88
.357
89
.357
90
.355
91
.355
92
.354
93
.352
94
.351
95
.351
96
.351
97
.349
98
.349
99
.348
100
.346
101
.345
102
.345
103
.343
104
.340
105
.339
106
.335
107
.331
108
.330
109
.329
110
.326
111
.324
112
.324
113
.321
114
.321
115
.320
116
.312
117
.312
118
.312
119
.312
120
.307
121
.306
122
.306
123
.306
124
.305
125
.301
126
.298
127
.297
128
.289
129
.288
130
.287
131
.285
132
.284
133
.282
134
.280
135
.279
136
.276
137
.275
138
.274
139
.273
140
.271
141
.267
142
.266
143
.264
144
.263
145
.260
146
.260
147
.260
148
.259
149
.259
150
.258
151
.258
152
.257
153
.256
154
.256
155
.255
156
.255
157
.252
158
.252
159
.249
160
.248
161
.248
162
.244
163
.240
164
.239
165
.239
166
.237
167
.234
168
.232
169
.231
170
.231
171
.229
172
.229
173
.229
174
.225
175
.218
176
.218
177
.216
178
.214
179
.214
180
.211
181
.210
182
.207
183
.206
184
.206
185
.204
186
.202
187
.201
188
.195
189
.193
190
.192
191
.187
192
.187
193
.187
194
.187
195
.187
196
.185
197
.183
198
.181
199
.179
200
.179
201
.179
202
.179
203
.179
204
.175
205
.173
206
.171
207
.170
208
.168
209
.165
210
.163
211
.156
212
.155
213
.154
214
.154
215
.146
216
.144
217
.132
218
.132
219
.130
220
.127
221
.127
222
.120
223
.119
224
.108
225
.102
226
.020
227
.020
228
.006
229
.003
230
.003
231
.002
232
.002
233
.002
234
.001
235
.001
236
.001
237
.001
238
.001
239
.001
240
.001
241
.001
242
.001
243
.001
244
.001
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457

Benchmark bibtex

@ARTICLE{Hendrycks2019-di,
   title         = "Benchmarking Neural Network Robustness to Common Corruptions
                    and Perturbations",
   author        = "Hendrycks, Dan and Dietterich, Thomas",
   abstract      = "In this paper we establish rigorous benchmarks for image
                    classifier robustness. Our first benchmark, ImageNet-C,
                    standardizes and expands the corruption robustness topic,
                    while showing which classifiers are preferable in
                    safety-critical applications. Then we propose a new dataset
                    called ImageNet-P which enables researchers to benchmark a
                    classifier's robustness to common perturbations. Unlike
                    recent robustness research, this benchmark evaluates
                    performance on common corruptions and perturbations not
                    worst-case adversarial perturbations. We find that there are
                    negligible changes in relative corruption robustness from
                    AlexNet classifiers to ResNet classifiers. Afterward we
                    discover ways to enhance corruption and perturbation
                    robustness. We even find that a bypassed adversarial defense
                    provides substantial common perturbation robustness.
                    Together our benchmarks may aid future work toward networks
                    that robustly generalize.",
   month         =  mar,
   year          =  2019,
   archivePrefix = "arXiv",
   primaryClass  = "cs.LG",
   eprint        = "1903.12261",
   url           = "https://arxiv.org/abs/1903.12261"
}

Ceiling

1.00.

Note that scores are relative to this ceiling.

Data: ImageNet-C-blur

Metric: top1