Sample stimuli

sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9

How to use

from brainscore_vision import load_benchmark
benchmark = load_benchmark("ImageNet-C-blur-top1")
score = benchmark(my_model)

Model scores

Min Alignment Max Alignment

Rank

Model

Score

1
.623
2
.596
3
.576
4
.548
5
.542
6
.538
7
.533
8
.525
9
.525
10
.516
11
.514
12
.514
13
.510
14
.505
15
.503
16
.500
17
.494
18
.483
19
.475
20
.474
21
.468
22
.463
23
.463
24
.456
25
.455
26
.448
27
.444
28
.442
29
.436
30
.434
31
.433
32
.431
33
.431
34
.426
35
.416
36
.416
37
.414
38
.413
39
.413
40
.412
41
.409
42
.406
43
.404
44
.404
45
.403
46
.399
47
.396
48
.395
49
.392
50
.391
51
.390
52
.388
53
.388
54
.388
55
.385
56
.384
57
.383
58
.381
59
.381
60
.379
61
.379
62
.378
63
.377
64
.375
65
.375
66
.371
67
.370
68
.368
69
.366
70
.366
71
.364
72
.363
73
.362
74
.361
75
.361
76
.357
77
.357
78
.355
79
.355
80
.354
81
.352
82
.351
83
.351
84
.351
85
.349
86
.349
87
.348
88
.346
89
.345
90
.345
91
.343
92
.340
93
.339
94
.335
95
.331
96
.330
97
.326
98
.324
99
.324
100
.321
101
.321
102
.320
103
.312
104
.312
105
.312
106
.312
107
.307
108
.306
109
.306
110
.306
111
.305
112
.301
113
.298
114
.297
115
.289
116
.288
117
.287
118
.285
119
.284
120
.282
121
.280
122
.279
123
.276
124
.275
125
.274
126
.273
127
.271
128
.267
129
.266
130
.264
131
.263
132
.260
133
.260
134
.260
135
.259
136
.259
137
.258
138
.258
139
.257
140
.256
141
.256
142
.255
143
.255
144
.252
145
.252
146
.249
147
.248
148
.248
149
.244
150
.240
151
.239
152
.239
153
.237
154
.234
155
.232
156
.231
157
.231
158
.229
159
.229
160
.229
161
.225
162
.218
163
.218
164
.216
165
.214
166
.214
167
.211
168
.210
169
.207
170
.206
171
.206
172
.204
173
.202
174
.201
175
.195
176
.193
177
.192
178
.187
179
.187
180
.187
181
.187
182
.187
183
.185
184
.183
185
.181
186
.179
187
.179
188
.179
189
.179
190
.179
191
.175
192
.173
193
.171
194
.170
195
.168
196
.165
197
.163
198
.156
199
.155
200
.154
201
.154
202
.146
203
.144
204
.132
205
.132
206
.130
207
.127
208
.127
209
.120
210
.119
211
.108
212
.102
213
.020
214
.020
215
.006
216
.003
217
.003
218
.002
219
.002
220
.002
221
.001
222
.001
223
.001
224
.001
225
.001
226
.001
227
.001
228
.001
229
.001
230
.001
231
.001
232
.001
233
.001
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422

Benchmark bibtex

@ARTICLE{Hendrycks2019-di,
   title         = "Benchmarking Neural Network Robustness to Common Corruptions
                    and Perturbations",
   author        = "Hendrycks, Dan and Dietterich, Thomas",
   abstract      = "In this paper we establish rigorous benchmarks for image
                    classifier robustness. Our first benchmark, ImageNet-C,
                    standardizes and expands the corruption robustness topic,
                    while showing which classifiers are preferable in
                    safety-critical applications. Then we propose a new dataset
                    called ImageNet-P which enables researchers to benchmark a
                    classifier's robustness to common perturbations. Unlike
                    recent robustness research, this benchmark evaluates
                    performance on common corruptions and perturbations not
                    worst-case adversarial perturbations. We find that there are
                    negligible changes in relative corruption robustness from
                    AlexNet classifiers to ResNet classifiers. Afterward we
                    discover ways to enhance corruption and perturbation
                    robustness. We even find that a bypassed adversarial defense
                    provides substantial common perturbation robustness.
                    Together our benchmarks may aid future work toward networks
                    that robustly generalize.",
   month         =  mar,
   year          =  2019,
   archivePrefix = "arXiv",
   primaryClass  = "cs.LG",
   eprint        = "1903.12261",
   url           = "https://arxiv.org/abs/1903.12261"
}

Ceiling

1.00.

Note that scores are relative to this ceiling.

Data: ImageNet-C-blur

Metric: top1