Sample stimuli

sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9

How to use

from brainscore_vision import load_benchmark
benchmark = load_benchmark("ImageNet-C-blur-top1")
score = benchmark(my_model)

Model scores

Min Alignment Max Alignment

Rank

Model

Score

1
.623
2
.596
3
.576
4
.548
5
.542
6
.538
7
.533
8
.525
9
.525
10
.516
11
.514
12
.514
13
.510
14
.505
15
.503
16
.500
17
.494
18
.483
19
.475
20
.474
21
.468
22
.463
23
.463
24
.456
25
.455
26
.448
27
.442
28
.436
29
.434
30
.433
31
.431
32
.431
33
.426
34
.416
35
.416
36
.414
37
.413
38
.413
39
.412
40
.409
41
.406
42
.404
43
.404
44
.403
45
.399
46
.396
47
.395
48
.392
49
.391
50
.390
51
.388
52
.388
53
.388
54
.385
55
.384
56
.381
57
.381
58
.379
59
.379
60
.378
61
.377
62
.375
63
.375
64
.371
65
.370
66
.368
67
.366
68
.366
69
.364
70
.363
71
.362
72
.361
73
.361
74
.357
75
.357
76
.355
77
.355
78
.354
79
.352
80
.351
81
.351
82
.351
83
.349
84
.349
85
.348
86
.346
87
.345
88
.345
89
.343
90
.340
91
.339
92
.335
93
.331
94
.330
95
.326
96
.324
97
.324
98
.321
99
.321
100
.320
101
.312
102
.312
103
.312
104
.312
105
.307
106
.306
107
.306
108
.306
109
.305
110
.301
111
.298
112
.297
113
.289
114
.288
115
.287
116
.285
117
.284
118
.282
119
.280
120
.279
121
.276
122
.275
123
.274
124
.273
125
.271
126
.267
127
.266
128
.264
129
.263
130
.260
131
.260
132
.260
133
.259
134
.259
135
.258
136
.258
137
.257
138
.256
139
.256
140
.255
141
.255
142
.252
143
.252
144
.249
145
.248
146
.248
147
.244
148
.240
149
.239
150
.239
151
.237
152
.234
153
.232
154
.231
155
.231
156
.229
157
.229
158
.229
159
.225
160
.218
161
.218
162
.216
163
.214
164
.214
165
.211
166
.210
167
.207
168
.206
169
.206
170
.204
171
.202
172
.201
173
.195
174
.193
175
.192
176
.190
177
.190
178
.190
179
.190
180
.187
181
.185
182
.183
183
.181
184
.179
185
.179
186
.175
187
.174
188
.173
189
.171
190
.170
191
.168
192
.165
193
.163
194
.156
195
.155
196
.154
197
.154
198
.146
199
.144
200
.132
201
.132
202
.130
203
.127
204
.127
205
.120
206
.119
207
.108
208
.102
209
.020
210
.020
211
.006
212
.003
213
.003
214
.002
215
.002
216
.002
217
.001
218
.001
219
.001
220
.001
221
.001
222
.001
223
.001
224
.001
225
.001
226
.001
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399

Benchmark bibtex

@ARTICLE{Hendrycks2019-di,
   title         = "Benchmarking Neural Network Robustness to Common Corruptions
                    and Perturbations",
   author        = "Hendrycks, Dan and Dietterich, Thomas",
   abstract      = "In this paper we establish rigorous benchmarks for image
                    classifier robustness. Our first benchmark, ImageNet-C,
                    standardizes and expands the corruption robustness topic,
                    while showing which classifiers are preferable in
                    safety-critical applications. Then we propose a new dataset
                    called ImageNet-P which enables researchers to benchmark a
                    classifier's robustness to common perturbations. Unlike
                    recent robustness research, this benchmark evaluates
                    performance on common corruptions and perturbations not
                    worst-case adversarial perturbations. We find that there are
                    negligible changes in relative corruption robustness from
                    AlexNet classifiers to ResNet classifiers. Afterward we
                    discover ways to enhance corruption and perturbation
                    robustness. We even find that a bypassed adversarial defense
                    provides substantial common perturbation robustness.
                    Together our benchmarks may aid future work toward networks
                    that robustly generalize.",
   month         =  mar,
   year          =  2019,
   archivePrefix = "arXiv",
   primaryClass  = "cs.LG",
   eprint        = "1903.12261",
   url           = "https://arxiv.org/abs/1903.12261"
}

Ceiling

1.00.

Note that scores are relative to this ceiling.

Data: ImageNet-C-blur

Metric: top1