Scores on benchmarks

Model rank shown below is with respect to all public models.
.609 average_language rank 1
5 benchmarks
.609
0
ceiling
best
median
.915 neural_language rank 9
4 benchmarks
.915
0
ceiling
best
median
.991 Pereira2018-linear rank 11
2 benchmarks
.991
0
ceiling
best
median
.982 Pereira2018.243sentences-linear v1 rank 12
.982
0
ceiling
best
median
sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9
1.0 Pereira2018.384sentences-linear v1 rank 1
1.0
0
ceiling
best
median
sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9
.885 Fedorenko2016-linear_pearsonr v3 rank 14
.885
0
ceiling
best
median
sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9
.869 Fedorenko2016-ridge_pearsonr v3 rank 7
.869
0
ceiling
best
median
sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9
.303 behavior_language rank 10
1 benchmark
.303
0
ceiling
best
median
.303 Futrell2018-pearsonr v1 [reference] rank 10
.303
0
ceiling
best
median
sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9
.747 engineering_language rank 11
30 benchmarks
.747
0
ceiling
best
median
.747 SyntaxGym [reference] rank 11
30 benchmarks
.747
0
ceiling
best
median
.929 syntaxgym-center_embed v1 [reference] rank 12
1 benchmark
.929
0
ceiling
best
median
.893 syntaxgym-center_embed_mod v1 [reference] rank 9
.893
0
ceiling
best
median
sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9
1.0 syntaxgym-cleft v1 [reference] rank 1
1 benchmark
1.0
0
ceiling
best
median
1.0 syntaxgym-cleft_modifier v1 [reference] rank 1
1.0
0
ceiling
best
median
sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9
.000 syntaxgym-fgd_hierarchy v1 [reference] rank 1
.000
0
ceiling
best
median
sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9
.958 syntaxgym-fgd_object v1 [reference] rank 3
.958
0
ceiling
best
median
sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9
.750 syntaxgym-fgd_pp v1 [reference] rank 12
.750
0
ceiling
best
median
sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9
.500 syntaxgym-fgd_subject v1 [reference] rank 5
.500
0
ceiling
best
median
sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9
.786 syntaxgym-mvrr v1 [reference] rank 5
1 benchmark
.786
0
ceiling
best
median
.786 syntaxgym-mvrr_mod v1 [reference] rank 7
.786
0
ceiling
best
median
sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9
.974 syntaxgym-npi_orc_any v1 [reference] rank 6
.974
0
ceiling
best
median
sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9
.947 syntaxgym-npi_orc_ever v1 [reference] rank 12
.947
0
ceiling
best
median
sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9
1.0 syntaxgym-npi_src_any v1 [reference] rank 1
1.0
0
ceiling
best
median
sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9
.974 syntaxgym-npi_src_ever v1 [reference] rank 9
.974
0
ceiling
best
median
sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9
.917 syntaxgym-npz_ambig v1 [reference] rank 5
1 benchmark
.917
0
ceiling
best
median
.958 syntaxgym-npz_ambig_mod v1 [reference] rank 5
.958
0
ceiling
best
median
sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9
1.0 syntaxgym-npz_obj_mod v1 [reference] rank 1
1.0
0
ceiling
best
median
sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9
.684 syntaxgym-number_orc v1 [reference] rank 7
.684
0
ceiling
best
median
sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9
.789 syntaxgym-number_prep v1 [reference] rank 7
.789
0
ceiling
best
median
sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9
.737 syntaxgym-number_src v1 [reference] rank 9
.737
0
ceiling
best
median
sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9
.158 syntaxgym-reflexive_orc_fem v1 [reference] rank 12
.158
0
ceiling
best
median
sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9
.526 syntaxgym-reflexive_orc_masc v1 [reference] rank 12
.526
0
ceiling
best
median
sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9
.211 syntaxgym-reflexive_prep_fem v1 [reference] rank 13
.211
0
ceiling
best
median
sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9
.474 syntaxgym-reflexive_prep_masc v1 [reference] rank 13
.474
0
ceiling
best
median
sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9
.158 syntaxgym-reflexive_src_fem v1 [reference] rank 12
.158
0
ceiling
best
median
sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9
.421 syntaxgym-reflexive_src_masc v1 [reference] rank 14
.421
0
ceiling
best
median
sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9
.913 syntaxgym-subordination v1 [reference] rank 9
3 benchmarks
.913
0
ceiling
best
median
1.0 syntaxgym-subordination_orc-orc v1 [reference] rank 1
1.0
0
ceiling
best
median
sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9
.957 syntaxgym-subordination_pp-pp v1 [reference] rank 10
.957
0
ceiling
best
median
sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9
1.0 syntaxgym-subordination_src-src v1 [reference] rank 1
1.0
0
ceiling
best
median
sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 sample 8 sample 9

How to use

from brainscore_language import load_model
model = load_model("tinyllama-1.1b")
model.start_task(...)
model.start_recording(...)
model.look_at(...)

Brain Encoding Response Generator (BERG)

Through the BERG you can easily generate neural responses to text sentences of your choice using any Brain-Score language model.

For more information on how to use BERG, see the documentation and tutorial.

Benchmarks bibtex

@proceedings{futrell2018natural,
  title={The Natural Stories Corpus},
  author={Futrell, Richard and Gibson, Edward and Tily, Harry J. and Blank, Idan and Vishnevetsky, Anastasia and
          Piantadosi, Steven T. and Fedorenko, Evelina},
  conference={International Conference on Language Resources and Evaluation (LREC)},
  url={http://www.lrec-conf.org/proceedings/lrec2018/pdf/337.pdf},
  year={2018}
}
        @inproceedings{gauthier-etal-2020-syntaxgym,
    title = "{S}yntax{G}ym: An Online Platform for Targeted Evaluation of Language Models",
    author = "Gauthier, Jon and Hu, Jennifer and Wilcox, Ethan and Qian, Peng and Levy, Roger",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations",
    month = jul,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-demos.10",
    pages = "70--76",
    abstract = "Targeted syntactic evaluations have yielded insights into the generalizations learned by neural network language models. However, this line of research requires an uncommon confluence of skills: both the theoretical knowledge needed to design controlled psycholinguistic experiments, and the technical proficiency needed to train and deploy large-scale language models. We present SyntaxGym, an online platform designed to make targeted evaluations accessible to both experts in NLP and linguistics, reproducible across computing environments, and standardized following the norms of psycholinguistic experimental design. This paper releases two tools of independent value for the computational linguistics community: 1. A website, syntaxgym.org, which centralizes the process of targeted syntactic evaluation and provides easy tools for analysis and visualization; 2. Two command-line tools, {`}syntaxgym{`} and {`}lm-zoo{`}, which allow any user to reproduce targeted syntactic evaluations and general language model inference on their own machine.",
}
        

Layer Commitment

No layer commitments found for this model. Older submissions might not have stored this information but will be updated when evaluated on new benchmarks.

Visual Angle

None degrees