Leaderboard Explore Compare Profile Community Blog FAQ Competition 2024 Report Issue

Scores on benchmarks

Model rank shown below is with respect to all public models.

.284	average_language rank 17 6 benchmarks	.284 0 ceiling best median

.210	neural_language rank 23 5 benchmarks	.210 0 ceiling best median

.523	Pereira2018-ridge rank 22 2 benchmarks	.523 0 ceiling best median

.504	Pereira2018.243sentences-ridge v1 rank 23	.504 0 ceiling best median
	Sample stimuli cannot be displayed publicly for this benchmark.

.542	Pereira2018.384sentences-ridge v1 rank 19	.542 0 ceiling best median
	Sample stimuli cannot be displayed publicly for this benchmark.

.000	Blank2014-ridge v1 rank 20	.000 0 ceiling best median
	Sample stimuli cannot be displayed publicly for this benchmark.

.316	Fedorenko2016-ridge v3 rank 25	.316 0 ceiling best median
	Sample stimuli cannot be displayed publicly for this benchmark.

.358	behavior_language rank 7 1 benchmark	.358 0 ceiling best median

.358	Futrell2018-pearsonr v1 [reference] rank 7	.358 0 ceiling best median
	Sample stimuli cannot be displayed publicly for this benchmark.

.736	engineering_language rank 12 30 benchmarks	.736 0 ceiling best median

.736	SyntaxGym [reference] rank 12 30 benchmarks	.736 0 ceiling best median

.964	syntaxgym-center_embed v1 [reference] rank 7 1 benchmark	.964 0 ceiling best median

.857	syntaxgym-center_embed_mod v1 [reference] rank 18	.857 0 ceiling best median
	Sample stimuli cannot be displayed publicly for this benchmark.

1.0	syntaxgym-cleft v1 [reference] rank 1 1 benchmark	1.0 0 ceiling best median

.925	syntaxgym-cleft_modifier v1 [reference] rank 12	.925 0 ceiling best median
	Sample stimuli cannot be displayed publicly for this benchmark.

.000	syntaxgym-fgd_hierarchy v1 [reference] rank 1	.000 0 ceiling best median
	Sample stimuli cannot be displayed publicly for this benchmark.

1.0	syntaxgym-fgd_object v1 [reference] rank 1	1.0 0 ceiling best median
	Sample stimuli cannot be displayed publicly for this benchmark.

.917	syntaxgym-fgd_pp v1 [reference] rank 3	.917 0 ceiling best median
	Sample stimuli cannot be displayed publicly for this benchmark.

.625	syntaxgym-fgd_subject v1 [reference] rank 1	.625 0 ceiling best median
	Sample stimuli cannot be displayed publicly for this benchmark.

.786	syntaxgym-mvrr v1 [reference] rank 5 1 benchmark	.786 0 ceiling best median

.786	syntaxgym-mvrr_mod v1 [reference] rank 7	.786 0 ceiling best median
	Sample stimuli cannot be displayed publicly for this benchmark.

.947	syntaxgym-npi_orc_any v1 [reference] rank 17	.947 0 ceiling best median
	Sample stimuli cannot be displayed publicly for this benchmark.

1.0	syntaxgym-npi_orc_ever v1 [reference] rank 1	1.0 0 ceiling best median
	Sample stimuli cannot be displayed publicly for this benchmark.

.737	syntaxgym-npi_src_any v1 [reference] rank 17	.737 0 ceiling best median
	Sample stimuli cannot be displayed publicly for this benchmark.

.868	syntaxgym-npi_src_ever v1 [reference] rank 17	.868 0 ceiling best median
	Sample stimuli cannot be displayed publicly for this benchmark.

.833	syntaxgym-npz_ambig v1 [reference] rank 9 1 benchmark	.833 0 ceiling best median

.917	syntaxgym-npz_ambig_mod v1 [reference] rank 8	.917 0 ceiling best median
	Sample stimuli cannot be displayed publicly for this benchmark.

1.0	syntaxgym-npz_obj_mod v1 [reference] rank 1	1.0 0 ceiling best median
	Sample stimuli cannot be displayed publicly for this benchmark.

.421	syntaxgym-number_orc v1 [reference] rank 17	.421 0 ceiling best median
	Sample stimuli cannot be displayed publicly for this benchmark.

.579	syntaxgym-number_prep v1 [reference] rank 17	.579 0 ceiling best median
	Sample stimuli cannot be displayed publicly for this benchmark.

.737	syntaxgym-number_src v1 [reference] rank 11	.737 0 ceiling best median
	Sample stimuli cannot be displayed publicly for this benchmark.

.105	syntaxgym-reflexive_orc_fem v1 [reference] rank 16	.105 0 ceiling best median
	Sample stimuli cannot be displayed publicly for this benchmark.

.526	syntaxgym-reflexive_orc_masc v1 [reference] rank 15	.526 0 ceiling best median
	Sample stimuli cannot be displayed publicly for this benchmark.

.421	syntaxgym-reflexive_prep_fem v1 [reference] rank 11	.421 0 ceiling best median
	Sample stimuli cannot be displayed publicly for this benchmark.

.579	syntaxgym-reflexive_prep_masc v1 [reference] rank 15	.579 0 ceiling best median
	Sample stimuli cannot be displayed publicly for this benchmark.

.263	syntaxgym-reflexive_src_fem v1 [reference] rank 11	.263 0 ceiling best median
	Sample stimuli cannot be displayed publicly for this benchmark.

.632	syntaxgym-reflexive_src_masc v1 [reference] rank 9	.632 0 ceiling best median
	Sample stimuli cannot be displayed publicly for this benchmark.

.957	syntaxgym-subordination v1 [reference] rank 7 3 benchmarks	.957 0 ceiling best median

.826	syntaxgym-subordination_orc-orc v1 [reference] rank 19	.826 0 ceiling best median
	Sample stimuli cannot be displayed publicly for this benchmark.

.957	syntaxgym-subordination_pp-pp v1 [reference] rank 13	.957 0 ceiling best median
	Sample stimuli cannot be displayed publicly for this benchmark.

.913	syntaxgym-subordination_src-src v1 [reference] rank 15	.913 0 ceiling best median
	Sample stimuli cannot be displayed publicly for this benchmark.

How to use

from brainscore_language import load_model
model = load_model("gpt-neo-125m")
model.start_task(...)
model.start_recording(...)
model.look_at(...)

Model API

Code examples

Brain Encoding Response Generator (BERG)

Through the BERG you can easily generate neural responses to text sentences of your choice using any Brain-Score language model.

For more information on how to use BERG, see the documentation and tutorial.

Benchmarks bibtex

@proceedings{futrell2018natural,
  title={The Natural Stories Corpus},
  author={Futrell, Richard and Gibson, Edward and Tily, Harry J. and Blank, Idan and Vishnevetsky, Anastasia and
          Piantadosi, Steven T. and Fedorenko, Evelina},
  conference={International Conference on Language Resources and Evaluation (LREC)},
  url={http://www.lrec-conf.org/proceedings/lrec2018/pdf/337.pdf},
  year={2018}
}
        @inproceedings{gauthier-etal-2020-syntaxgym,
    title = "{S}yntax{G}ym: An Online Platform for Targeted Evaluation of Language Models",
    author = "Gauthier, Jon and Hu, Jennifer and Wilcox, Ethan and Qian, Peng and Levy, Roger",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations",
    month = jul,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-demos.10",
    pages = "70--76",
    abstract = "Targeted syntactic evaluations have yielded insights into the generalizations learned by neural network language models. However, this line of research requires an uncommon confluence of skills: both the theoretical knowledge needed to design controlled psycholinguistic experiments, and the technical proficiency needed to train and deploy large-scale language models. We present SyntaxGym, an online platform designed to make targeted evaluations accessible to both experts in NLP and linguistics, reproducible across computing environments, and standardized following the norms of psycholinguistic experimental design. This paper releases two tools of independent value for the computational linguistics community: 1. A website, syntaxgym.org, which centralizes the process of targeted syntactic evaluation and provides easy tools for analysis and visualization; 2. Two command-line tools, {`}syntaxgym{`} and {`}lm-zoo{`}, which allow any user to reproduce targeted syntactic evaluations and general language model inference on their own machine.",
}

Layer Commitment

No layer commitments found for this model. Older submissions might not have stored this information but will be updated when evaluated on new benchmarks.

Visual Angle

None degrees