Submitting Benchmarks
Once you've created and tested your benchmark locally, the final step is submitting it to the Brain-Score platform. This tutorial walks you through the complete submission processβfrom packaging your data to creating a pull request.
β οΈ Note: All data uploaded to Brain-Score is set by default to be publicly accessible in the spirit of open science. If you would like to ensure your data remains private, please contact a member of the Brain-Score team by either joining our Slack or making a GitHub Issue. Additionally, all benchmark scores, regardless if the underlying data is public or private, is set to being publicly visible.
Summary
The benchmark submission process follows these key steps:
1. Package Data β Validate and format stimuli + assemblies
2. Create Benchmark β Implement benchmark class with ceiling
3. Test Locally β Write and run unit tests
4. Upload Data β Package and upload to S3
5. Submit PR β Fork, commit, and create pull request
6. Review Process β Address feedback and iterate
7. Merge & Deploy β Benchmark goes live on Brain-Score
Overview of Data Packaging
Before submitting a benchmark, your experimental data must be packaged in Brain-Score's standardized format. Here's a quick summary of the key steps (see Data Packaging for details):
Stimulus Set Preparation
from brainscore_core.supported_data_standards.brainio.stimuli import StimulusSet
# 1. Create metadata DataFrame
stimuli_data = [
{'stimulus_id': 'img_001', 'category': 'face', 'condition': 'A'},
{'stimulus_id': 'img_002', 'category': 'object', 'condition': 'B'},
# ... more stimuli
]
stimulus_set = StimulusSet(stimuli_data)
# 2. Map stimulus IDs to file paths
stimulus_set.stimulus_paths = {
'img_001': '/path/to/image1.jpg',
'img_002': '/path/to/image2.jpg',
}
# 3. Set unique identifier
stimulus_set.name = 'MyExperiment2024'
Data Assembly Creation
from brainscore_core.supported_data_standards.brainio.assemblies import BehavioralAssembly
assembly = BehavioralAssembly(
response_data,
coords={
'stimulus_id': ('presentation', stimulus_ids),
'subject': ('presentation', subject_ids),
'condition': ('presentation', conditions),
},
dims=['presentation']
)
Validation Checklist
Before packaging, verify:
| Check | Command |
|---|---|
| Unique stimulus IDs | assert stimulus_set['stimulus_id'].nunique() == len(stimulus_set) |
| All files exist | Verify all paths in stimulus_paths are valid |
| No NaN values | assert not np.isnan(assembly.values).any() |
| Aligned coordinates | Ensure stimulus_id matches between StimulusSet and Assembly |
Overview of Creating a Benchmark
Your benchmark implementation should follow Brain-Score's plugin architecture. Here's the essential structure:
Benchmark Class
from brainscore_vision.benchmarks import BenchmarkBase
from brainscore_vision.model_interface import BrainModel
from brainscore_core.metrics import Score
BIBTEX = """
@article{author2024,
title={Your Paper Title},
author={Author, A. and Coauthor, B.},
journal={Journal Name},
year={2024}
}
"""
class MyExperiment2024Benchmark(BenchmarkBase):
def __init__(self):
self._assembly = load_dataset('MyExperiment2024')
self._stimulus_set = load_stimulus_set('MyExperiment2024')
self._metric = load_metric('accuracy')
self._visual_degrees = 8
super().__init__(
identifier='MyExperiment2024-accuracy',
version=1,
ceiling_func=lambda: self._compute_ceiling(),
parent='behavior', # or 'V1', 'V4', 'IT', etc.
bibtex=BIBTEX
)
def __call__(self, candidate: BrainModel) -> Score:
# Implementation here
pass
def _compute_ceiling(self) -> Score:
# Ceiling calculation here
pass
Registration
# __init__.py
from brainscore_vision import benchmark_registry
from .benchmark import MyExperiment2024Benchmark
benchmark_registry['MyExperiment2024-accuracy'] = MyExperiment2024Benchmark
Plugin Directory Structure
vision/brainscore_vision/benchmarks/myexperiment2024/
βββ __init__.py # Registration
βββ benchmark.py # Benchmark implementation
βββ test.py # Unit tests
βββ requirements.txt # Dependencies (optional)
Testing Your Benchmark Locally
Before submitting, thoroughly test your benchmark to ensure it works correctly.
1. Install Development Dependencies
# Clone the vision repository
git clone https://github.com/brain-score/vision.git
cd vision
# Install in development mode
pip install -e ".[test]"
2. Create Unit Tests
Create a test.py file in your benchmark directory:
# vision/brainscore_vision/benchmarks/myexperiment2024/test.py
import pytest
from brainscore_vision import load_benchmark
@pytest.mark.private_access
class TestMyExperiment2024:
def test_benchmark_loads(self):
"""Verify benchmark can be loaded."""
benchmark = load_benchmark('MyExperiment2024-accuracy')
assert benchmark is not None
assert benchmark.identifier == 'MyExperiment2024-accuracy'
def test_ceiling_valid(self):
"""Verify ceiling is computed and valid."""
benchmark = load_benchmark('MyExperiment2024-accuracy')
ceiling = benchmark.ceiling
assert 0 < ceiling <= 1
assert hasattr(ceiling, 'attrs')
def test_benchmark_runs(self):
"""Test benchmark with a simple model."""
from brainscore_vision import load_model
benchmark = load_benchmark('MyExperiment2024-accuracy')
model = load_model('alexnet') # or another simple model
score = benchmark(model)
assert score == 0.212 # Replace with actual expected score
assert hasattr(score, 'attrs')
assert 'raw' in score.attrs or 'ceiling' in score.attrs
3. Run Tests Locally
# Run your specific tests
pytest brainscore_vision/benchmarks/myexperiment2024/test.py -v
4. Manual Verification
# Quick manual test
from brainscore_vision import load_benchmark, load_model
benchmark = load_benchmark('MyExperiment2024-accuracy')
model = load_model('alexnet')
score = benchmark(model)
print(f"Score: {score.values:.4f}")
print(f"Ceiling: {benchmark.ceiling.values:.4f}")
print(f"Raw attrs: {score.attrs}")
Uploading Data to Brain-Score
Once your data is validated and your benchmark is tested locally, you need to upload the data to Brain-Score's S3 storage.
1. Package Your Data Locally
# Located: core/brainscore_core/supported_data_standards/brainio/packaging.py
from brainscore_core.supported_data_standards.brainio.packaging import package_stimulus_set_locally
from brainscore_core.supported_data_standards.brainio import packaging
# Package stimulus set
package_stimulus_set_locally(
proto_stimulus_set=stimulus_set,
stimulus_set_identifier='MyExperiment2024',
)
'''
Output:
{
'identifier': 'MyExperiment2024',
'csv_path': '~/Downloads/brainscore_packages/stimulus_MyExperiment2024.csv',
'zip_path': '~/Downloads/brainscore_packages/stimulus_MyExperiment2024.zip',
'csv_sha1': '1d47ea4a09ddd72cebabca95b985646650f21646',
'zip_sha1': 'c96036d459f0a2ce4494ba73a2b18b8eec59f6b6'
}
'''
# Package data assembly
packaging.package_data_assembly_locally(
proto_data_assembly=assembly,
assembly_identifier='MyExperiment2024',
stimulus_set_identifier='MyExperiment2024',
assembly_class_name='NeuroidAssembly', # or 'BehavioralAssembly'
)
'''
Output:
{
'identifier': 'MyExperiment2024',
'path': '~/Downloads/brainscore_packages/assy_MyExperiment2024.nc',
'sha1': '3c45d95e2e61616c885758df0d463fddbfc2b427',
'cls': 'NeuroidAssembly'
}
'''
Save the SHA hashes β you'll need them for registration.
2. Upload to Brain-Score
Upload your packaged files (CSV, ZIP, and NC) to Brain-Score's S3 storage through the Brain-Score website. Navigate to the upload page through the Central Profile Page.
After uploading, you'll receive the S3 bucket path and version_id values needed for registration.
3. Register in Data Plugin
Create your data plugin's __init__.py with the upload metadata:
# vision/brainscore_vision/data/myexperiment2024/__init__.py
from brainscore_vision import data_registry, stimulus_set_registry, load_stimulus_set
from brainscore_core.supported_data_standards.brainio.s3 import load_stimulus_set_from_s3, load_assembly_from_s3
from brainscore_core.supported_data_standards.brainio.assemblies import NeuroidAssembly
stimulus_set_registry['MyExperiment2024'] = lambda: load_stimulus_set_from_s3(
identifier='MyExperiment2024',
bucket='brainscore-storage/brainscore-vision/data/...', # S3 path (from upload)
csv_sha1='1d47ea4a09ddd72cebabca95b985646650f21646', # From packaging
zip_sha1='c96036d459f0a2ce4494ba73a2b18b8eec59f6b6', # From packaging
csv_version_id='U405jdh2ECWCzwoZauxh0VNDSAHAb.2s', # From upload
zip_version_id='cTVQxPGpom_seCpwN2ltG_LK7eYGbZid', # From upload
filename_prefix='stimulus_'
)
data_registry['MyExperiment2024'] = lambda: load_assembly_from_s3(
identifier='MyExperiment2024',
bucket='brainscore-storage/brainscore-vision/data/...', # S3 path (from upload)
sha1='3c45d95e2e61616c885758df0d463fddbfc2b427', # From packaging
version_id='yusgi5xpyrNzU10cjk69Z49G.CSyujXO', # From upload
cls=NeuroidAssembly,
stimulus_set_loader=lambda: load_stimulus_set('MyExperiment2024')
)
For complete details on data packaging, see Data Packaging.
Creating a GitHub PR
With your data uploaded and code tested, you're ready to submit a pull request.
1. Fork and Clone
# Fork the repository on GitHub, then:
git clone https://github.com/YOUR_USERNAME/vision.git
cd vision
git remote add upstream https://github.com/brain-score/vision.git
2. Create a Feature Branch
git checkout -b benchmark/myexperiment2024
3. Add Your Files
Your submission should include:
vision/
βββ brainscore_vision/
β βββ benchmarks/
β β βββ myexperiment2024/
β β βββ __init__.py # Benchmark registration
β β βββ benchmark.py # Benchmark implementation
β β βββ test.py # Unit tests
β β βββ requirements.txt # Extra dependencies (if any)
β βββ data/
β βββ myexperiment2024/
β βββ __init__.py # Data registration
β βββ test.py # Data loading tests
4. Commit and Push
git add .
git commit -m "Add MyExperiment2024 benchmark
- Neural/behavioral benchmark comparing model X to human data
- Uses accuracy/PLS metric with split-half ceiling
- Based on Author et al. (2024) paper"
git push origin benchmark/myexperiment2024
5. Create Pull Request
- Go to github.com/brain-score/vision
- Click "New Pull Request"
- Select your fork and branch
- Fill out the PR template with:
- Description: What the benchmark measures
- Data source: Paper citation and data availability
- Testing: Confirm tests pass locally
- Expected scores: Document expected scores for at least one model
PR Template Example
## Benchmark: MyExperiment2024
### Description
This benchmark evaluates model alignment with human behavioral data from
Author et al. (2024), measuring accuracy on a visual categorization task.
### Components
- **Stimulus Set**: 500 images across 10 categories
- **Data Assembly**: Behavioral responses from 50 human subjects
- **Metric**: Accuracy with split-half ceiling
### Testing
- [x] All unit tests pass locally
- [x] Benchmark loads correctly
- [x] Ceiling is valid (0.85 Β± 0.02)
- [x] Tested with alexnet: score = 0.42
### Citation
@article{author2024, ...}
What to Expect
After submitting your PR, here's the typical review process:
1. Code Review
A Brain-Score maintainer will review your submission for:
| Aspect | What They Check |
|---|---|
| Code quality | Clean, readable, follows conventions |
| Scientific validity | Benchmark correctly implements the paper's methodology |
| Data integrity | Data is properly packaged and accessible |
| Ceiling implementation | Ceiling is appropriate and correctly computed |
| Testing coverage | Tests adequately verify benchmark functionality |
| Documentation | Code is well-commented, bibtex is correct |
2. Timeline
| Stage | Typical Duration |
|---|---|
| Initial review | 1-2 weeks |
| Revisions (if needed) | Varies |
| Final approval | 1-3 days |
| Merge and deployment | Same day |
3. After Merge
Once your PR is merged:
- Benchmark becomes available on the Brain-Score platform
- Models are automatically scored against your benchmark
- Leaderboard updates to show model performance
- Your citation is displayed with the benchmark
5. Common Review Feedback
Be prepared to address:
- Ceiling concerns: "Can you explain why the ceiling is computed this way?"
- Test coverage: "Please add a test for the edge case where..."
- Documentation: "The bibtex is missing"
- Code style: "Please use consistent naming conventions"
Submission Checklist
Before submitting your PR, verify:
Data
- [ ] Stimulus set has unique IDs and all files exist
- [ ] Data assembly has no NaN values
- [ ] Data is uploaded to S3 with correct SHA hashes
- [ ] Data registration includes proper loader functions
Benchmark
- [ ] Inherits from
BenchmarkBase(or appropriate helper) - [ ]
identifierfollows naming convention:AuthorYear-metric - [ ]
versionis set (start at 1) - [ ]
ceiling_funcis implemented and returns valid Score - [ ]
bibtexincludes complete citation - [ ]
parentis set correctly (V1, V4, IT, behavior, etc.)
Testing
- [ ] Unit tests verify benchmark loads
- [ ] Unit tests verify ceiling is valid
- [ ] Unit tests verify benchmark runs with at least one model
- [ ] All tests pass locally
Documentation
- [ ] Code is well-commented
- [ ] PR description is complete
- [ ] Expected scores documented for reference model
Getting Help
If you encounter issues during submission:
- GitHub Issues: Open an issue on the vision repository
- Slack: Slack
Summary
The benchmark submission process follows these key steps:
1. Package Data β Validate and format stimuli + assemblies
2. Create Benchmark β Implement benchmark class with ceiling
3. Test Locally β Write and run unit tests
4. Upload Data β Package and upload to S3
5. Submit PR β Fork, commit, and create pull request
6. Review Process β Address feedback and iterate
7. Merge & Deploy β Benchmark goes live on Brain-Score
Congratulations on contributing to Brain-Score! Your benchmark will help evaluate how well models align with biological intelligence.