{
"cells": [
{
"cell_type": "markdown",
"id": "50af6f85",
"metadata": {},
"source": [
"# Accessing Visual Behavior Neuropixels Data"
]
},
{
"cell_type": "markdown",
"id": "83baaafe",
"metadata": {},
"source": [
"## Tutorial overview\n",
"\n",
"This Jupyter notebook covers the various methods for accessing the Allen Institute Visual Behavior Neuropixels dataset. We will go over how to request data, where it's stored, and what the various files contain. If you're having trouble downloading the data, or you just want to know more about what's going on under the hood, this is a good place to start.\n",
"\n",
"This data release will not have a web interface for browsing through the released data, as with the [two-photon imaging Visual Coding dataset](http://observatory.brain-map.org/visualcoding). Instead, the data must be retrieved through the AllenSDK (Python 3.6+) or via requests sent to the **Amazon Web Services (AWS)** **Simple Storage Service (S3)** bucket (name: [visual-behavior-neuropixels-data](https://s3.console.aws.amazon.com/s3/buckets/visual-behavior-neuropixels-data)) for this project.\n",
"\n",
"Functions related to data analysis as well as descriptions of metadata table columns will be covered in other tutorials. For a full list of available tutorials for this project, see the [SDK documentation](https://allensdk.readthedocs.io/en/latest/visual_behavior_optical_physiology.html)."
]
},
{
"cell_type": "markdown",
"id": "53e75988",
"metadata": {},
"source": [
"## Options for data access\n",
"\n",
"The `VisualBehaviorNeuropixelsProjectCache` object in the AllenSDK is the easiest way to interact with the released data. This object abstracts away the details of on-disk file storage, and delivers the data to you as ready-to-analyze Python objects. The cache will automatically keep track of which files are stored locally, and will download additional files on an as-needed basis. Usually you won't need to worry about the organization of these files, but this tutorial will cover those details in case you want to analyze them without using the AllenSDK (e.g., in Matlab). This tutorial begins with an introduction to this approach.\n",
"\n",
"Another option is to directly download the data using an S3 URL. This should be used if the other options are broken or are not available to you. Instructions for this can be found at the end of this tutorial."
]
},
{
"cell_type": "markdown",
"id": "75b3ae20",
"metadata": {},
"source": [
"## Using the AllenSDK to retrieve data\n",
"\n",
"Most users will want to access data via the AllenSDK. This requires nothing more than a Python interpreter and some free disk space to store the data locally.\n",
"\n",
"How much data is there? If you want to download the complete dataset (153 NWB files plus 5 metadata csv files), you'll need 524 GB of space.\n",
"\n",
"Before downloading the data, you must decide on a cache directory where you would like downloaded data to be stored. This directory is where the `VisualBehaviorNeuropixelsProjectCache` object will look first when you request a metadata table or a data file.\n",
"\n",
"When you initialize a local cache for the first time, it will create the manifest file at the path that you specify. This file lives one directory up from the rest of the data, so make sure you put it somewhere that has enough space available.\n",
"\n",
"When you need to access the data in subsequent analysis sessions, you should point the `VisualBehaviorNeuropixelsProjectCache` object to an existing cache directory; otherwise, it will try to re-download the data in a new location.\n",
"\n",
"To get started with this approach, first take care of the necessary imports:\n",
"\n",
"We will first install allensdk into your environment by running the appropriate commands below. "
]
},
{
"cell_type": "markdown",
"id": "f13f8c95",
"metadata": {},
"source": [
"## Instal AllenSDK into your local environment"
]
},
{
"cell_type": "markdown",
"id": "394255fc",
"metadata": {},
"source": [
"You can install AllenSDK with:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "abd813fe",
"metadata": {},
"outputs": [],
"source": [
"pip install allensdk"
]
},
{
"cell_type": "markdown",
"id": "e2021a8e",
"metadata": {},
"source": [
"## Install AllenSDK into your notebook environment (good for Google Colab)"
]
},
{
"cell_type": "markdown",
"id": "9765633d",
"metadata": {},
"source": [
"You can install AllenSDK into your notebook environment by executing the cell below.\n",
"\n",
"If using Google Colab, click on the RESTART RUNTIME button that appears at the end of the output when this cell is complete,. Note that running this cell will produce a long list of outputs and some error messages. Clicking RESTART RUNTIME at the end will resolve these issues.\n",
"You can minimize the cell after you are done to hide the output."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "18345f53",
"metadata": {},
"outputs": [],
"source": [
"!pip install --upgrade pip\n",
"!pip install allensdk"
]
},
{
"cell_type": "markdown",
"id": "f0592bb2",
"metadata": {},
"source": [
"## Import required packages"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "9d2f1f6d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Your allensdk version is: 2.13.5\n"
]
}
],
"source": [
"from pathlib import Path\n",
"import matplotlib.pyplot as plt\n",
"\n",
"import allensdk\n",
"from allensdk.brain_observatory.behavior.behavior_project_cache import VisualBehaviorNeuropixelsProjectCache\n",
"\n",
"# Confirming your allensdk version\n",
"print(f\"Your allensdk version is: {allensdk.__version__}\")"
]
},
{
"cell_type": "markdown",
"id": "6e268380",
"metadata": {},
"source": [
"Next, we'll specify the directory where you'd like downloaded data to be stored (cache_dir). Remember to choose a location that has plenty of free space available."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "ca3f1f01",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"ecephys_sessions.csv: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 63.5k/63.5k [00:00<00:00, 146kMB/s]\n",
"behavior_sessions.csv: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 531k/531k [00:00<00:00, 959kMB/s]\n",
"units.csv: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 132M/132M [00:14<00:00, 8.87MMB/s]\n",
"probes.csv: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 127k/127k [00:00<00:00, 614kMB/s]\n",
"channels.csv: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 27.9M/27.9M [00:09<00:00, 3.01MMB/s]\n"
]
}
],
"source": [
"# Update this to a valid directory in your filesystem\n",
"data_storage_directory = Path(\"/tmp/vbn_cache\")\n",
"\n",
"cache = VisualBehaviorNeuropixelsProjectCache.from_s3_cache(cache_dir=data_storage_directory)"
]
},
{
"cell_type": "markdown",
"id": "4fed1ef9",
"metadata": {},
"source": [
"Instantiating the cache will have it to download 5 project metadata files:\n",
"\n",
"1. `ecephys_sessions.csv` (64 kB)\n",
"2. `behavior_sessions.csv` (531 kB)\n",
"3. `units.csv` (130 MB)\n",
"4. `probes.csv` (127 kB)\n",
"5. `channels.csv` (28 MB)\n",
"\n",
"Each one contains a table of information related to its file name. If you're using the AllenSDK, you won't have to worry about how these files are formatted. Instead, you'll load the relevant data using specific accessor method: `get_ecephys_session_table()`, `get_behavior_session_table()`, `get_probe_table()`, `get_unit_table()` and `get_channel_table()`. These functions return a pandas DataFrame containing a row for each item and a column for each metric.\n",
"\n",
"If you are analyzing data without using the AllenSDK, you can load the data using your CSV file reader of choice. However, please be aware the columns in the original file do not necessarily match what's returned by the AllenSDK, which may combine information from multiple files to produce the final DataFrame."
]
},
{
"cell_type": "markdown",
"id": "7f96ddc9",
"metadata": {},
"source": [
"### Managing versions of the dataset\n",
"\n",
"Over time, updates may be made to the released dataset. These updates will result in new versions of the dataset being available in the S3 bucket. The versions of the dataset are managed through distinct data manifests stored on S3.\n",
"\n",
"**Note:** Some of the cells below may seem a little pointless at first, since there is only one version of the data release as of this writing (June 7, 2022). We are leaving them here for reference so that future users know how to navigate different versions of the data release as they are issued."
]
},
{
"cell_type": "markdown",
"id": "b5b340e3",
"metadata": {},
"source": [
"#### Discovering manifests\n",
"\n",
"To see all of the manifest files available for this dataset online, run"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "41cd25c1",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['visual-behavior-neuropixels_project_manifest_v0.1.0.json',\n",
" 'visual-behavior-neuropixels_project_manifest_v0.2.0.json',\n",
" 'visual-behavior-neuropixels_project_manifest_v0.3.0.json']"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cache.list_manifest_file_names()"
]
},
{
"cell_type": "markdown",
"id": "37824cce",
"metadata": {},
"source": [
"To see the most up-to-date available manifest, run"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "4538c01a",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'visual-behavior-neuropixels_project_manifest_v0.3.0.json'"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cache.latest_manifest_file()"
]
},
{
"cell_type": "markdown",
"id": "c6803645",
"metadata": {},
"source": [
"To see the name of the most up-to-date manifest that you have already downloaded to your system run (note: this just means that the manifest file has been downloaded; it does not necessarily mean that any data has been downloaded)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "0a846392",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'visual-behavior-neuropixels_project_manifest_v0.3.0.json'"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cache.latest_downloaded_manifest_file()"
]
},
{
"cell_type": "markdown",
"id": "7a2a8b2e",
"metadata": {},
"source": [
"You can list all of the manifest files currently downloaded to your system with"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "0a6053d8",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['visual-behavior-neuropixels_project_manifest_v0.3.0.json']"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cache.list_all_downloaded_manifests()"
]
},
{
"cell_type": "markdown",
"id": "08ea42d5",
"metadata": {},
"source": [
"#### Loading manifests/dataset versions\n",
"\n",
"The `VisualBehaviorNeuropixelsProjectCache` determines which version of the dataset to use by loading one of these manifests. By default, the `VisualBehaviorNeuropixelsProjectCache` loads either\n",
"\n",
"- the most up-to-date available data manifest, if you are instaniating it on an empty `cache_dir`\n",
"\n",
"- the data manifest you were last using, if you are instantiating it on a pre-existing `cache_dir` (in this case, the `VisualBehaviorNeuropixelsProjectCache` will emit a warning if a more up-to-data data manifest exists online letting you know that you can, if you choose, move to the more up-to-date data manifest)\n",
"\n",
"To see the manifest that you currently have loaded, run"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "88d47216",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'visual-behavior-neuropixels_project_manifest_v0.3.0.json'"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cache.current_manifest()"
]
},
{
"cell_type": "markdown",
"id": "5cc21ace",
"metadata": {},
"source": [
"To load a particular data manifest by hand, run (note: because we are intentionally loading an out-of-date manifest, this will emit a warning alerting us to the existence of the most up-to-date manifest)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "560f4751",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/adam.amster/AllenSDK/allensdk/api/cloud_cache/cloud_cache.py:466: OutdatedManifestWarning: \n",
"\n",
"The manifest file you are loading is not the most up to date manifest file available for this dataset. The most up to data manifest file available for this dataset is \n",
"\n",
"visual-behavior-neuropixels_project_manifest_v0.3.0.json\n",
"\n",
"To see the differences between these manifests,run\n",
"\n",
"VisualBehaviorNeuropixelsProjectCache.compare_manifests('visual-behavior-neuropixels_project_manifest_v0.2.0.json', 'visual-behavior-neuropixels_project_manifest_v0.3.0.json')\n",
"\n",
"To see all of the manifest files currently downloaded onto your local system, run\n",
"\n",
"self.list_all_downloaded_manifests()\n",
"\n",
"If you just want to load the latest manifest, run\n",
"\n",
"self.load_latest_manifest()\n",
"\n",
"\n",
" warnings.warn(msg, OutdatedManifestWarning)\n"
]
}
],
"source": [
"cache.load_manifest('visual-behavior-neuropixels_project_manifest_v0.2.0.json')"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "6271893c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'visual-behavior-neuropixels_project_manifest_v0.2.0.json'"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cache.current_manifest()"
]
},
{
"cell_type": "markdown",
"id": "81a1578f",
"metadata": {},
"source": [
"As the earlier warning informed us, we can see the difference between an two versions of the dataset by running"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "66cb1f37",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Changes going from\n",
"visual-behavior-neuropixels_project_manifest_v0.1.0.json\n",
"to\n",
"visual-behavior-neuropixels_project_manifest_v0.2.0.json\n",
"\n",
"project_metadata/units.csv changed\n",
"\n"
]
}
],
"source": [
"# This cell will not be useful until an updated version of the data release is issued\n",
"\n",
"msg = cache.compare_manifests('visual-behavior-neuropixels_project_manifest_v0.1.0.json',\n",
" 'visual-behavior-neuropixels_project_manifest_v0.2.0.json')\n",
"print(msg)"
]
},
{
"cell_type": "markdown",
"id": "3a3f4c95",
"metadata": {},
"source": [
"The `VisualBehaviorNeuropixelsProjectCache` is smart enough to know that, if a file has not changed between version `A` and version `B` of the dataset, and you have already downloaded the file while version `A` of the manifest was loaded, when you move to version `B`, it does not need to download the data again. It will simply construct a symlink where version `B` of the data should exist on your system, pointing to version `A` of the file."
]
},
{
"cell_type": "markdown",
"id": "755f8802",
"metadata": {},
"source": [
"### Using the AllenSDK to access Visual Behavior Neuropixels metadata"
]
},
{
"cell_type": "markdown",
"id": "8076f8d9",
"metadata": {},
"source": [
"#### Ecephys sessions table\n",
"\n",
"Let's look at the contents of `ecephys_sessions.csv`:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "663a4a46",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Total number of ecephys sessions: 103\n"
]
},
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
behavior_session_id
\n",
"
date_of_acquisition
\n",
"
equipment_name
\n",
"
session_type
\n",
"
mouse_id
\n",
"
genotype
\n",
"
sex
\n",
"
project_code
\n",
"
age_in_days
\n",
"
unit_count
\n",
"
...
\n",
"
channel_count
\n",
"
structure_acronyms
\n",
"
image_set
\n",
"
prior_exposures_to_image_set
\n",
"
session_number
\n",
"
experience_level
\n",
"
prior_exposures_to_omissions
\n",
"
file_id
\n",
"
abnormal_histology
\n",
"
abnormal_activity
\n",
"
\n",
"
\n",
"
ecephys_session_id
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
"
\n",
"
1052342277
\n",
"
1052374521
\n",
"
2020-09-23 15:34:18.179
\n",
"
NP.1
\n",
"
EPHYS_1_images_G_3uL_reward
\n",
"
530862
\n",
"
Vip-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt
\n",
"
M
\n",
"
NeuropixelVisualBehavior
\n",
"
148
\n",
"
1696.0
\n",
"
...
\n",
"
2304.0
\n",
"
['APN', 'CA1', 'CA3', 'DG-mo', 'DG-po', 'DG-sg...
\n",
"
G
\n",
"
32.0
\n",
"
1
\n",
"
Familiar
\n",
"
0.0
\n",
"
0
\n",
"
NaN
\n",
"
NaN
\n",
"
\n",
"
\n",
"
1051155866
\n",
"
1052162536
\n",
"
2020-09-17 15:05:39.665
\n",
"
NP.1
\n",
"
EPHYS_1_images_H_3uL_reward
\n",
"
524760
\n",
"
wt/wt
\n",
"
F
\n",
"
NeuropixelVisualBehavior
\n",
"
180
\n",
"
1922.0
\n",
"
...
\n",
"
2304.0
\n",
"
['APN', 'CA1', 'CA3', 'DG-mo', 'DG-po', 'DG-sg...
\n",
"
H
\n",
"
0.0
\n",
"
2
\n",
"
Novel
\n",
"
1.0
\n",
"
1
\n",
"
NaN
\n",
"
NaN
\n",
"
\n",
"
\n",
"
1052533639
\n",
"
1052572359
\n",
"
2020-09-24 15:12:13.229
\n",
"
NP.1
\n",
"
EPHYS_1_images_H_3uL_reward
\n",
"
530862
\n",
"
Vip-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt
\n",
"
M
\n",
"
NeuropixelVisualBehavior
\n",
"
149
\n",
"
1677.0
\n",
"
...
\n",
"
2304.0
\n",
"
['APN', 'CA1', 'CA3', 'DG-mo', 'DG-po', 'DG-sg...
\n",
"
H
\n",
"
0.0
\n",
"
2
\n",
"
Novel
\n",
"
1.0
\n",
"
4
\n",
"
NaN
\n",
"
NaN
\n",
"
\n",
"
\n",
"
1053925378
\n",
"
1053960984
\n",
"
2020-10-01 16:07:18.990
\n",
"
NP.0
\n",
"
EPHYS_1_images_H_3uL_reward
\n",
"
532246
\n",
"
Vip-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt
\n",
"
M
\n",
"
NeuropixelVisualBehavior
\n",
"
145
\n",
"
1823.0
\n",
"
...
\n",
"
2304.0
\n",
"
['APN', 'CA1', 'CA3', 'DG-mo', 'DG-po', 'DG-sg...
\n",
"
H
\n",
"
0.0
\n",
"
2
\n",
"
Novel
\n",
"
1.0
\n",
"
5
\n",
"
NaN
\n",
"
NaN
\n",
"
\n",
"
\n",
"
1053941483
\n",
"
1053960987
\n",
"
2020-10-01 17:03:58.362
\n",
"
NP.1
\n",
"
EPHYS_1_images_H_3uL_reward
\n",
"
527749
\n",
"
Sst-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt
\n",
"
M
\n",
"
NeuropixelVisualBehavior
\n",
"
180
\n",
"
1543.0
\n",
"
...
\n",
"
2304.0
\n",
"
['APN', 'CA1', 'CA3', 'DG-mo', 'DG-po', 'DG-sg...
\n",
"
H
\n",
"
0.0
\n",
"
2
\n",
"
Novel
\n",
"
1.0
\n",
"
6
\n",
"
NaN
\n",
"
NaN
\n",
"
\n",
" \n",
"
\n",
"
5 rows × 21 columns
\n",
"
"
],
"text/plain": [
" behavior_session_id date_of_acquisition \\\n",
"ecephys_session_id \n",
"1052342277 1052374521 2020-09-23 15:34:18.179 \n",
"1051155866 1052162536 2020-09-17 15:05:39.665 \n",
"1052533639 1052572359 2020-09-24 15:12:13.229 \n",
"1053925378 1053960984 2020-10-01 16:07:18.990 \n",
"1053941483 1053960987 2020-10-01 17:03:58.362 \n",
"\n",
" equipment_name session_type mouse_id \\\n",
"ecephys_session_id \n",
"1052342277 NP.1 EPHYS_1_images_G_3uL_reward 530862 \n",
"1051155866 NP.1 EPHYS_1_images_H_3uL_reward 524760 \n",
"1052533639 NP.1 EPHYS_1_images_H_3uL_reward 530862 \n",
"1053925378 NP.0 EPHYS_1_images_H_3uL_reward 532246 \n",
"1053941483 NP.1 EPHYS_1_images_H_3uL_reward 527749 \n",
"\n",
" genotype sex \\\n",
"ecephys_session_id \n",
"1052342277 Vip-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt M \n",
"1051155866 wt/wt F \n",
"1052533639 Vip-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt M \n",
"1053925378 Vip-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt M \n",
"1053941483 Sst-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt M \n",
"\n",
" project_code age_in_days unit_count ... \\\n",
"ecephys_session_id ... \n",
"1052342277 NeuropixelVisualBehavior 148 1696.0 ... \n",
"1051155866 NeuropixelVisualBehavior 180 1922.0 ... \n",
"1052533639 NeuropixelVisualBehavior 149 1677.0 ... \n",
"1053925378 NeuropixelVisualBehavior 145 1823.0 ... \n",
"1053941483 NeuropixelVisualBehavior 180 1543.0 ... \n",
"\n",
" channel_count \\\n",
"ecephys_session_id \n",
"1052342277 2304.0 \n",
"1051155866 2304.0 \n",
"1052533639 2304.0 \n",
"1053925378 2304.0 \n",
"1053941483 2304.0 \n",
"\n",
" structure_acronyms \\\n",
"ecephys_session_id \n",
"1052342277 ['APN', 'CA1', 'CA3', 'DG-mo', 'DG-po', 'DG-sg... \n",
"1051155866 ['APN', 'CA1', 'CA3', 'DG-mo', 'DG-po', 'DG-sg... \n",
"1052533639 ['APN', 'CA1', 'CA3', 'DG-mo', 'DG-po', 'DG-sg... \n",
"1053925378 ['APN', 'CA1', 'CA3', 'DG-mo', 'DG-po', 'DG-sg... \n",
"1053941483 ['APN', 'CA1', 'CA3', 'DG-mo', 'DG-po', 'DG-sg... \n",
"\n",
" image_set prior_exposures_to_image_set session_number \\\n",
"ecephys_session_id \n",
"1052342277 G 32.0 1 \n",
"1051155866 H 0.0 2 \n",
"1052533639 H 0.0 2 \n",
"1053925378 H 0.0 2 \n",
"1053941483 H 0.0 2 \n",
"\n",
" experience_level prior_exposures_to_omissions file_id \\\n",
"ecephys_session_id \n",
"1052342277 Familiar 0.0 0 \n",
"1051155866 Novel 1.0 1 \n",
"1052533639 Novel 1.0 4 \n",
"1053925378 Novel 1.0 5 \n",
"1053941483 Novel 1.0 6 \n",
"\n",
" abnormal_histology abnormal_activity \n",
"ecephys_session_id \n",
"1052342277 NaN NaN \n",
"1051155866 NaN NaN \n",
"1052533639 NaN NaN \n",
"1053925378 NaN NaN \n",
"1053941483 NaN NaN \n",
"\n",
"[5 rows x 21 columns]"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ecephys_sessions = cache.get_ecephys_session_table()\n",
"\n",
"print(f\"Total number of ecephys sessions: {len(ecephys_sessions)}\")\n",
"\n",
"ecephys_sessions.head()"
]
},
{
"cell_type": "markdown",
"id": "be96dfe8",
"metadata": {},
"source": [
"The `ecephys_session_table` DataFrame provides a high-level overview for ecephys sessions in the Visual Behavior Neurpoixels dataset. The index column (ecephys_session_id) is a unique ID, which serves as a key for access behavior data for each session. To get additional information about this data table (and other tables) please visit [this example notebook](https://allensdk.readthedocs.io/en/latest/_static/examples/nb/visual_behavior_neuropixels_quickstart.html).\n",
"\n",
"Sharp eyed readers may be wondering why the number of behavior session (103) in this table does not match up with the number of NWB files in the data release (153). Some of the session being released had obvious abnormalities in either electrophysiological activity or histology. By default `get_ecephys_session_table()` does not return the metadata for these abnormal sessions. To see all 153 sessions, run `get_ecephys_session_table(filter_abnormalities=False)`\n"
]
},
{
"cell_type": "markdown",
"id": "d2016732",
"metadata": {},
"source": [
"#### Behavior sessions table\n",
"\n",
"Let's look at the contents of `behavior_sessions.csv`:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "b82d40c1",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Total number of behavior sessions: 3424\n"
]
},
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
equipment_name
\n",
"
genotype
\n",
"
mouse_id
\n",
"
sex
\n",
"
age_in_days
\n",
"
session_number
\n",
"
prior_exposures_to_session_type
\n",
"
prior_exposures_to_image_set
\n",
"
prior_exposures_to_omissions
\n",
"
ecephys_session_id
\n",
"
date_of_acquisition
\n",
"
session_type
\n",
"
image_set
\n",
"
\n",
"
\n",
"
behavior_session_id
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
"
\n",
"
1051333618
\n",
"
BEH.G-Box2
\n",
"
Vip-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt
\n",
"
540536
\n",
"
M
\n",
"
85
\n",
"
1
\n",
"
0
\n",
"
NaN
\n",
"
0.0
\n",
"
NaN
\n",
"
2020-09-18 10:02:30.869000
\n",
"
TRAINING_0_gratings_autorewards_15min_0uL_reward
\n",
"
NaN
\n",
"
\n",
"
\n",
"
1052301754
\n",
"
BEH.G-Box2
\n",
"
Vip-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt
\n",
"
540536
\n",
"
M
\n",
"
90
\n",
"
4
\n",
"
2
\n",
"
NaN
\n",
"
0.0
\n",
"
NaN
\n",
"
2020-09-23 09:43:25.595000
\n",
"
TRAINING_1_gratings_10uL_reward
\n",
"
NaN
\n",
"
\n",
"
\n",
"
1052374521
\n",
"
NP.1
\n",
"
Vip-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt
\n",
"
530862
\n",
"
M
\n",
"
148
\n",
"
44
\n",
"
0
\n",
"
32.0
\n",
"
0.0
\n",
"
1.052342e+09
\n",
"
2020-09-23 15:34:18.179000
\n",
"
EPHYS_1_images_G_3uL_reward
\n",
"
G
\n",
"
\n",
"
\n",
"
1051860415
\n",
"
BEH.G-Box4
\n",
"
wt/wt
\n",
"
533539
\n",
"
F
\n",
"
127
\n",
"
9
\n",
"
0
\n",
"
3.0
\n",
"
0.0
\n",
"
NaN
\n",
"
2020-09-21 09:57:23.650000
\n",
"
TRAINING_4_images_G_training_7uL_reward
\n",
"
G
\n",
"
\n",
"
\n",
"
1052132182
\n",
"
BEH.F-Box5
\n",
"
Vip-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt
\n",
"
536480
\n",
"
M
\n",
"
112
\n",
"
8
\n",
"
1
\n",
"
1.0
\n",
"
0.0
\n",
"
NaN
\n",
"
2020-09-22 12:04:46.304000
\n",
"
TRAINING_3_images_G_10uL_reward
\n",
"
G
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" equipment_name \\\n",
"behavior_session_id \n",
"1051333618 BEH.G-Box2 \n",
"1052301754 BEH.G-Box2 \n",
"1052374521 NP.1 \n",
"1051860415 BEH.G-Box4 \n",
"1052132182 BEH.F-Box5 \n",
"\n",
" genotype mouse_id \\\n",
"behavior_session_id \n",
"1051333618 Vip-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt 540536 \n",
"1052301754 Vip-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt 540536 \n",
"1052374521 Vip-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt 530862 \n",
"1051860415 wt/wt 533539 \n",
"1052132182 Vip-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt 536480 \n",
"\n",
" sex age_in_days session_number \\\n",
"behavior_session_id \n",
"1051333618 M 85 1 \n",
"1052301754 M 90 4 \n",
"1052374521 M 148 44 \n",
"1051860415 F 127 9 \n",
"1052132182 M 112 8 \n",
"\n",
" prior_exposures_to_session_type \\\n",
"behavior_session_id \n",
"1051333618 0 \n",
"1052301754 2 \n",
"1052374521 0 \n",
"1051860415 0 \n",
"1052132182 1 \n",
"\n",
" prior_exposures_to_image_set \\\n",
"behavior_session_id \n",
"1051333618 NaN \n",
"1052301754 NaN \n",
"1052374521 32.0 \n",
"1051860415 3.0 \n",
"1052132182 1.0 \n",
"\n",
" prior_exposures_to_omissions ecephys_session_id \\\n",
"behavior_session_id \n",
"1051333618 0.0 NaN \n",
"1052301754 0.0 NaN \n",
"1052374521 0.0 1.052342e+09 \n",
"1051860415 0.0 NaN \n",
"1052132182 0.0 NaN \n",
"\n",
" date_of_acquisition \\\n",
"behavior_session_id \n",
"1051333618 2020-09-18 10:02:30.869000 \n",
"1052301754 2020-09-23 09:43:25.595000 \n",
"1052374521 2020-09-23 15:34:18.179000 \n",
"1051860415 2020-09-21 09:57:23.650000 \n",
"1052132182 2020-09-22 12:04:46.304000 \n",
"\n",
" session_type \\\n",
"behavior_session_id \n",
"1051333618 TRAINING_0_gratings_autorewards_15min_0uL_reward \n",
"1052301754 TRAINING_1_gratings_10uL_reward \n",
"1052374521 EPHYS_1_images_G_3uL_reward \n",
"1051860415 TRAINING_4_images_G_training_7uL_reward \n",
"1052132182 TRAINING_3_images_G_10uL_reward \n",
"\n",
" image_set \n",
"behavior_session_id \n",
"1051333618 NaN \n",
"1052301754 NaN \n",
"1052374521 G \n",
"1051860415 G \n",
"1052132182 G "
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"behavior_sessions = cache.get_behavior_session_table()\n",
"\n",
"print(f\"Total number of behavior sessions: {len(behavior_sessions)}\")\n",
"\n",
"behavior_sessions.head()"
]
},
{
"cell_type": "markdown",
"id": "405b5888",
"metadata": {},
"source": [
"This file contains metadata summarizing every behavior session experienced by the mice in this data release. By filtering on the `mouse_id` column, it can be used to reconstruct the history of any given mouse as it passed through our experimental apparatus."
]
},
{
"cell_type": "markdown",
"id": "acb4f7a9",
"metadata": {},
"source": [
"#### Probes table\n",
"\n",
"Let's look at the contents of `probes.csv`:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "c8ccfb5a",
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Total number of probes: 905\n"
]
},
{
"data": {
"text/html": [
"
"
],
"text/plain": [
" ecephys_channel_id ecephys_probe_id ecephys_session_id \\\n",
"unit_id \n",
"1157005856 1157001834 1046469925 1046166369 \n",
"1157005853 1157001834 1046469925 1046166369 \n",
"1157005720 1157001786 1046469925 1046166369 \n",
"1157006074 1157001929 1046469925 1046166369 \n",
"1157006072 1157001929 1046469925 1046166369 \n",
"\n",
" amplitude_cutoff anterior_posterior_ccf_coordinate \\\n",
"unit_id \n",
"1157005856 0.500000 8453.0 \n",
"1157005853 0.323927 8453.0 \n",
"1157005720 0.044133 8575.0 \n",
"1157006074 0.000583 8212.0 \n",
"1157006072 0.500000 8212.0 \n",
"\n",
" dorsal_ventral_ccf_coordinate left_right_ccf_coordinate \\\n",
"unit_id \n",
"1157005856 3353.0 6719.0 \n",
"1157005853 3353.0 6719.0 \n",
"1157005720 3842.0 6590.0 \n",
"1157006074 2477.0 6992.0 \n",
"1157006072 2477.0 6992.0 \n",
"\n",
" cumulative_drift d_prime structure_acronym ... valid_data \\\n",
"unit_id ... \n",
"1157005856 140.32 6.088133 MB ... True \n",
"1157005853 239.76 4.635583 MB ... True \n",
"1157005720 263.32 5.691955 MRN ... True \n",
"1157006074 154.64 6.049284 NOT ... True \n",
"1157006072 242.58 4.745499 NOT ... True \n",
"\n",
" amplitude waveform_duration waveform_halfwidth PT_ratio \\\n",
"unit_id \n",
"1157005856 286.132665 0.151089 0.096147 0.310791 \n",
"1157005853 181.418835 0.357119 0.192295 0.531490 \n",
"1157005720 180.866205 0.521943 0.178559 0.612217 \n",
"1157006074 574.984215 0.343384 0.192295 0.470194 \n",
"1157006072 315.794115 0.329648 0.164824 0.488276 \n",
"\n",
" recovery_slope repolarization_slope spread velocity_above \\\n",
"unit_id \n",
"1157005856 -0.227726 0.961313 20.0 -0.457845 \n",
"1157005853 -0.150522 0.732741 30.0 2.060302 \n",
"1157005720 -0.024239 0.539687 80.0 0.000000 \n",
"1157006074 -0.356670 2.258649 40.0 1.373534 \n",
"1157006072 -0.210010 1.320270 70.0 0.412060 \n",
"\n",
" velocity_below \n",
"unit_id \n",
"1157005856 NaN \n",
"1157005853 -2.060302 \n",
"1157005720 0.863364 \n",
"1157006074 0.000000 \n",
"1157006072 0.343384 \n",
"\n",
"[5 rows x 34 columns]"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"units = cache.get_unit_table()\n",
"\n",
"print(f\"Total number of units: {len(units)}\")\n",
"\n",
"units.head()"
]
},
{
"cell_type": "markdown",
"id": "deafdedb",
"metadata": {},
"source": [
"This table provides metadata on the units identified in this data release. Quoting the [Visual Coding Neuropixels documentation](https://allensdk.readthedocs.io/en/latest/visual_coding_neuropixels.html)\n",
"\n",
"\n",
">Throughout the SDK, we refer to neurons as “units,” because we cannot guarantee that all the spikes assigned to one unit actually originate from a single cell. Unlike in two-photon imaging, where you can visualize each neuron throughout the entire experiment, with electrophysiology we can only “see” a neuron when it fires a spike. If a neuron moves relative to the probe, or if it’s far away from the probe, some of its spikes may get mixed together with those from other neurons. Because of this inherent ambiguity, we provide a variety of quality metrics to allow you to find the right units for your analysis. Even highly contaminated units contain potentially valuable information about brain states, so we didn’t want to leave them out of the dataset. But certain types of analysis require more stringent quality thresholds, to ensure that all of the included units are well isolated from their neighbors.\n",
"\n",
"Units are identified by a unique `unit_id` and can be associated to channels, probes, and sessions via `ecephys_channel_id`, `ecephys_probe_id` and `ecephys_session_id`."
]
},
{
"cell_type": "markdown",
"id": "7c1fd9aa",
"metadata": {},
"source": [
"### Using the AllenSDK to access Visual Behavior Neuropixels data\n",
"\n",
"After looking through the metadata for the data release, let's say you want to access information about a specific ecephys session (ecephys_session_id=1052533639)\n",
"\n",
"The following command will download the NWB file associated with a specific `ecephys_session_id` (unless it has been previously downloaded into this cache) and load it into a python object for manipulation and inspection.\n",
"\n",
"**Note:** each NWB file associated with an ecephys esssion is approximately 2GB in size."
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "316205f8",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"ecephys_session_1052533639.nwb: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 2.31G/2.31G [05:34<00:00, 6.92MMB/s]\n",
"/opt/anaconda3/envs/allensdk/lib/python3.8/site-packages/hdmf/spec/namespace.py:532: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.5.1 because version 1.5.0 is already loaded.\n",
" warn(\"Ignoring cached namespace '%s' version %s because version %s is already loaded.\"\n",
"/opt/anaconda3/envs/allensdk/lib/python3.8/site-packages/hdmf/spec/namespace.py:532: UserWarning: Ignoring cached namespace 'hdmf-experimental' version 0.2.0 because version 0.1.0 is already loaded.\n",
" warn(\"Ignoring cached namespace '%s' version %s because version %s is already loaded.\"\n"
]
}
],
"source": [
"ecephys_session = cache.get_ecephys_session(ecephys_session_id=1052533639)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "3b1e8c57",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['behavior_data_class', 'behavior_session_id', 'eye_tracking', 'eye_tracking_rig_geometry', 'get_channels', 'get_performance_metrics', 'get_reward_rate', 'get_rolling_performance_df', 'get_units', 'licks', 'mean_waveforms', 'metadata', 'optotagging_table', 'probes', 'raw_running_speed', 'rewards', 'running_speed', 'spike_amplitudes', 'spike_times', 'stimulus_presentations', 'stimulus_templates', 'stimulus_timestamps', 'task_parameters', 'trials']\n"
]
}
],
"source": [
"# List methods of the session that can be used to get data\n",
"print(ecephys_session.list_data_attributes_and_methods())"
]
},
{
"cell_type": "markdown",
"id": "2f600c35",
"metadata": {},
"source": [
"Let's try viewing one of the visual stimuli presented to the mouse during the behavior session we downloaded:"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "04fa8cec",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
unwarped
\n",
"
warped
\n",
"
\n",
"
\n",
"
image_name
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
"
\n",
"
im104_r
\n",
"
[[nan, nan, nan, nan, nan, nan, nan, nan, nan,...
\n",
"
[[136, 138, 140, 141, 141, 141, 140, 140, 140,...
\n",
"
\n",
"
\n",
"
im114_r
\n",
"
[[nan, nan, nan, nan, nan, nan, nan, nan, nan,...
\n",
"
[[193, 190, 192, 194, 190, 182, 175, 173, 174,...
\n",
"
\n",
"
\n",
"
im083_r
\n",
"
[[nan, nan, nan, nan, nan, nan, nan, nan, nan,...
\n",
"
[[6, 9, 2, 0, 0, 0, 7, 5, 0, 0, 0, 2, 7, 6, 2,...
\n",
"
\n",
"
\n",
"
im005_r
\n",
"
[[nan, nan, nan, nan, nan, nan, nan, nan, nan,...
\n",
"
[[81, 82, 80, 76, 76, 80, 83, 82, 80, 78, 78, ...
\n",
"
\n",
"
\n",
"
im087_r
\n",
"
[[nan, nan, nan, nan, nan, nan, nan, nan, nan,...
\n",
"
[[38, 39, 34, 28, 28, 35, 41, 39, 34, 31, 33, ...
\n",
"
\n",
"
\n",
"
im024_r
\n",
"
[[nan, nan, nan, nan, nan, nan, nan, nan, nan,...
\n",
"
[[19, 21, 15, 8, 8, 17, 23, 22, 15, 11, 14, 19...
\n",
"
\n",
"
\n",
"
im111_r
\n",
"
[[nan, nan, nan, nan, nan, nan, nan, nan, nan,...
\n",
"
[[53, 55, 50, 44, 45, 51, 56, 56, 52, 49, 50, ...
\n",
"
\n",
"
\n",
"
im034_r
\n",
"
[[nan, nan, nan, nan, nan, nan, nan, nan, nan,...
\n",
"
[[124, 126, 128, 128, 129, 129, 129, 129, 127,...
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" unwarped \\\n",
"image_name \n",
"im104_r [[nan, nan, nan, nan, nan, nan, nan, nan, nan,... \n",
"im114_r [[nan, nan, nan, nan, nan, nan, nan, nan, nan,... \n",
"im083_r [[nan, nan, nan, nan, nan, nan, nan, nan, nan,... \n",
"im005_r [[nan, nan, nan, nan, nan, nan, nan, nan, nan,... \n",
"im087_r [[nan, nan, nan, nan, nan, nan, nan, nan, nan,... \n",
"im024_r [[nan, nan, nan, nan, nan, nan, nan, nan, nan,... \n",
"im111_r [[nan, nan, nan, nan, nan, nan, nan, nan, nan,... \n",
"im034_r [[nan, nan, nan, nan, nan, nan, nan, nan, nan,... \n",
"\n",
" warped \n",
"image_name \n",
"im104_r [[136, 138, 140, 141, 141, 141, 140, 140, 140,... \n",
"im114_r [[193, 190, 192, 194, 190, 182, 175, 173, 174,... \n",
"im083_r [[6, 9, 2, 0, 0, 0, 7, 5, 0, 0, 0, 2, 7, 6, 2,... \n",
"im005_r [[81, 82, 80, 76, 76, 80, 83, 82, 80, 78, 78, ... \n",
"im087_r [[38, 39, 34, 28, 28, 35, 41, 39, 34, 31, 33, ... \n",
"im024_r [[19, 21, 15, 8, 8, 17, 23, 22, 15, 11, 14, 19... \n",
"im111_r [[53, 55, 50, 44, 45, 51, 56, 56, 52, 49, 50, ... \n",
"im034_r [[124, 126, 128, 128, 129, 129, 129, 129, 127,... "
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Listing the different stimuli templates\n",
"ecephys_session.stimulus_templates"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "32dde8ff",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# Visualizing a particular stimulus\n",
"plt.imshow(ecephys_session.stimulus_templates['warped']['im104_r'], cmap='gray')"
]
},
{
"cell_type": "markdown",
"id": "1a0831ef",
"metadata": {},
"source": [
"As you can see, the `ecephy_session` object has a lot of attributes and methods that can be used to access underlying data in the NWB file. Most of these will be touched on in other tutorials for [this data release](http://portal.brain-map.org/explore/circuits/visual-behavior-neuropixels).\n",
"\n",
"Now let's see how to get data for a particular ecephys session:"
]
},
{
"cell_type": "markdown",
"id": "4fcbd1bd",
"metadata": {},
"source": [
"#### Downloading the complete dataset with AllenSDK\n",
"\n",
"Analyzing one session or experiment at a time is nice, but in some cases you'll want to be able to perform an analysis across the whole dataset. To fill your cache with all available data, you can use a for loop like the one below.\n",
"\n",
"Before running this code, please make sure that you have enough space available in your cache directory. You'll need around 524 GB for to contain all of the NWB files."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cb91c928",
"metadata": {},
"outputs": [],
"source": [
"# Remove rows from the behavior sessions table which don't correspond to a behavior session NWB file\n",
"filtered_ecephys_sessions = ecephys_sessions.dropna(subset=[\"file_id\"])\n",
"\n",
"for ecephys_session_id, _ in filtered_ecephys_sessions.iterrows():\n",
" cache.get_ecephys_session(ecephys_session_id=ecephys_session_id)\n"
]
},
{
"cell_type": "markdown",
"id": "11d58792",
"metadata": {},
"source": [
"## Direct download of data from S3\n",
"\n",
"If you do not wish to obtain data via the AllenSDK `VisualBehaviorNeuropixelsProjectCache` class, this section describes how to directly determine an S3 download link for your file or files of interest.\n",
"\n",
"The S3 bucket that stores all the data for this project's release is: \n",
"https://visual-behavior-neuropixels-data.s3.us-west-2.amazonaws.com\n",
"\n",
"The structure of the S3 bucket looks like:\n",
"\n",
"```\n",
"visual-behavior-neuropixels-2022/\n",
"│\n",
"├── release_notes.txt\n",
"│\n",
"├── manifests/\n",
"│ ├── visual-behavior-neuropixels_project_manifest_v{a.b.c}.json\n",
"│ ├── visual-behavior-neuropixels_project_manifest_v{x.y.z}.json\n",
"│ ...\n",
"│\n",
"├── project_metadata/\n",
"│ ├── ecephys_sessions.csv\n",
"│ ├── behavior_sessions.csv\n",
"│ ├── probes.csv\n",
"│ ├── channels.csv\n",
"│ ├── units.csv\n",
"│\n",
"└── ecephys_sessions/\n",
" ├── ecephys_session_{abc}.nwb\n",
" ├── ecephys_session_{xyz}.nwb\n",
" ...\n",
"```\n",
"\n",
"So if for example, you wanted to download a specific `ecephys_session` you could first download the `ecephys_sessions.csv` with:\n",
"\n",
"https://visual-behavior-neuropixels-data.s3.us-west-2.amazonaws.com/visual-behavior-neuropixels/project_metadata/ecephys_sessions.csv (try clicking me!)\n",
"\n",
"Then using the table, determine the `ecephys_session_id` you are interested in. Let's say we want `ecephys_session_id = 1043752325`, then the appropriate download link would be:\n",
"\n",
"https://visual-behavior-neuropixels-data.s3.us-west-2.amazonaws.com/visual-behavior-neuropixels/ecephys_sessions/ecephys_session_1043752325.nwb\n",
"\n",
"Below are some simple sample functions that will help you efficiently determine download URL links:"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "c12a4819",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"https://visual-behavior-neuropixels-data.s3.us-west-2.amazonaws.com/visual-behavior-neuropixels/manifests/visual-behavior-neuropixels_project_manifest_v0.1.0.json\n"
]
}
],
"source": [
"from urllib.parse import urljoin\n",
"\n",
"def get_manifest_url(manifest_version: str) -> str:\n",
" hostname = \"https://visual-behavior-neuropixels-data.s3.us-west-2.amazonaws.com\"\n",
" object_key = f\"visual-behavior-neuropixels/manifests/visual-behavior-neuropixels_project_manifest_v{manifest_version}.json\"\n",
" return urljoin(hostname, object_key)\n",
"\n",
"# Example:\n",
"print(get_manifest_url(\"0.1.0\"))"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "685c9cff",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"https://visual-behavior-neuropixels-data.s3.us-west-2.amazonaws.com/visual-behavior-neuropixels/project_metadata/behavior_sessions.csv\n"
]
}
],
"source": [
"def get_metadata_url(metadata_table_name: str) -> str:\n",
" hostname = \"https://visual-behavior-neuropixels-data.s3.us-west-2.amazonaws.com\"\n",
" object_key = f\"visual-behavior-neuropixels/project_metadata/{metadata_table_name}.csv\"\n",
" return urljoin(hostname, object_key)\n",
"\n",
"# Example:\n",
"print(get_metadata_url(\"behavior_sessions\"))"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "e1d07772",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"https://visual-behavior-neuropixels-data.s3.us-west-2.amazonaws.com/visual-behavior-neuropixels/ecephys_sessions/ecephys_session_1052533639.nwb\n"
]
}
],
"source": [
"def get_behavior_session_url(ecephys_session_id: int) -> str:\n",
" hostname = \"https://visual-behavior-neuropixels-data.s3.us-west-2.amazonaws.com\"\n",
" object_key = f\"visual-behavior-neuropixels/ecephys_sessions/ecephys_session_{ecephys_session_id}.nwb\"\n",
" return urljoin(hostname, object_key)\n",
"\n",
"# Example:\n",
"print(get_behavior_session_url(1052533639))"
]
},
{
"cell_type": "markdown",
"id": "bb1d746d",
"metadata": {},
"source": [
"## Downloading previous versions of released data from S3\n",
"\n",
"AllenSDK makes uses of versioned manifest (JSON) files that live in the S3 bucket to keep track of EVERY version of a file for this data release. If a bug/error in the released data is discovered or new data is added to existing NWB files and the updated NWB file is uploaded in the future, a new manifest will be created pointing to the newest version of the file. The existing manifest will continue pointing at the original version allowing reproducibility of analysis results. You can think of each manifest as a snapshot of the state of the S3 bucket when the manifest was created.\n",
"\n",
"This section describes how to download specific versions of a file in the S3 bucket.\n",
"\n",
"### Listing and downloading a specific manifest version for the data release\n",
"\n",
"If you have an AWS account (even a free tier account works) you can log in and access the bucket directly:\n",
"\n",
"https://s3.console.aws.amazon.com/s3/buckets/visual-behavior-neuropixels-data?prefix=visual-behavior-neuropixels/manifests/\n",
"\n",
"If you don't have or don't want to use an AWS account you can click the following list to get an XML document:\n",
"\n",
"https://s3.console.aws.amazon.com/s3/buckets/visual-behavior-neuropixels-data?list-type=2&prefix=visual-behavior-neuropixels/manifests/\n",
"\n",
"Which will look like:\n",
"```\n",
"\n",
" sfd-cloudcache-test-bucket\n",
" visual-behavior-neuropixels/manifests/\n",
" 1\n",
" 1000\n",
" false\n",
" \n",
" \n",
" visual-behavior-neuropixels/manifests/visual-behavior-neuropixels-2022_project_manifest_v0.1.0.json\n",
" \n",
" 2021-03-22T14:36:31.000Z\n",
" \"8d10d6dd87234d4e0a1d400908c5013d\"\n",
" 1730897\n",
" STANDARD\n",
" \n",
"\n",
"```\n",
"The XML document is the result of a query which lists all manifests that currently exist for the data release (denoted with `` ``). To obtain a specific manifest of interest you just take the `Key` for the manifest you're interested in and append it to the name of the S3 bucket. For example:\n",
"\n",
"https://s3.console.aws.amazon.com/s3/buckets/visual-behavior-neuropixels-data?prefix=visual-behavior-neuropixels/manifests/visual-behavior-neuropixels_project_manifest_v0.1.0.json\n",
"\n",
"\n",
"### Using a versioned manifest to download a specific data version\n",
"\n",
"Once you've downloaded a manifest, you can use it to obtain download links for the specific version of data files that the manifest tracks. The example function below loads a downloaded manifest and generates download links for *all* the metadata and data files for the specified manifest:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a4dff7f9",
"metadata": {},
"outputs": [],
"source": [
"from typing import List\n",
"from urllib.parse import urljoin\n",
"import json\n",
"\n",
"# The location will differ based on where you downloaded the manifest.json!\n",
"my_manifest_location = data_storage_directory / 'visual-behavior-neuropixels_project_manifest_v0.3.0.json'\n",
"\n",
"def generate_all_download_urls_from_manifest(manifest_path: Path) -> List[str]:\n",
" with manifest_path.open('r') as fp:\n",
" manifest = json.load(fp)\n",
" \n",
" download_links = []\n",
" \n",
" # Get download links for specific version of metadata files\n",
" for metadata_file_entry in manifest[\"metadata_files\"].values():\n",
" base_download_url = metadata_file_entry[\"url\"]\n",
" version_query = f\"?versionId={metadata_file_entry['version_id']}\"\n",
" full_download_url = urljoin(base_download_url, version_query)\n",
" download_links.append(full_download_url)\n",
"\n",
" # Get download links for specific version of data files\n",
" for data_file_entry in manifest[\"data_files\"].values():\n",
" base_download_url = data_file_entry[\"url\"]\n",
" version_query = f\"?versionId={data_file_entry['version_id']}\"\n",
" full_download_url = urljoin(base_download_url, version_query)\n",
" download_links.append(full_download_url) \n",
"\n",
" return download_links\n",
"\n",
"# Example:\n",
"print('\\n'.join(generate_all_download_urls_from_manifest(my_manifest_location)))"
]
}
],
"metadata": {
"interpreter": {
"hash": "916dbcbb3f70747c44a77c7bcd40155683ae19c65e1c03b4aa3499c5328201f1"
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.0"
}
},
"nbformat": 4,
"nbformat_minor": 5
}