Extracellular Electrophysiology Data Quick Start

A short introduction to the Visual Coding Neuropixels data and SDK. For more information, see the full reference notebook.

Contents

In [1]:
import os

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from allensdk.brain_observatory.ecephys.ecephys_project_cache import EcephysProjectCache

The EcephysProjectCache is the main entry point to the Visual Coding Neuropixels dataset. It allows you to download data for individual recording sessions and view cross-session summary information.

In [2]:
# this path determines where downloaded data will be stored
manifest_path = os.path.join("example_ecephys_project_cache", "manifest.json")

cache = EcephysProjectCache.from_warehouse(manifest=manifest_path)

print(cache.get_all_session_types())
['brain_observatory_1.1', 'functional_connectivity']

This dataset contains sessions in which two sets of stimuli were presented. The "brain_observatory_1.1" sessions are (almost exactly) the same as Visual Coding 2P sessions.

In [3]:
sessions = cache.get_sessions()
brain_observatory_type_sessions = sessions[sessions["session_type"] == "brain_observatory_1.1"]
brain_observatory_type_sessions.tail()
Out[3]:
date_of_acquisition isi_experiment_id published_at specimen_id session_type age_in_days sex genotype unit_count channel_count probe_count structure_acronyms
id
773418906 2018-12-10T23:16:20Z 762398689 2019-10-03T00:00:00Z 757329624 brain_observatory_1.1 124.0 F Pvalb-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt 546 2232 6 [PPT, NOT, SUB, ProS, CA1, VISam, nan, APN, DG...
791319847 2019-01-08T21:55:01Z 779428997 2019-10-03T00:00:00Z 769360779 brain_observatory_1.1 116.0 M Vip-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt 555 2229 6 [APN, DG, CA1, VISam, nan, LP, TH, VISpm, POL,...
797828357 2019-01-08T21:26:13Z 787643635 2019-10-03T00:00:00Z 776061251 brain_observatory_1.1 107.0 M Pvalb-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt 611 2232 6 [PPT, MB, APN, NOT, HPF, ProS, CA1, VISam, nan...
798911424 2018-12-21T08:02:57Z 785726931 2019-10-03T00:00:00Z 775876828 brain_observatory_1.1 110.0 F Vip-IRES-Cre/wt;Ai32(RCL-ChR2(H134R)_EYFP)/wt 825 2233 6 [APN, TH, Eth, LP, DG, HPF, CA3, CA1, VISrl, n...
799864342 2019-01-08T21:26:07Z 785352181 2019-10-03T00:00:00Z 772616823 brain_observatory_1.1 129.0 M wt/wt 604 2233 6 [APN, POL, LP, DG, CA1, VISrl, nan, LGd, CA3, ...

peristimulus time histograms

We are going to pick a session arbitrarily and download its spike data.

In [4]:
session_id = 791319847
session = cache.get_session_data(session_id)

print(f"stimuli presented in session {session_id}:\n{session.stimulus_names}\n")

unit_counts = session.units["manual_structure_acronym"].value_counts()
print(f"units per brain structure in session {session_id}:\n{unit_counts}")
/home/nile/miniconda3/envs/py37/lib/python3.6/site-packages/pandas/core/ops/__init__.py:1115: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  result = method(y)
stimuli presented in session 791319847:
['spontaneous', 'gabors', 'flashes', 'drifting_gratings', 'natural_movie_three', 'natural_movie_one', 'static_gratings', 'natural_scenes', 'drifting_gratings_contrast']

units per brain structure in session 791319847:
VISp     93
CA1      85
VISrl    58
VISl     56
VISam    49
VISal    43
SUB      41
CA3      33
DG       32
VISpm    17
LGv      16
LP        9
LGd       8
ZI        4
TH        4
POL       3
CA2       3
ProS      1
Name: manual_structure_acronym, dtype: int64

Now that we've gotten spike data, we can create peristimulus time histograms.

In [5]:
presentations = session.get_presentations_for_stimulus("flashes")
units = session.units[session.units["manual_structure_acronym"] == 'VISp']

time_step = 0.01
time_bins = np.arange(-0.1, 0.5 + time_step, time_step)

histograms = session.presentationwise_spike_counts(
    stimulus_presentation_ids=presentations.index.values,  
    bin_edges=time_bins,
    unit_ids=units.index.values
)

histograms.coords
Out[5]:
Coordinates:
  * stimulus_presentation_id         (stimulus_presentation_id) int64 3647 ... 3796
  * time_relative_to_stimulus_onset  (time_relative_to_stimulus_onset) float64 -0.095 ... 0.495
  * unit_id                          (unit_id) int64 951061537 ... 951062679
In [6]:
mean_histograms = histograms.mean(dim="stimulus_presentation_id")

fig, ax = plt.subplots(figsize=(8, 8))
ax.pcolormesh(
    mean_histograms["time_relative_to_stimulus_onset"], 
    np.arange(mean_histograms["unit_id"].size),
    mean_histograms.T, 
    vmin=0,
    vmax=1
)

ax.set_ylabel("unit", fontsize=24)
ax.set_xlabel("time relative to stimulus onset (s)", fontsize=24)
ax.set_title("peristimulus time histograms for VISp units on flash presentations", fontsize=24)

plt.show()

image classification

First, we need to extract spikes. We will do using EcephysSession.presentationwise_spike_times, which returns spikes annotated by the unit that emitted them and the stimulus presentation during which they were emitted.

In [7]:
scene_presentations = session.get_presentations_for_stimulus("natural_scenes")
visp_units = session.units[session.units["manual_structure_acronym"] == "VISp"]

spikes = session.presentationwise_spike_times(
    stimulus_presentation_ids=scene_presentations.index.values,
    unit_ids=visp_units.index.values[:]
)

spikes
Out[7]:
stimulus_presentation_id unit_id
spike_time
5914.084307 51355 951061801
5914.085207 51355 951062307
5914.087241 51355 951062610
5914.090107 51355 951061993
5914.090407 51355 951062605
... ... ...
8573.801313 68228 951061801
8573.801613 68228 951062175
8573.805379 68228 951062898
8573.805546 68228 951061537
8573.805846 68228 951062143

1224681 rows × 2 columns

Next, we will convert these into a num_presentations X num_units matrix, which will serve as our input data.

In [8]:
spikes["count"] = np.zeros(spikes.shape[0])
spikes = spikes.groupby(["stimulus_presentation_id", "unit_id"]).count()

design = pd.pivot_table(
    spikes, 
    values="count", 
    index="stimulus_presentation_id", 
    columns="unit_id", 
    fill_value=0.0,
    aggfunc=np.sum
)

design
Out[8]:
unit_id 951061537 951061549 951061556 951061568 951061574 951061607 951061637 951061643 951061649 951061655 ... 951062587 951062600 951062605 951062610 951062647 951062679 951062808 951062833 951062854 951062898
stimulus_presentation_id
51355 1 3 1 1 2 3 2 1 2 0 ... 15 0 15 5 11 5 0 0 2 1
51356 2 0 1 2 2 1 1 3 4 0 ... 10 0 10 3 16 3 1 0 2 0
51357 0 0 1 1 1 1 1 1 0 0 ... 7 0 13 0 14 6 12 3 2 5
51358 0 2 1 3 4 1 0 2 1 2 ... 6 0 11 7 8 3 0 0 0 3
51359 1 0 1 2 2 1 0 1 0 2 ... 7 0 15 4 10 5 11 2 0 7
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
68224 2 0 4 2 1 0 1 0 1 0 ... 2 0 11 16 3 0 4 0 0 1
68225 0 0 9 0 2 0 1 0 0 0 ... 5 0 13 5 5 0 3 0 0 5
68226 1 0 1 3 2 0 1 0 0 0 ... 15 0 12 2 10 0 4 0 1 5
68227 1 0 2 3 3 0 0 0 1 0 ... 9 1 4 2 5 0 1 0 1 7
68228 3 1 0 0 1 1 0 0 0 0 ... 9 0 18 6 4 4 0 0 0 8

5950 rows × 93 columns

... with targets being the numeric identifiers of the images presented.

In [9]:
targets = scene_presentations.loc[design.index.values, "frame"]
targets
Out[9]:
stimulus_presentation_id
51355     14
51356      0
51357      8
51358    106
51359      8
        ... 
68224     35
68225     35
68226     73
68227     95
68228     95
Name: frame, Length: 5950, dtype: object
In [10]:
from sklearn import svm
from sklearn.model_selection import KFold
from sklearn.metrics import confusion_matrix
In [11]:
design_arr = design.values.astype(float)
targets_arr = targets.values.astype(int)

labels = np.unique(targets_arr)
In [12]:
accuracies = []
confusions = []

for train_indices, test_indices in KFold(n_splits=5).split(design_arr):
    
    clf = svm.SVC(gamma="scale", kernel="rbf")
    clf.fit(design_arr[train_indices], targets_arr[train_indices])
    
    test_targets = targets_arr[test_indices]
    test_predictions = clf.predict(design_arr[test_indices])
    
    accuracy = 1 - (np.count_nonzero(test_predictions - test_targets) / test_predictions.size)
    print(accuracy)
    
    accuracies.append(accuracy)
    confusions.append(confusion_matrix(test_targets, test_predictions, labels))
0.41428571428571426
0.4966386554621849
0.49747899159663866
0.5067226890756302
0.473109243697479
In [13]:
print(f"mean accuracy: {np.mean(accuracy)}")
print(f"chance: {1/labels.size}")
mean accuracy: 0.473109243697479
chance: 0.008403361344537815
imagewise performance
In [14]:
mean_confusion = np.mean(confusions, axis=0)

fig, ax = plt.subplots(figsize=(8, 8))

img = ax.imshow(mean_confusion)
fig.colorbar(img)

ax.set_ylabel("actual")
ax.set_xlabel("predicted")

plt.show()
In [15]:
best = labels[np.argmax(np.diag(mean_confusion))]
worst = labels[np.argmin(np.diag(mean_confusion))]

fig, ax = plt.subplots(1, 2, figsize=(16, 8))

best_image = cache.get_natural_scene_template(best)
ax[0].imshow(best_image, cmap=plt.cm.gray)
ax[0].set_title("most decodable", fontsize=24)

worst_image = cache.get_natural_scene_template(worst)
ax[1].imshow(worst_image, cmap=plt.cm.gray)
ax[1].set_title("least decodable", fontsize=24)


plt.show()
In [ ]: