Extracellular Electrophysiology Data

At the Allen Institute for Brain Science we carry out in vivo extracellular electrophysiology (ecephys) experiments in awake animals using high-density Neuropixels probes. The data from these experiments are organized into sessions, where each session is a distinct continuous recording period. During a session we collect:

The AllenSDK contains code for accessing across-session (project-level) metadata as well as code for accessing detailed within-session data. The standard workflow is to use project-level tools, such as EcephysProjectCache to identify and access sessions of interest, then delve into those sessions' data using EcephysSession.

Project-level

The EcephysProjectCache class in allensdk.brain_observatory.ecephys.ecephys_project_cache accesses and stores data pertaining to many sessions. You can use this class to run queries that span all collected sessions and to download data for individual sessions.

Session-level

The EcephysSession class in allensdk.brain_observatory.ecephys.ecephys_session provides an interface to all of the data for a single session, aligned to a common clock. This notebook will show you how to use the EcephysSession class to extract these data.

Obtaining an EcephysProjectCache

In order to create an EcephysProjectCache object, you need to specify two things:

  1. A remote source for the object to fetch data from. We will instantiate our cache using EcephysProjectCache.from_warehouse() to point the cache at the Allen Institute's public web API.
  2. A path to a manifest json, which designates filesystem locations for downloaded data. The cache will try to read data from these locations before going to download those data from its remote source, preventing repeated downloads.

Querying across sessions

Using your EcephysProjectCache, you can download a table listing metadata for all sessions.

Querying across probes

... or for all probes

Querying across channels

... or across channels.

Querying across units

... as well as for sorted units.

Surveying metadata

You can answer questions like: "what mouse genotypes were used in this dataset?" using your EcephysProjectCache.

In order to look up a brain structure acronym, you can use our online atlas viewer. The AllenSDK additionally supports programmatic access to structure annotations. For more information, see the reference space and mouse connectivity documentation.

Obtaining an EcephysSession

We package each session's data into a Neurodata Without Borders 2.0 (NWB) file. Calling get_session_data on your EcephysProjectCache will download such a file and return an EcephysSession object.

EcephysSession objects contain methods and properties that access the data within an ecephys NWB file and cache it in memory.

This session object has some important metadata, such as the date and time at which the recording session started:

We'll now jump in to accessing our session's data. If you ever want a complete documented list of the attributes and methods defined on EcephysSession, you can run help(EcephysSession) (or in a jupyter notebook: EcephysSession?).

Sorted units

Units are putative neurons, clustered from raw voltage traces using Kilosort 2. Each unit is associated with a single peak channel on a single probe, though its spikes might be picked up with some attenuation on multiple nearby channels. Each unit is assigned a unique integer identifier ("unit_id") which can be used to look up its spike times and its mean waveform.

The units for a session are recorded in an attribute called, fittingly, units. This is a pandas.DataFrame whose index is the unit id and whose columns contain summary information about the unit, its peak channel, and its associated probe.

As a pandas.DataFrame the units table supports many straightforward filtering operations:

... as well as some more advanced (and very useful!) operations. For more information, please see the pandas documentation. The following topics might be particularly handy:

Stimulus presentations

During the course of a session, visual stimuli are presented on a monitor to the subject. We call intervals of time where a specific stimulus is presented (and its parameters held constant!) a stimulus presentation.

You can find information about the stimulus presentations that were displayed during a session by accessing the stimulus_presentations attribute on your EcephysSession object.

Like the units table, this is a pandas.DataFrame. Each row corresponds to a stimulus presentation and lists the time (on the session's master clock, in seconds) when that presentation began and ended as well as the kind of stimulus that was presented (the "stimulus_name" column) and the parameter values that were used for that presentation. Many of these parameter values don't overlap between stimulus classes, so the stimulus_presentations table uses the string "null" to indicate an inapplicable parameter. The index is named "stimulus_presentation_id" and many methods on EcephysSession use these ids.

Some of the columns bear a bit of explanation:

What kinds of stimuli were presented during this session? Pandas makes it easy to find out:

We can also obtain the stimulus epochs - blocks of time for which a particular kind of stimulus was presented - for this session.

If you are only interested in a subset of stimuli, you can either filter using pandas or using the get_stimulus_table convience method:

We might also want to know what the total set of available parameters is. The get_stimulus_parameter_values method provides a dictionary mapping stimulus parameters to the set of values that were applied to those parameters:

Each distinct state of the monitor is called a "stimulus condition". Each presentation in the stimulus presentations table exemplifies such a condition. This is encoded in its stimulus_condition_id field.

To get the full list of conditions presented in a session, use the stimulus_conditions attribute:

Spike data

The EcephysSession object holds spike times (in seconds on the session master clock) for each unit. These are stored in a dictionary, which maps unit ids (the index values of the units table) to arrays of spike times.

You can also obtain spikes tagged with the stimulus presentation during which they occurred:

We can make raster plots of these data:

We can access summary spike statistics for stimulus conditions and unit

Using these data, we can ask for each unit: which stimulus condition evoked the most activity on average?

Spike histograms

It is commonly useful to compare spike data from across units and stimulus presentations, all relative to the onset of a stimulus presentation. We can do this using the presentationwise_spike_counts method.

This has returned a new (to this notebook) data structure, the xarray.DataArray. You can think of this as similar to a 3+D pandas.DataFrame, or as a numpy.ndarray with labeled axes and indices. See the xarray documentation for more information. In the mean time, the salient features are:

xarray is nice because it forces code to be explicit about dimensions and coordinates, improving readability and avoiding bugs. However, you can always convert to numpy or pandas data structures as follows:

We can now plot spike counts for a particular presentation:

We can also average across all presentations, adding a new data array to the dataset. Notice that this one no longer has a stimulus_presentation_id dimension, as we have collapsed it by averaging.

... and plot the mean spike counts

Waveforms

We store precomputed mean waveforms for each unit in the mean_waveforms attribute on the EcephysSession object. This is a dictionary which maps unit ids to xarray DataArrays. These have channel and time (seconds, aligned to the detected event times) dimensions. The data values are in microvolts, as measured at the recording site.

Since neuropixels probes are densely populated with channels, spikes are typically detected on several channels. We can see this by plotting mean waveforms on channels surrounding a unit's peak channel:

Running speed

We can obtain the velocity at which the experimental subject ran as a function of time by accessing the running_speed attribute. This returns a pandas dataframe whose rows are intervals of time (defined by "start_time" and "end_time" columns), and whose "velocity" column contains mean running speeds within those intervals.

Here we'll plot the running speed trace for an arbitrary chunk of time.

Optogenetic stimulation

Eye tracking ellipse fits and estimated screen gaze location

Ecephys sessions may contain eye tracking data in the form of ellipse fits and estimated screen gaze location. Let's look at the ellipse fits first:

This particular session has eye tracking data, let's try plotting the ellipse fits over time.

Using the above ellipse fits and location/orientation information about the experimental rigs, it is possible to calculate additional statistics such as pupil size or estimate a gaze location on screen at a given time. Due to the degrees of freedom in some rig components, gaze estimates have no accuracy guarantee. For additional information about the gaze mapping estimation process please refer to: https://github.com/AllenInstitute/AllenSDK/tree/master/allensdk/brain_observatory/gaze_mapping

Local Field Potential

We record local field potential on a subset of channels at 2500 Hz. Even subsampled and compressed, these data are quite large, so we store them seperately for each probe.

We can figure out where each LFP channel is located in the brain

Current source density

We precompute current source density for each probe.

Suggested exercises

If you would hands-on experience with the EcephysSession class, please consider working through some of these excercises.