allensdk.core.dataframe_utils module

allensdk.core.dataframe_utils.INT_NULL = -99

A collection of utilities to manipulate pandas DataFrames.

allensdk.core.dataframe_utils.enforce_df_column_order(input_df: DataFrame, column_order: List[str]) DataFrame[source]

Return the data frame but with columns ordered.

Parameters:
input_dfpandas.DataFrame

Data frame with columns to be ordered.

column_orderlist of str

Ordering of column names to enforce. Columns not specified are shifted to the end of the order but retain their order amongst others not specified. If a specified column is not in the DataFrame it is ignored.

Returns:
output_dfpandas.DataFrame

DataFrame the same as the input but with columns reordered.

allensdk.core.dataframe_utils.enforce_df_int_typing(input_df: DataFrame, int_columns: List[str], use_pandas_type: object = False) DataFrame[source]

Enforce integer typing for columns that may have lost int typing when combined into the final DataFrame.

Parameters:
input_dfpandas.DataFrame

DataFrame with typing to enforce.

int_columnslist of str

Columns to enforce int typing and fill any NaN/None values with the value set in INT_NULL in this file. Requested columns not in the dataframe are ignored.

use_pandas_typebool

Instead of filling with the value INT_NULL to enforce integer typing, use the pandas type Int64. This type can have issues converting to numpy/array type values.

Returns:
output_dfpandas.DataFrame

DataFrame specific columns hard typed to Int64 to allow NA values without resorting to float type.

allensdk.core.dataframe_utils.patch_df_from_other(target_df: DataFrame, source_df: DataFrame, columns_to_patch: List[str], index_column: str) DataFrame[source]

Overwrite column values in target_df from column values in source_df in rows where the two dataframes share a value of index_column.

Parameters:
target_df: pd.DataFrame

The dataframe whose columns will get overwritten

source_df: pd.DataFrame

The dataframe from which correct values are to be read

columns_to_patch: List[str]

The columns to be overwritten

index_column: str

The column to join the dataframes on

Returns:
patched_df: pd.DataFrame

target_df except with the specified columns and rows overwritten.

Notes

If any of the columns_to_patch are not in target_df, they will be added.

This function starts by creating a copy of target_df, so it will not alter the argument in-place.

allensdk.core.dataframe_utils.return_one_dataframe_row_only(input_table: DataFrame, index_value: int, table_name: str) Series[source]

Lookup and return one and only one row from the DataFrame returning an informative error if no or multiple rows are returned for a given index.

This method is used mainly to return a more informative error when attempting to retrieve metadata from the values behavior cache metadata tables.

Parameters:
input_tablepandas.DataFrame

Input dataframe to retrieve row from.

index_valueint
Index of the row to return. Must match an index in the input

dataframe/table. i.e. in the case of ecephys_session_table or

behavior_session_table.

table_namestr

Name of the table being returned. Used to output the table name in case of error.

Returns:
rowpandas.Series

Row corresponding to the input index.