Loading the data
Setting up
To begin you need three paths saved as variables:
- the path to the metadata
.csv
file
- the full path (including folder) the ftp location (eg: ftp://myethoscopedata.com/results/)
-
the path of a local folder to save downloaded .db
files (if your .db files are already downloaded on the same machine running ethoscopy, then this will be the path to the folder containing them)
import ethoscopy as etho
# Replace with your own file/sever paths
meta_loc = 'USER/experiment_folder/metadata.csv'
remote = 'ftp://ftpsever/auto_generated_data/ethoscope_results'
local = 'USER/ethoscope_results'
# This will download the data remotely via FTP onto the local machine
# If your ethoscopy is running via ethoscope-lab on the same machine
# where the ethoscope data are, then this step is not necessary
etho.downlaod_from_remote_dir(meta_loc, remote, local)
Click the button to see the docstring - download_from_remote_dir
etho.download_from_remote_dir(meta, remote_dir, local_dir):
This function is used to import data from the ethoscope node platform to your local directory for later use. The ethoscope files must be saved on a remote FTP server and saved as .db files, see the Ethoscope manual for how to setup a node correctly.https://www.notion.so/giorgiogilestro/Ethoscope-User-Manual-a9739373ae9f4840aa45b277f2f0e3a7
Args:
meta (str): The path to a csv file containing columns with machine_name, date, and time if multiple files on the same day
remote_dir (str): The url containing the location of the ftp server up to the folder contain the machine id's, server must not have a username or password (anonymous login) e.g. 'ftp://YOUR_SERVER//auto_generated_data//ethoscope_results'
local_dir (str): The path of the local directory to save .db files to, files will be saved using the structure of the ftp server
e.g. 'C:\\Users\\YOUR_NAME\\Documents\\ethoscope_databases'
returns:
None
This only needs to be run if like the Gilestro lab you have all your data saved to a remote ftp sever. If not you can skip straight to the the next part.
Create a modified metadata DataFrame
This function creates a modified metadata DataFrame with the paths of the saved .db files and generates a unique id for each experimental individual. This function only works for a file structure locally saved to whatever computer you are running and is saved in a nested director structure as created by the ethoscopes, i.e.
'LOCAL_DIR/ethoscope_results/00190f0080e54d9c906f304a8222fa8c/ETHOSCOPE_001/2022-08-23_03-33-59/DATABASE.db'
For this function you only need the path to the metadata file and the path the the first higher level of your database directories, as seen in the example below. Do not provide a path directly to the folder with your known .db file in it, the function searches all the saved data directories and selects the ones that match the metadata file.
meta_loc = 'USER/experiment_folder/metadata.csv'
local = 'USER/ethoscope_results' # remember to just provide the path up to the directory where the individual ethoscope files are saved
meta = etho.link_meta_index(meta_loc, local)
Click button to see the docstring - link_meta_index
link_meta_index(metadata, local_dir):
A function to alter the provided metadata file with the path locations of downloaded .db files from the Ethoscope experimental system. The function will check all unique machines against the original ftp server for any errors. Errors will be omitted from the returned metadata table without warning.
Args:
metadata (str): The path to a file containing the metadata information of each ROI to be downloaded, must include 'ETHOSCOPE_NAME', 'date' in yyyy-mm-dd format or others (see validate_datetime), and 'region_id'
local_dir (str): The path to the top level parent directory where saved database files are located.
returns:
A pandas DataFrame containing the csv file information and corresponding path for each entry in the csv
Load and modify the ethoscope data
The load function takes the raw ethoscope data from its .db format and modifies it into a workable pandas DataFrame format, changing the time (seconds) to be in reference to a given hour (usually lights on). Min and max times can be provided to filter the data to only recordings between those hours. With 0 being in relation to the start of the experiment not the reference hour.
data = etho.load_ethoscope(meta, min_time = 24, max_time = 48, reference_hour = 9.0)
# you can cache the each specimen as the data is loaded for faster load times when run again, just add a file path to a folder of choice, the first time it will save, the second it will search the folder and load straight from there
# However this can take up a lot of memory and it's recommended to save the whole loaded dataset at the end and to load from this each time. See the end of this page
data = etho.load_ethoscope(meta, min_time = 24, max_time = 48, reference_hour = 9.0, cache = 'path/ethoscope_cache/')
Click button to see the docstring - load_ethoscope
load_ethoscope(metadata, min_time = 0 , max_time = float('inf'), reference_hour = None, cache = None, FUN = None, verbose = True):
The users function to iterate through the dataframe generated by link_meta_index() and load the corresponding database files
and analyse them according to the inputted function.
Args:
metadata (pd.DataFrame): The metadata datafframe as returned from link_meta_index function
min_time (int): The minimum time you want to load data from with 0 being the experiment start (in hours), for all experiments. Default is 0.
max_time (int): Same as above, but for the maximum time you want to load to. Default is infinity.
reference_hour (int): The hour at which lights on occurs when the experiment is begun, or when you want the timestamps to equal 0. None equals the start of the experiment. Default is None.
cache (str): The local path to find and store cached versions of each ROI per database. Directory tree structure is a mirror of ethoscope saved data. Cached files are in a pickle format. Default is None.
FUN (function): A function to apply indiviual curatation to each ROI, typically using package generated functions (i.e. sleep_annotation). If using a user defined function use the package analyse functions as examples.
If None the data remains as found in the database. Default is None.
verbose (bool): If True (defualt) then the function prints to screen information about each ROI when loading, if False no printing to screen occurs. Default is True.
returns:
A pandas DataFrame object containing the database data and unique ids per fly as the index
Additionally, an analysing function can be also called to modify the data as it is read. It's recommended you always call at least max_velocity_detector or sleep_annotation function when loading the data as it generates columns that are needed for the analysis / plot generating methods.
from functools import partial
data = etho.load_ethoscope(meta, reference_hour = 9.0, FUN = partial(etho.sleep_annotation, time_window_length = 60, min_time_immobile = 300))
# time_window_length is the amount of time each row represents. The ethoscope can record multiple times per second, so you can go as low as 10 seconds for this.
# The default for time_window_length is 10 seconds
# min_time_immobile is your sleep criteria, 300 is 5 mins the general rule of sleep for flies, see Hendricks et al., 2000.
Ethoscopy has 3 general functions that can be called whilst loading:
- max_velocity_detector: Aggregates variables per the given time window, finding their means. Sleeep_annotation uses this function before finding sleep bouts, so use this when you don't need to know the sleep bouts.
- sleep_annotation: Aggregates per time window and generates a new boolean column of sleep, as given by the time immobile argument.
- isolate_activity_lengths: Finds consecutive runs of inactivity or activity, filter by the intervals column and provide a window to contain the variables from prior to the start of the run.
Ethoscopy also has 2 functions for use with mAGO ethoscope module (odour delivery and mechanical stimulation):
- puff_mago: Finds the interaction times and then searches a given window post interaction for movement.
- find motifs: A modifcation of puff_mago, the function finds all interaction times and their response whilst retaining all the previous variables information in a given time window.
Click the button to see the docstring - analysing functions
max_velocity_detector(data, time_window_length, velocity_correction_coef = 3e-3, masking_duration = 6, optional_columns = 'has_interacted'):
Max_velocity_detector is the default movement classification for real-time ethoscope experiments.
It is benchmarked against human-generated ground truth.
Args:
data (pd.DataFrame): A dataframe containing behavioural variables of a single animal (no id)
time_window_length (int): The period of time the data is binned and sampled to, i.e. if 60 the
timestep will per row will be 60 seconds.
velocity_correction_coef (float): A coefficient to correct the velocity data (change for different length tubes). For 'small'
tubes (20 per ethoscope) = 3e-3, for 'long' tubes (10 per ethoscope) = 15e-4. Default is 3e-3.
masking_duration (int): The number of seconds during which any movement is ignored (velocity is set to 0) after a
stimulus is delivered (a.k.a. interaction). If using the AGO set to 0. Default is 6.
optional_columns (str): The columns other than ['t', 'x', 'velocity'] that you want included post analysis. Default is
'has_interacted.
returns:
A pandas dataframe object with columns such as 't', 'moving', 'max_velocity', 'mean_velocity' and 'beam_cross'
sleep_annotation(data, time_window_length = 10, min_time_immobile = 300, motion_detector_FUN = max_velocity_detector, masking_duration = 6, velocity_correction_coef = 3e-3):
This function first uses a motion classifier to decide whether an animal is moving during a given time window.
Then, it defines sleep as contiguous immobility for a minimum duration.
Args:
data (pandas dataframe): The dataframe containing behavioural variables from one animals.
time_window_length (int): The period of time the data is binned and sampled to. Default is 10
min_time_immobile (int): Immobility bouts longer or equal to this value are considered as asleep.
Default is 300 (i.e 5 mins)
motion_detector_FUN (function): The function to curate raw ethoscope data into velocity measurements.
Default is max_velocity_detector.
masking_duration (int): The number of seconds during which any movement is ignored (velocity is set to 0) after a
stimulus is delivered (a.k.a. interaction). If using the AGO set to 0. Default is 6.
velocity_correction_coef (float): A coefficient to correct the velocity data (change for different length tubes). For 'small'
tubes (20 per ethoscope) = 3e-3, for 'long' tubes (10 per ethoscope) = 15e-4. Default is 3e-3.
returns:
A pandas dataframe containing columns 'moving' and 'asleep'
Saving the data
Loading the ethoscope data each time can be a long process depending on the length of the experiment and number of machines. It's recommended to save the loaded/modified DataFrame as a pickle .pkl
file. See here for more information about pandas and pickle saves. The saved behavpy object can then be loaded in instantly at the start of a new session!
# Save any behavpy or pandas object with the method below
import pandas as pd
df.to_pickle('path/behapvy.pkl') # replace string with your file location/path
# Load the saved pickle file like this. It will retain all the metadata information
df = pd.read_pickle('path/behapvy.pkl')