Ethoscopy user manual

Getting started

Installing ethoscopy as a docker container with ethoscope-lab (recommended).

The ethoscope-lab docker container is the recommended way to use ethoscopy. A docker container is a pre-made image that will run inside any computer, independent of the operating system you use. The docker container is isolated from the rest of the machine and will not interfere with your other Python or R installations. It comes with its own dependencies and will just work. The docker comes with its own multi-user jupyter hub notebook so lab members can login into it directly from their browser and run all the analyses remotely from any computer, at home or at work. In the Gilestro laboratory we use a common workstation with the following hardware configuration.

CPU: 12x Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz 
GPU: Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz 
Hard drive: 1TB SSD for OS, 1TB 
SSD for homes and cache, 7.3 TB for ethoscope data 
Memory: 64GB 

The workstation is accessible via the internet (behind VPN) so that any lab member can login into the service and run their analyses remotely. All the computational power is in the workstation itself so one can analyse ethoscope data from a tablet, if needs be. Follow the instruction below to install the ethoscope-lab docker container on your machine.

On linux (recommended)

The best solution is to install this on the same computer that collects the ethoscope data so ethoscopy can have access to the db files directly stored in the machine. For most small installations, this computer could be the node

To install the docker you will have to find out the following information:

Once these info are clear, you can proceed.

# Optional. Update the system to the latest version. You may want to restart after this. 
sudo pamac update 
# install docker 
sudo pacman -S docker 
# start the docker service 
sudo systemctl enable --now docker

# and finally download and run the ethoscope-lab docker container 
# the :ro flag means you are mounting that destination in read-only 
sudo docker run -d -p 8000:8000 \
      --name ethoscope-lab \
      --volume /ethoscope_data/results:/mnt/ethoscope_results:ro \
      --restart=unless-stopped \

Installation on Windows or MacOS makes sense if you have actual ethoscope data on those machines, which is normally not the case. If you go for those OSs, I won't provide detailed instruction or support as I assume you know what you're doing.

On MacOS

Install the docker software from here. Open the terminal and run the same command as above, e.g.:

# download and run the ethoscope-lab docker container 
# the :ro flag means you are mounting that destination in read-only 
sudo docker run -d -p 8000:8000 \
      --name ethoscope-lab \
      --volume /path/to/ethoscope_data/:/mnt/ethoscope_results:ro \
      --restart=unless-stopped \
On Windows

Install the docker software from here. After installation, open the window terminal and issue the same command as above, only replacing the folder syntax as appropriate. For instance, if your ethoscope data are on z:\ethoscope_data and the user data are on c:\Users\folder use the following:

sudo docker run -d -p 8080:8080 \
      --name ethoscope-lab \
      --volume /z/ethoscope_data:/mnt/ethoscope_results:ro \
      --restart=unless-stopped \

Storing user data on the machine, not on the container (recommended)

ethoscopelab runs on top of a jupyterhub environment, meaning that it supports organised and simultaneous access by multiple users. Users will need to have their own credentials and their own home folder. The default user is ethoscopelab, with password ethoscopelab and this user will save all of their work in the folder called /home/ethoscopelab. In the examples above, the users' folders are stored inside the container itself which is not ideal. A better solution is to mount the home folders to a local point in your machine. In the example below, we would use the folder /mnt/my_user_homes.

sudo docker run -d -p 8000:8000 \
      --name ethoscope-lab \
      --volume /ethoscope_data/results:/mnt/ethoscope_results:ro \
      --volume /home:/mnt/my_user_homes \
      --restart=unless-stopped \

Make sure that your local home location contains an ethoscopelab folder that can be accessed by the ethoscopelab user! In the example below, you would need to create a folder called /mnt/my_user_homes/ethoscopelab.

Any folder in /mnt/my_user_homes will become accessible to ethoscopelab. In our lab, we sync those using owncloud (an opensource Dropbox clone) so that every user has their files automatically synced across all their machines.

Creating new users

If you wan to add new users, you will have to do it from the command line. On the linux computer running ethoscopelab (normally the node) use the following commands:

#enter in a bash shell of the container
sudo docker exec -it ethoscope-lab /bin/bash

#create the username
useradd myusername -m

#set the password for the  username you just created
passwd myusername

You will now be able to login into jupyter with these new credentials. The data will be stored in the newly created folder.

Persistent user credentials

In linux, user credentials are saved inside three files: /etc/passwd, /etc/shadow, /etc/group. It is possible to store those on the host computer (e.g. the node) and then mount them to the container. This is called a persistent volume because the data will remain on the host computer even if the container is deleted. An example of a container running in this way is the following:

sudo docker run -d -p 8000:8000 \
      --name ethoscope-lab \
      --volume /mnt/data/results:/mnt/ethoscope_results:ro \
      --volume /mnt/data/ethoscope_metadata:/opt/ethoscope_metadata \
      --volume /mnt/homes:/home \
      --volume /mnt/cache:/home/cache \
      --restart=unless-stopped \
      -e VIRTUAL_HOST="" \
      -e VIRTUAL_PORT="8000" \
      -e LETSENCRYPT_HOST="" \
      --volume /mnt/secrets/passwd:/etc/passwd:ro \
      --volume /mnt/secrets/group:/etc/group:ro \
      --volume /mnt/secrets/shadow:/etc/shadow:ro \
      --cpus=10 \

Lines 12-14 indicate the location of the user credentials. This configuration allows to maintain user information even when upgrading ethoscopelab to newer versions.


If your Jupyter starts but hangs on the following image


It means that the ethoscopelab user does not have access to its own folder. This most likely indicates that you are running the container mounting the folder onto your local machine but the ethoscope home folder is either not present or does not have reading and writing access.

Install ethoscopy in your Python space

Ethoscopy is on github and on PyPi. You can install the latest stable version with pip3.

pip install ethoscopy 

As of version 1.2.25, the required dependencies are:

Python >= 3.8
Pandas ^ 1.4.2
Numpy ^ 1.22.3
Scipy ^ 1.8
Hmmlearn ^ 0.3.0
Color ^ 0.1.5
Astropy ^ 5.1.1
PyWavelets ^ 1.4.1
plotly ^ 5.7.0
kaleido ^ 0.2.1

Metadata design

What is the metadata?

The metadata is a simple .csv file that contains the information to (1) find and retrieve the data saved to a remote server and (2) to segment and transform the data to compare experimental conditions. I would recommend recording as many experimental variables as possible, just to be safe. Each row of the .csv file is an individual from the experiment. It is mandatory to have at least the following columns: 'machine_name', 'region_id', and 'date' with date in a YYY-MM-DD format. Without these columns the data cannot be retrieved.

Top tips.

Loading the data

Setting up

To begin you need three paths saved as variables:

  1. the path to the metadata .csv file
  2. the full path (including folder) the ftp location (eg:
  3. the path of a local folder to save downloaded .db files (if your .db files are already downloaded on the same machine running ethoscopy, then this will be the path to the folder containing them)

import ethoscopy as etho

# Replace with your own file/sever paths

meta_loc = 'USER/experiment_folder/metadata.csv' 
remote = 'ftp://ftpsever/auto_generated_data/ethoscope_results'
local = 'USER/ethoscope_results' 

# This will download the data remotely via FTP onto the local machine
# If your ethoscopy is running via ethoscope-lab on the same machine 
# where the ethoscope data are, then this step is not necessary
etho.downlaod_from_remote_dir(meta_loc, remote, local)
download_from_remote_dir - Function reference
download_from_remote_dir(meta, remote_dir, local_dir):

This function is used to import data from the ethoscope node platform to your local directory for later use. The ethoscope files must be saved on a remote FTP server and saved as .db files, see the Ethoscope manual for how to setup a node correctly.

meta (str): The path to a csv file containing columns with machine_name, date, and time if multiple files on the same   
remote_dir (str): The url containing the location of the ftp server up to the folder contain the machine id's, server must
not have a username or password (anonymous login) e.g.                                                'ftp://YOUR_SERVER//auto_generated_data//ethoscope_results'
local_dir (str): The path of the local directory to save .db files to, files will be saved using the structure of the ftp server 
e.g. 'C:\\Users\\YOUR_NAME\\Documents\\ethoscope_databases'

This only needs to be run if like the Gilestro lab you have all your data saved to a remote ftp sever. If not you can skip straight to the the next part.

Create a modified metadata DataFrame

This function creates a modified metadata DataFrame with the paths of the saved .db files and generates a unique id for each experimental individual. This function only works for a file structure locally saved to whatever computer you are running and is saved in a nested director structure as created by the ethoscopes, i.e.


For this function you only need the path to the metadata file and the path the the first higher level of your database directories, as seen in the example below. Do not provide a path directly to the folder with your known .db file in it, the function searches all the saved data directories and selects the ones that match the metadata file.

link_meta_index - Function reference
link_meta_index(metadata, local_dir):

A function to alter the provided metadata file with the path locations of downloaded .db files from the Ethoscope experimental system. The function will check all unique machines against the original ftp server for any errors. Errors will be omitted from the returned metadata table without warning.

metadata (str): The path to a file containing the metadata information of each ROI to be downloaded, must include     
'ETHOSCOPE_NAME', 'date' in yyyy-mm-dd format or others (see validate_datetime), and 'region_id'
local_dir (str): The path to the top level parent directory where saved database files are located.

A pandas DataFrame containing the csv file information and corresponding path for each entry in the csv

Load and modify the ethoscope data

The load function takes the raw ethoscope data from its .db format and modifies it into a workable pandas DataFrame format, changing the time (seconds) to be in reference to a given hour (usually lights on). Min and max times can be provided to filter the data to only recordings between those hours. With 0 being in relation to the start of the experiment not the reference hour.

data = etho.load_ethoscope(meta, min_time = 24, max_time = 48, reference_hour = 9.0)

# you can cache the each specimen as the data is loaded for faster load times when run again, just add a file path to a folder of choice, the first time it will save, the second it will search the folder and load straight from there
# However this can take up a lot of memory and it's recommended to save the whole loaded dataset at the end and to load from this each time. See the end of this page

data = etho.load_ethoscope(meta, min_time = 24, max_time = 48, reference_hour = 9.0, cache = 'path/ethoscope_cache/')
load_ethoscope - Function reference
load_ethoscope(metadata, min_time = 0 , max_time = float('inf'), reference_hour = None, cache = None, FUN = None, verbose = True):

The users function to iterate through the dataframe generated by link_meta_index() and load the corresponding database files
and analyse them according to the inputted function.

The users function to iterate through the dataframe generated by link_meta_index() and load the corresponding database files

metadata (pd.DataFrame): The metadata datafframe as returned from link_meta_index function
min_time (int, optional): The minimum time you want to load data from with 0 being the experiment start (in hours), for
all experiments. Default is 0.
max_time (int, optional): Same as above, but for the maximum time you want to load to. Default is infinity.
reference_hour (int, optional): The hour at which lights on occurs when the experiment is begun, or when you want
the timestamps to equal 0. None equals the start of the experiment. Default is None.
cache (str, optional): The local path to find and store cached versions of each ROI per database. Directory tree
structure is a mirror of ethoscope saved data. Cached files are in a pickle format. Default is None.
FUN (function, optional): A function to apply indiviual curatation to each ROI, typically using package generated
functions (i.e. sleep_annotation). If using a user defined function use the package analyse functions as examples. If None the data remains as found in the database. Default is None.
verbose (bool, optional): If True (defualt) then the function prints to screen information about each ROI when loading,
if False no printing to screen occurs. Default is True.
A pandas DataFrame object containing the database data and unique ids per fly as the index

Additionally, an analysing function can be also called to modify the data as it is read. It's recommended you always call at least max_velocity_detector or sleep_annotation function when loading the data as it generates columns that are needed for the analysis / plot generating methods.

from functools import partial

data = etho.load_ethoscope(meta, reference_hour = 9.0, FUN = partial(etho.sleep_annotation, time_window_length = 60, min_time_immobile = 300))

# time_window_length is the amount of time each row represents. The ethoscope can record multiple times per second, so you can go as low as 10 seconds for this.
# The default for time_window_length is 10 seconds
# min_time_immobile is your sleep criteria, 300 is 5 mins the general rule of sleep for flies, see Hendricks et al., 2000.

Ethoscopy has 2 general functions that can be called whilst loading:

max_velocity_detector - Function reference
max_velocity_detector(data, time_window_length, velocity_correction_coef = 3e-3, masking_duration = 6, optional_columns = 'has_interacted'):

Max_velocity_detector is the default movement classification for real-time ethoscope experiments.
It is benchmarked against human-generated ground truth.

data (pd.DataFrame): A dataframe containing behavioural variables of a single animal (no id) 
time_window_length (int): The period of time the data is binned and sampled to, i.e. if 60 the
timestep will per row will be 60 seconds.
velocity_correction_coef (float, optional):  A coefficient to correct the velocity data (change for different length tubes).
For 'small' tubes (20 per ethoscope) = 3e-3, for 'long' tubes (10 per ethoscope) = 15e-4. Default is 3e-3.
masking_duration (int, optional): The number of seconds during which any movement is ignored (velocity is set to 0)
after a stimulus is delivered (a.k.a. interaction). If using the AGO set to 0. Default is 6.
optional_columns (str, optional): The columns other than ['t', 'x', 'velocity'] that you want included post analysis. Default
is 'has_interacted'.
A pandas dataframe object with columns such as 't', 'moving', 'max_velocity', 'mean_velocity' and 'beam_cross'
sleep_annotation - Function reference
sleep_annotation(data, time_window_length = 10, min_time_immobile = 300, motion_detector_FUN = max_velocity_detector, masking_duration = 6, velocity_correction_coef = 3e-3):

This function first uses a motion classifier to decide whether an animal is moving during a given time window.
Then, it defines sleep as contiguous immobility for a minimum duration.

data (pd.DataFrame): The dataframe containing behavioural variables from one animals.
time_window_length (int): The period of time the data is binned and sampled to. Default is 10
min_time_immobile (int, optional): Immobility bouts longer or equal to this value are considered as asleep.
Default is 300 (i.e 5 mins)
motion_detector_FUN (function, optional): The function to curate raw ethoscope data into velocity measurements.
Default is max_velocity_detector.
masking_duration (int, optional): The number of seconds during which any movement is ignored (velocity is set to 0)
after a stimulus is delivered (a.k.a. interaction). If using the AGO set to 0. Default is 6.
velocity_correction_coef (float, optional): A coefficient to correct the velocity data (change for different length tubes).
For 'small' tubes (20 per ethoscope) = 3e-3, for 'long' tubes (10 per ethoscope) = 15e-4. Default is 3e-3.

A pandas dataframe containing columns 'moving' and 'asleep'

Ethoscopy also has 2 functions for use with AGO or mAGO ethoscope module (odour delivery and mechanical stimulation):

stimulus_response - Function reference
stimulus_response(data, start_response_window = 0, response_window_length = 10, add_false = False, velocity_correction_coef = 3e-3):

Stimulus_response finds interaction times from raw ethoscope data to detect responses in a given window.
This function will only return data from around interaction times and not whole movement data from the experiment.

data (pd.DataFrame): The dataframe containing behavioural variable from many or one multiple animals
response_window (int, optional): The period of time (seconds) after the stimulus to check for a response (movement). Default is 10 seconds
add_false (bool / int, optional): If not False then an int which is the percentage of the total of which to add false
interactions, recommended is 10. This is for use with old datasets with no false interactions so you can observe spontaneous movement with a HMM. Default is False
velocity_correction_coef (float, optional): A coefficient to correct the velocity data (change for different length tubes).
Default is 3e-3.

A pandas dataframe object with columns such as 'interaction_t' and 'has_responded'
stimulus_prior - Function reference
stimulus_prior(data, window = 300, response_window_length = 10, velocity_correction_coef = 3e-3):

Stimulus_prior is a modification of stimulus_response. It only takes data with a populated has_interacted column.
The function will take a response window (in seconds) to find the variables recorded by the ethoscope in this window prior to an interaction taking place. Each run is given a unique ID per fly, however it is not unique to other flies. To do so, combine the
fly ID with run ID after.


data (pd.DataFrame): A dataframe containing behavioural variable from many or one multiple animals
window (int, optional): The period of time (seconds) prior to the stimulus you want data retrieved for. Default is 300.
response_window_length (int, optional): The period of time (seconds) after the stimulus to check for a response
(movement). Default is 10 seconds.
velocity_correction_coef (float, optional): A coefficient to correct the velocity data (change for different length tubes).
Args:Default is 3e-3

A pandas dataframe object with columns such as 't_count' and 'has_responded'

Saving the data

Loading the ethoscope data each time can be a long process depending on the length of the experiment and number of machines. It's recommended to save the loaded/modified DataFrame as a pickle .pkl file. See here for more information about pandas and pickle saves. The saved behavpy object can then be loaded in instantly at the start of a new session!

# Save any behavpy or pandas object with the method below

import pandas as pd

df.to_pickle('path/behapvy.pkl') # replace string with your file location/path

# Load the saved pickle file like this. It will retain all the metadata information
df = pd.read_pickle('path/behapvy.pkl')




Behavpy is the default object in ethoscopy, a way of storing your metadata and data in a single structure, whilst adding methods to help you manipulate and analyse your data.

Metadata is crucial for proper statistical analysis of the experimental data. In the context of the Ethoscope, the data is a long time series of recorded variables, such as position and velocity for each individual. It is easier and neater to always keep the data and metadata together. As such, behavpy class is a child class of Pandas dataframe, the widely used data table package in python, but enhanced by the addition of a metadata as a class variable. Both the metadata and data must be linked by a common unique identifier for each individual (the 'id' column), which is automatically done if you've loaded with Ethoscopy (If not see the bottom of the page for the requirements for creating a behavpy object from alternative data sources).

The behavpy class has variety of methods designed to augment, filter, and inspect your data which we will go into later. However, if you're not already familiar with Pandas, take some time to look through their guide to get an understanding of its many uses.

Initialising behavpy

To create a behavpy object you need matching metadata and data. To match they both need to have an id column that has linked/same id's in each. Don't worry, if you downloaded/formatted the data using the built in ethoscopy functions shown previously they'll be in there already.

import ethoscopy as etho

# You can load you data using the previously mentioned functions
meta = etho.link_meta_index(meta_loc, remote, local)
data = etho.load_ethoscope(meta, reference_hour = 9.0, FUN = partial(etho.sleep_annotation, time_window_length = 60, min_time_immobile = 300))

# Or you can load then from a saved object of your choice into a pandas df, 
# mine is a pickle file
meta = pd.read_pickle('users\folder\experiment1_meta.pkl')
data = pd.read_pickle('users\folder\experiment1_data.pkl')

# to initialise a behavpy object, just call the class with the data and meta as arguments. Have check = True to ensure the ids match between metadata and data.
# As of version 1.3.0 you can choose the colour palette for the plotly plots - see for the choices
# The default is 'Safe' (the one used before), but this example we'll use 'Vivid'
df = etho.behavpy(data = data, meta = meta, colour = 'Vivid', check = True)

Using behavpy with non-ethoscope data

The behavpy class is made to work with the ethoscope system, utilising the data structure it records to create the analytics pipeline you'll see next. However, you can still use it on non-ethoscope data if you follow the same structure.

Data sources:
You will need the metadata file as discussed prior, however you will need to manually create a column called id that contains a unique id per specimen in the experiment.
Additionally, you will need a data source where each row is a log of a time per specimen. Each row must have the following

The above columns are necessary for all the methods to work, but feel free to add other columns with extra information per timestamp. Both these data sources must be converted to pandas DataFrames, which can then be used to create a behavpy class object as shown above.


Basic methods

Behavpy has lots of built in methods to manipulate your data. The next few sections will walk you through a basic methods to manipulate your data before analysis.

Filtering by the metadata

One of the core methods of behavpy. This method creates a new behavpy object that only contains specimens whose metadata matches your inputted list. Use this to separate out your data by experimental conditions for further analysis.

# filter your dataset by variables in the metadata wtih .xmv()
# the first argument is the column in the metadata
# the second can be the variables in a list or as subsequent arguments

df = df.xmv('species', ['D.vir', 'D.ere', 'D.wil', 'D.sec', 'D.yak', 'D.sims'])
# or
df = df.xmv('species', 'D.vir', 'D.ere', 'D.wil', 'D.sec', 'D.yak', 'D.sims')

# the new data frame will only contain data from specimens with the selected variables
Removing by the metadata

The inverse of .xmv(). Remove from both the data and metadata any experimental groups you don't want. This method can be called also on individual specimens by specifying their id and their unique identifier.

# remove specimens from your dataset by the metadata with .remove()
# remove acts like the opposite of .xmv()

df = df.remove('species', ['D.vir', 'D.ere', 'D.wil', 'D.sec', 'D.yak', 'D.sims'])
# or
df = df.remove('species', 'D.vir', 'D.ere', 'D.wil', 'D.sec', 'D.yak', 'D.sims')

# both .xmv() and .remove() can filter/remove by the unique id if the first argument = 'id'
df = df.remove('id', '2019-08-02_14-21-23_021d6b|01')
Filtering by time

Often you will want to remove from the analyses the very start of the experiments, when the data isn't as clean because animals are habituating to their new environment. Or perhaps you'll want to just look at the baseline data before something occurs. Use .t_filter() to filter the dataset between two time points.

# filter you dataset by time with .t_filter()
# the arguments take time in hours
# the data is assumed to represented in seconds

df = df.t_filter(start_time =  24, end_time = 48)

# Note: the default column for time is 't', to change use the parameter t_column

Concatenate allows you to join two or more behavpy data tables together, joining both the data and metadata of each table. The two tables do not need to have identical columns: where there's a mismatch the column values will be replaced with NaNs.

# An example of concatenate using .xmv() to create seperate data tables
df1 = df.xmv('species', 'D.vir')
df2 = df.xmv('species', 'D.sec')
df3 = df.xmv('species', 'D.ere')

# a behapvy wrapper to expand the pandas function to concat the metadata
new_df = df1.concat(df2)
# .concat() can process multiple data frames
new_df = df1.concat(df2, df3)
Analyse a single column

Sometimes you want to get summary statistics of a single column per specimen. This is where you can use .analyse_column(). The method will take all the values in your desired column per specimen and apply a summary statistic. You can choose from a basic selection, e.g. mean, median, sum. But you can also use your own function if you wish (the function must work on array data and return a single output).

# Pivot the data frame by 'id' to find summary statistics of a selected columns
# Example summary statistics: 'mean', 'max', 'sum', 'median'...

pivot_df = df.analyse_column('interactions', 'sum')

2019-08-02_14-21-23_021d6b|01                 0
2019-08-02_14-21-23_021d6b|02                 43
2019-08-02_14-21-23_021d6b|03                 24
2019-08-02_14-21-23_021d6b|04                 15
2020-08-07_12-23-10_172d50|18                 45
2020-08-07_12-23-10_172d50|19                 32
2020-08-07_12-23-10_172d50|20                 43

# the output column will be a string combination of the column and summary statistic
# each row is a single specimen

Sometimes you will create an output from the pivot table or just a have column you want to add to the metadata for use with other methods. The column to be added must be a pandas series of matching length to the metadata and with the same specimen IDs.

# you can add these pivoted data frames or any data frames with one row per specimen to the metadata with .rejoin()
# the joining dataframe must have an index 'id' column that matches the metadata

df = df.rejoin(pivot_df)
Binning time

Sometimes you'll want to aggregate over a larger time to ensure you have consistent readings per time points. For example, the ethoscope can record several readings per second, however sometimes tracking of a fly will be lost for short time. Binning the time to 60 means you'll smooth over these gaps.
However, this will just be done for 1 variable so will only be useful in specific analysis. If you want this applied across all variables remember to set it as your time window length in your loading functions.

# Sort the data into bins of time with a single column to summarise the bin

# bin time into groups of 60 seconds with 'moving' the aggregated column of choice
# default aggregating function is the mean
bin_df = df.bin_time('moving', 60)

                                t_bin  moving_mean
2019-08-02_14-21-23_021d6b|01   86400          0.75
2019-08-02_14-21-23_021d6b|01   86460          0.5
2019-08-02_14-21-23_021d6b|01   86520          0.0
2019-08-02_14-21-23_021d6b|01   86580          0.0
2019-08-02_14-21-23_021d6b|01   86640          0.0
...                               ...          ...
2020-08-07_12-23-10_172d50|19  431760          1.0
2020-08-07_12-23-10_172d50|19  431820          0.75
2020-08-07_12-23-10_172d50|19  431880          0.5
2020-08-07_12-23-10_172d50|19  431940          0.25
2020-08-07_12-23-10_172d50|20  215760          1.0

# the column containg the time and the aggregating function can be changed

bin_df = df.bin_time('moving', 60, t_column = 'time', function = 'max')

                                time_bin  moving_max
2019-08-02_14-21-23_021d6b|01   86400          1.0
2019-08-02_14-21-23_021d6b|01   86460          1.0
2019-08-02_14-21-23_021d6b|01   86520          0.0
2019-08-02_14-21-23_021d6b|01   86580          0.0
2019-08-02_14-21-23_021d6b|01   86640          0.0
...                               ...          ...
2020-08-07_12-23-10_172d50|19  431760          1.0
2020-08-07_12-23-10_172d50|19  431820          1.0
2020-08-07_12-23-10_172d50|19  431880          1.0
2020-08-07_12-23-10_172d50|19  431940          1.0
2020-08-07_12-23-10_172d50|20  215760          1.0
Wrap time

The time in the ethoscope data is measured in seconds, however these numbers can get very large and don't look great when plotting data or showing others. Use this method to change the time column values to be a decimal of a given time period, the default is the normal day (24) and will change time to be in hours from reference hour or experiment start.

# Change the time column to be a decimal of a given time period, e.g. 24 hours
# wrap can be performed inplace and will not return a new behavpy
df.wrap_time(24, inplace = True)
# however if you want to create a new dataframe leave inplace False
new_df = df.wrap_time(24)
Remove specimens with low data points

Sometimes you'll run an experiment and have a few specimens that were tracked poorly or just have fewer data points than the rest. This can be really affect some analysis, so it's best to remove it. 
Specify the minimum number of data points you want per specimen, any lower and they'll be removed from the metadata and data. Remember the minimum points per a single day will change with the frequency of your measurements.

# removes specimens from both the metadata and data when they have fewer data points than the user specified amount 

# 1440 is 86400 / 60. So the amount of data points needed for 1 whole day if the data points are measured every minute

new_df = df.curate(points = 1440)
Remove specimens that aren't sleep deprived enough

In the Gilestro lab we'll sleep deprive flies to test their response. Sometimes the method won't work and you'll a few flies mixed in that have slept normally. Call this method to remove all flies that have been asleep for more than a certain percentage over a given time period. This method can return two difference outputs depending on the argument for the remove parameter. If it's a integer between 0 and 1 then any specimen with more than that fraction as asleep will be removed. If left as False then a pandas data frame is returned with the sleep fraction per specimen. 

# Here we are removing specimens that have slept for more than 20% of the time between the period of 24 and 48 hours.
dfn = df.remove_sleep_deprived(start_time = 24, end_time = 48, remove = 0.2, sleep_column = 'asleep', t_column = 't')

# Here we will return the sleep fraction per specimen
df_sleep_fraction = df.remove_sleep_deprived(start_time = 24, end_time = 48, sleep_column = 'asleep', t_column = 't')
Interpolate missing results

Sometimes you'll have missing data points, which is not usually too big of a problem. However, sometimes you'll need to do some analysis that requires regularly measured data. Use the .interpolate() method to set a recording frequency and interpolate any missing points from the surrounding data. Interpolate is a wrapper for the scipy interpolate function.

# Set the varible you want to interpolate and sampling frequency you'd like (step_size)

# step size is given in seconds. Below would interpolate the data for every 5 mins from the min time to max time
new_df = df.interpolate(variable = 'x', step_size = 300)

Not all experiments are run at the same time and you'll often have differing number of days before an interaction (such as sleep deprivation) occurs. To have all the data aligned so the interaction day is the same include in your metadata .csv file a column called baseline. Within this column, write the number of additional days that needs to be added to align to the longest set of baseline experiments.

# add addtional time to specimens time column to make specific interaction times line up when the baseline time is not consistent
# the metadata must contain a a baseline column with an integer from 0 - infinity

df = df.baseline(column = 'baseline')

# perform the operation inplace with the inplace parameter 
Add day numbers and phase

Add new columns to the data, one called phase will state whether it's light or dark given your reference hour and a normal circadian rhythm (12:12). However, if you're working with different circadian hours you can specify the time it turns dark.

# Add a column with the a number which indicates which day of the experiment the row occured on
# Also add a column with the phase of day (light, dark) to the data
# This method is performed in place and won't return anything. 
# However you can make it return a new dataframe with the inplace = False
df.add_day_phase(t_column = 't') # default parameter for t_column is 't'

# if you're running circadian experiments you can change the length of the days the experiment is running 
# as well as the time the lights turn off, see below.

# Here the experiments had days of 30 hours long, with the lights turning off at ZT 15 hours.
# Also we changed inplace to False to return a modified behavpy, rather than modify it in place.
df = df.add_day_phase(day_length = 30, lights_off = 15, inplace = False)
Estimate Feeding

If you're using the ethoscope we can approximate the amount of time feeding by labelling micro-movements near the end of he tube with food in it as feeding times. This method relies upon your data having a micro column which should be generated if you load the data with the motion_detector or sleep_annotation loading function.  

This method will return a new behavpy object that has an additional column called 'feeding' with a boolean label (True/False). The subsequent new column can then be plotted as is shown on the next page.

# You need to declare if the food is positioned on the outside or inside so the method knows which end to look at
new_df = df.feeding(food_position = 'outside', dist_from_food = 0.05, micro_mov = 'micro', x_position = 'x') # micro_mov and x_position are the column names and defaults
# The default for distance from food is 0.05, which is a hedged estimate. Try looking at the spread of the x position to get a better idea what the number should be for your data
Automatically remove dead animals

Sometimes the specimen dies or the tracking is lost. This method will remove all data of the specimen after they've stopped moving for a considerable length of time.

# a method to remove specimens that havent moved for certain amount of time
# only data past the point deemed dead will be removed per specimen
new_df = df.curate_dead_animals()

# Below are the standard numbers and their variable names the function uses to remove dead animals:
# time_window = 24: The window in which death is defined, set to 24 houurs or 1 day
# prop_immobile = 0.01: The proportion of immobility that counts as "dead" during the time window
# resoltion = 24: How much the scanning window overlap, expressed as a factor
Find lengths of bouts of sleep
# break down a specimens sleep into bout duration and type

bout_df = df.sleep_bout_analysis()

                               duration  asleep         t
2019-08-02_14-21-23_021d6b|01      60.0    True   86400.0
2019-08-02_14-21-23_021d6b|01     900.0   False   86460.0
...                                 ...     ...       ...
2020-08-07_12-23-10_172d50|05     240.0    True  430980.0
2020-08-07_12-23-10_172d50|05     120.0   False  431760.0
2020-08-07_12-23-10_172d50|05      60.0    True  431880.0

# have the data returned in a format ready to be made into a histogram
hist_df = df.sleep_bout_analysis(as_hist = True, max_bins = 30, bin_size = 1, time_immobile = 5, asleep = True)

                               bins  count      prob
2019-08-02_14-21-23_021d6b|01    60      0  0.000000
2019-08-02_14-21-23_021d6b|01   120    179  0.400447
2019-08-02_14-21-23_021d6b|01   180     92  0.205817
...                             ...    ...       ...
2020-08-07_12-23-10_172d50|05  1620      1  0.002427
2020-08-07_12-23-10_172d50|05  1680      0  0.000000
2020-08-07_12-23-10_172d50|05  1740      0  0.000000

# max bins is the largest bout you want to include
# bin_size is the what length runs together, i.e. 5 would find all bouts between factors of 5 minutes
# time_immobile is the time in minutes sleep was defined as prior. This removes anything that is small than this as produced by error previously.
# if alseep is True (the default) the return data frame will be for asleep bouts, change to False for one for awake bouts
Plotting a histogram of sleep_bout_analysis
# You can take the output from above and create your own histograms, or you can use this handy method to plot a historgram with error bars from across your specimens
# Like all functions you can facet by your metadata
# Here we'll compare two of the species and group the bouts into periods of 5 minutes, with up to 12 of them (1 hour)
# See the next page for more information about plots

fig = df.plot_sleep_bouts(
    sleep_column = 'asleep',
    facet_col = 'species',
    facet_arg = ['D.vir', 'D.ere'],
    facet_labels = ['D.virilis', 'D.erecta'],
    bin_size = 5, 
    max_bins = 12


Find bouts of sleep
# If you haven't already analysed the dataset to find periods of sleep,0
# but you do have a column containing the movement as True/False. 
# Call this method to find contiguous bouts of sleep according to a minimum length

new_df = df.sleep_contiguous(time_window_length = 60, min_time_immobile = 300)   
Sleep download functions as methods
# some of the download functions mentioned previously can be called as methods if the data wasn't previously analysed
# dont't call this method if your data was already analysed! 
# If it's already analysed it will be missing columns needed for this method
new_df = df.motion_detector(time_window_length = 60)


Visualising your data

Once the behavpy object is created, the print function  will just show your data structure. If you want to see your data and the metadata at once, use the built in method .display()

# first load your data and create a behavpy instance of it



You can also get quick summary statistics of your dataset with .summary() 


# an example output of df.summary()
behavpy table with:
    individuals       675
   metavariable         9
      variables        13
   measurements   3370075
# add the argument detailed = True to get information per fly
df.summary(detailed = True)

                               data_points          time_range
2019-08-02_14-21-23_021d6b|01         5756   86400  ->  431940
2019-08-02_14-21-23_021d6b|02         5481   86400  ->  431940

Be careful with the pandas method .groupby()  as this will return a pandas object back and not a behavpy object. Most other common pandas actions will return a behavpy object.

Visualising your data

Whilst summary statistics are good for a basic overview, visualising the variable of interest over time is usually a lot more informative. 


The first port of call when looking at time series data is to create a heatmap to see if there are any obvious irregularities in your experiments.

# To create a heatmap all you need to write is one line of code!
# All plot methods will return the figure, the usual etiquette is to save the variable as fig

fig = df.heatmap('moving') # enter as a string which ever numerical variable you want plotted inside the brackets

# Then all you need to do is the below to generate the figure