Glider Data Processing

Background

There are many steps and flavors to glider data processing, from the base processing (binary glider to NetCDF files for Slocum gliders) to QA/QC to developing products from additional glider sensors.

Historically, the ESD glider team has processed glider data using the Matlab toolbox SOCIB. However, this toolbox is not actively maintained, and the majority of ESD processing efforts have moved to Google Cloud (GCP). Subsequent efforts involved developing amlr-gliders, which contained a Python toolbox and scripts that were ~medium wrappers around gdm. These efforts never caught full traction.

Current efforts both leveraging existing toolboxes (PyGlider, GliderTools, dbdreader) and developing new Python code and wrapper scripts that live in glider-utils. All of these efforts are geared towards processing ESD glider data in ESD’s Google Cloud project, using an Open Science approach .

This page describes these current efforts, and outlines planned areas of future development. For acoustic or image processing workflows, see the acoustics and and imagery pages, respectively.

Todo: create and add flow chart, a la here

Base Processing

The base glider data processing is primarily done using PyGlider, hereafter simply ‘pyglider’. pyglider takes in binary data files from Teledyne/Webb Slocum gliders, and creates CF-compliant NetCDF files. These NetCDF files will be both used internally in further workflows, and sent to the NGDAC.

Level 1 - Timeseries

Workflow:

  1. Binary and cache files from the glider are properly archived in GCS bucket (see here - link todo)
  2. Create deployment YAML file (example). Currently this is done by hand; ideally in the future this is done by code pulling from the Glider&Mooring database
  3. Run pyglider and create output NetCDF files using the script binary_to_nc.py, run via binary_to_nc.sh on a GCP VM
  4. Access processed NetCDF files via GCP for additional internal workflows

Output NetCDF File

Todo: Notes about the structure of and variables in the output NetCDF file. This will depend on CR answers to processing questions (link to issue todo).

Processing Notes

This section describes the default pyglider/dbdreader behavior when processing binary glider files. This may be useful reference for folks using pyglider/dbdreader for any purpose.

The pyglider.slocum.binary_to_timeseries function uses dbdreader to read slocum glider data from binary files. More discussion can be found here around why pyglider switched to using dbdreader, and how pyglider processing worked before.

As constructed, binary_to_timeseries uses dbdreader’s get_sync to extract values for all of the specified sensors. Thus, the user specifies a particular sensor to server as the ‘time_base’, and then all other desired variables (across science and engineering: pressure, temperature, oxygen, roll, etc.) are interpolated onto the same time base (i.e., the timestamps where this ‘time_base’ variable has a valid (not missing/nan) value). This makes life difficult if different sensors have different sampling strategies, and does not allow users to extract non-interpolated data. Adding a ‘union’ behavior option is currently under discussion in this issue and this pull request. For now, binary_to_nc.py generates several datasets, and merges them together soa s to have all variables/relevant timestamps in one file.

Data alterations, or additional features of note:

  • dbdreader:
    • dbdreader throws an error if a sensor is turned off and thus not present in some sbd/tbd or dbd/ebd files. For example, if the oxygen sensor is turned off halfway through a deployment, although this would apply to any sensor that is turned off. However, it looks like this behavior may be fixed soon.
    • dbdreader by default skips the first line of each binary file. The reasoning is that “this line contains either nonsense values, or values that were read a long time ago. This behavior can be changed.” See here for more discussion.
    • dbdreader only identifies sensors as ‘engineering’ or ‘science’. Thus, when extracting e.g. data for the sensor ‘sci_oxy4_oxygen’, dbdreader uses the ‘sci_m_present_time’ as the timestamp, rather than ‘sci_oxy4_present_time’.
  • pyglider:
    • Latitude and longitude values are set to nan if their absolute value is greater than 90 and 180, respectively.
    • Any values of zero from science sensors are converted to nan.
    • No other data alterations are made (e.g., CTD data is all read and left as-is).

Level 2 - Gridded

pyglider has a function ncprocess.make_gridfiles for making gridded data. This function takes the above timeseries file as input.

More docs forthcoming.

IOOS NGDAC

pyglider has a function ncprocess.extract_timeseries_profiles for creating profile files for submission to the NGDAC. This function takes the above timeseries file as input.

More docs forthcoming.

Other Files

ESD’s base processing also creates several other files necessary for processing or using data from other sensors on the glider. If the glider is carrying a shadowgraph or glidercam camera system, then a CSV file is created that links each image with the relevant glider measurements at that time (depth, temperature, oxygen concentration, etc.). If the glider is carrying an acoustic instrument, specific files are needed to process these data using Echoview. These files are created during the base processing step as well.

Real-Time Data

tbd. Vision is to have GCP infrastructure in place to:

  • Periodically rsync real-time data (sbd/tbd files) from the SFMC to a GCP VM
  • Run the base processing steps on these data
  • Automatically send the processed files to the NGDAC

Future Steps

qc + data cleaning -> cleaned netcdf files with qc flags. Potential resources:

  • Implement IOOS QARTOD tests, eg using ioos_qc to add qc flags
  • https://github.com/OceanGlidersCommunity/Realtime-QC
    • https://github.com/castelao/CoTeDe
  • other qc/data cleaning? For instance, sanity check ts plots. Likely will involve by-hand inspection for each deployment, including removing bad data if discovered

GliderTools: additional processing functionality

  • manuscript describing the GliderTools toolbox
  • GliderTools contains tools for processing Seaglider basestation files. However, the rest of the tools simply require that the data be in an xarray dataset.
  • optics, pq: quenching correction method described by Thomalla et al. (2018)
  • additional qc tools?
  • calculate cool physics things (mixed layer depth, …)
  • leverage gridded plotting routines

GitHub Repos

See the home page for ESD-developed repos. External GitHub repos that are particularly relevant/useful:

repo link description
dbdreader Extract data from slocum binary files
pyglider Convert datafiles from slocum and seaexplorer into netcdf
GliderTools Quality control and plot generic glider data
IOOS qc Apply IOOS QARTOD and other qc routines
SOCIB Process glider data in Matlab (not actively maintained)
glider tools list OceanGliders community repository to list tools for processing glider data. Includes many of the tools listed above.
Back to top