Glider Data

There are many steps and flavors to glider data processing, from the base processing (binary glider to NetCDF files for Slocum gliders), to QA/QC, to developing additional data products. This page outlines the ESD glider lab’s data processing workflow, and in particular base processing and data products.

Users interested in understanding and accessing data products created for each deployment should see the Data Products section. For acoustic or image processing workflows, see the acoustics and imagery pages, respectively.

Background

Historically, the ESD glider team has processed glider data using the Matlab toolbox SOCIB. However, this toolbox is not actively maintained, and the majority of ESD processing efforts have moved to Google Cloud (GCP), an environment where Matlab is difficult to run. Subsequent efforts involved developing amlr-gliders, a Python toolbox and scripts that were primarily wrappers around gdm. These efforts never caught full traction.

Currently, the ESD glider lab uses the esdglider Python package, which can be found in the glider-utils repo, to do its base glider data processing. esdglider primarily consists of ESD-specific wrappers around existing toolboxes such as PyGlider, GliderTools, and dbdreader. All efforts are geared towards processing ESD glider data in ESD’s Google Cloud project, using an Open Science approach.

Terminology

Common terminology used in glider data processing:

Profile, and other sampling terminology
NetCDF
Inflection: when a glider is transitioning from a dive to a climb, or from a climb to a dive

Base Processing

The base glider data processing, also referred to as “Level 1” or “L1” processing, is primarily done using PyGlider, hereafter simply pyglider. Pyglider creates creates timeseries and gridded CF-compliant NetCDF files, which can be both used internally in further workflows, as well as made publicly available through an ERDDAP (e.g., via the NGDAC).

Data Products

The esdglider package uses pyglider to create several different data products as part of the base processing, which are described here. All output files begin with the deployment name (‘glider-YYYYmmdd’), and include the data processing ‘mode’ (i.e., delayed or rt).

The timeseries data products are all NetCDF files with a single coordinate “time”. The gridded data products are all NetCDF files with two coordinates: “depth” and “profile” (i.e., profile_index). The one exception is a ‘profiles’ CSV, which contains the start/end times and depths of the different profile segments.

For descriptions of the various standard plots, see Plots below.

Primary Products

These are the products that most science end-users will be interested in.

{glider-YYYYmmdd}-{mode}-sci.nc: The ‘science’ timeseries. Most users will want this dataset, as it contains the values from all of the various science sensors. Functionally speaking, this timeseries has one ‘row’ of data for each timestamp from the glider’s science computer where a CTD temperature value was recorded. All included sensor values are interpolated to all timestamps, as long as there is not a gap of 300 or more seconds. The sensor values included in this dataset depend on the instruments carried by the glider; … (Todo: add a section that lists the sensor values for each instrument). Users should note that the ‘depth’ value in this timeseries is the depth calculated from the CTD pressure sensor.
{glider-YYYYmmdd}_grid-{mode}-5m.nc: The science timeseries, gridded into five meter bins. This NetCDF file will include all of the science sensor values that were in the science timeseries. Users should note that the depth bin name is the bin midpoint. For instance, the bin for data from zero to five meters has the name/label “2.5”.
{glider-YYYYmmdd}-imagery-metadata.csv: A CSV file with one row for each image, and one column for each desired sensor value. The sensor values are interpolated to each image timestamp. This file is only created if the glider is carrying a camera, such as a shadowgraph or glidercam. Note that this file lives in the raw glider imagery bucket (e.g. here)

Other Products

{glider-YYYYmmdd}-{mode}-eng.nc: The ‘engineering’ timeseries. Functionally speaking, this timeseries has one ‘row’ of data for each timestamp from the glider’s flight computer where an m_depth value was recorded. All included sensor values are interpolated to all timestamps, as long as there is not a gap of 300 or more seconds. In addition to m_depth, it contains sensor values such as m_roll, m_pitch, and m_gps_lon. See the full list of sensor values here. Users should note that the ‘depth’ value in this timeseries is from m_depth, meaning the measured depth from the glider.
{glider-YYYYmmdd}-{mode}-raw.nc: The ‘raw’ timeseries. This timeseries contains a timestamp for every time recorded by the glider’s flight or science computer, and no interpolation is done for any sensor values. These two features make this dataset useful for a) calculating profiles and b) troubleshooting any sensor values or processing steps. This dataset contains both the measured depth (“depth”) and the depth calculated from the CTD pressure sensor (“depth_ctd”).
{glider-YYYYmmdd}_grid-{mode}-1m.nc: The science timeseries, gridded into one meter bins. This NetCDF file will include all of the science sensor values that were in the science timeseries. Users should note that the depth bin name is the bin midpoint. For instance, the bin for data from zero to one meters has the name/label “0.5”.
{glider-YYYYmmdd}-{mode}-profiles.csv: A CSV file with one row for each profile, containing information such as: profile index, profile start and end time, profile start and end depth, and profile direction. Note that this file contains “#.5” profiles, meaning time between profiles when the glider is at the surface or inflecting at depth.
Acoustic metadata files: Files needed for processing acoustic data with Echoview. These files are described on the acoustics page. These files are only created if the glider is carrying an acoustic instrument, such as a Nortek or AZFP.

Timeseries

See data products for a description of the timeseries output files. Current workflow (also shown above):

Upload binary and cache files into Deployment GCS bucket, as specified on the data management page
Run this script locally to create a deployment YAML file. This script pulls all instrument information from the Glider Database. A user needs to then check this script by hand, in particular the people, devices, comment, and summary blocks, and then commit to the glider-lab repo.
Create a deployment-specific script, and run it in a GCP Workbench Instance.
Access processed NetCDF files via GCP for additional internal workflows
(in development) serve/archive data with IOOS/NCEI. Create acoustic data products. Create imagery data products.

Processing Notes

This section describes in-the-weeds choices made by the code when processing binary glider files. This is likely only useful reference for esdglider developers, or folks using pyglider/dbdreader for another purpose.

The pyglider.slocum.binary_to_timeseries function uses dbdreader to read slocum glider data from binary files. More discussion can be found here around why pyglider switched to using dbdreader, and how pyglider processing worked before.

As constructed, binary_to_timeseries uses dbdreader.get_sync to extract values for all of the specified sensors. Thus, the user specifies a particular sensor to server as the ‘time_base’, and then all other desired variables (across science and engineering: pressure, temperature, oxygen, roll, etc.) are interpolated onto the same time base (i.e., the timestamps where this ‘time_base’ variable has a valid (not missing/nan) value). The only exception is that no interpolation is done if the gap between time points is greater than or equal to 300 seconds.

In ESD, we use binary_to_timeseries to create a ‘science’ timeseries, with the pressure from the CTD as the time base, and an ‘engineering’ timeseries, with the glider’s measured depth as the time base. We also have written a binary_to_raw function, which uses dbdreader.get to read in raw (i.e., uninterpolated) data points from both the science and flight computers. Thus ‘raw’ timeseries contains all glider data points, and is thus used to calculate profiles, and useful for debugging sensor behavior.

Data alterations, or additional features of note:

esdglider:

min_dt: esdglider allows users to pass a minimum datetime, which is used to filter all glider data for timestamps after this datetime. The datetime is usually when the glider begins its first 1k_n.mi mission. Because pressure is used as the time base for the science dataset, rows in science where pressure is nan are dropped (even if other sensors have non-nan values)

dbdreader:

dbdreader throws an warning if a sensor is turned off and thus not present in some sbd/tbd or dbd/ebd files. See this issue for more discussion.
dbdreader by default skips the first line of each binary file. The reasoning is that “this line contains either nonsense values, or values that were read a long time ago. This behavior can be changed.” See this issue for more discussion.
dbdreader only identifies sensors as ‘engineering’ or ‘science’. Thus, when extracting e.g. data for the sensor ‘sci_oxy4_oxygen’, dbdreader uses the ‘sci_m_present_time’ as the timestamp, rather than ‘sci_oxy4_present_time’.

pyglider:

Latitude and longitude values are set to nan if their absolute value is greater than 90 and 180, respectively.
Any values of zero from science sensors are converted to nan.
No other data alterations/qc are made (e.g., CTD data is all read and left as-is).

Gridded

Docs in progress. Gridding uses the pyglider.ncprocess.make_gridfiles function.

NGDAC Profiles

Docs in progress

Plots

Standard plots are generated from either the ‘processed-rt’ (subfolder ‘rt’) or ‘processed-L1’ data (subfolder ‘delayed’). Note that ‘sci’ is short for plots of science sensor values, while ‘eng’ is short for plots of engineering variables. Standard plots, which are separated by folder, include:

TS-sci: Temperature/salinity plots of the various science sensor values
maps-sci: Maps of surface values (i.e., depth <10m) from various science sensors
pointMaps: Scatter plots of the lat/lon points recorded by the glider
spatialGrids-sci: Gridded science data plotted by all combinations of latitude, longitude, and depth
spatialSections-sci: Gridded science data plotted by either latitude or longitude, and depth
thisVsThat-eng: Timeseries plots of various engineering sensor values plotted against each other
timeSections-sci: Gridded science data plotted by time and depth
timeSections-sci-gt: Gridded science data plotted by profile_index and depth. The plots are made with the glidertools package, and thus use the 0.5 and 99.5 percentile to set color limits.
timeSeries-eng: Timeseries plots of various engineering sensor values
timeSeries-sci: Timeseries plots of various science sensor values

Other Files

Docs in progress

Real-Time Data

Currently, all ESD glider data processing is delayed-mode processing. The vision is to have GCP infrastructure in place to:

Periodically rsync real-time data (sbd/tbd files) from the SFMC to GCP
Run Caleb’s slocumRtDataVisTool to create plots and statistics useful for real-time piloting decisions
Run the base processing steps on these data, and automatically send the processed files to the NGDAC

Future Directions

QC + data cleaning. Potential resources:

Purpose: to create cleaned netcdf files with qc flags
Implement IOOS QARTOD tests, eg using ioos_qc to add qc flags
https://github.com/OceanGlidersCommunity/Realtime-QC
- https://github.com/castelao/CoTeDe
other qc/data cleaning? For instance, sanity check ts plots. Likely will involve by-hand inspection for each deployment, including removing bad data if discovered

GliderTools:

manuscript describing the GliderTools toolbox
GliderTools contains tools for processing Seaglider basestation files. However, the rest of the tools simply require that the data be in an xarray dataset.
optics, pq: quenching correction method described by Thomalla et al. (2018)
additional qc tools?
calculate cool physics things (mixed layer depth, …)
leverage gridded plotting routines

GitHub Repos

See the home page for ESD-developed repos. External GitHub repos that are particularly relevant or useful:

repo link	description
dbdreader	Extract data from slocum binary files
pyglider	Convert datafiles from slocum and seaexplorer into netcdf
GliderTools	Quality control and plot generic glider data
IOOS qc	Apply IOOS QARTOD and other qc routines
SOCIB	Process glider data in Matlab (not actively maintained)
glider tools list	OceanGliders community repository to list tools for processing glider data. Includes many of the tools listed above.