Data Management

Glider data management is a complex endeavor. Gliders contain many moving parts, and can collect well over a terabyte of data in a single deployment. This page outlines the ESD glider data management efforts, including directory structures and storage locations for different types of data.

This page provides the in-the-weeds data management details. These will likely only be relevant for ESD glider lab members, or members of other glider labs. Users looking for more general information about glider data products should go to the relevant data type pages via the sidebar menu.

Data Management Plan

This section describes what files go where, as well as required directory structures. It replaces the AMLR Glider Data Management Plan Google doc.

These slides (NOAA internal) describe and link to the various homes for different pieces of ESD glider data, for additional references. Several of these data homes are described in more detail below.

Glider & Mooring Database

The glider and mooring database is a SQL Server database, hosted in a SWFSC-specific GCP project. This database exists to track all of the glider/instrument (and mooring) parts and builds. Each glider deployment is represented as a build, and each build includes links to all of the different parts that were on that glider for the deployment. The rest of this section provides high-level guidance for how to use the Glider & Mooring Database; see this document for more details, including database access instructions for both database users and managers.

Devices

For this database, ‘device’ means any piece of glider, instrument, mooring, etc, that needs to be tracked. Devices can be created by clicking the ‘Devices’ button from the database home page, and then using the form that pops up. Note that any new device types must be added using the ‘Device Types’ form.

All new devices should be added to the database as soon as they arrive at the SWFSC. Created devices can then be added to a glider build, to track which devices were part of which deployments.

Note: Currently there is no way to track software versions of a ‘device’ in the database. Some of this info is tracked via the Fleet Status page (NOAA internal), but it will be formally added to the database as time permits.

Gliders

A glider build functionally allows ESD to create a collection of devices and files, which they can then associate with a deployment. A deployment build tracks the glider software version, deployment start date and end date, and the number of profiles (yos) performed during the deployment. Glider builds should be created 1) when a glider is first acquired, 2) before a glider is deployed, and 3) when a relatively whole glider is sent in for service. Again, see the database Google doc for more details.

Calibrations

All factory calibration documents are now stored in the glider-lab repo.

Although some records for acoustic calibrations performed in the SWFSC tech tank are in the database, most are currently recorded and tracked by Tony.

Files

All factory calibration documents are now stored in the glider-lab repo.

Other files associated with a deployment (e.g., autoexec and proglet files, or config files associated with certain instruments) are currently stored in the database, but will likely be moved to a more accessible location in the future.

AMLR Gliders Google Drive

The AMLR Gliders Google Drive is the home for ‘Glider prep’ files that we a) want to be able to access from a glider lab computer or b) want multiple people to be able to edit. Currently these are primarily files from the glider build and testing (e.g., tech tank dive data)

Glider Deployments

The Glider Deployments folder is for working files during both deployment prep, and the deployment itself. It provides an easy place (i.e., easier than Google Cloud) for ESD glider team members to collaborate on and refer to different files. These files live in ‘GDrive deployment prep folders’; the Glider Deployments folder contains GDrive folder templates.

For new deployments, a template exists in the Templates folder.

The structure of an individual GDrive deployment prep folder:

glider-YYYYmmdd-prep
├── acoustic-prep
├── deployment-files
├── glider-prep
    ├── final-seal-photos
    ├── functional-checkout-data
    ├── tech-tank-data
├── imagery-prep
├── photos
  • acoustic-prep: acoustic prep files from pre-deployment tests. These may include sample data files, relevant tests, etc.
  • deployment-files: a catch-all folder for other files from the deployment
  • glider-prep: Folders and files from the glider prep. In addition to the folders described below, the ‘glider-prep’ folder is the home for any ballast sheet and compass calibration files.
    • final-seal-photos: photos of the final seals of the glider
    • functional-checkout-data: functional checkout sheet, data files from test tank dive, and any other files/notes from the functional checkout
    • tech-tank-data: data files from the tech tank, if relevant
  • imagery-prep: imagery (camera) prep files from pre-deployment tests. These may include sample images, relevant tests, etc.
  • photos: folder for miscellaneous photos. These could include photos of the glider seal, or photos of the glider deployment and recovery.

Miscellaneous files that should stay associated with the deployment, but do not belong in the GCS buckets, can go directly in the GDrive glider-YYYYmmdd-prep folder.

glider-lab repository

The glider-lab repository has evolved into a place to store a) reference documents that can be accessible outside NOAA and b) files used for processing in GCP. These include:

  • calibration-docs: factory calibration documents for glider instruments
  • deployment-configs: yaml configuration files for glider data processing
  • deployment-reports: post-deployment reports, created as Quarto documents
  • deployment-scripts: Python scripts for processing the glider data.
  • echoview-glider-calib-files: calibration files used for processing acoustic data using Echoview

See the repo readme for full folder descriptions.

standard-glider-files repository

The standard-glider-files repository is the home for all ESD glider cache files, as well as standard files that are put on ESD gliders before deployment. This repo and these files are also commonly used during ESD lab tests.

GCS Buckets

NOTE: glider data are currently in the process of being moved from the ESD Dev project (ggn-nmfs-usamlr-dev-7b99) to the ESD Prod project (ggn-nmfs-swfscesd-prod-1). Data Plan updates can be found here (NOAA internal). If you cannot find specific data in the prod project, please create an issue or contact Sam Woodman

All ESD glider data are stored in Google Cloud Storage (GCS) buckets, in the ESD Prod project. Data are split across several different buckets, to better utilize GCS bucket options, inlcuding permissions (prionciple of least permissions), storage classes, and object versioning/soft delete. GCS buckets are only accessible to individuals with a noaa.gov email address that have been added to the GCP project as a data viewer. For any data access issues, please contact Sam Woodman.

Data processing code depends on a specific directory structure, and thus it is important that data in GCS buckets follow the format described below. All GCS buckets used for storing glider data follow the same top-level directory structure of ‘YYYY/glider-YYYYmmdd’:

  • YYYY: All glider deployments are grouped by the year in which they were deployed. These year folders are of the format YYYY (e.g., “2024”).
  • glider-YYYYmmdd: Within each year folder are deployment folders. The deployment folders all follow the same naming convention: ‘glider-YYYYmmdd’, where ‘glider’ is the name of the glider and ‘YYYYmmdd’ is the deployment date. For example, the folder name for the glider calanus that was deployed on 19 Oct 2024 is “calanus-20241019”.

Each section below corresponds to an individual GCS bucket. They include a brief description, a link to the GCS bucket, and a directory outline followed by text that explains each directory. Consistent principles in the ‘Directory structure’ sections:

  • Several folders contain “delayed” and “rt” subfolders. Delayed mode data were collected or generated after the glider was recovered, while rt (i.e., real-time) data were transmitted or generated while the glider was deployed.
  • File names can be differentiated from folder names by a file extension. For instance, “archive-sfmc” does not have a file extension and thus is a folder name, while “{deployment}-event-timeline.csv” has a “.csv” file extension and thus is a file name.
  • Curely braces, e.g. {…} indicate parts of a file name that are different across deployments. For instance, {deployment} represents the given deployment name, e.g. “amlr08-20220513”.

Deployments

Glider deployment data refers to glider data collected and stored on the glider computers, i.e. not acoustic (active or passive) or imagery data.

NOTE: cache files are stored in standard-glider-files.

Data In

The ‘data in’ bucket contains glider input data files.

Link: esd-glider-deployments-data-in bucket

Directory structure:

├── archive-sfmc
├── binary
    ├── delayed
    ├── rt
  • archive-sfmc: The ‘archive’ folder, synced or downloaded from the SFMC. This folder contains all files sent to the glider during the deployment, with an associated timestamp as part of the file name. Among other things, this folder is used for generating a sensor config table for the post-deployment report.
  • binary: Binary data files generated by the glider. This folder should include dbd/ebd/dcd/ecd (for delayed data) or sbd/tbd/scd/tcd (for rt data) files. Note that there is not a different path for compressed files, meaning that dcd/ecd files, for instance, go directly in the ‘delayed’ folder.
Data Out

The ‘data out’ bucket contains glider output data files. The subfolders in this bucket follow NASA data processing level definitions.

Link: esd-glider-deployments-data-out bucket

Directory structure:

├── ancillary-products
├── plots
    ├── delayed
        ├── TS-sci
        ├── maps-sci
        ├── pointMaps
        ├── spatialGrids-sci
        ├── spatialSections-sci
        ├── thisVsThat-eng
        ├── timeSections-sci
        ├── timeSections-sci-gt
        ├── timeSeries-eng
        ├── timeSeries-sci
    ├── rt
        ├── ... : same subfolders as 'delayed' plots folder
├── processed-L0
├── processed-L1
    ├── ngdac
        ├── delayed
        ├── rt
├── processed-L3
  • ancillary-products: Ancillary data products, as relevant for the deployment. See {todo} for more info.
  • plots: Standard plots generated from the timeseries and gridded glider data. Note that ‘sci’ is short for plots of science sensor values, while ‘eng’ is short for plots of engineering variables. See plots for descriptions of the various plot types.
  • processed-L0: “Level 0” processed data. In practice, “Level 0” data products are the raw timeseries were created by the esdglider/pyglider base processing functions.
  • processed-L1: “Level 1” processed data and data products. In practice, “Level 1” data products are the science and engineering timeseries created by the esdglider/pyglider base processing functions.
    • ngdac: NetCDF files formatted for submission to the IOOS Glider DAC. The DAC requires one NetCDF file for each profile, hence the subfolder.
  • processed-L3: “Level 3” processed data and data products. In practice, “Level 3” data products are the gridded datasets created by the esdglider/pyglider base processing functions.

Note: the various ‘processed’ folders will have both delayed-mode and rt data files. These files will have the mode indicated in the file name, as described in {link todo}. There are distinct delayed/rt plots and ngdac folders because of the number of respective files.

Backup

The ‘backup’ bucket contains backup glider files. These are files that do not need to be frequently accessed, and thus do not need to be stored in ‘standard’ GCS storage.

Link: esd-glider-deployments-backup bucket

Directory structure:

├── glider-files
├── {deployment}-event-timeline.csv
├── {...}.tar.gz
  • glider-files: Folder for glider flight/science backups. These are typically zips of the glider and science folders, but can contain whatever files/folders were saved from the glider computers.
  • {deployment}-event-timeline.csv: The event timeline, from the SFMC. Instructions: 1) download the xls file from the SFMC, 2) format ‘Time’ column as “yyyy/mm/dd hh:mm:ss”, 3) save file as a CSV. Example file name: “amlr08-20220513-event-timeline.csv”
  • {…}.tar.gz: The full Glider Folder Archive Tarball from the SFMC. Keep the default filename from the SFMC, e.g. 

Active Acoustics

Data In

The Active Acoutics (AA) ‘data in’ bucket contains raw AA data files.

Link: swfscesd-glider-active-acoustics-data-in

Directory structure:

├── delayed
├── rt
  • delayed: Raw acoustic delayed data, from either AZFPs or Echosounders
  • rt: Raw real-time ad2 files. For Nortek instruments only

TODO: Does config info still need to live here? Or can/should that be derived from archive-sfmc?

Data Out

TODO

Passive Acoustics

See PAM-Glider for Passive Acoustic Monitoring (PAM) glider data management.

Imagery

Raw Images

The raw imagery ‘data in’ bucket contains raw image files.

Link: swfscesd-glider-imagery-data-in

Directory structure:

├── images
  • images: All imagery collected during the deployment. These images will be in their ‘Dir####’/‘dir#######’ folders as they were recorded on the imagery cards.

TODO: Does config info still need to live here? Or can/should that be derived from archive-sfmc?

Raw Image Metadata

The raw iamgery ‘metadata’ bucket conatins metadata about the raw image files. There are two metadata files - one with deployment-level metadata (e.g., image size), and one with image-specific metadata (i.e., the datetime, directory name, and file name of each image).

Link: swfscesd-glider-imagery-metadata

Directory structure:

├── {glider-YYYYmmdd}-deployment-metadata.json
├── {glider-YYYYmmdd}-image-metadata.jsonl
  • {glider-YYYYmmdd}-deployment-metadata.json: JSON file containing deployment-level image metadata, such as image size and shadowgraph camera version. Example file name: “amlr08-20220513-deployment-metadata.json”
  • {glider-YYYYmmdd}-image-metadata.jsonl: JSON Lines file containing image-level metadata. Specifically, each line represents one image, and contains the image file name, directory name, and datetime. Example file name: “amlr08-20220513-image-metadata.jsonl”
Pre-Processed Imagery

The directory structure for the pre-processed imagery is under development. See the Imagery page for more commentrary.

Back to top