MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

HDF5eis: A storage and input/output solution for big multidimensional time series data from environmental sensors

Author(s)
White, Malcolm CA; Zhang, Zhendong; Bai, Tong; Qiu, Hongrui; Chang, Hilary; Nakata, Nori; ... Show more Show less
Thumbnail
DownloadPublished version (818.1Kb)
Publisher Policy

Publisher Policy

Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.

Terms of use
Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.
Metadata
Show full item record
Abstract
Modern high-performance computing (HPC) tasks overwhelm conventional geophysical data formats. We describe a new data schema called HDF5eis (read H-D-F-size) for handling big multidimensional time series data from environmental sensors in HPC applications and implement a freely available Python application programming interface (API) for building and processing HDF5eis files. HDF5eis augments the popular Hierarchical Data Format 5 with a minimal set of additional conventions that facilitate fast and flexible data input and output protocols for regularly sampled (in time) data with any number of dimensions. HDF5eis supports arbitrary ancillary data (e.g., metadata) storage in columnar format or as UTF-8 encoded byte streams alongside time series data. Our HDF5eis API enables simple and efficient access to big data sets distributed across a potentially large number of small heterogeneous files through a single point of access. HDF5eis outperforms conventional seismic data formats by up to two orders of magnitude in terms of random read access times. We contribute HDF5eis as an operational tool and an experimental draft proposal that will help establish the next generation of data standards in the earth sciences.
Date issued
2023-04-12
URI
https://hdl.handle.net/1721.1/165708
Department
Massachusetts Institute of Technology. Department of Earth, Atmospheric, and Planetary Sciences
Journal
Geophysics
Publisher
Society of Exploration Geophysicists
Citation
Malcolm C. A. White, Zhendong Zhang, Tong Bai, Hongrui Qiu, Hilary Chang, Nori Nakata; HDF5eis: A storage and input/output solution for big multidimensional time series data from environmental sensors. Geophysics 2023;; 88 (3): F29–F38.
Version: Final published version

Collections
  • MIT Open Access Articles

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.