top of page

HDF5 Data Processing Toolkit

  • Writer: Claude Paugh
    Claude Paugh
  • Apr 7
  • 1 min read

Updated: 1 day ago


We just released our Beta version of the HDF5 Data Processing Toolkit on Github: https://github.com/cpaw24/h5da/tree/main


It's been under development for several months, with the idea that reading/writing to and from HDF5 files have some common patterns. Multiple input file types are supported, including common image formats: JPEG, SVG, PNG, TIFF, etc., file based data in CSV and JSON, and video data in MP3/MP4 format.


The processing design utilizes batches alongside the multi-processing (mp) Python module to handle processes and resources. Each content "classification" is encapsulated in its own class, such as ImageProcessor, VideoProcessor, TextFileProcessor, etc.


Different content types are assigned to corresponding processors, like JSON/CSV to the TextFileProcessor and image formats to the ImageProcessor. The DataProcessor class within the dataWrangler module serves as the I/O handler and manages processing across various content types. It also oversees the multi-processing functions.


There is additional documentation linked to the README for each class and method. Open for modifications and improvements for anyone who would like to participate. The "main" processing module is still tightly integrated, but the individual processors are accessible separately. The generated doc for the dataWrangler modules is below.





HDF5 File Concepts





Bedford, MA 01730

bottom of page