Spectral Data Model¶
Core Concepts¶
Spectrum¶
The fundamental unit in speclib. Every spectrum contains:
| Field | Type | Description |
|---|---|---|
wavelengths |
ndarray[float64] |
Wavelength positions in micrometers (µm), sorted ascending |
reflectance |
ndarray[float64] |
Reflectance values, scaled 0.0–1.0 |
errors |
ndarray[float64] |
Measurement uncertainty per band (optional) |
metadata |
SampleMetadata |
Provenance, material properties, analytical context |
quality |
QualityFlag |
USGS-convention purity/quality indicator |
Conventions¶
- Wavelength units: Always micrometers (µm). Adapters convert from nm or cm⁻¹ on ingestion.
- Reflectance scale: Always 0.0–1.0. Adapters convert from percentage (0–100) on ingestion.
- Spectral gaps: Stored as
NaNin arrays, documented in metadata. - No implicit resampling: Data is stored exactly as measured. Resampling is an explicit operation.
Quality Flags¶
Following USGS conventions:
| Flag | Meaning |
|---|---|
REFERENCE |
Reference standard, highest confidence |
HIGH |
High purity, well-characterized sample |
MEDIUM |
Moderate confidence, may have mixtures |
LOW |
Low confidence, use with caution |
UNVALIDATED |
Not yet validated against reference |
Storage Model¶
HDF5 Archive (Source of Truth)¶
Spectra are organized hierarchically by material category:
speclib_archive.h5
├── /metadata # Archive-level provenance
├── /minerals # USGS mineral spectra
├── /vegetation # Plant spectra including KY invasives
├── /soils # Soil spectra
├── /manmade # Artificial materials
├── /mixtures # Mixed spectra
└── /ky # Kentucky-specific collections
├── /invasives
├── /minerals
└── /reclamation
Parquet Query Layer (Derived)¶
Flat, columnar format optimized for search and filtering. Regenerated from HDF5 on demand. GeoParquet extension for spatially-referenced spectra.
Provenance¶
Every spectrum tracks its origin:
source_library— which upstream library (USGS, ECOSTRESS, ASTER, EMIT, KY)source_record_id— original ID in the upstream libraryingested_at— when the spectrum was added to speclibadapter_version— version of the ingestion adapter used