System Architecture & Data Integrity

Software Architecture & Type Safety

Standard Python scientific computing stacks (utilizing bare pandas.DataFrame or numpy.ndarray objects) do not inherently prevent misaligned or mathematically invalid data. In spectroscopy, processing an intensity array of size 500 against a binding energy array of size 501 is a silent but catastrophic error that propagates through numerical integration and peak fitting.

XPS Analyzer addresses these inherent risks by enforcing a strict object-oriented architecture, heavily relying on Pydantic v2 to validate mathematical invariants at runtime.

The Pydantic Safety Net

The core of the library is the XPSBaseModel, which inherits from pydantic.BaseModel. Every primary data structure in the package (XPSSpectrum, XPSDataset, XPSSample, PeakParameters, FitResult) extends this base class.

By wrapping numpy.ndarray objects inside these validated models, XPS Analyzer executes @model_validator(mode='after') hooks to assert physical constraints. If a function attempts to instantiate a peak with a negative full-width at half-maximum (FWHM), or an array containing NaN values, the system explicitly raises a ValidationError rather than silently producing mathematically invalid scientific data.

class XPSSpectrum(XPSBaseModel):
    """
    Individual XPS spectral region with strict Pydantic validation.
    """
    region_name: str
    binding_energy: np.ndarray  # eV
    intensity: np.ndarray       # Arbitrary counts
    metadata: dict[str, Any] = Field(default_factory=dict)
    
    @model_validator(mode='after')
    def validate_arrays(self):
        """
        Enforce identical array dimensions and prevent empty initialization.
        """
        if len(self.binding_energy) != len(self.intensity):
            raise ValueError("Arrays must possess the exact same length.")
        if len(self.binding_energy) == 0:
            raise ValueError("Arrays cannot be empty.")
        return self

Immutability Pattern

Scientific raw data must never be irreversibly altered by an intermediate processing step. The software is engineered to prefer absolute immutability.

When a mathematical transformation is applied (e.g., background subtraction or energy calibration), the library relies on the model_copy(deep=True) pattern provided by Pydantic. This ensures that the original numpy.ndarray memory buffer is completely cloned before the background is subtracted from the intensity values, maintaining a pristine record of the experimental data in memory.

Data Flow Pipeline

The software architecture is modular, separating parsing, preprocessing, analysis, and data export into distinct, testable modules.

Test-Driven Correctness

To validate this rigid architecture, the project utilizes an exhaustive test suite driven by pytest:

93% Test Coverage across the core scientific modules (Data Loader, Preprocessing, Analysis, Export, and Reference Data).
355 Unit Tests specifically targeting error states, boundaries of the physical models (such as testing the Shirley integral convergence with highly noisy synthetic data), and verifying Pydantic validation hooks.

This design philosophy guarantees that the computational pipeline enforces the strict boundaries of physical reality before initiating any numerical optimization.

Technical Stack

Resources

Software Architecture & Type Safety

The Pydantic Safety Net

Immutability Pattern

Data Flow Pipeline

Test-Driven Correctness