Core Analytical Features & Capabilities

Functional Overview

The XPS Analyzer package provides a complete, programmatically accessible pipeline for processing X-ray Photoelectron Spectroscopy datasets. The library abstracts the repetitive numerical operations required in surface science, enforcing strict mathematical and physical constraints at every step of the analytical workflow.

Below is a technical detailing of the core functionalities implemented in the library.

1. Automated Data Parsing & Ingestion

Spectrometer output files are often generated in proprietary, mixed-text formats containing both instrumental metadata and numerical arrays.

The data_loader module implements a custom parser that reads semicolon-delimited textual exports. It utilizes a multi-criteria detection algorithm to differentiate between broad “Survey” scans and high-resolution “Multiplex” regional scans.

Upon parsing, the module extracts instrumental parameters (e.g., pass energy, dwell time, X-ray source) and instantiates XPSSpectrum and XPSDataset Pydantic models. This ensures that down-stream functions receive strictly typed numpy.ndarray objects rather than raw string representations.

from xps_analyzer.data_loader import load_single_file

# The parser automatically detects the format (Survey vs. Multiplex)
# and populates the dataset.header dictionary with instrumental metadata.
dataset = load_single_file("data/raw/BN-SET-01/BN-BS-3/BN-BS-3 MULTIPLEX.txt")

# Retrieve a specific region as an explicitly typed XPSSpectrum object
ti2p_spectrum = dataset.get_spectrum("Ti 2p")

2. Spectroscopic Energy Calibration

Due to electrostatic charging of non-conductive samples during X-ray irradiation, the entire kinetic energy spectrum often shifts, requiring post-acquisition recalibration.

The preprocessing module provides the calibrate_dataset function, which accepts a reference element and its theoretical binding energy. The algorithm:

Locates the specified reference region within the dataset.
Identifies the binding energy corresponding to the maximum intensity peak ( $E_{obs}$ ).
Calculates the required shift: $\Delta E = E_{ref} - E_{obs}$ .
Applies this uniform scalar shift to the binding_energy arrays of all spectra within the XPSDataset.

from xps_analyzer.preprocessing.calibration import calibrate_dataset

# Calibrate the entire dataset using an internal reference peak.
# By setting inplace=False (default), it returns a new deeply copied dataset.
calibrated_dataset = calibrate_dataset(
    dataset, 
    reference_element="O", 
    reference_energy=530.0  # Theoretical lattice oxygen binding energy
)

3. Algorithmic Background Subtraction

The analysis.background module exposes three discrete mathematical models for the removal of the inelastic scattering tail:

Shirley Background: An iterative numerical integral evaluated against a convergence tolerance (tol=1e-5). Recommended for transition metals exhibiting sharp, step-like inelastic tails.
Tougaard Background: Computes the background using a universal inelastic scattering cross-section. The function accepts the empirical constants $B, C,$ and $D$ , defaulting to the universal parameters for transition metals ( $B=2866, C=1643$ ).
Linear Background: Computes a simple secant line between the integration bounds, utilized primarily for flat spectral regions with low signal-to-noise ratios.

All background functions append their computed arrays into the XPSSpectrum.metadata dictionary, preserving the raw intensity array for auditing.

4. Non-Linear Peak Deconvolution

The analysis.peak_fitting module provides the mathematical engine for isolating overlapping electronic states.

It exposes functions for modeling individual profiles (fit_gaussian, fit_lorentzian, fit_voigt) and a generalized fit_multiple_peaks function for complex multiplets. The module utilizes the Levenberg-Marquardt algorithm via scipy.optimize.curve_fit to minimize the residual sum of squares ( $\chi^2$ ).

The optimization routine handles:

Automatic Parameter Estimation: Derives initial guesses for peak centroids ( $E_0$ ), amplitudes, and FWHM by analyzing the second derivative of the smoothed data.
Spin-Orbit Doublet Constraints: When fitting elements like Ti, Sr, or Bi, the algorithm can enforce strict theoretical constraints on the energy splitting ( $\Delta E$ ) and the intensity ratios (e.g., 2:1 for $p$ -orbitals).

The function returns a strictly typed FitResult model containing the optimized PeakParameters, the calculated residual array, and statistical goodness-of-fit metrics ( $R^2$ , reduced $\chi^2$ ).

5. Empirical Atomic Quantification

The area integrated under a deconvoluted peak is not an absolute measure of atomic concentration; it must be normalized against the probability of photoemission.

The analysis.quantification module integrates databases of Relative Sensitivity Factors (RSF). It supports:

Scofield Theoretical Cross-Sections (1976): Tabulated for both Mg K $\alpha$ and Al K $\alpha$ X-ray sources across 89 elements.
Wagner Empirical Factors (1981): Derived experimentally, available for 18 common elements.

The calculate_atomic_concentration function takes a list of PeakParameters and computes the normalized fractional composition, providing the final quantitative output of the XPS analysis pipeline.

from xps_analyzer.analysis.quantification import (
    load_sensitivity_factors, 
    calculate_atomic_concentration
)

# Load empirical sensitivity factors for an Mg Ka X-ray source
rsf_db = load_sensitivity_factors(source="scofield", xray_source="mg_ka")

# Compute the fractional atomic concentration from the integrated peak areas
concentrations = calculate_atomic_concentration(
    peaks=[ti_peak, o_peak, sr_peak],
    sensitivity_factors=rsf_db,
    element_names=["Ti 2p", "O 1s", "Sr 3d"]
)

6. Data Serialization and Export

To interface with external statistical tools or publication pipelines, the export.exporters module provides serialization routines for both isolated XPSSpectrum objects and complete XPSDataset collections.

JSON Serialization: Utilizes a custom NumpyEncoder that safely casts numpy.ndarray structures into standard JSON arrays, translating NaN and Inf values to strict null equivalents, preserving the complete nested structure of the analysis metadata.
CSV/Excel Export: Unrolls the hierarchical data into strictly typed tabular formats using the pandas underlying engine. When exporting a full dataset to Excel, the algorithm dynamically maps each spectral region to an isolated workbook sheet, ensuring compatibility with standard laboratory workflows.

Technical Stack

Resources