clophfit.fitting.diagnostics#

Well-quality diagnostics for plate-reader titration data.

Two complementary entry points:

  • detect_bad_wells_from_dat() — reads raw .dat files (one per well, all labels together). No fitting required; works before the fitting pipeline. Detects outliers based on robust trendline between signal span and maximum signal.

  • detect_bad_wells() — reads ffit*.csv fit results (one label per file). Adds fit-quality criteria (K at bound, K outlier, poor fit) on top of the signal-quality checks.

Detection criteria#

  • K at bound : K equals the optimizer bound (default 3 or 11 for pH). Fit converged to a limit, not a true optimum.

  • K outlier : |K - median_K| > k_mad_factor * MAD(K) across all

    wells on the plate. Identifies wells with biologically implausible K.

  • Poor fit : sK / K > max_sk_ratio. Relative uncertainty so large that K is undetermined.

  • Low signal / Flat curve : Outlier detection based on robust Theil-Sen regression between max signal and dynamic range. A well is flagged if its signal span is too low compared to the trend, or if its max signal is significantly below the plate median.

  • Inverted curve : S0 > S1 for pH or S0 < S1 for Cl – wrong polarity. Only checked in detect_bad_wells() (requires fitted plateaus).

  • High residuals : per-well residual MAD > residual_mad_factor times the plate median MAD. Requires the optional residual_stats DataFrame.

Functions#

detect_bad_wells_from_dat(data_dir, *[, z_threshold, ...])

Flag unreliable wells by reading raw .dat titration files.

detect_bad_wells(ffit, *[, k_min, k_max, ...])

Flag unreliable wells from a ffit result DataFrame.

Module Contents#

clophfit.fitting.diagnostics.detect_bad_wells_from_dat(data_dir, *, z_threshold=3.0, ctr_cols=None)#

Flag unreliable wells by reading raw .dat titration files.

Reads every *.dat file in data_dir (one per well). Each file must have an x column and one or more signal columns (e.g. y1, y2). All labels are checked together — no fitting is required.

Parameters:
  • data_dir (str | Path) – Directory containing *.dat files (one per well, CSV format with columns x, y1[, y2, ...]).

  • z_threshold (float) – Z-score threshold for outlier detection on the max-vs-span trendline.

  • ctr_cols (list[int] | None) – 1-based column numbers for control wells (e.g. [1, 12]). Currently used only for logging; all flags apply equally to CTR wells.

Returns:

One row per well with columns:

  • well

  • flag_low_signal

  • flag_flat_curve

  • flag_any

  • flag_count

Sorted by descending flag_count.

Return type:

pd.DataFrame

Raises:

FileNotFoundError – If no *.dat files are found in data_dir.

clophfit.fitting.diagnostics.detect_bad_wells(ffit, *, k_min=3.0, k_max=11.0, k_mad_factor=5.0, max_sk_ratio=0.3, z_threshold=3.0, check_polarity=True, is_ph=True, ctr_cols=None, residual_stats=None, residual_mad_factor=5.0)#

Flag unreliable wells from a ffit result DataFrame.

Parameters:
  • ffit (pd.DataFrame) – Per-well fit results with at minimum columns well, K, sK and at least one pair of S0_{lbl} / S1_{lbl} columns.

  • k_min (float) – Lower optimizer bound for K (default 3.0 for pH).

  • k_max (float) – Upper optimizer bound for K (default 11.0 for pH).

  • k_mad_factor (float) – Outlier threshold: flag if |K - median| > k_mad_factor * MAD.

  • max_sk_ratio (float) – Maximum tolerated relative uncertainty sK/K (default 0.30).

  • z_threshold (float) – Z-score threshold for outlier detection on the max-vs-span trendline.

  • check_polarity (bool) – If True, flag wells where the signal direction is inverted relative to the expected biological response.

  • is_ph (bool) – If True (default), pH assay: expect S1 > S0 (signal rises with pH). If False, Cl assay: expect S0 > S1.

  • ctr_cols (list[int] | None) – Column numbers (1-based, e.g. [1, 12]) reserved for control wells.

  • residual_stats (pd.DataFrame | None) – Optional DataFrame from residual_stats_*.csv.

  • residual_mad_factor (float) – Flag if per-well residual MAD > residual_mad_factor times the plate median MAD (default 5.0).

Returns:

One row per well with boolean flag columns.

Return type:

pd.DataFrame