clophfit.fitting.diagnostics#
Well-quality diagnostics for plate-reader titration data.
Two complementary entry points:
detect_bad_wells_from_dat()— reads raw.datfiles (one per well, all labels together). No fitting required; works before the fitting pipeline. Detects outliers based on robust trendline between signal span and maximum signal.detect_bad_wells()— readsffit*.csvfit results (one label per file). Adds fit-quality criteria (K at bound, K outlier, poor fit) on top of the signal-quality checks.
Detection criteria#
K at bound : K equals the optimizer bound (default 3 or 11 for pH). Fit converged to a limit, not a true optimum.
K outlier : |K - median_K| >
k_mad_factor * MAD(K)across allwells on the plate. Identifies wells with biologically implausible K.
Poor fit : sK / K >
max_sk_ratio. Relative uncertainty so large that K is undetermined.Low signal / Flat curve : Outlier detection based on robust Theil-Sen regression between max signal and dynamic range. A well is flagged if its signal span is too low compared to the trend, or if its max signal is significantly below the plate median.
Inverted curve : S0 > S1 for pH or S0 < S1 for Cl – wrong polarity. Only checked in
detect_bad_wells()(requires fitted plateaus).High residuals : per-well residual MAD >
residual_mad_factortimes the plate median MAD. Requires the optionalresidual_statsDataFrame.
Functions#
|
Flag unreliable wells by reading raw |
|
Flag unreliable wells from a ffit result DataFrame. |
Module Contents#
- clophfit.fitting.diagnostics.detect_bad_wells_from_dat(data_dir, *, z_threshold=3.0, ctr_cols=None)#
Flag unreliable wells by reading raw
.dattitration files.Reads every
*.datfile in data_dir (one per well). Each file must have anxcolumn and one or more signal columns (e.g.y1,y2). All labels are checked together — no fitting is required.- Parameters:
data_dir (str | Path) – Directory containing
*.datfiles (one per well, CSV format with columnsx, y1[, y2, ...]).z_threshold (float) – Z-score threshold for outlier detection on the max-vs-span trendline.
ctr_cols (list[int] | None) – 1-based column numbers for control wells (e.g.
[1, 12]). Currently used only for logging; all flags apply equally to CTR wells.
- Returns:
One row per well with columns:
wellflag_low_signalflag_flat_curveflag_anyflag_count
Sorted by descending
flag_count.- Return type:
pd.DataFrame
- Raises:
FileNotFoundError – If no
*.datfiles are found in data_dir.
- clophfit.fitting.diagnostics.detect_bad_wells(ffit, *, k_min=3.0, k_max=11.0, k_mad_factor=5.0, max_sk_ratio=0.3, z_threshold=3.0, check_polarity=True, is_ph=True, ctr_cols=None, residual_stats=None, residual_mad_factor=5.0)#
Flag unreliable wells from a ffit result DataFrame.
- Parameters:
ffit (pd.DataFrame) – Per-well fit results with at minimum columns
well,K,sKand at least one pair ofS0_{lbl}/S1_{lbl}columns.k_min (float) – Lower optimizer bound for K (default 3.0 for pH).
k_max (float) – Upper optimizer bound for K (default 11.0 for pH).
k_mad_factor (float) – Outlier threshold: flag if
|K - median| > k_mad_factor * MAD.max_sk_ratio (float) – Maximum tolerated relative uncertainty sK/K (default 0.30).
z_threshold (float) – Z-score threshold for outlier detection on the max-vs-span trendline.
check_polarity (bool) – If True, flag wells where the signal direction is inverted relative to the expected biological response.
is_ph (bool) – If True (default), pH assay: expect S1 > S0 (signal rises with pH). If False, Cl assay: expect S0 > S1.
ctr_cols (list[int] | None) – Column numbers (1-based, e.g.
[1, 12]) reserved for control wells.residual_stats (pd.DataFrame | None) – Optional DataFrame from
residual_stats_*.csv.residual_mad_factor (float) – Flag if per-well residual MAD >
residual_mad_factortimes the plate median MAD (default 5.0).
- Returns:
One row per well with boolean flag columns.
- Return type:
pd.DataFrame