clophfit.fitting.diagnostics ============================ .. py:module:: clophfit.fitting.diagnostics .. autoapi-nested-parse:: Well-quality diagnostics for plate-reader titration data. Two complementary entry points: - :func:`detect_bad_wells_from_dat` — reads raw ``.dat`` files (one per well, all labels together). No fitting required; works before the fitting pipeline. Detects outliers based on robust trendline between signal span and maximum signal. - :func:`detect_bad_wells` — reads ``ffit*.csv`` fit results (one label per file). Adds fit-quality criteria (K at bound, K outlier, poor fit) on top of the signal-quality checks. Detection criteria ------------------ - **K at bound** : K equals the optimizer bound (default 3 or 11 for pH). Fit converged to a limit, not a true optimum. - **K outlier** : \|K - median_K\| > ``k_mad_factor * MAD(K)`` across all wells on the plate. Identifies wells with biologically implausible K. - **Poor fit** : sK / K > ``max_sk_ratio``. Relative uncertainty so large that K is undetermined. - **Low signal / Flat curve** : Outlier detection based on robust Theil-Sen regression between max signal and dynamic range. A well is flagged if its signal span is too low compared to the trend, or if its max signal is significantly below the plate median. - **Inverted curve** : S0 > S1 for pH or S0 < S1 for Cl -- wrong polarity. Only checked in :func:`detect_bad_wells` (requires fitted plateaus). - **High residuals** : per-well residual MAD > ``residual_mad_factor`` times the plate median MAD. Requires the optional ``residual_stats`` DataFrame. Functions --------- .. autoapisummary:: clophfit.fitting.diagnostics.detect_bad_wells_from_dat clophfit.fitting.diagnostics.detect_bad_wells Module Contents --------------- .. py:function:: detect_bad_wells_from_dat(data_dir, *, z_threshold = 3.0, ctr_cols = None) Flag unreliable wells by reading raw ``.dat`` titration files. Reads every ``*.dat`` file in *data_dir* (one per well). Each file must have an ``x`` column and one or more signal columns (e.g. ``y1``, ``y2``). All labels are checked together — no fitting is required. :param data_dir: Directory containing ``*.dat`` files (one per well, CSV format with columns ``x, y1[, y2, ...]``). :type data_dir: str | Path :param z_threshold: Z-score threshold for outlier detection on the max-vs-span trendline. :type z_threshold: float :param ctr_cols: 1-based column numbers for control wells (e.g. ``[1, 12]``). Currently used only for logging; all flags apply equally to CTR wells. :type ctr_cols: list[int] | None :returns: One row per well with columns: - ``well`` - ``flag_low_signal`` - ``flag_flat_curve`` - ``flag_any`` - ``flag_count`` Sorted by descending ``flag_count``. :rtype: pd.DataFrame :raises FileNotFoundError: If no ``*.dat`` files are found in *data_dir*. .. py:function:: detect_bad_wells(ffit, *, k_min = 3.0, k_max = 11.0, k_mad_factor = 5.0, max_sk_ratio = 0.3, z_threshold = 3.0, check_polarity = True, is_ph = True, ctr_cols = None, residual_stats = None, residual_mad_factor = 5.0) Flag unreliable wells from a ffit result DataFrame. :param ffit: Per-well fit results with at minimum columns ``well``, ``K``, ``sK`` and at least one pair of ``S0_{lbl}`` / ``S1_{lbl}`` columns. :type ffit: pd.DataFrame :param k_min: Lower optimizer bound for K (default 3.0 for pH). :type k_min: float :param k_max: Upper optimizer bound for K (default 11.0 for pH). :type k_max: float :param k_mad_factor: Outlier threshold: flag if ``|K - median| > k_mad_factor * MAD``. :type k_mad_factor: float :param max_sk_ratio: Maximum tolerated relative uncertainty sK/K (default 0.30). :type max_sk_ratio: float :param z_threshold: Z-score threshold for outlier detection on the max-vs-span trendline. :type z_threshold: float :param check_polarity: If True, flag wells where the signal direction is inverted relative to the expected biological response. :type check_polarity: bool :param is_ph: If True (default), pH assay: expect S1 > S0 (signal rises with pH). If False, Cl assay: expect S0 > S1. :type is_ph: bool :param ctr_cols: Column numbers (1-based, e.g. ``[1, 12]``) reserved for control wells. :type ctr_cols: list[int] | None :param residual_stats: Optional DataFrame from ``residual_stats_*.csv``. :type residual_stats: pd.DataFrame | None :param residual_mad_factor: Flag if per-well residual MAD > ``residual_mad_factor`` times the plate median MAD (default 5.0). :type residual_mad_factor: float :returns: One row per well with boolean flag columns. :rtype: pd.DataFrame