clophfit.fitting.diagnostics
============================

.. py:module:: clophfit.fitting.diagnostics

.. autoapi-nested-parse::

   Well-quality diagnostics for plate-reader titration data.

   Two complementary entry points:

   - :func:`detect_bad_wells_from_dat` — reads raw ``.dat`` files (one per well,
     all labels together).  No fitting required; works before the fitting pipeline.
     Detects outliers based on robust trendline between signal span and maximum signal.

   - :func:`detect_bad_wells` — reads ``ffit*.csv`` fit results (one label per
     file).  Adds fit-quality criteria (K at bound, K outlier, poor fit) on top
     of the signal-quality checks.

   Detection criteria
   ------------------
   - **K at bound** : K equals the optimizer bound (default 3 or 11 for pH).
     Fit converged to a limit, not a true optimum.
   - **K outlier** : \|K - median_K\| > ``k_mad_factor * MAD(K)`` across all

     wells on the plate.  Identifies wells with biologically implausible K.
   - **Poor fit** : sK / K > ``max_sk_ratio``.  Relative uncertainty so large
     that K is undetermined.
   - **Low signal / Flat curve** : Outlier detection based on robust Theil-Sen regression
     between max signal and dynamic range. A well is flagged if its signal span is too low
     compared to the trend, or if its max signal is significantly below the plate median.
   - **Inverted curve** : S0 > S1 for pH or S0 < S1 for Cl -- wrong polarity.
     Only checked in :func:`detect_bad_wells` (requires fitted plateaus).
   - **High residuals** : per-well residual MAD > ``residual_mad_factor`` times
     the plate median MAD.  Requires the optional ``residual_stats`` DataFrame.


Functions
---------

.. autoapisummary::

   clophfit.fitting.diagnostics.detect_bad_wells_from_dat
   clophfit.fitting.diagnostics.detect_bad_wells


Module Contents
---------------

.. py:function:: detect_bad_wells_from_dat(data_dir, *, z_threshold = 3.0, ctr_cols = None)

   Flag unreliable wells by reading raw ``.dat`` titration files.

   Reads every ``*.dat`` file in *data_dir* (one per well).  Each file must
   have an ``x`` column and one or more signal columns (e.g. ``y1``, ``y2``).
   All labels are checked together — no fitting is required.

   :param data_dir: Directory containing ``*.dat`` files (one per well, CSV format with
                    columns ``x, y1[, y2, ...]``).
   :type data_dir: str | Path
   :param z_threshold: Z-score threshold for outlier detection on the max-vs-span trendline.
   :type z_threshold: float
   :param ctr_cols: 1-based column numbers for control wells (e.g. ``[1, 12]``).
                    Currently used only for logging; all flags apply equally to CTR wells.
   :type ctr_cols: list[int] | None

   :returns: One row per well with columns:

             - ``well``
             - ``flag_low_signal``
             - ``flag_flat_curve``
             - ``flag_any``
             - ``flag_count``

             Sorted by descending ``flag_count``.
   :rtype: pd.DataFrame

   :raises FileNotFoundError: If no ``*.dat`` files are found in *data_dir*.


.. py:function:: detect_bad_wells(ffit, *, k_min = 3.0, k_max = 11.0, k_mad_factor = 5.0, max_sk_ratio = 0.3, z_threshold = 3.0, check_polarity = True, is_ph = True, ctr_cols = None, residual_stats = None, residual_mad_factor = 5.0)

   Flag unreliable wells from a ffit result DataFrame.

   :param ffit: Per-well fit results with at minimum columns ``well``, ``K``, ``sK``
                and at least one pair of ``S0_{lbl}`` / ``S1_{lbl}`` columns.
   :type ffit: pd.DataFrame
   :param k_min: Lower optimizer bound for K (default 3.0 for pH).
   :type k_min: float
   :param k_max: Upper optimizer bound for K (default 11.0 for pH).
   :type k_max: float
   :param k_mad_factor: Outlier threshold: flag if ``|K - median| > k_mad_factor * MAD``.
   :type k_mad_factor: float
   :param max_sk_ratio: Maximum tolerated relative uncertainty sK/K (default 0.30).
   :type max_sk_ratio: float
   :param z_threshold: Z-score threshold for outlier detection on the max-vs-span trendline.
   :type z_threshold: float
   :param check_polarity: If True, flag wells where the signal direction is inverted relative
                          to the expected biological response.
   :type check_polarity: bool
   :param is_ph: If True (default), pH assay: expect S1 > S0 (signal rises with pH).
                 If False, Cl assay: expect S0 > S1.
   :type is_ph: bool
   :param ctr_cols: Column numbers (1-based, e.g. ``[1, 12]``) reserved for control wells.
   :type ctr_cols: list[int] | None
   :param residual_stats: Optional DataFrame from ``residual_stats_*.csv``.
   :type residual_stats: pd.DataFrame | None
   :param residual_mad_factor: Flag if per-well residual MAD > ``residual_mad_factor`` times the
                               plate median MAD (default 5.0).
   :type residual_mad_factor: float

   :returns: One row per well with boolean flag columns.
   :rtype: pd.DataFrame