Fit titration with multiple ts
------------------------------

For example data collected with multiple labelblocks in Tecan plate reader.

“A01”: pH titration with y1 and y2.

.. code:: python

    df = pd.read_csv('../../tests/data/A01.dat', sep=' ', names=['x', 'y1', 'y2'])
    df = df[::-1].reset_index(drop=True)
    df

.. table::

    +---+----------+---------+---------+
    | \ |        x |      y1 |      y2 |
    +===+==========+=========+=========+
    | 0 | 9.030000 | 29657.0 | 22885.0 |
    +---+----------+---------+---------+
    | 1 | 8.373333 | 35200.0 | 16930.0 |
    +---+----------+---------+---------+
    | 2 | 7.750000 | 44901.0 |  9218.0 |
    +---+----------+---------+---------+
    | 3 | 7.073333 | 53063.0 |  3758.0 |
    +---+----------+---------+---------+
    | 4 | 6.460000 | 54202.0 |  2101.0 |
    +---+----------+---------+---------+
    | 5 | 5.813333 | 54851.0 |  1542.0 |
    +---+----------+---------+---------+
    | 6 | 4.996667 | 51205.0 |  1358.0 |
    +---+----------+---------+---------+

lmfit of single y1 using analytical Jacobian
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

It computes the Jacobian of the fz. Mind that the residual (i.e. y - fz) will be actually minimized.

.. code:: python

    import sympy
    x, S0_1, S1_1, K = sympy.symbols('x S0_1 S1_1 K')
    f = (S0_1 + S1_1 * 10 ** (K - x)) / (1 + 10 ** (K - x))
    print(sympy.diff(f, S0_1))
    print(sympy.diff(f, S1_1))
    print(sympy.diff(f, K))

::

    1/(10**(K - x) + 1)
    10**(K - x)/(10**(K - x) + 1)
    10**(K - x)*S1_1*log(10)/(10**(K - x) + 1) - 10**(K - x)*(10**(K - x)*S1_1 + S0_1)*log(10)/(10**(K - x) + 1)**2


.. code:: python

    f2 = (S0_1 + S1_1 * x / K) / (1 + x / K)
    print(sympy.diff(f2, S0_1))
    print(sympy.diff(f2, S1_1))
    print(sympy.diff(f2, K))

::

    1/(1 + x/K)
    x/(K*(1 + x/K))
    -S1_1*x/(K**2*(1 + x/K)) + x*(S0_1 + S1_1*x/K)/(K**2*(1 + x/K)**2)


.. code:: python

    def residual(pars, x, data):
        S0 =  pars['S0']
        S1 =  pars['S1']
        K = pars['K']
        #model = (S0 + S1 * x / Kd) / (1 + x / Kd)
        x = np.array(x)
        y = np.array(data)
        model = (S0 + S1 * 10 ** (K - x)) / (1 + 10 ** (K - x))
        if data is None:
            return model
        return (y - model)

    # Try Jacobian
    def dfunc(pars, x, data=None):
        S0_1 =  pars['S0']
        S1_1 =  pars['S1']
        K = pars['K']
        kx = np.array(10**(K - x))
        return np.array([-1 / (kx + 1),
                         -kx / (kx + 1),
                         -kx * np.log(10) * (S1_1 / (kx + 1) - (kx * S1_1 + S0_1) / (kx + 1)**2)])
                         # kx * S1_1 * np.log(10) / (kx + 1) - kx * (kx * S1_1 + S0_1) * np.log(10) / (kx + 1)**2])

    params = lmfit.Parameters()
    params.add('S0', value=25000, min=0.0)
    params.add('S1', value=50000, min=0.0)
    params.add('K', value=7, min=2.0, max=12.0)

    # out = lmfit.minimize(residual, params, args=(df.x,), kws={'data':df.y1})
    # mini = lmfit.Minimizer(residual, params, fcn_args=(df.x, df.y2))
    mini = lmfit.Minimizer(residual, params, fcn_args=(df.x,), fcn_kws={'data':df.y1})
    # res= mini.minimize()
    res= mini.leastsq(Dfun=dfunc, col_deriv=True, ftol=1e-8)

    fit = residual(params, df.x, None)
    print(lmfit.report_fit(res))

    ci = lmfit.conf_interval(mini, res, sigmas=[1, 2, 3])
    lmfit.printfuncs.report_ci(ci)

::

    [[Fit Statistics]]
        # fitting method   = leastsq
        # function evals   = 9
        # data points      = 7
        # variables        = 3
        chi-square         = 12308015.2
        reduced chi-square = 3077003.79
        Akaike info crit   = 106.658958
        Bayesian info crit = 106.496688
    [[Variables]]
        S0:  26638.8377 +/- 2455.91825 (9.22%) (init = 25000)
        S1:  54043.3592 +/- 979.995977 (1.81%) (init = 50000)
        K:   8.06961091 +/- 0.14940678 (1.85%) (init = 7)
    [[Correlations]] (unreported correlations are < 0.100)
        C(S0, K)  = -0.775
        C(S1, K)  = -0.455
        C(S0, S1) = 0.205
    None
    /home/dan/workspace/ClopHfit/.hatch/clophfit/lib/python3.10/site-packages/lmfit/confidence.py:317: UserWarning: Bound reached with prob(S0=0.0) = 0.9944737517813054 < max(sigmas)
      warn(errmsg)
           99.73%    95.45%    68.27%    _BEST_    68.27%    95.45%    99.73%
     S0:      -inf-8376.38956-2895.5618126638.83771+2558.77424+5999.31275+12360.60692
     S1:-6192.81418-2734.30623-1098.2204254043.35921+1113.18257+2829.54298+6725.37841
     K :  -0.98139  -0.40197  -0.15949   8.06961  +0.16276  +0.42591  +1.17272

.. code:: python

    print(lmfit.ci_report(ci, with_offset=False, ndigits=2))

::

       99.73% 95.45% 68.27% _BEST_ 68.27% 95.45% 99.73%
    S0:   -inf18262.4523743.2826638.8429197.6132638.1538999.44
    S1:47850.5551309.0552945.1454043.3655156.5456872.9060768.74
    K :   7.09   7.67   7.91   8.07   8.23   8.50   9.24

emcee
^^^^^

.. code:: python

    res.params.add('__lnsigma', value=np.log(.1), min=np.log(0.001), max=np.log(1e4))
    resMC = lmfit.minimize(residual, method='emcee', steps=3000,
                            nan_policy='omit', is_weighted=False, burn=300, thin=1,
                           params=res.params, args=(df.x, df.y1), progress=True)

::

    100% 3000/3000 [00:36<00:00, 81.99it/s]
    The chain is shorter than 50 times the integrated autocorrelation time for 4 parameter(s). Use this estimate with caution and run a longer chain!
    N/50 = 60;
    tau: [ 89.90422082 126.15518337 114.08544671  82.49207071]


.. code:: python

    plt.plot(resMC.acceptance_fraction, 'o')
    plt.xlabel('walker')
    plt.ylabel('acceptance frac')

.. image:: ../_static/emcee-01.png

.. code:: python

    import corner

    tr = [v for v in resMC.params.valuesdict().values()]
    emcee_plot = corner.corner(resMC.flatchain, labels=resMC.var_names,
                                truths=list(resMC.params.valuesdict().values()))
                                # truths=tr[:-1])

::

    WARNING:root:Too few points to create valid contours
    WARNING:root:Too few points to create valid contours
    WARNING:root:Too few points to create valid contours

.. image:: ../_static/emcee-02.png

global
~~~~~~

I believe I was using scipy.optimize.

using lmfit with np.r\_ trick
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code:: python

    # %%timeit #62ms
    def residual2(pars, x, data=None):
        K = pars['K']
        S0_1 =  pars['S0_1']
        S1_1 =  pars['S1_1']
        S0_2 =  pars['S0_2']
        S1_2 =  pars['S1_2']
        model_0 = (S0_1 + S1_1 * 10 ** (K - x[0])) / (1 + 10 ** (K - x[0]))
        model_1 = (S0_2 + S1_2 * 10 ** (K - x[1])) / (1 + 10 ** (K - x[1]))
        if data is None:
            return np.r_[model_0, model_1]
        return np.r_[data[0] - model_0, data[1] - model_1]


    params2 = lmfit.Parameters()
    params2.add('K', value=7.0, min=2.0, max=12.0)
    params2.add('S0_1', value=df.y1[0], min=0.0)
    params2.add('S0_2', value=df.y2[0], min=0.0)
    params2.add('S1_1', value=df.y1.iloc[-1], min=0.0)
    params2.add('S1_2', value=df.y2.iloc[-1], min=0.0)
    mini2 = lmfit.Minimizer(residual2, params2, fcn_args=([df.x, df.x],), fcn_kws={'data': [df.y1, df.y2]})
    res2 = mini2.minimize()
    print(lmfit.fit_report(res2))

    ci2, tr2 = lmfit.conf_interval(mini2, res2, sigmas=[.68, .95], trace=True)
    print(lmfit.ci_report(ci2, with_offset=False, ndigits=2))

::

    [[Fit Statistics]]
        # fitting method   = leastsq
        # function evals   = 37
        # data points      = 14
        # variables        = 5
        chi-square         = 12471473.3
        reduced chi-square = 1385719.25
        Akaike info crit   = 201.798560
        Bayesian info crit = 204.993846
    [[Variables]]
        K:     8.07255057 +/- 0.07600744 (0.94%) (init = 7)
        S0_1:  26601.3422 +/- 1425.69369 (5.36%) (init = 29657)
        S0_2:  25084.4220 +/- 1337.07555 (5.33%) (init = 22885)
        S1_1:  54034.5797 +/- 627.642878 (1.16%) (init = 51205)
        S1_2:  1473.57942 +/- 616.944953 (41.87%) (init = 1358)
    [[Correlations]] (unreported correlations are < 0.100)
        C(K, S0_1)    = -0.682
        C(K, S0_2)    = 0.626
        C(S0_1, S0_2) = -0.426
        C(K, S1_1)    = -0.361
        C(K, S1_2)    = 0.316
        C(S0_2, S1_1) = -0.226
        C(S0_1, S1_2) = -0.215
        C(S1_1, S1_2) = -0.114
          95.00% 68.00% _BEST_ 68.00% 95.00%
     K   :   7.91   7.99   8.07   8.15   8.24
     S0_1:23210.9025078.6226601.3428045.4929623.53
     S0_2:22232.9723723.9425084.4226514.8828263.75
     S1_1:52629.0453378.2454034.5854695.2655460.17
     S1_2:  72.04 824.011473.582118.982855.89

.. code:: python

    xfit = np.linspace(df.x.min(), df.x.max(), 100)
    yfit0 = residual2(params2, [xfit, xfit])
    yfit0 = yfit0.reshape(2, 100)
    yfit = residual2(res2.params, [xfit, xfit])
    yfit = yfit.reshape(2, 100)
    plt.plot(df.x, df.y1, 'o', df.x, df.y2, 's', xfit, yfit[0], '-', xfit, yfit[1], '-', xfit, yfit0[0], '--', xfit, yfit0[1], '--')
    plt.grid(True)

.. image:: ../_static/glmfit_np.r_.png

lmfit constraints aiming for generality
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I believe a name convention would be more robust than relying on OrderedDict Params object.

.. code:: python

    "S0_1".split("_")[0]

::

    S0


.. code:: python

    def exception_fcn_handler(func):
        def inner_function(*args, **kwargs):
            try:
                return func(*args, **kwargs)
            except TypeError:
                print(f"{func.__name__} only takes (1D) vector as argument besides lmfit.Parameters.")
        return inner_function

    @exception_fcn_handler
    def titration_pH(params, pH):
        p = {k.split("_")[0]: v for k, v in params.items()}
        return (p["S0"] + p["S1"] * 10 ** (p["K"] - pH)) / (1 + 10 ** (p["K"] - pH))

    def residues(params, x, y, fcn):
        return y - fcn(params, x)


    p1 = lmfit.Parameters()
    p2 = lmfit.Parameters()
    p1.add("K_1", value=7., min=2.0, max=12.0)
    p2.add("K_2", value=7., min=2.0, max=12.0)
    p1.add("S0_1", value=df.y1.iloc[0], min=0.0)
    p2.add("S0_2", value=df.y2.iloc[0], min=0.0)
    p1.add("S1_1", value=df.y1.iloc[-1], min=0.0)
    p2.add("S1_2", value=df.y2.iloc[-1], min=0.0)

    print(residues(p1, np.array(df.x), [1.97, 1.8, 1.7, 0.1, 0.1, .16, .01], titration_pH))

    def gobjective(params, xl, yl, fcnl):
        nset = len(xl)
        res = []
        for i in range(nset):
            pi = {k: v for k, v in params.valuesdict().items() if k[-1]==f"{i+1}"}
            res = np.r_[res, residues(pi, xl[i], yl[i], fcnl[i])]
            # res = np.r_[res, yl[i] - fcnl[i](parsl[i], x[i])]
        return res

    print(gobjective(p1+p2, [df.x, df.x], [df.y1, df.y2], [titration_pH, titration_pH]))

::

    [-29854.26823732 -30530.32007939 -32908.60749879 -39523.42660007
     -46381.47878947 -49888.5091843  -50993.25866394]
    [  -199.23823732   4667.87992061  11990.69250121  13539.47339993
       7820.42121053   4962.3308157     211.73133606    199.04406603
      -5080.73278499 -10416.86307191  -9270.08900503  -4075.72045662
      -1131.04796128   -211.52498939]


Here single.

.. code:: python

    mini = lmfit.Minimizer(residues, p1, fcn_args=(df.x, df.y1, titration_pH, ))
    res= mini.minimize()

    fit = titration_pH(res.params, df.x)
    print(lmfit.report_fit(res))
    plt.plot(df.x, df.y1, "o", df.x, fit, "--")
    ci = lmfit.conf_interval(mini, res, sigmas=[1, 2])
    lmfit.printfuncs.report_ci(ci)

::

    [[Fit Statistics]]
        # fitting method   = leastsq
        # function evals   = 25
        # data points      = 7
        # variables        = 3
        chi-square         = 12308015.2
        reduced chi-square = 3077003.79
        Akaike info crit   = 106.658958
        Bayesian info crit = 106.496688
    [[Variables]]
        K_1:   8.06961101 +/- 0.14940677 (1.85%) (init = 7)
        S0_1:  26638.8364 +/- 2455.91626 (9.22%) (init = 29657)
        S1_1:  54043.3589 +/- 979.996129 (1.81%) (init = 51205)
    [[Correlations]] (unreported correlations are < 0.100)
        C(K_1, S0_1)  = -0.775
        C(K_1, S1_1)  = -0.455
        C(S0_1, S1_1) = 0.205
    None
             95.45%    68.27%    _BEST_    68.27%    95.45%
     K_1 :  -0.40197  -0.15949   8.06961  +0.16276  +0.42592
     S0_1:-8376.38827-2895.5605426638.83642+2558.77552+5999.33126
     S1_1:-2734.30657-1098.2200654043.35891+1113.18279+2829.55530

.. image:: ../_static/glmfit0.png

Now global.

.. code:: python

    # %%timeit #66ms
    pg = p1 + p2
    pg['K_2'].expr = 'K_1'
    # gmini = lmfit.Minimizer(gobjective, pg, fcn_args=([df.x[1:], df.x], [df.y1[1:], df.y2], [titration_pH, titration_pH]))
    gmini = lmfit.Minimizer(gobjective, pg, fcn_args=([df.x, df.x], [df.y1, df.y2], [titration_pH, titration_pH]))
    gres= gmini.minimize()
    print(lmfit.fit_report(gres))

    pp1 = {k: v for k, v in gres.params.valuesdict().items() if k.split("_")[1]==f"{1}"}
    pp2 = {k: v for k, v in gres.params.valuesdict().items() if k.split("_")[1]==f"{2}"}
    xfit = np.linspace(df.x.min(), df.x.max(), 100)
    yfit1 = titration_pH(pp1, xfit)
    yfit2 = titration_pH(pp2, xfit)
    plt.plot(df.x, df.y1, "o", xfit, yfit1, "--")
    plt.plot(df.x, df.y2, "s", xfit, yfit2, "--")
    ci = lmfit.conf_interval(gmini, gres, sigmas=[1, 0.95])
    print(lmfit.ci_report(ci, with_offset=False, ndigits=2))

::

    [[Fit Statistics]]
        # fitting method   = leastsq
        # function evals   = 37
        # data points      = 14
        # variables        = 5
        chi-square         = 12471473.3
        reduced chi-square = 1385719.25
        Akaike info crit   = 201.798560
        Bayesian info crit = 204.993846
    [[Variables]]
        K_1:   8.07255057 +/- 0.07600744 (0.94%) (init = 7)
        S0_1:  26601.3422 +/- 1425.69369 (5.36%) (init = 29657)
        S1_1:  54034.5797 +/- 627.642878 (1.16%) (init = 51205)
        K_2:   8.07255057 +/- 0.07600744 (0.94%) == 'K_1'
        S0_2:  25084.4220 +/- 1337.07555 (5.33%) (init = 22885)
        S1_2:  1473.57942 +/- 616.944953 (41.87%) (init = 1358)
    [[Correlations]] (unreported correlations are < 0.100)
        C(K_1, S0_1)  = -0.682
        C(K_1, S0_2)  = 0.626
        C(S0_1, S0_2) = -0.426
        C(K_1, S1_1)  = -0.361
        C(K_1, S1_2)  = 0.316
        C(S1_1, S0_2) = -0.226
        C(S0_1, S1_2) = -0.215
        C(S1_1, S1_2) = -0.114

::

    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    Cell In[79], line 16
         14 plt.plot(df.x, df.y1, "o", xfit, yfit1, "--")
         15 plt.plot(df.x, df.y2, "s", xfit, yfit2, "--")
    ---> 16 ci = lmfit.conf_interval(gmini, gres, sigmas=[1, 0.95])
         17 print(lmfit.ci_report(ci, with_offset=False, ndigits=2))

    File ~/workspace/ClopHfit/.hatch/clophfit/lib/python3.10/site-packages/lmfit/confidence.py:143, in conf_interval(minimizer, result, p_names, sigmas, trace, maxiter, verbose, prob_func)
        139     sigmas = [1, 2, 3]
        141 ci = ConfidenceInterval(minimizer, result, p_names, prob_func, sigmas,
        142                         trace, verbose, maxiter)
    --> 143 output = ci.calc_all_ci()
        144 if trace:
        145     return output, ci.trace_dict

    File ~/workspace/ClopHfit/.hatch/clophfit/lib/python3.10/site-packages/lmfit/confidence.py:218, in ConfidenceInterval.calc_all_ci(self)
        215 out = {}
        217 for p in self.p_names:
    --> 218     out[p] = (self.calc_ci(p, -1)[::-1] +
        219               [(0., self.params[p].value)] +
        220               self.calc_ci(p, 1))
        221 if self.trace:
        222     self.trace_dict = map_trace_to_names(self.trace_dict, self.params)

    File ~/workspace/ClopHfit/.hatch/clophfit/lib/python3.10/site-packages/lmfit/confidence.py:258, in ConfidenceInterval.calc_ci(self, para, direction)
        255     ret.append((prob, direction*np.inf))
        256     continue
    --> 258 sol = root_scalar(calc_prob, method='toms748', bracket=sorted([limit, a_limit]), rtol=.5e-4, args=(prob,))
        259 if sol.converged:
        260     val = sol.root

    File ~/workspace/ClopHfit/.hatch/clophfit/lib/python3.10/site-packages/scipy/optimize/_root_scalar.py:275, in root_scalar(f, args, method, bracket, fprime, fprime2, x0, x1, xtol, rtol, maxiter, options)
        272         raise ValueError('Bracket needed for %s' % method)
        274     a, b = bracket[:2]
    --> 275     r, sol = methodc(f, a, b, args=args, **kwargs)
        276 elif meth in ['secant']:
        277     if x0 is None:

    File ~/workspace/ClopHfit/.hatch/clophfit/lib/python3.10/site-packages/scipy/optimize/_zeros_py.py:1374, in toms748(f, a, b, args, k, xtol, rtol, maxiter, full_output, disp)
       1372     args = (args,)
       1373 solver = TOMS748Solver()
    -> 1374 result = solver.solve(f, a, b, args=args, k=k, xtol=xtol, rtol=rtol,
       1375                       maxiter=maxiter, disp=disp)
       1376 x, function_calls, iterations, flag = result
       1377 return _results_select(full_output, (x, function_calls, iterations, flag))

    File ~/workspace/ClopHfit/.hatch/clophfit/lib/python3.10/site-packages/scipy/optimize/_zeros_py.py:1221, in TOMS748Solver.solve(self, f, a, b, args, xtol, rtol, k, maxiter, disp)
       1219 r"""Solve f(x) = 0 given an interval containing a zero."""
       1220 self.configure(xtol=xtol, rtol=rtol, maxiter=maxiter, disp=disp, k=k)
    -> 1221 status, xn = self.start(f, a, b, args)
       1222 if status == _ECONVERGED:
       1223     return self.get_result(xn)

    File ~/workspace/ClopHfit/.hatch/clophfit/lib/python3.10/site-packages/scipy/optimize/_zeros_py.py:1121, in TOMS748Solver.start(self, f, a, b, args)
       1118     return _ECONVERGED, b
       1120 if np.sign(fb) * np.sign(fa) > 0:
    -> 1121     raise ValueError("a, b must bracket a root f(%e)=%e, f(%e)=%e " %
       1122                      (a, fa, b, fb))
       1123 self.fab[:] = [fa, fb]
       1125 return _EINPROGRESS, sum(self.ab) / 2.0

    ValueError: a, b must bracket a root f(7.844528e+00)=3.045473e-01, f(7.906050e+00)=2.673105e-01

.. image:: ../_static/glmfit1.png

To plot ci for the 5 parameters.

.. code:: python

    fig, axes = plt.subplots(1, 4, figsize=(24.2, 4.8), sharey=True)
    cx, cy, grid = lmfit.conf_interval2d(gmini, gres, 'S0_1', 'K_1', 25, 25)
    ctp = axes[0].contourf(cx, cy, grid, np.linspace(0, 1, 11))
    fig.colorbar(ctp, ax=axes[0])
    axes[0].set_xlabel('SA1')
    axes[0].set_ylabel('pK1')
    cx, cy, grid = lmfit.conf_interval2d(gmini, gres, 'S0_2', 'K_1', 25, 25)
    ctp = axes[1].contourf(cx, cy, grid, np.linspace(0, 1, 11))
    fig.colorbar(ctp, ax=axes[1])
    axes[1].set_xlabel('SA2')
    axes[1].set_ylabel('pK1')
    cx, cy, grid = lmfit.conf_interval2d(gmini, gres, 'S1_1', 'K_1', 25, 25)
    ctp = axes[2].contourf(cx, cy, grid, np.linspace(0, 1, 11))
    fig.colorbar(ctp, ax=axes[2])
    axes[2].set_xlabel('SB1')
    axes[2].set_ylabel('pK1')
    cx, cy, grid = lmfit.conf_interval2d(gmini, gres, 'S1_2', 'K_1', 25, 25)
    ctp = axes[3].contourf(cx, cy, grid, np.linspace(0, 1, 11))
    fig.colorbar(ctp, ax=axes[3])
    axes[3].set_xlabel('SB2')
    axes[3].set_ylabel('pK1')

.. image:: ../_static/glmfit2.png


.. code:: python

    plt.plot(np.r_[df.x, df.x], gres.residual, "o")

.. image:: ../_static/glmfit3.png

emcee
:::::

.. code:: python

    gmini.params.add('__lnsigma', value=np.log(.1), min=np.log(0.001), max=np.log(2))
    gresMC = lmfit.minimize(gobjective, method='emcee', steps=1800, #workers=8,
                            nan_policy='omit', burn=30, is_weighted=False, #thin=20,
                            params=gmini.params, args=([df.x, df.x], [df.y1, df.y2], [titration_pH, titration_pH]), progress=True)

::

    100% 1800/1800 [05:17<00:00,  5.66it/s]
    The chain is shorter than 50 times the integrated autocorrelation time for 5 parameter(s). Use this estimate with caution and run a longer chain!
    N/50 = 36;
    tau: [ 25.20679429  64.86628075  40.01735791  82.79200202 114.97290655
      87.1914766 ]


This next block comes from: `https://lmfit.github.io/lmfit-py/examples/example_emcee_Model_interface.html?highlight=emcee <https://lmfit.github.io/lmfit-py/examples/example_emcee_Model_interface.html?highlight=emcee>`_

.. code:: python

    emcee_kws = dict(steps=5000, burn=500, thin=20, is_weighted=False,)
    emcee_params = gmini.params.copy()
    emcee_params.add('__lnsigma', value=np.log(0.1), min=np.log(0.001), max=np.log(2.0))

    mi = lmfit.Minimizer(gobjective, emcee_params, fcn_args=([df.x, df.x], [df.y1, df.y2], [titration_pH, titration_pH]))

    res_emcee = mi.minimize(method="emcee", steps=500, burn=50, thin=20, is_weighted=False)

::

    100% 500/500 [01:34<00:00,  5.30it/s]
    The chain is shorter than 50 times the integrated autocorrelation time for 6 parameter(s). Use this estimate with caution and run a longer chain!
    N/50 = 10;
    tau: [29.23479726 21.62881729 35.43287521 36.66580039 29.46506312 60.95433911]


.. code:: python

    # result_emcee = model.fit(data=y, x=x, params=emcee_params, method='emcee',
    #                          nan_policy='omit', fit_kws=emcee_kws)

    lmfit.report_fit(res_emcee)

::

    [[Fit Statistics]]
        # fitting method   = emcee
        # function evals   = 50000
        # data points      = 14
        # variables        = 6
        chi-square         = 3126257.04
        reduced chi-square = 390782.130
        Akaike info crit   = 184.428056
        Bayesian info crit = 188.262400
    [[Variables]]
        K_1:        8.07253629 +/- 0.02377606 (0.29%) (init = 8.072551)
        S0_1:       26607.0352 +/- 221.686080 (0.83%) (init = 26601.34)
        S1_1:       54031.8395 +/- 593.213078 (1.10%) (init = 54034.58)
        K_2:        8.07253629 == 'K_1'
        S0_2:       25011.8630 +/- 1294.50486 (5.18%) (init = 25084.42)
        S1_2:       1473.83633 +/- 88.7501077 (6.02%) (init = 1473.579)
        __lnsigma:  0.69209236 +/- 0.11961456 (17.28%) (init = -2.302585)
    [[Correlations]] (unreported correlations are < 0.100)
        C(S1_1, S0_2) = 0.913
        C(S0_1, S0_2) = -0.666
        C(S0_1, S1_1) = -0.632
        C(K_1, S0_1)  = -0.373
        C(K_1, S1_1)  = -0.265
        C(S0_2, S1_2) = -0.239
        C(K_1, S1_2)  = 0.238
        C(S1_1, S1_2) = -0.215
        C(K_1, S0_2)  = -0.201


.. code:: python

    plt.plot(gresMC.acceptance_fraction, 'o')
    plt.xlabel('walker')
    plt.ylabel('acceptance frac')

.. image:: ../_static/emcee-11.png


.. code:: python

    import corner

    tr = [v for v in gresMC.params.valuesdict().values()]
    emcee_plot = corner.corner(gresMC.flatchain, labels=gresMC.var_names,
                                # truths=list(gresMC.params.valuesdict().values()))
                                truths=tr[:-1])

::

    WARNING:root:Too few points to create valid contours
    WARNING:root:Too few points to create valid contours
    WARNING:root:Too few points to create valid contours
    WARNING:root:Too few points to create valid contours
    WARNING:root:Too few points to create valid contours
    WARNING:root:Too few points to create valid contours
    WARNING:root:Too few points to create valid contours
    WARNING:root:Too few points to create valid contours
    WARNING:root:Too few points to create valid contours
    WARNING:root:Too few points to create valid contours
    WARNING:root:Too few points to create valid contours
    WARNING:root:Too few points to create valid contours
    WARNING:root:Too few points to create valid contours
    WARNING:root:Too few points to create valid contours
    WARNING:root:Too few points to create valid contours

.. image:: ../_static/emcee-12.png

.. code:: python

    lmfit.report_fit(gresMC.params)

::

    [[Variables]]
        K_1:        8.07256863 +/- 1.8571e-04 (0.00%) (init = 8.072551)
        S0_1:       26601.2542 +/- 3.55016167 (0.01%) (init = 26601.34)
        S1_1:       54034.8079 +/- 1.60777173 (0.00%) (init = 54034.58)
        K_2:        8.07256863 == 'K_1'
        S0_2:       25084.6292 +/- 3.32916391 (0.01%) (init = 25084.42)
        S1_2:       1473.87013 +/- 4.74264114 (0.32%) (init = 1473.579)
        __lnsigma:  0.69314685 +/- 2.5390e-05 (0.00%) (init = -2.302585)
    [[Correlations]] (unreported correlations are < 0.100)
        C(S0_1, S1_2)      = 0.914
        C(S0_2, S1_2)      = 0.698
        C(K_1, S0_1)       = -0.587
        C(K_1, S1_1)       = -0.545
        C(S0_1, S1_1)      = 0.479
        C(S0_1, S0_2)      = 0.475
        C(S1_1, S1_2)      = 0.465
        C(K_1, S1_2)       = -0.379
        C(S0_2, __lnsigma) = -0.165
        C(S0_1, __lnsigma) = -0.157
        C(S1_1, S0_2)      = 0.156
        C(S1_2, __lnsigma) = -0.156
        C(K_1, __lnsigma)  = 0.133


.. code:: python

    highest_prob = np.argmax(gresMC.lnprob)
    hp_loc = np.unravel_index(highest_prob, gresMC.lnprob.shape)
    mle_soln = gresMC.chain[hp_loc]
    for i, par in enumerate(pg):
        pg[par].value = mle_soln[i]

    print('\nMaximum Likelihood Estimation from emcee       ')
    print('-------------------------------------------------')
    print('Parameter  MLE Value   Median Value   Uncertainty')
    fmt = '  {:5s}  {:11.5f} {:11.5f}   {:11.5f}'.format
    for name, param in pg.items():
        print(fmt(name, param.value, gresMC.params[name].value,
                  gresMC.params[name].stderr))


.. code:: python

    print('\nError estimates from emcee:')
    print('------------------------------------------------------')
    print('Parameter  -2sigma  -1sigma   median  +1sigma  +2sigma')

    for name in pg.keys():
        quantiles = np.percentile(gresMC.flatchain[name],
                                  [2.275, 15.865, 50, 84.135, 97.275])
        median = quantiles[2]
        err_m2 = quantiles[0] - median
        err_m1 = quantiles[1] - median
        err_p1 = quantiles[3] - median
        err_p2 = quantiles[4] - median
        fmt = '  {:5s}   {:8.4f} {:8.4f} {:8.4f} {:8.4f} {:8.4f}'.format
        print(fmt(name, err_m2, err_m1, median, err_p1, err_p2))

bootstrap con pandas
^^^^^^^^^^^^^^^^^^^^

.. code:: python

    %%timeit
    for i in range(100):
        tdf = pd.DataFrame([(j, i) for i in range(7) for j in range(2)]).sample(14, replace=True, ignore_index=False)
        df1 = df[["x", "y1"]].iloc[np.array(tdf[tdf[0]==0][1])]
        df2 = df[["x", "y2"]].iloc[np.array(tdf[tdf[0]==1][1])]


.. code:: python

    # %%timeit
    def idx_sample(npoints):
        tidx = []
        for i in range(npoints):
            tidx.append((np.random.randint(2), np.random.randint(7)))
        idx1 = []
        idx2 = []
        for t in tidx:
            if t[0] == 0:
                idx1.append(t[1])
            elif t[0] == 1:
                idx2.append(t[1])
            else:
                raise Exception("Must never occur")
        return idx1, idx2

    for i in range(100):
        idx1, idx2 = idx_sample(14)
        df1 = df[["x", "y1"]].iloc[idx1].sort_values(by="x", ascending=False).reset_index(drop=True)
        df2 = df[["x", "y2"]].iloc[idx2].sort_values(by="x", ascending=False).reset_index(drop=True)

.. code:: python

    # %%timeit  #5-6 s for nboot=7 now 0.4s
    n_points = len(df)
    nboot=199
    np.random.seed(5)
    best = lmfit.minimize(gobjective, pg, args=([df.x[1:], df.x], [df.y1[1:], df.y2], [titration_pH, titration_pH]))
    nb = {k: [] for k in best.params.keys()}

    for i in range(nboot):
        idx1, idx2 = idx_sample(13)
        df1 = df[["x", "y1"]].iloc[idx1].sort_values(by="x", ascending=False).reset_index(drop=True)
        df2 = df[["x", "y2"]].iloc[idx2].sort_values(by="x", ascending=False).reset_index(drop=True)
        # boot_idxs = np.random.randint(0, n_points, n_points)
        # df2 = df.iloc[boot_idxs]
        # df2=df2.sort_values(by="x", ascending=False).reset_index(drop=True)
        # # df2.reset_index(drop=True, inplace=True)
        # boot_idxs = np.random.randint(0, n_points, n_points)
        # df3 = df.iloc[boot_idxs]
        # # df3.reset_index(drop=True, inplace=True)
        # df3=df3.sort_values(by="x", ascending=False).reset_index(drop=True)
        try:
            out = lmfit.minimize(gobjective, best.params,
                                 args=([df1.x, df2.x], [df1.y1, df2.y2], [titration_pH, titration_pH]),
                                 calc_covar=False, method="leastsq", nan_policy="omit",  scale_covar=False)
            for k,v in out.params.items():
                nb[k].append(v.value)
        except:
            print(df1)
            print(df2)

    # print(nb)

.. code:: python

    np.quantile(nb["K_1"],[0.025, 0.5, 0.975])

::

    array([7.97738191, 8.0781979 , 8.64995886])


.. code:: python

    sb.kdeplot(data=nb, x="K_1", y="S1_2")

.. image:: ../_static/bs_pd_f1.png


.. code:: python

    # nb.drop("K_2", axis=1, inplace=True)
    with sb.axes_style("darkgrid"):
        g = sb.PairGrid(pd.DataFrame(nb), diag_sharey=False, vars=["K_1", "S1_1", "S1_2"])
        g.map_upper(plt.hexbin, bins='log', gridsize=20, cmap="Blues", mincnt=2)
        g.map_lower(sb.kdeplot, cmap="viridis_r", fill=True)
        g.map_diag(sb.histplot, kde=True)

.. image:: ../_static/bs_pd_f2.png


.. code:: python

    sb.violinplot(data=nb, x="K_1", split=True)

.. image:: ../_static/bs_pd_f3.png


.. code:: python

    g = sb.jointplot(y="S1_2", x="K_1", data=nb, marker="+", s=25, marginal_kws=dict(bins=25, fill=False, kde=True), color="#2075AA", marginal_ticks=True, height=5, ratio=2)
    g.plot_joint(sb.kdeplot, color="r", zorder=0, levels=5)

.. image:: ../_static/bs_pd_f4.png


.. code:: python

    g = sb.JointGrid(data=nb, x="K_1", y="S1_2")
    g.plot_joint(sb.histplot)
    g.plot_marginals(sb.boxplot)

.. image:: ../_static/bs_pd_f5.png


.. code:: python

    f, (ax_box, ax_hist) = plt.subplots(2, sharex=True, gridspec_kw={"height_ratios": (.25, .75)})

    sb.histplot(data=nb, x="K_1", kde=True, ax=ax_hist)

    sb.boxplot(x="K_1", data=nb, whis=[2.5, 97.5], ax=ax_box)
    sb.stripplot(x="K_1", data=nb, color=".3", alpha=0.2, ax=ax_box)
    ax_box.set(xlabel='')
    f.tight_layout()
    # ax = sb.violinplot(x="K_1", data=nb, inner=None, color="r")

.. image:: ../_static/bs_pd_f6.png


.. code:: python

    import corner

    g = corner.corner(pd.DataFrame(nb)[["K_1", "S1_1", "S1_2"]], labels=list(nb.keys()))

::

    WARNING:root:Too few points to create valid contours

.. image:: ../_static/bs_pd_f7.png

using R
^^^^^^^

.. code:: R

    d <- read.table("../../tests/data/A01.dat")
    fit = nls(V2 ~ (SB + SA * 10 **(pK - V1))/ (1 + 10 ** (pK - V1)), start = list(SB=3e4, SA=3e5, pK=7), data=d)
    summary(fit)
    set.seed(4)

::


    Formula: V2 ~ (SB + SA * 10^(pK - V1))/(1 + 10^(pK - V1))

    Parameters:
        Estimate Std. Error t value Pr(>|t|)
    SB 2.664e+04  2.456e+03   10.85  0.00041 ***
    SA 5.404e+04  9.800e+02   55.15 6.47e-07 ***
    pK 8.070e+00  1.494e-01   54.01 7.03e-07 ***
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

    Residual standard error: 1754 on 4 degrees of freedom

    Number of iterations to convergence: 9
    Achieved convergence tolerance: 1.51e-06

.. code:: R

    confint(fit)

::

    Waiting for profiling to be done...
               2.5%       97.5%
    SB 18604.738923 32461.32421
    SA 51396.339658 56779.63168
    pK     7.680826     8.48057


.. code:: R

    fz <- function(x, SA1, SB1, SA2, SB2, pK){
      y1 <- (SB1 + SA1 * 10 **(pK - x))/ (1 + 10 ** (pK - x))
      y2 <- (SB2 + SA2 * 10 **(pK - x))/ (1 + 10 ** (pK - x))
      return(rbind(y1,y2))
    }
    ##fitg = nls(rbind(V2, V3) ~ fz(V1, SA1, SB1, SA2, SB2, pK),         start = list(SB1=3e4, SA1=3e5, SB2=3e4, SA2=3e5, pK=7), data=d)
    ##fitg = nls(c(V2, V3) ~ c((SB1 + SA1 * 10 **(pK - V1))/ (1 + 10 ** (pK - V1)), (SB2 + SA2 * 10 **(pK - V1))/ (1 + 10 ** (pK - V1))),         start = list(SB1=3e4, SA1=3e5, SB2=3e4, SA2=3e5, pK=7), data=d)

`https://stats.stackexchange.com/questions/44246/nls-curve-fitting-of-nested-shared-parameters <https://stats.stackexchange.com/questions/44246/nls-curve-fitting-of-nested-shared-parameters>`_

.. code:: R

    library("nlstools")
    n1 <- length(d$V2)
    n2 <- length(d$V3)

    # separate fits:
    fit1 = nls(V2 ~ (SB1 + SA1 * 10 **(pK - V1))/ (1 + 10 ** (pK - V1)),
               start = list(SB1=3e4, SA1=3e5, pK=7), data=d)
    fit2 = nls(V3 ~ (SB2 + SA2 * 10 **(pK - V1))/ (1 + 10 ** (pK - V1)),
               start = list(SB2=3e4, SA2=3e5, pK=7), data=d)

    #set up stacked variables:
    ## y <- c(y1,y2); x <- c(x1,x2)
    y <- c(d$V2,d$V3)

    lcon1 <- rep(c(1,0), c(n1,n2))
    lcon2 <- rep(c(0,1), c(n1,n2))
    mcon1 <- lcon1
    mcon2 <- lcon2

    # combined fit with common 'c' parameter, other parameters separate
    fitg = nls(y ~ mcon1*(SB1 + SA1 * 10 **(pK - V1))/ (1 + 10 ** (pK - V1)) + mcon2*(SB2 + SA2 * 10 **(pK - V1))/ (1 + 10 ** (pK - V1)),
           start = list(SB1=3e4, SA1=3e5, SB2=3e4, SA2=3e5, pK=7), data=d)

    confint2(fitg)
    confint2(fit1)
    confint2(fit2)

::

               2.5 %       97.5 %
    SB1 23376.154137 29826.554415
    SA1 52614.760849 55454.403951
    SB2 22059.687893 28109.136342
    SA2    77.955582  2869.198281
    pK      7.900608     8.244491
              2.5 %      97.5 %
    SB1 19820.10513 33457.59221
    SA1 51322.45855 56764.26498
    pK      7.65479     8.48443
               2.5 %       97.5 %
    SB2 24352.669239 25919.322982
    SA2  1175.819778  1795.244474
    pK      8.022244     8.132196

.. code:: R

    nlstools::confint2(fitg)

::

               2.5 %       97.5 %
    SB1 23376.154137 29826.554415
    SA1 52614.760849 55454.403951
    SB2 22059.687893 28109.136342
    SA2    77.955582  2869.198281
    pK      7.900608     8.244491


.. code:: R

    nlstools::plotfit(fit2)

.. image:: ../_static/gR_fit1.png

.. code:: R

    nlstools::overview(fitg)

::


    ------
    Formula: y ~ mcon1 * (SB1 + SA1 * 10^(pK - V1))/(1 + 10^(pK - V1)) + mcon2 *
        (SB2 + SA2 * 10^(pK - V1))/(1 + 10^(pK - V1))

    Parameters:
         Estimate Std. Error t value Pr(>|t|)
    SB1 2.660e+04  1.426e+03  18.658 1.67e-08 ***
    SA1 5.403e+04  6.276e+02  86.092 1.95e-14 ***
    SB2 2.508e+04  1.337e+03  18.760 1.60e-08 ***
    SA2 1.474e+03  6.169e+02   2.389   0.0407 *
    pK  8.073e+00  7.601e-02 106.207 2.95e-15 ***
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

    Residual standard error: 1177 on 9 degrees of freedom

    Number of iterations to convergence: 7
    Achieved convergence tolerance: 7.71e-07

    ------
    Residual sum of squares: 12500000

    ------
    t-based confidence interval:
                2.5%        97.5%
    SB1 23376.154137 29826.554415
    SA1 52614.760849 55454.403951
    SB2 22059.687893 28109.136342
    SA2    77.955582  2869.198281
    pK      7.900608     8.244491

    ------
    Correlation matrix:
                SB1         SA1          SB2          SA2         pK
    SB1  1.00000000  0.06634912 -0.426385167 -0.215493852 -0.6816295
    SA1  0.06634912  1.00000000 -0.225860127 -0.114149066 -0.3610654
    SB2 -0.42638517 -0.22586013  1.000000000  0.002758745  0.6255380
    SA2 -0.21549385 -0.11414907  0.002758745  1.000000000  0.3161451
    pK  -0.68162953 -0.36106540  0.625537996  0.316145125  1.0000000

.. code:: R

    nlstools::test.nlsResiduals(nlstools::nlsResiduals(fitg))

::


    ------
    	Shapiro-Wilk normality test

    data:  stdres
    W = 0.82711, p-value = 0.01102


    ------
    	Runs Test

    data:  as.factor(run)
    Standard Normal = 0.081275, p-value = 0.9352
    alternative hypothesis: two.sided

.. code:: R

    plot(nlstools::nlsResiduals(fitg))
    ## plot(nlsResiduals(fitg))

.. image:: ../_static/gR_fit2.png

.. code:: R

    plot(nlstools::nlsConfRegions(fit))

.. image:: ../_static/gR_fit3.png

.. code:: R

    plot(nlstools::nlsContourRSS(fit))

.. image:: ../_static/gR_fit4.png

.. code:: R

    library(nlstools)
    nb = nlsBoot(fit, niter=999)
    plot(nb)

.. image:: ../_static/gR_fit5.png

.. code:: R

    plot(nb, type="boxplot")

.. image:: ../_static/gR_fit6.png

.. code:: R

    summary(nb)

::


    ------
    Bootstrap statistics
           Estimate   Std. error
    SB 26516.059610 1930.5821817
    SA 54049.694523  745.3580575
    pK     8.071597    0.1121147

    ------
    Median of bootstrap estimates and percentile confidence intervals
             Median      2.5%        97.5%
    SB 26887.590883 21927.862 29495.527023
    SA 54141.042940 52421.957 55273.697836
    pK     8.078571     7.833     8.274291


.. code:: R

    plot(nlsJack(fit))

.. image:: ../_static/gR_fit7.png

.. code:: R

    summary(nlsJack(fit))

::


    ------
    Jackknife statistics
          Estimates          Bias
    SB 29534.416585 -2.895568e+03
    SA 54043.021568  3.401935e-01
    pK     7.971888  9.772245e-02

    ------
    Jackknife confidence intervals
                Low           Up
    SB 20101.598249 38967.234921
    SA 50408.254863 57677.788274
    pK     7.600316     8.343459

    ------
    Influential values
    * Observation 7 is influential on SB
    * Observation 1 is influential on SA
    * Observation 7 is influential on pK

lmfit.Model
^^^^^^^^^^^

It took 9 vs 5 ms.
It is not possible to do global fitting. In the documentation it is stressed the need to convert the output of the residue to be 1D vectors.

.. code:: python

    mod = lmfit.models.ExpressionModel("(SB + SA * 10**(pK-x)) / (1 + 10**(pK-x))")
    result = mod.fit(np.array(df.y1), x=np.array(df.x), pK=7, SB=7e3, SA=10000)
    print(result.fit_report())

::

    [[Model]]
        Model(_eval)
    [[Fit Statistics]]
        # fitting method   = leastsq
        # function evals   = 44
        # data points      = 7
        # variables        = 3
        chi-square         = 12308015.2
        reduced chi-square = 3077003.79
        Akaike info crit   = 106.658958
        Bayesian info crit = 106.496688
        R-squared          = 0.97973543
    [[Variables]]
        SB:  26638.8739 +/- 2455.97231 (9.22%) (init = 7000)
        SA:  54043.3677 +/- 979.991414 (1.81%) (init = 10000)
        pK:  8.06960807 +/- 0.14940702 (1.85%) (init = 7)
    [[Correlations]] (unreported correlations are < 0.100)
        C(SB, pK) = -0.775
        C(SA, pK) = -0.455
        C(SB, SA) = 0.205

.. code:: python

    plt.plot(df.x, df.y1, 'o')
    plt.plot(df.x, result.init_fit, '--', label='initial fit')
    plt.plot(df.x, result.best_fit, '-', label='best fit')
    plt.legend()

.. image:: ../_static/lmodel1.png

.. code:: python

    print(result.ci_report())

::

          99.73%    95.45%    68.27%    _BEST_    68.27%    95.45%    99.73%
    SB:-85235.77732-8376.43993-2895.5980126638.87391+2558.73803+5999.27655+12360.57068
    SA:-6192.82267-2734.31472-1098.2289154043.36770+1113.17408+2829.53449+6725.36991
    pK:  -0.98139  -0.40197  -0.15948   8.06961  +0.16277  +0.42592  +1.50918


which is faster but still I failed to find the way to global fitting.

.. code:: python

    def tit_pH(x, S0, S1, K):
        return (S0 + S1 * 10 ** (K - x)) / (1 + 10 ** (K - x))

    tit_model1 = lmfit.Model(tit_pH, prefix="ds1_")
    tit_model2 = lmfit.Model(tit_pH, prefix="ds2_")
    print(f'parameter names: {tit_model1.param_names}')
    print(f'parameter names: {tit_model2.param_names}')
    print(f'independent variables: {tit_model1.independent_vars}')
    print(f'independent variables: {tit_model2.independent_vars}')

    tit_model1.set_param_hint('K', value=7.0, min=2.0, max=12.0)
    tit_model1.set_param_hint('S0', value=df.y1[0], min=0.0)
    tit_model1.set_param_hint('S1', value=df.y1.iloc[-1], min=0.0)
    tit_model2.set_param_hint('K', value=7.0, min=2.0, max=12.0)
    tit_model2.set_param_hint('S0', value=df.y1[0], min=0.0)
    tit_model2.set_param_hint('S1', value=df.y1.iloc[-1], min=0.0)
    pars1 = tit_model1.make_params()
    pars2 = tit_model2.make_params()
    # gmodel = tit_model1 + tit_model2
    # result = gmodel.fit(df.y1 + df.y2, pars, x=df.x)
    res1 = tit_model1.fit(df.y1, pars1, x=df.x)
    res2 = tit_model2.fit(df.y2, pars2, x=df.x)
    print(res1.fit_report())
    print(res2.fit_report())

::

    parameter names: ['ds1_S0', 'ds1_S1', 'ds1_K']
    parameter names: ['ds2_S0', 'ds2_S1', 'ds2_K']
    independent variables: ['x']
    independent variables: ['x']
    [[Model]]
        Model(tit_pH, prefix='ds1_')
    [[Fit Statistics]]
        # fitting method   = leastsq
        # function evals   = 25
        # data points      = 7
        # variables        = 3
        chi-square         = 12308015.2
        reduced chi-square = 3077003.79
        Akaike info crit   = 106.658958
        Bayesian info crit = 106.496688
        R-squared          = 0.97973543
    [[Variables]]
        ds1_S0:  26638.8364 +/- 2455.91626 (9.22%) (init = 29657)
        ds1_S1:  54043.3589 +/- 979.996129 (1.81%) (init = 51205)
        ds1_K:   8.06961101 +/- 0.14940677 (1.85%) (init = 7)
    [[Correlations]] (unreported correlations are < 0.100)
        C(ds1_S0, ds1_K)  = -0.775
        C(ds1_S1, ds1_K)  = -0.455
        C(ds1_S0, ds1_S1) = 0.205
    [[Model]]
        Model(tit_pH, prefix='ds2_')
    [[Fit Statistics]]
        # fitting method   = leastsq
        # function evals   = 33
        # data points      = 7
        # variables        = 3
        chi-square         = 159980.530
        reduced chi-square = 39995.1326
        Akaike info crit   = 76.2582808
        Bayesian info crit = 76.0960112
        R-squared          = 0.99963719
    [[Variables]]
        ds2_S0:  25135.9917 +/- 282.132353 (1.12%) (init = 29657)
        ds2_S1:  1485.53109 +/- 111.550019 (7.51%) (init = 51205)
        ds2_K:   8.07721961 +/- 0.01980087 (0.25%) (init = 7)
    [[Correlations]] (unreported correlations are < 0.100)
        C(ds2_S0, ds2_K)  = 0.777
        C(ds2_S1, ds2_K)  = 0.455
        C(ds2_S0, ds2_S1) = 0.205


.. code:: python

    xfit_delta = (df.x.max() - df.x.min()) / 100
    xfit = np.arange(df.x.min() - xfit_delta, df.x.max() + xfit_delta, xfit_delta)
    dely1 = res1.eval_uncertainty(x=xfit) * 1
    dely2 = res2.eval_uncertainty(x=xfit) * 1
    best_fit1 = res1.eval(x=xfit)
    best_fit2 = res2.eval(x=xfit)
    plt.plot(df.x, df.y1, "o")
    plt.plot(df.x, df.y2, "o")
    plt.plot(xfit, best_fit1,"-.")
    plt.plot(xfit, best_fit2,"-.")
    plt.fill_between(xfit, best_fit1 - dely1, best_fit1 + dely1, color='#FEDCBA', alpha=0.5)
    plt.fill_between(xfit, best_fit2 - dely2, best_fit2 + dely2, color='#FEDCBA', alpha=0.5)

.. image:: ../_static/lmodel2.png

Please mind the difference in the uncertainty between the 2 label blocks.

.. code:: python

    def tit_pH2(x, S0_1, S0_2, S1_1, S1_2, K):
        y1 = (S0_1 + S1_1 * 10 **(K - x)) / (1 + 10 **(K - x))
        y2 = (S0_2 + S1_2 * 10 **(K - x)) / (1 + 10 **(K - x))
        # return y1, y2
        return np.r_[y1, y2]

    tit_model = lmfit.Model(tit_pH2)
    tit_model.set_param_hint('K', value=7.0, min=2.0, max=12.0)
    tit_model.set_param_hint('S0_1', value=df.y1[0], min=0.0)
    tit_model.set_param_hint('S0_2', value=df.y2[0], min=0.0)
    tit_model.set_param_hint('S1_1', value=df.y1.iloc[-1], min=0.0)
    tit_model.set_param_hint('S1_2', value=df.y2.iloc[-1], min=0.0)
    pars = tit_model.make_params()
    # res = tit_model.fit([df.y1, df.y2], pars, x=df.x)
    res = tit_model.fit(np.r_[df.y1, df.y2], pars, x=df.x)
    print(res.fit_report())

::

    [[Model]]
        Model(tit_pH2)
    [[Fit Statistics]]
        # fitting method   = leastsq
        # function evals   = 37
        # data points      = 14
        # variables        = 5
        chi-square         = 12471473.3
        reduced chi-square = 1385719.25
        Akaike info crit   = 201.798560
        Bayesian info crit = 204.993846
        R-squared          = 0.99794717
    [[Variables]]
        S0_1:  26601.3422 +/- 1425.69369 (5.36%) (init = 29657)
        S0_2:  25084.4220 +/- 1337.07555 (5.33%) (init = 22885)
        S1_1:  54034.5797 +/- 627.642878 (1.16%) (init = 51205)
        S1_2:  1473.57942 +/- 616.944953 (41.87%) (init = 1358)
        K:     8.07255057 +/- 0.07600744 (0.94%) (init = 7)
    [[Correlations]] (unreported correlations are < 0.100)
        C(S0_1, K)    = -0.682
        C(S0_2, K)    = 0.626
        C(S0_1, S0_2) = -0.426
        C(S1_1, K)    = -0.361
        C(S1_2, K)    = 0.316
        C(S0_2, S1_1) = -0.226
        C(S0_1, S1_2) = -0.215
        C(S1_1, S1_2) = -0.114

.. code:: python

    dely = res.eval_uncertainty(x=xfit)
    # res.plot() # this return error because of the global fit

.. code:: python

    def fit_pH(fp):
        df = pd.read_csv(fp)
        def tit_pH(x, SA, SB, pK):
            return (SB + SA * 10 ** (pK - x)) / (1 + 10 ** (pK - x))
        mod = lmfit.Model(tit_pH)
        pars = mod.make_params(SA=10000, SB=7e3, pK=7)
        result = mod.fit(df.y2, pars, x=df.x)
        return result, df.y2, df.x, mod

    # r,y,x,model = fit_pH("/home/dati/ibf/IBF/Database/Random mutag results/Liasan-analyses/2016-05-19/2014-02-20/pH/dat/C12.dat")
    r,y,x,model = fit_pH("../../tests/data/H04.dat")
    xfit = np.linspace(x.min(),x.max(),50)
    dely = r.eval_uncertainty(x=xfit) * 1
    best_fit = r.eval(x=xfit)
    plt.plot(x, y, "o")
    plt.plot(xfit, best_fit,"-.")
    plt.fill_between(xfit, best_fit-dely,
                     best_fit+dely, color='#FEDCBA', alpha=0.5)
    r.conf_interval(sigmas=[2])
    print(r.ci_report(with_offset=False, ndigits=2))

::

       99.73% 95.45% 68.27% _BEST_ 68.27% 95.45% 99.73%
    SA:2338.404511.625450.156052.536642.137512.329321.96
    SB:33406.9634609.8235170.9935544.4435920.0736492.8937756.24
    pK:   6.47   6.60   6.66   6.70   6.74   6.80   6.93

.. image:: ../_static/lmodel_H04.png


.. code:: python

    g = r.plot()

.. image:: ../_static/lmodel4.png

.. code:: python

    print(r.ci_report())

::

          99.73%    95.45%    68.27%    _BEST_    68.27%    95.45%    99.73%
    SA:-3714.12523-1540.90611-602.372266052.52642+589.60360+1459.79555+3269.43112
    SB:-2137.47471-934.62390-373.4473135544.43896+375.63195+948.44897+2211.80126
    pK:  -0.23398  -0.10021  -0.03976   6.70122  +0.03971  +0.09989  +0.23227


.. code:: python

    emcee_kws = dict(steps=2000, burn=500, thin=2, is_weighted=False,
                     progress=False)
    emcee_params = r.params.copy()
    emcee_params.add('__lnsigma', value=np.log(0.1), min=np.log(0.001), max=np.log(2000.0))
    result_emcee = model.fit(data=y, x=x, params=emcee_params, method='emcee',
                             nan_policy='omit', fit_kws=emcee_kws)

    lmfit.report_fit(result_emcee)

::

    The chain is shorter than 50 times the integrated autocorrelation time for 4 parameter(s). Use this estimate with caution and run a longer chain!
    N/50 = 40;
    tau: [46.79373445 51.93012345 40.4955552  89.70949742]
    [[Fit Statistics]]
        # fitting method   = emcee
        # function evals   = 200000
        # data points      = 7
        # variables        = 4
        chi-square         = 3.33295342
        reduced chi-square = 1.11098447
        Akaike info crit   = 2.80564072
        Bayesian info crit = 2.58928132
        R-squared          = 1.00000000
    [[Variables]]
        SA:         6033.16338 +/- 597.312717 (9.90%) (init = 6052.526)
        SB:         35545.1923 +/- 379.344928 (1.07%) (init = 35544.44)
        pK:         6.70058390 +/- 0.03921574 (0.59%) (init = 6.701225)
        __lnsigma:  6.30231251 +/- 0.38825321 (6.16%) (init = -2.302585)
    [[Correlations]] (unreported correlations are < 0.100)
        C(SA, pK) = 0.717
        C(SB, pK) = 0.518
        C(SA, SB) = 0.200

.. code:: python

    result_emcee.plot_fit()

.. image:: ../_static/lmodel5.png

.. code:: python

    emcee_corner = corner.corner(result_emcee.flatchain, labels=result_emcee.var_names,
                                 truths=list(result_emcee.params.valuesdict().values()))

.. image:: ../_static/lmodel6.png

.. code:: python

    highest_prob = np.argmax(result_emcee.lnprob)
    hp_loc = np.unravel_index(highest_prob, result_emcee.lnprob.shape)
    mle_soln = result_emcee.chain[hp_loc]
    print("\nMaximum Likelihood Estimation (MLE):")
    print('----------------------------------')
    for ix, param in enumerate(emcee_params):
        print(f"{param}: {mle_soln[ix]:.3f}")

    quantiles = np.percentile(result_emcee.flatchain['pK'], [2.28, 15.9, 50, 84.2, 97.7])
    print(f"\n\n1 sigma spread = {0.5 * (quantiles[3] - quantiles[1]):.3f}")
    print(f"2 sigma spread = {0.5 * (quantiles[4] - quantiles[0]):.3f}")

::


    Maximum Likelihood Estimation (MLE):
    ----------------------------------
    SA: 6019.885
    SB: 35528.699
    pK: 6.699
    __lnsigma: 5.939


    1 sigma spread = 0.039
    2 sigma spread = 0.098

TODO See also this tutorial
~~~~~~~~~~~~~~~~~~~~~~~~~~~

`https://www.astro.rug.nl/software/kapteyn/kmpfittutorial.html <https://www.astro.rug.nl/software/kapteyn/kmpfittutorial.html>`_

TODO jackknife to auto-reject
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

TODO uncertainty estimate
^^^^^^^^^^^^^^^^^^^^^^^^^

Example 2P Cl–ratio
-------------------

using lmfit.model
~~~~~~~~~~~~~~~~~

.. code:: python

    def fit_Rcl(fp):
        df = pd.read_table(fp)
        def R_Cl(cl, R0, R1, Kd):
            return (R1 * cl + R0 * Kd)/(Kd + cl)
        mod = lmfit.Model(R_Cl)
        pars = mod.make_params(R0=0.8, R1=0.05, Kd=10)
        result = mod.fit(df.R, pars, cl=df.cl)
        return result, df.R, df.cl, mod

    r,y,x,model = fit_Rcl("../../tests/data/ratio2P.txt")
    xfit = np.linspace(x.min(),x.max(),50)
    dely = r.eval_uncertainty(cl=xfit) * 3
    best_fit = r.eval(cl=xfit)
    plt.plot(x, y, "o")
    plt.grid()
    plt.plot(xfit, best_fit,"-.")
    plt.fill_between(xfit, best_fit-dely,
                     best_fit+dely, color='#FEDCBA', alpha=0.5)
    r.conf_interval(sigmas=[2])
    print(r.ci_report(with_offset=False, ndigits=2))

::

       99.73% 95.45% 68.27% _BEST_ 68.27% 95.45% 99.73%
    R0:   0.49   0.58   0.60   0.61   0.62   0.64   0.73
    R1:  -0.30  -0.01   0.03   0.04   0.06   0.09   0.20
    Kd:   2.95  10.09  12.51  13.66  14.91  18.49  59.97

.. image:: ../_static/ratio2P-lmodel1.png

.. code:: python

    emcee_kws = dict(steps=3000, burn=300, thin=2, is_weighted=False,
                     progress=False)
    emcee_params = r.params.copy()
    emcee_params.add('__lnsigma', value=np.log(0.1), min=np.log(0.000001), max=np.log(2000.0))
    result_emcee = model.fit(data=y, cl=x, params=emcee_params, method='emcee',
                             nan_policy='omit', fit_kws=emcee_kws)

::

    The chain is shorter than 50 times the integrated autocorrelation time for 2 parameter(s). Use this estimate with caution and run a longer chain!
    N/50 = 60;
    tau: [ 83.72116791  47.29844186  47.96365641 198.21874013]


.. code:: python

    lmfit.report_fit(result_emcee)

::

    [[Fit Statistics]]
        # fitting method   = emcee
        # function evals   = 300000
        # data points      = 5
        # variables        = 4
        chi-square         = 1.01491984
        reduced chi-square = 1.01491984
        Akaike info crit   = 0.02685860
        Bayesian info crit = -1.53538975
        R-squared          = -6.14152912
    [[Variables]]
        R0:         0.60540963 +/- 0.01716324 (2.83%) (init = 0.6071065)
        R1:         0.04245431 +/- 0.02352093 (55.40%) (init = 0.04390401)
        Kd:         13.8169352 +/- 2.11632273 (15.32%) (init = 13.66125)
        __lnsigma: -4.71757393 +/- 1.52902499 (32.41%) (init = -2.302585)
    [[Correlations]] (unreported correlations are < 0.100)
        C(R1, Kd)        = -1.000
        C(R0, __lnsigma) = -0.303

.. code:: python

    emcee_corner = corner.corner(result_emcee.flatchain, labels=result_emcee.var_names,
                                 truths=list(result_emcee.params.valuesdict().values()))

::

    WARNING:root:Too few points to create valid contours
    WARNING:root:Too few points to create valid contours
    WARNING:root:Too few points to create valid contours
    WARNING:root:Too few points to create valid contours
    WARNING:root:Too few points to create valid contours
    WARNING:root:Too few points to create valid contours

.. image:: ../_static/ratio2P-lmodel2.png

.. code:: python

    highest_prob = np.argmax(result_emcee.lnprob)
    hp_loc = np.unravel_index(highest_prob, result_emcee.lnprob.shape)
    mle_soln = result_emcee.chain[hp_loc]
    print("\nMaximum Likelihood Estimation (MLE):")
    print('----------------------------------')
    for ix, param in enumerate(emcee_params):
        print(f"{param}: {mle_soln[ix]:.3f}")

    quantiles = np.percentile(result_emcee.flatchain['Kd'], [2.28, 15.9, 50, 84.2, 97.7])
    print(f"\n\n1 sigma spread = {0.5 * (quantiles[3] - quantiles[1]):.3f}")
    print(f"2 sigma spread = {0.5 * (quantiles[4] - quantiles[0]):.3f}")

::


    Maximum Likelihood Estimation (MLE):
    ----------------------------------
    R0: 0.607
    R1: 0.045
    Kd: 13.602
    __lnsigma: -5.555


    1 sigma spread = 2.127
    2 sigma spread = 917154430706916272373760.000

using R
~~~~~~~

.. code:: R

    d <- read.delim("../../tests/data/ratio2P.txt")
    fitr = nls(R ~ (R1 * cl + R0 * Kd)/(Kd + cl), start = list(R0=0.8, R1=0.05, Kd=10), data=d)
    nlstools::overview(fitr)

::


    ------
    Formula: R ~ (R1 * cl + R0 * Kd)/(Kd + cl)

    Parameters:
        Estimate Std. Error t value Pr(>|t|)
    R0  0.607106   0.006197  97.965 0.000104 ***
    R1  0.043904   0.010314   4.257 0.051000 .
    Kd 13.661249   0.895076  15.263 0.004265 **
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

    Residual standard error: 0.006231 on 2 degrees of freedom

    Number of iterations to convergence: 5
    Achieved convergence tolerance: 2.164e-06

    ------
    Residual sum of squares: 7.76e-05

    ------
    t-based confidence interval:
                2.5%      97.5%
    R0  0.5804421912  0.6337708
    R1 -0.0004723916  0.0882804
    Kd  9.8100489312 17.5124485

    ------
    Correlation matrix:
               R0         R1         Kd
    R0  1.0000000  0.1481828 -0.4238954
    R1  0.1481828  1.0000000 -0.8612579
    Kd -0.4238954 -0.8612579  1.0000000

.. code:: R

    nlstools::test.nlsResiduals(nlstools::nlsResiduals(fitr))

::


    ------
    	Shapiro-Wilk normality test

    data:  stdres
    W = 0.8952, p-value = 0.3839


    ------
    	Runs Test

    data:  as.factor(run)
    Standard Normal = 0.65465, p-value = 0.5127
    alternative hypothesis: two.sided

.. code:: R

    plot(nlstools::nlsResiduals(fitr))

.. image:: ../_static/ratio2P_R1.png

.. code:: R

    plot(nlstools::nlsConfRegions(fitr))

.. image:: ../_static/ratio2P_R2.png

.. code:: R

    plot(nlstools::nlsContourRSS(fitr))

.. image:: ../_static/ratio2P_R3.png

.. code:: R

    library(nlstools)
    set.seed(4)
    nb = nlsBoot(fitr, niter=999)
    plot(nb)

.. image:: ../_static/ratio2P_R4.png

.. code:: R

    plot(nb, type="boxplot")

.. image:: ../_static/ratio2P_R5.png

.. code:: R

    summary(nb)

::


    ------
    Bootstrap statistics
          Estimate  Std. error
    R0  0.60701704 0.003940589
    R1  0.04388451 0.006595830
    Kd 13.67402020 0.571780243

    ------
    Median of bootstrap estimates and percentile confidence intervals
            Median        2.5%       97.5%
    R0  0.60786727  0.60160431  0.61225102
    R1  0.04430874  0.03139322  0.05609658
    Kd 13.66608898 12.50884400 14.80687686

.. code:: R

    plot(nlsJack(fitr))

.. image:: ../_static/ratio2P_R6.png

.. code:: R

    summary(nlsJack(fitr))

::


    ------
    Jackknife statistics
        Estimates        Bias
    R0 0.65998921 -0.05288272
    R1 0.05557924 -0.01167524
    Kd 9.23221855  4.42903016

    ------
    Jackknife confidence intervals
                Low         Up
    R0   0.42359388  0.8963845
    R1  -0.06687494  0.1780334
    Kd -12.39589872 30.8603358

    ------
    Influential values
    * Observation 1 is influential on R0
    * Observation 1 is influential on R1
    * Observation 2 is influential on R1
    * Observation 5 is influential on R1
    * Observation 1 is influential on Kd
    * Observation 2 is influential on Kd
    * Observation 5 is influential on Kd

Old scripts
-----------

``fit_titration.py``
~~~~~~~~~~~~~~~~~~~~

- input ← csvtable and note \_file

  - csvtable

  .. image:: ../_static/csvtable.png

  - note \_file

  .. image:: ../_static/note_file.png

- output → pK spK and pdf of analysis


It is a unique script for pK and Cl and various methods:

1. svd

2. bands

3. single lambda

and bootstrapping


I do not know how to unittest
TODO

- average spectra

- join spectra ['B', 'E', 'F']

- compute band integral (or sums)

``fit_titration_global.py``
~~~~~~~~~~~~~~~~~~~~~~~~~~~

A script for fitting tuples (y1, y2) of values for each concentration (x). It uses lmfit confint and bootstrap.

- input ← x y1 y2 (file)

  - file

  .. image:: ../_static/file.png

- output →

  - params: K SA1 SB1 SA2 SB2

  - fit.png

  - correl.png

It uses lmfit confint and bootstrap. In global fit the best approach was using lmfit without bootstrap.

.. code:: bash

    for i in *.dat; do gfit $i png2 --boot 99 > png2/$i.txt; done

IBF database uses
~~~~~~~~~~~~~~~~~

Bash scripts (probably moved into prtecan) for:

- ``fit_titration_global.py``

  - `../../src/clophfit/old/bash/fit.tecan <../../src/clophfit/old/bash/fit.tecan>`_

  - `../../src/clophfit/old/bash/fit.tecan.cl <../../src/clophfit/old/bash/fit.tecan.cl>`_

- ``fit_titration.py``

  .. code:: sh

      cd 2014-xx-xx

      (prparser) pr.enspire *.csv

      fit_titration.py meas/Copy_daniele00_893_A.csv A02_37_note.csv -d fit/37C | tee fit/svd_Copy_daniele00_893_A_A02_37_note.txt

      w_ave.sh > pKa.txt

      head pKa??/pKa.txt >> Readme.txt


      # fluorimeter data
      ls > list
      merge.py list
      fit_titration *.csv fluo_note

see: `/home/dati/ibf/IBF/Database/Data and protocols_Liaisan/library after Omnichange mutagenesis/Readme_howto.txt </home/dati/ibf/IBF/Database/Data and protocols_Liaisan/library after Omnichange mutagenesis/Readme_howto.txt>`_