{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# `prtecan` Tutorial\n", "\n", "This tutorial demonstrates how to process Tecan plate reader data using the `clophfit.prtecan` module.\n", "\n", "What you'll learn:\n", "- Tecan file structure and label blocks\n", "- Building titrations from multiple files (manually and from list file)\n", "- Setting plate scheme, loading additions, background handling\n", "- Inspecting and plotting results\n", "- Brief overview of fitting methods and quality control" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Setup\n", "%load_ext autoreload\n", "%autoreload 2\n", "\n", "from pathlib import Path\n", "\n", "import arviz as az\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "\n", "%matplotlib inline\n", "\n", "from clophfit import prtecan\n", "\n", "# Point to the tests data directory shipped with the repo\n", "data_root = Path(\"../../tests/Tecan\")\n", "l1_dir = data_root / \"L1\"\n", "l2_dir = data_root / \"140220\" # second dataset used by the original tutorial\n", "l4_dir = data_root / \"L4\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Understanding Tecan file structure\n", "Each Tecan file contains global metadata and one or more label blocks (measurement blocks).\n", "Blocks with identical key metadata are equivalent; blocks differing only by Integration Time, Flashes, or Gain are almost equivalent after normalization." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Load a single Tecan file and inspect label blocks\n", "tf = prtecan.Tecanfile(l1_dir / \"290513_7.2.xls\")\n", "lb1, lb2 = tf.labelblocks[\"1\"], tf.labelblocks[\"2\"]\n", "print(\"Available label blocks:\", list(tf.labelblocks.keys()))\n", "lb1.metadata" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"\\nSample Data (A01-B06):\")\n", "print({k: v for i, (k, v) in enumerate(lb2.data.items()) if i < 18})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Load additional files to compare block equivalence and demonstrate normalization across Gain differences." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tf1 = prtecan.Tecanfile(l1_dir / \"290513_5.5.xls\") # two equivalent blocks\n", "tf2 = prtecan.Tecanfile(\n", " l1_dir / \"290513_8.8.xls\"\n", ") # one equivalent, one almost equivalent\n", "\n", "print(\"tf.lb1 = tf2.lb1 (strict):\", lb1 == tf2.labelblocks[\"1\"])\n", "print(\"tf.lb2 = tf2.lb2 (strict):\", lb2 == tf2.labelblocks[\"2\"])\n", "print(\"tf.lb2 ~ tf2.lb2 (almost):\", lb2.almost_equal(tf2.labelblocks[\"2\"]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Grouping files: manual and convenience constructor\n", "You can group equivalent blocks across files either via TecanfilesGroup or by constructing a Titration directly." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Manual grouping\n", "tfg = prtecan.TecanfilesGroup([tf2, tf, tf1])\n", "lbg1 = tfg.labelblocksgroups[\"1\"]\n", "print(\"Well A01 raw:\", lbg1.data[\"A01\"])\n", "print(\"Well A01 normalized:\", lbg1.data_nrm[\"A01\"])\n", "\n", "# Same using Titration with explicit x (e.g., pH values)\n", "tit_manual = prtecan.Titration([tf2, tf, tf1], x=np.array([8.8, 7.2, 5.5]), is_ph=True)\n", "print(tit_manual)\n", "print(\n", " \"A01 normalized via Titration:\", tit_manual.labelblocksgroups[\"1\"].data_nrm[\"A01\"]\n", ")\n", "tit_manual.labelblocksgroups == tfg.labelblocksgroups" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Build a titration from a list file\n", "Using a list file is convenient and less error-prone. The example list/plate files are in `tests/Tecan/L1`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tit = prtecan.Titration.fromlistfile(l1_dir / \"list.pH.csv\", is_ph=True)\n", "print(\"x values (e.g., pH):\", tit.x)\n", "lbg1 = tit.labelblocksgroups[\"1\"]\n", "lbg2 = tit.labelblocksgroups[\"2\"]\n", "print(\n", " \"Temperature in labelblocksgroup 2:\",\n", " [lb.metadata.get(\"Temperature\").value for lb in lbg2.labelblocks],\n", " lbg2.labelblocks[5].metadata.get(\"Temperature\").unit[0],\n", ")\n", "(lbg1.metadata, lbg2.metadata)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Within each label-block group, normalized data (by Gain, Flashes, Integration Time) are readily available.\n", "In the case of not fully identical labelblock metadata non-normalized data might not exist (empty dict {})." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Inspect raw vs normalized for a sample well\n", "well = \"H03\"\n", "(lbg1.data[well], lbg2.data, lbg1.data_nrm[well], lbg2.data_nrm[well])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load plate scheme and additions\n", "The plate scheme defines buffer and control wells; additions define dilution steps.\n", "After loading these, the processed `tit.data[...]` arrays reflect background subtraction and optional dilution correction, depending on `tit.params`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Load plate scheme and additions (kept to L1 files for consistency)\n", "tit.load_scheme(l1_dir / \"scheme.txt\")\n", "print(\n", " f\"Titration with {len(tit.tecanfiles)} files and {len(tit.labelblocksgroups)} label groups\"\n", ")\n", "print(\"Buffer wells:\", tit.scheme.buffer)\n", "print(\"Control wells:\", tit.scheme.ctrl)\n", "print(\"Named groups:\", tit.scheme.names)\n", "\n", "tit.load_additions(l1_dir / \"additions.pH\")\n", "print(\"Additions:\", tit.additions)\n", "tit.params.bg_adj = False\n", "tit.params.bg_mth = \"meansd\"\n", "print(\"Titration Params:\", tit.params)\n", "# Example: compare values in data vs normalized groups (after scheme/additions)\n", "(lbg1.data[\"H12\"], tit.data[\"1\"][\"H12\"], lbg1.data_nrm[\"H12\"], tit.bg[\"1\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Background handling summary:\n", "- labelblocksgroups[:].data: unchanged raw block data\n", "- labelblocksgroups[:].data_buffersubtracted: background-subtracted\n", "- tit.data: background-subtracted and dilution-corrected (if enabled)\n", "\n", "The order in which you apply dilution correction and plate scheme can impact your intermediate results, even though the final results might be the same.\n", "\n", " Dilution correction adjusts the measured data to account for any dilutions made during sample preparation. This typically involves multiplying the measured values by the dilution factor to estimate the true concentration of the sample.\n", "\n", " A plate scheme describes the layout of the samples on a plate (common in laboratory experiments, such as those involving microtiter plates). The plate scheme may involve rearranging or grouping the data in some way based on the physical location of the samples on the plate." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Demonstrate changing background wells and seeing bg estimate\n", "import copy\n", "\n", "tit2 = copy.deepcopy(tit)\n", "tit2.params.bg = True\n", "tit2.buffer.wells = [\"D01\", \"E01\"]\n", "tit.bg, tit2.bg" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Quick look at fitting and results\n", "The `tit.results` container provides per-label fits; `tit.result_global` combines multiple labels.\n", "Below we only preview access/plotting. For advanced Bayesian/ODR methods, see the dedicated section." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tit.bg_err" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from clophfit.fitting.pipeline import fit_plate\n", "from clophfit.prtecan import TitrationResults\n", "\n", "# Create dataset dicts\n", "ds_dict1 = tit.create_dataset_dict(\"1\")\n", "ds_dict2 = tit.create_dataset_dict(\"2\")\n", "ds_dict_glob = tit.create_dataset_dict()\n", "\n", "# Compute fits\n", "res1 = fit_plate(ds_dict1, method=\"lm\")\n", "res2 = fit_plate(ds_dict2, method=\"lm\")\n", "res_glob = fit_plate(ds_dict_glob, method=\"lm\")\n", "res_odr = fit_plate(ds_dict_glob, method=\"odr\")\n", "\n", "# Create TitrationResults wrappers for dataframe access\n", "tr1 = TitrationResults(tit.scheme, tit.fit_keys, res1)\n", "tr2 = TitrationResults(tit.scheme, tit.fit_keys, res2)\n", "tr_glob = TitrationResults(tit.scheme, tit.fit_keys, res_glob)\n", "tr_odr = TitrationResults(tit.scheme, tit.fit_keys, res_odr)\n", "\n", "# Access result objects and figures\n", "well = \"D10\"\n", "single1 = tr1[well]\n", "single2 = tr2[well]\n", "glob = tr_glob[well]\n", "odr = tr_odr[well]\n", "\n", "# Display figures inline\n", "print(f\"Reduced X2: {single2.result.redchi:.3f}\")\n", "single2.figure" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(f\"Reduced X2: {glob.result.redchi:.3f}\")\n", "glob.figure" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(f\"Reduced X2: {odr.mini.sum_square:.3f}\")\n", "odr.figure" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tr1.dataframe.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Only fit D10 to avoid computing MCMC for the entire plate\n", "ds_mcmc = {\"D10\": tit.create_global_ds(\"D10\")}\n", "res_mcmc = fit_plate(ds_mcmc, method=\"mcmc\")\n", "tr_mcmc = TitrationResults(tit.scheme, tit.fit_keys, res_mcmc)\n", "tr_mcmc" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Multi MCMC is out of scope for this pipeline API" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "rp = tr_mcmc[well]\n", "rp.figure" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "az.plot_trace_dist(rp.mini, var_names=[\"x_true\", \"K\", \"x_diff\"])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# 5.1 Bayesian fitting with PyMC\n", "result_mcmc = tr_mcmc[well]\n", "\n", "print(\"MCMC Results:\")\n", "print(f\"Kd: {result_mcmc.result.params['K'].value:.2f}\")\n", "print(\n", " f\"95% HDI: [{result_mcmc.result.params['K'].min:.2f}, {result_mcmc.result.params['K'].max:.2f}]\"\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Plot trace\n", "az.plot_trace_dist(result_mcmc.mini, var_names=[\"K\", \"x_true\"]);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Quality control and utilities\n", "A few helper plots are useful to quickly assess experiment consistency (buffer, temperature)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Buffer plot\n", "buf_plot = tit.buffer.plot(nrm=False)\n", "buf_plot.figure" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Temperature plot\n", "temp_plot = tit.plot_temperature()\n", "temp_plot" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import seaborn as sns\n", "\n", "df1 = pd.read_csv(l2_dir / \"fit1-1.csv\", index_col=0)\n", "# merged_df = tit.result_dfs[\"1\"][[\"K\", \"sK\"]].merge(df1, left_index=True, right_index=True)\n", "merged_df = tr_glob.dataframe[[\"K\", \"sK\"]].merge(df1, left_index=True, right_index=True)\n", "\n", "sns.jointplot(merged_df, x=\"K_y\", y=\"K_x\", ratio=3, space=0.4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If a fit fails in a well, the well key will be anyway present in results list of dict." ] }, { "cell_type": "raw", "metadata": {}, "source": [ "tit.results[1].compute_all()\n", "\n", "conf = prtecan.TecanConfig(Path(\"jjj\"), False, (), \"\", True, True)\n", "\n", "tit.export_data_fit(conf)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Posterior" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "\n", "from clophfit.fitting import plotting\n", "\n", "np.random.seed(0) # noqa: NPY002\n", "remcee = glob.mini.emcee(\n", " burn=100,\n", " steps=2000,\n", " workers=int(os.environ.get(\"CLOPHFIT_EMCEE_WORKERS\", \"4\")),\n", " thin=10,\n", " nwalkers=30,\n", " progress=False,\n", " is_weighted=True,\n", ")\n", "\n", "f = plotting.plot_emcee(remcee.flatchain)\n", "print(remcee.flatchain.quantile([0.03, 0.97])[\"K\"].to_list())" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "samples_dict = {\n", " col: remcee.flatchain[col].to_numpy()[np.newaxis, :] # (1 chain, 5700 draws)\n", " for col in remcee.flatchain\n", "}\n", "idata = az.from_dict({\"posterior\": samples_dict})\n", "\n", "# percentile on K\n", "k_samples = idata.posterior[\"K\"].to_numpy().flatten()\n", "percentile_value = np.percentile(k_samples, 3)\n", "print(f\"Value at which the probability of being higher is 99%: {percentile_value:.4f}\")\n", "\n", "az.plot_ridge(idata, var_names=\"K\")\n", "az.plot_forest(idata, var_names=\"K\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "samples = remcee.flatchain[[\"K\"]]\n", "k_samples = samples[\"K\"].to_numpy()\n", "percentile_value = np.percentile(k_samples, 3)\n", "print(f\"Value at which the probability of being higher is 99%: {percentile_value}\")\n", "\n", "plt.hist(k_samples, bins=30, density=True)\n", "plt.xlabel(\"K\")\n", "plt.title(\"Posterior of K\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Combining" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Computed automatically earlier" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "with sns.axes_style(\"darkgrid\"):\n", " g = sns.pairplot(\n", " tr_glob.dataframe[[\"S1_2\", \"S0_2\", \"K\", \"S1_1\", \"S0_1\"]],\n", " hue=\"S1_1\",\n", " palette=\"Reds\",\n", " corner=True,\n", " diag_kind=\"kde\",\n", " )" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df_ctr = tr1.dataframe\n", "for name, wells in tit.scheme.names.items():\n", " for well in wells:\n", " df_ctr.loc[well, \"ctrl\"] = name\n", "\n", "df_ctr.loc[df_ctr[\"ctrl\"].isna(), \"ctrl\"] = \"U\"\n", "\n", "sns.set_style(\"whitegrid\")\n", "g = sns.PairGrid(\n", " df_ctr,\n", " x_vars=[\"K\", \"S1_1\", \"S0_1\"],\n", " y_vars=[\"K\", \"S1_1\", \"S0_1\"],\n", " hue=\"ctrl\",\n", " palette=\"Set1\",\n", " diag_sharey=False,\n", ")\n", "g.map_lower(plt.scatter)\n", "g.map_upper(sns.kdeplot, fill=True)\n", "g.map_diag(sns.kdeplot)\n", "g.add_legend()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tr_glob[\"A04\"].figure" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "keys_unk = tit.fit_keys - set(tit.scheme.ctrl)\n", "res_unk = tr_glob.dataframe.loc[list(keys_unk)].sort_index()\n", "res_unk[\"well\"] = res_unk.index\n", "\n", "f = plt.figure(figsize=(24, 14))\n", "# Make the PairGrida\n", "g = sns.PairGrid(\n", " res_unk,\n", " x_vars=[\"K\", \"S1_2\", \"S0_2\"],\n", " y_vars=\"well\",\n", " height=12,\n", " aspect=0.4,\n", ")\n", "# Draw a dot plot using the stripplot function\n", "g.map(sns.stripplot, size=14, orient=\"h\", palette=\"Set2\", edgecolor=\"auto\")\n", "\n", "# Use the same x axis limits on all columns and add better labels\n", "# g.set(xlim=(0, 25), xlabel=\"Crashes\", ylabel=\"\")\n", "\n", "# Use semantically meaningful titles for the columns\n", "titles = [\"$pK_a$\", \"B$_{neutral}$\", \"B$_{anionic}$\"]\n", "\n", "for ax, title in zip(g.axes.flat, titles, strict=False):\n", " # Set a different title for each axes\n", " ax.set(title=title)\n", "\n", " # Make the grid horizontal instead of vertical\n", " ax.xaxis.grid(visible=False)\n", " ax.yaxis.grid(visible=True)\n", "\n", "sns.despine(left=True, bottom=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Background method comparison\n", "Different background methods may slightly shift baselines; inspect the impact on a single well." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "methods = [\"mean\", \"meansd\", \"fit\"]\n", "well = \"D10\"\n", "fig, axes = plt.subplots(1, len(methods), figsize=(16, 4), sharey=True)\n", "for ax, method in zip(axes, methods, strict=False):\n", " tit.params.bg_mth = method\n", " ax.plot(tit.x, tit.data[\"1\"][well], \"o-\", label=method)\n", " ax.axhline(0, color=\"gray\", ls=\"--\", lw=1)\n", " ax.set_title(f\"method: {method}\")\n", " ax.set_xlabel(\"pH\")\n", "axes[0].set_ylabel(\"Signal\")\n", "plt.tight_layout()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can decide how to pre-process data with datafit_params:\n", "- [bg] subtract background\n", "- [dil] apply correction for dilution (when e.g. during a titration you add titrant without protein)\n", "- [nrm] normalize for gain, number of flashes and integration time. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# 3.1 Accessing processed data\n", "well = \"D10\"\n", "data = {\n", " \"pH\": tit.x,\n", " \"Signal (raw)\": tit.labelblocksgroups[\"1\"].data_nrm[well],\n", " \"Signal (processed)\": tit.data[\"1\"][well],\n", "}\n", "\n", "plt.figure(figsize=(10, 5))\n", "plt.plot(data[\"pH\"], data[\"Signal (raw)\"], \"o-\", label=\"Raw\")\n", "plt.plot(data[\"pH\"], data[\"Signal (processed)\"], \"s-\", label=\"Processed\")\n", "plt.xlabel(\"pH\")\n", "plt.ylabel(\"Fluorescence\")\n", "plt.title(f\"Data Processing Pipeline for Well {well}\")\n", "plt.legend()\n", "plt.grid(visible=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Cl titration analysis" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "cl_an = prtecan.Titration.fromlistfile(l2_dir / \"list.cl.csv\", is_ph=False)\n", "cl_an.load_scheme(l2_dir / \"scheme.txt\")\n", "cl_an.scheme" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from clophfit import prtecan\n", "\n", "cl_an.load_additions(l2_dir / \"additions.cl\")\n", "print(cl_an.x)\n", "cl_an.x = prtecan.calculate_conc(cl_an.additions, 1000)\n", "cl_an.x" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ds_glob_cl = cl_an.create_dataset_dict()\n", "tr_glob_cl = prtecan.TitrationResults(\n", " cl_an.scheme, cl_an.fit_keys, fit_plate(ds_glob_cl, method=\"lm\")\n", ")\n", "fres = tr_glob_cl[\"D10\"]\n", "print(fres.is_valid(), fres.result.bic, fres.result.redchi)\n", "fres.figure" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Batch export (optional)\n", "You can export processed data and fit results using `TecanConfig`.\n", "Note: adjust paths and toggles (png, fit, comb) as needed." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tit.params" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tit.params.bg_mth = \"meansd\"\n", "tit.params.mcmc = None" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Fits are computed via fit_plate now" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from tempfile import mkdtemp\n", "\n", "from clophfit.prtecan.export import export_data_fit\n", "\n", "out_dir = Path(mkdtemp())\n", "conf = prtecan.TecanConfig(\n", " out_fp=out_dir, comb=False, lim=None, title=\"FullAnalysis\", fit=True, png=True\n", ")\n", "export_data_fit(tit, conf)\n", "print(\"Exported to:\", out_dir)\n", "# list(out_dir.glob('*'))[:10]\n", "# print(\"Contents:\", *[f.name for f in output_dir.glob(\"*\")], sep=\"\\n- \")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3" } }, "nbformat": 4, "nbformat_minor": 4 }