This package is not yet available on the PyPI, therefore it must be installed via PIP and Git. For e.g. to install the package and its requirements in one go, one can do: pip install -U git+ssh://git@github.com/Pappulab/microphase-assemblies.git. Doing so will uninstall any previously installed versions. Note that since this is currently a private repository, it requires SSH keys for access, which need to be configured prior to the pip install step.
Alternatively, one can download a zip file of the latest version of the repository and install it via: pip install -U microphase-assemblies-main.zip.
This repository contains the following scripts which successfully used, provide a pipeline of how to extract, filter, tessellate, and analyze the assemblies within an input TIFF image. The installed scripts are:
extract_assemblies- This script examines an input TIFF file and determines the assemblies within the image using a provided thresholding range. Due to lighting differences, parts of the image are not uniformly lit, which leads to errors in detections. As such, having a thresholding range generally allows for flexibility in object detection, but can be very sensitive to the values used. Judiciously chosen values are provided but may need further tweaking. This script also supports defining custom sections to analyze in recognition of the lighting differences within images which can lead to erroneous or poor detections.filter_results- This script takes the output from the previous script and applies a filter based on the ansatz that assemblies are likely to be more circular as well as well characterized within the field of view. This can lead to a dramatic reduction in the number of assemblies.tessellate- This script takes the output of the previous script and applies a voronoi tesselation method using the centers of the detected assemblies. It also performs a similar analysis using the bounding box of the assembly to fit an ellipse and the corresponding voronoi cells and their areas that emerge with those objects.sweep- This script sweeps across the entire input TIFF image in defined increments and determines the number of assemblies within each sweep.
In brief, it is recommended to use the scripts in the order provided. Please consult each section for more information.
To extract assemblies, we use the scikit-image package to identify regions using custom thresholds. This is required as the thresholding algorithm is sensitive the distribution of pixels within the field of view. Regions that are unevenly lit relative to other parts of an input image can be over- or under-saturated which can lead to erroenous or poorly identified detections.
usage: extract_assemblies [-h] [-a MIN_AREA] [-bbs BOUNDING_BOX_SAVENAME] [-c] [-e EXCLUDED_AREA]
[-es EXPORTED_REGIONS_SAVENAME] [-i INPUT_TIFF_FILE]
[-isc IMAGE_SECTION_COORDINATES IMAGE_SECTION_COORDINATES IMAGE_SECTION_COORDINATES IMAGE_SECTION_COORDINATES]
[-l LIMITS LIMITS] [-rbs REGIONS_BASE_SAVENAME] [-rfs REGIONS_FILTERED_PLOT_SAVENAME]
[-rss REGIONS_SIZES_PLOT_SAVENAME]
options:
-h, --help show this help message and exit
-a MIN_AREA, --min-area MIN_AREA
The minimum area to exclude.
-bbs BOUNDING_BOX_SAVENAME, --bounding-box-savename BOUNDING_BOX_SAVENAME
The savename of the bounding box plot.
-c, --cropped Whether or not the input image is cropped
-e EXCLUDED_AREA, --excluded-area EXCLUDED_AREA
The min area of excluded regions (patched) to ignore.
-es EXPORTED_REGIONS_SAVENAME, --exported-regions-savename EXPORTED_REGIONS_SAVENAME
The savename of the exported regions
-i INPUT_TIFF_FILE, --input-tiff-file INPUT_TIFF_FILE
The name / path of the input TIFF file
-isc IMAGE_SECTION_COORDINATES IMAGE_SECTION_COORDINATES IMAGE_SECTION_COORDINATES IMAGE_SECTION_COORDINATES, --image-section-coordinates IMAGE_SECTION_COORDINATES IMAGE_SECTION_COORDINATES IMAGE_SECTION_COORDINATES IMAGE_SECTION_COORDINATES
The coordinates of the region to analyze (ymin, ymax, xmin, xmax).
-l LIMITS LIMITS, --limits LIMITS LIMITS
The lower and upper limits for thresholding.
-rbs REGIONS_BASE_SAVENAME, --regions-base-savename REGIONS_BASE_SAVENAME
-rfs REGIONS_FILTERED_PLOT_SAVENAME, --regions-filtered-plot-savename REGIONS_FILTERED_PLOT_SAVENAME
-rhs REGIONS_SIZES_HISTOGRAM_SAVENAME, --regions-sizes-histogram-savename REGIONS_SIZES_HISTOGRAM_SAVENAME
-rss REGIONS_SIZES_PLOT_SAVENAME, --regions-sizes-plot-savename REGIONS_SIZES_PLOT_SAVENAME
# Example:
# Extract all assemblies above size / pixel area 0 - note this will generate many points!
# Use a defined region in the image for analysis
# Apply the thresholding range of 100 to 200 (min 0, max 255)
extract_assemblies -i data/Pappu_Dylan_S1glass_019.tif -a 0 -isc 0 2960 0 2528 -l 100 200
# A simpler version
extract_assemblies -i data/Pappu_Dylan_S1glass_019.tif -l 150 250Although assemblies whose raw area is less than 100 pixels squared are excluded by default (i.e. -a 100), we recommend using -a 0. Doing so ensures that all the data is not truncated and can be analyzed in more detail via the filter_results script.
Upon running this script, the following files under results will be generated:
results/{YOURIMAGE}_{METADATA}_all_regions.pkl- This Python pickle file contains the numpy array corresponding to all the detected regions.results/{YOURIMAGE}_{METADATA}_filtered_regions.pkl- This Python pickle file contains the numpy array corresponding to the regions that are larger than a provided minimum area.results/{YOURIMAGE}_{METADATA}_filter1.tsv- a TSV file containing all the information about the detected assemblies. This file is theresults/{YOURIMAGE}_{METADATA}_sizes-histogram.npy- A Numpy file containing the histogram of assemblies found in bin sizes of 100.results/{YOURIMAGE}_{METADATA}_sizes-histogram-edges.npy- The corresponding bin edges for the histogram in the previous item.
Similarly, the following plots will be generated:
results/{YOURIMAGE}_{METADATA}_bboxes.pdf- A plot of the image (with requested region if so provided) with all the detections overlain with filter1 (exclude objects less than a minimum area of N pixels squared) applied.results/{YOURIMAGE}_{METADATA}_sizes.pdf- A histogram of the areas of the detected assemblies by bin sizes of 100 pixels.results/{YOURIMAGE}_{METADATA}_size-filtered.pdf- A similar plot to item 1 but with the background image removed. This is not as informative and is more useful for debugging purposes.
Note that extra metadata will be added to the filenames such as the image section coordinates, the minimum area, and the thresholding limits. Although the names are verbose, this approach allows for more scriptability and easier titrations of different values for the attributes mentioned.
Once the assemblies have been extracted in the previous step, we can apply further filtering using a custom minimum region as well as presumed attributes that describe microphases, such as eccentricity and the ellipse fraction occupied within the bounding box. This filter is called filter2 and shows up in the name of the output files.
usage: filter_results [-h] [-bc BOUNDING_BOX_COLOR] [-c] [-i INPUT_TIFF_FILE] [-ma MINIMUM_AREA]
[-me MAX_ECCENTRICITY] [-mef MIN_ELLIPSE_FRACTION] [-t INPUT_TSV_FILE]
options:
-h, --help show this help message and exit
-bc BOUNDING_BOX_COLOR, --bounding-box-color BOUNDING_BOX_COLOR
The color to use for the bounding boxes.
-c, --cropped Whether or not the input image is cropped
-i INPUT_TIFF_FILE, --input-tiff-file INPUT_TIFF_FILE
The name / path of the input TIFF file
-ma MINIMUM_AREA, --minimum-area MINIMUM_AREA
The minimum area in pixels of assemblies sizes to exclude.
-me MAX_ECCENTRICITY, --max-eccentricity MAX_ECCENTRICITY
The maximum eccentricity of the detected assemblies to apply during filtering.
-mef MIN_ELLIPSE_FRACTION, --min-ellipse-fraction MIN_ELLIPSE_FRACTION
The fraction of the ellipse within the detected assembly which must be populated.
-t INPUT_TSV_FILE, --input-tsv-file INPUT_TSV_FILE
The input TSV file corresponding to all the detections made from the `extract_assemblies`
script.
# Example:
filter_results -i data/Pappu_Dylan_S1glass_019.tif \
-t results/Pappu_Dylan_S1glass_019___0_2960_0_2528___min-area-0_limits-100-200_filter1.tsvUpon successful run of this file, the following files are generated:
results/{YOURIMAGE}_{METADATA}_assemblies_filter2.pdf- A plot of the filtered assemblies.results/{YOURIMAGE}_{METADATA}_histogram_filter2.pdf- A histogram of the filtered assemblies sizes to illustrate any apparent population distributions within the filtered set.results/{YOURIMAGE}_{METADATA}_filter2.tsv- The TSV that has been filtered using the provided parameters. For error-checking purposes, the indices of the filtered assemblies correspond to those within the input TSV.
This process is relatively quick and should take no more than a few seconds.
Using the TSV output from the previous step, we can now tessellate the space. We perform two types of tessellations:
- Tessellations using the an ellipse to represent the detected (and filtered) assembly. This provides a closer ground-truth of the areas occupied by the voronoi cells, but is more difficult to delineate due to the voronoi edges. We use the Python package
pyclesperanto_prototypeto perform this analysis. - Tessellations using the centers of the detected and filtered microphase assemblies. Here, we use the centers of the bounding boxes and apply the Voronoi algorithm with
scipy.spatialto tessellate the space. These results are exported as pickled objects for further analysis.
Upon successful completion of this step, the following data files are generated:
results/{YOURIMAGE}_{METADATA}_tessellation_assemblies.npy- This is a numpy array containing all the microphase assemblies represented as ellipses, whose sizes were determined from the bounding boxes of the assemblies.results/{YOURIMAGE}_{METADATA}_tessellation_assemblies_centers.tsv- This TSV file contains the center coordinates of the microphase assemblies in cartesian space.results/{YOURIMAGE}_{METADATA}_tessellation_assemblies_labels.npy- This numpy file contains the array corresponding to the labelled assemblies.results/{YOURIMAGE}_{METADATA}_tessellation_edges.npy- This numpy file contains the array corresponding to the edges of the voronoi cells as determined from the tessellation of the microphase assemblies.results/{YOURIMAGE}_{METADATA}_tessellation_calculations.npy- This file contains the Voronoi cells calculations using the centers of the microphase assemblies in cartesian coordinates.results/{YOURIMAGE}_{METADATA}_tessellation_voronoi_objects_cells_areas.tsv- This TSV file contains the areas of the voronoi cells as determined from the voronoi tessellation of the ellipse representation of the microphase assemblies.
Similarly, the following plots are generated:
results/{YOURIMAGE}_{METADATA}_centers_tessellation.npy- This is the pickled file containing the voronoi tessellations calculated fromscipy.spatial.results/{YOURIMAGE}_{METADATA}_objects_tessellation.npy- This is the pickled file of the grid that has been tessellated from the ellipses.
usage: tessellate [-h] [-b BOUNDING_BOX BOUNDING_BOX BOUNDING_BOX BOUNDING_BOX] [-c] [-i INPUT_TIFF_FILE]
[-t INPUT_TSV_FILE]
options:
-h, --help show this help message and exit
-b BOUNDING_BOX BOUNDING_BOX BOUNDING_BOX BOUNDING_BOX, --bounding-box BOUNDING_BOX BOUNDING_BOX BOUNDING_BOX BOUNDING_BOX
The region on the image to analyze.
-c, --cropped Whether or not the input image is cropped
-i INPUT_TIFF_FILE, --input-tiff-file INPUT_TIFF_FILE
The name / path of the input TIFF file
-t INPUT_TSV_FILE, --input-tsv-file INPUT_TSV_FILE
The input TSV file corresponding to all the filtered detections made from the
`filter_results` script.
# Example:
tessellate -i data/Pappu_Dylan_S1glass_019.tif \
-t results/Pappu_Dylan_S1glass_019_min-area-150_max-eccentricity-0.6_min-ellipse-fraction-0.4_filter2.tsvThis process can take up to a minute due to how the tessellation objects are calculated and enumerated.
Next, we can determine the average number of assemblies within a droplet of a defined size by using the identified labels from the prior step. This is achieved by sweeping across the input image and its detected assemblies by a set increment and analyzing the average number of assemblies found. Due to edge effects, sweeps are performed from the border of the input step size. For e.g., if an image has the size 5000x3000 pixels and the step size 100 pixels is used, the first sweep will begin from coordinate (100, 100). An alternative method for exploration, bootstrapping, could be performed here as well.
Upon a successful run of this script, the following files will be generated:
results/{YOURIMAGE}_{METADATA}_sweep-regions_{SWEEP_METADATA}.npy- This numpy file contains the unique regions found within each sweep.results/{YOURIMAGE}_{METADATA}_sweep-areas_{SWEEP_METADATA}.npy- This numpy file contains the areas of the detected regions within each sweep.results/{YOURIMAGE}_{METADATA}_sweep-detections_{SWEEP_METADATA}.tsv- This TSV file contains the number of unique regions found within each sweep.results/{YOURIMAGE}_{METADATA}_sweep-distances_{SWEEP_METADATA}.tsv- This TSV file contains the distances among each set of points found within each sweep. This file references the assemblies centers (in Cartesian coordinates) and does a pairwise distance calculation of any points found within a sweep. No self-comparisons or repeated comparisons are performed. Therefore in cases where there is only 1 point, no distance calculations are performed. The distances for an individual sweep can be selected from the DataFrame using the corresponding sweep number.plots/{YOURIMAGE}_{METADATA}_regions-boxplot_{SWEEP_METADATA}.pdf- A boxplot of the number of microassemblies detected using the information from item 4.plots/{YOURIMAGE}_{METADATA}_distances-boxplot_{SWEEP_METADATA}.pdf- A boxplot of the number of microassemblies distances across all non-empty sweeps.
usage: sweep [-h] [-a ASSEMBLIES_FILENAME] [-c ASSEMBLIES_CENTERS] [-d] [-db DISTANCES_BASENAME]
[-i INPUT_TIFF_FILE] [-l LABELS_FILENAME] [-nm SCALEBAR_NM] [-px SCALEBAR_PIXELS] [-r RADIUS]
[-s STEP_SIZE]
options:
-h, --help show this help message and exit
-a ASSEMBLIES_FILENAME, --assemblies-filename ASSEMBLIES_FILENAME
The numpy pickle file containing all detected assemblies represented as filled ellipses.
-c ASSEMBLIES_CENTERS, --assemblies-centers ASSEMBLIES_CENTERS
The TSV file containing all the centers of the assemblies in Cartesian coordinates.
-d, --debug Interactively examine / debug the selected microphase assemblies determined from the
sweeping algorithm.
-db DISTANCES_BASENAME, --distances-basename DISTANCES_BASENAME
The basename that will be used for saving the distances determined by sweeping.
-i INPUT_TIFF_FILE, --input-tiff-file INPUT_TIFF_FILE
The name / path of the input TIFF file
-l LABELS_FILENAME, --labels-filename LABELS_FILENAME
The numpy pickle file containing the detected assemblies represented as filled ellipses.
-nm SCALEBAR_NM, --scalebar-nm SCALEBAR_NM
The image scalebar length in nm.
-px SCALEBAR_PIXELS, --scalebar-pixels SCALEBAR_PIXELS
The length of the scalebar in pixels.
-r RADIUS, --radius RADIUS
The radius of the droplet to use in nm to sweep across the image.
-s STEP_SIZE, --step-size STEP_SIZE
The step size in pixels of the sweep across both the x and y dimensions.
# Example
sweep -i data/Pappu_Dylan_S1glass_019.tif \
-a results/Pappu_Dylan_S1glass_019_min-area-150_max-eccentricity-0.6_min-ellipse-fraction-0.4_tessellation_assemblies.npy \
-c results/Pappu_Dylan_S1glass_019_min-area-150_max-eccentricity-0.6_min-ellipse-fraction-0.4_tessellation_assemblies_centers.tsv \
-l results/Pappu_Dylan_S1glass_019_min-area-150_max-eccentricity-0.6_min-ellipse-fraction-0.4_tessellation_assemblies_labels.npy \
-r 1000 \
-px 542 \
-nm 600 \
-s 100 \
-dNote that smaller step sizes will increase the time spent in calculation.
This package uses the incremental Python package for managing auto-incrmenting of package versions when changes are committed. To enable this behavior automatically, one needs to do: git config core.hooksPath .githooks in the directory containing the pyproject.toml file. Upon doing so, package version increments will be automatic.
One can use conda to create an environment for development / testing, or, Python's venv module. This is recommended so as to ensure that package versions are siloed and do not create dependency issues with other packages.
# CONDA
conda create -n yourenvname python=3.10
conda activate yourenvname
# VENV
python -m venv yourenvname # note: this creates the environment `yourenvname` in the current directory!
source yourenvname/activate # this will prepend your shell prompt with `(yourenvname)`For easier active development, it's recommended to install the package in editable mode (-e) using pip. Doing so ensures that any functionality modifications are immediately available for use without the need to reinstall everything from scratch. (Occasional reinstallations are required when new packages are included.)
# ZIP
# first, download the zip file from the web which will have the name `microphase-assemblies-main.zip`
unzip microphase-assemblies-main.zip
cd microphase-assemblies-main
pip install -U -e .
# GIT (recommended)
git clone git@github.com:Pappulab/microphase-assemblies.git
cd microphase-assemblies
pip install -U -e .
# shell