spras.analysis package
Submodules
spras.analysis.cytoscape module
- spras.analysis.cytoscape.run_cytoscape(pathways: List[str | PurePath], output_file: str, container_framework='docker') None
Create a Cytoscape session file with visualizations of each of the provided pathways @param pathways: a list of pathways to visualize @param output_file: the output Cytoscape session file @param container_framework: choose the container runtime framework, currently supports “docker” or “singularity” (optional)
spras.analysis.ml module
- spras.analysis.ml.create_palette(column_names)
Generates a dictionary mapping each column name (algorithm name) to a unique color from the specified palette.
- spras.analysis.ml.ensemble_network(dataframe: DataFrame, output_file: str)
Calculates the mean of the binary values in the provided dataframe to create an ensemble pathway. Counts the number of times an edge appears in a set of pathways and divides by the total number of pathways. Edges that appear more frequently across pathways are more likely to be robust, so this information can be used to filter edges in a final network. @param dataframe: binary dataframe of edge presence and absence in each pathway from summarize_networks @param output_file: the filename to save the ensemble network
- spras.analysis.ml.hac_horizontal(dataframe: DataFrame, output_png: str, output_file: str, linkage: str = 'ward', metric: str = 'euclidean')
Performs hierarchical agglomerative clustering on the dataframe, creates a dendrogram of the resulting tree using sckit learn and makes cluster groups scipy, and saves the dendrogram and the cluster labels of said dendrogram in separate files. @param dataframe: binary dataframe of edge comparison between algorithms from summarize_networks @param output_png: the file name to save the dendrogram image @param output_file: the file name to save the clustering labels @param linkage: methods for calculating the distance between clusters @param metric: used for distance computation between instances of clusters
- spras.analysis.ml.hac_vertical(dataframe: DataFrame, output_png: str, output_file: str, linkage: str = 'ward', metric: str = 'euclidean')
Performs hierarchical agglomerative clustering on the dataframe, creates a dendrogram of the resulting tree using seaborn and scipy for the cluster groups, and saves the dendrogram and the cluster labels of said dendrogram in separate files. @param dataframe: binary dataframe of edge comparison between algorithms from summarize_networks @param output_png: the file name to save the dendrogram image @param output_file: the file name to save the clustering labels @param linkage: methods for calculating the distance between clusters @param metric: used for distance computation between instances of clusters
- spras.analysis.ml.jaccard_similarity_eval(summary_df: DataFrame, output_file: str, output_png: str)
Calculates the pairwise Jaccard similarity matrix from the binary representation of summary_df. Save the resulting similarity matrix as a tab-delimited file and generates and save a heatmap visualization of the similarities. @param summary_df: pandas dataframe with algorithm-parameter summary information @param output_file: the filename to save the ensemble network @param output_png: the file name to save the heatmap image
- spras.analysis.ml.pca(dataframe: DataFrame, output_png: str, output_var: str, output_coord: str, components: int = 2, labels: bool = True, kde: bool = False, remove_empty_pathways: bool = False)
Performs PCA on the data and creates a scatterplot of the top two principal components. It saves the plot, the variance explained by each component, and the coordinates corresponding to the plot of each algorithm in a separate file. @param dataframe: binary dataframe of edge comparison between algorithms from summarize_networks @param output_png: the filename to save the scatterplot @param output_var: the filename to save the variance explained by each component @param output_coord: the filename to save the coordinates of each algorithm @param components: the number of principal components to calculate (Default is 2) @param labels: determines if labels will be included in the scatterplot (Default is True) @param kde: if True, overlays a kernel density estimate (KDE) on top of the PCA scatterplot (Default is False). Also saves coordinates to kde maximum (kde_peak) to output_coord file. @remove_empty_pathways: if True, removes pathways (columns) from the dataframe that contain no edges before performing PCA (Default is False)
- spras.analysis.ml.plot_dendrogram(model, **kwargs)
Plot a dendrogram to visualize a hierarchical clustering solution @param model: the fit AgglomerativeClustering model @param kwargs: arguments passed to the dendrogram function
- spras.analysis.ml.summarize_networks(file_paths: Iterable[str | PathLike]) DataFrame
Takes in a list of file paths and creates a binary dataframe where each row corresponds to an edge and each column corresponds to an algorithm. The values in the dataframe are 1 if the edge is present in the algorithm and 0 otherwise. Assumes edges are undirected. @param file_paths: file paths of pathway reconstruction algorithm outputs
- spras.analysis.ml.validate_df(dataframe: DataFrame)
Raises an error if the dataframe is empty or contains one pathway (one row) @param dataframe: datafrom of pathways to validate
spras.analysis.summary module
- spras.analysis.summary.degree(g)
- spras.analysis.summary.summarize_networks(file_paths: Iterable[Path], node_table: DataFrame, algo_params: dict[str, dict], algo_with_params: list) DataFrame
Generate a table that aggregates summary information about networks in file_paths, including which nodes are present in node_table columns. Network directionality is ignored and all edges are treated as undirected. The order of the file_paths and algo_with_params inputs must match after they are each sorted. @param file_paths: iterable of edge list files @param node_table: pandas DataFrame containing node attributes @param algo_params: a nested dict mapping algorithm names to dicts that map parameter hashes to parameter combinations. @param algo_with_params: a list of <algorithm>-params-<params_hash> combinations @return: pandas DataFrame with summary information