spras package
Subpackages
Submodules
spras.allpairs module
- class spras.allpairs.AllPairs
Bases:
PRM
- dois: list[str] = []
- static generate_inputs(data: Dataset, filename_map)
Access fields from the dataset and write the required input files @param data: dataset @param filename_map: a dict mapping file types in the required_inputs to the filename for that type
- static parse_output(raw_pathway_file, standardized_pathway_file, params)
Convert a predicted pathway into the universal format @param raw_pathway_file: pathway file produced by an algorithm’s run function @param standardized_pathway_file: the same pathway written in the universal format
- required_inputs: list[str] = ['nodetypes', 'network', 'directed_flag']
- static run(nodetypes=None, network=None, directed_flag=None, output_file=None, container_framework='docker')
Run All Pairs Shortest Paths with Docker @param nodetypes: input node types with sources and targets (required) @param network: input network file (required) @param container_framework: choose the container runtime framework, currently supports “docker” or “singularity” (optional) @param output_file: path to the output pathway file (required)
spras.config module
spras.containers module
- spras.containers.convert_docker_path(src_path: PurePath, dest_path: PurePath, file_path: str | PurePath) PurePosixPath
Convert a file_path that is in src_path to be in dest_path instead. For example, convert /usr/mydir and /usr/mydir/myfile and /tmp to /tmp/myfile @param src_path: source path that is a parent of file_path @param dest_path: destination path @param file_path: filename that is under the source path @return: a new path with the filename relative to the destination path
- spras.containers.download_gcs(gcs_path: str, local_path: str, is_dir: bool)
- spras.containers.env_to_items(environment: dict[str, str]) Iterator[str]
Turns an environment variable dictionary to KEY=VALUE pairs.
- spras.containers.prepare_dsub_cmd(flags: dict[str, str | list[str]])
- spras.containers.prepare_path_docker(orig_path: PurePath) str
Prepare an absolute path for mounting as a Docker volume. Converts Windows file separators to posix separators. Converts Windows drive letters in absolute paths.
- spras.containers.prepare_volume(filename: str | PurePath, volume_base: str | PurePath) Tuple[Tuple[PurePath, PurePath], str]
Makes a file on the local file system accessible within a container by mapping the local (source) path to a new container (destination) path and renaming the file to be relative to the destination path. The destination path will be a new path relative to the volume_base that includes a hash identifier derived from the original filename. An example mapped filename looks like ‘/spras/MG4YPNK/oi1-edges.txt’. @param filename: The file on the local file system to map @param volume_base: The base directory in the container, which must be an absolute directory @return: first returned object is a tuple (source path, destination path) and the second returned object is the updated filename relative to the destination path
- spras.containers.run_container(framework: str, container_suffix: str, command: List[str], volumes: List[Tuple[PurePath, PurePath]], working_dir: str, environment: dict[str, str] | None = None)
Runs a command in the container using Singularity or Docker @param framework: singularity or docker @param container_suffix: name of the DockerHub container without the ‘docker://’ prefix @param command: command to run in the container @param volumes: a list of volumes to mount where each item is a (source, destination) tuple @param working_dir: the working directory in the container @param environment: environment variables to set in the container @return: output from Singularity execute or Docker run
- spras.containers.run_container_and_log(name: str, framework: str, container_suffix: str, command: List[str], volumes: List[Tuple[PurePath, PurePath]], working_dir: str, environment: dict[str, str] | None = None)
Runs a command in the container using Singularity or Docker with associated pretty printed messages. @param name: the display name of the running container for logging purposes @param framework: singularity or docker @param container_suffix: name of the DockerHub container without the ‘docker://’ prefix @param command: command to run in the container @param volumes: a list of volumes to mount where each item is a (source, destination) tuple @param working_dir: the working directory in the container @param environment: environment variables to set in the container @return: output from Singularity execute or Docker run
- spras.containers.run_container_docker(container: str, command: List[str], volumes: List[Tuple[PurePath, PurePath]], working_dir: str, environment: dict[str, str] | None = None)
Runs a command in the container using Docker. Attempts to automatically correct file owner and group for new files created by the container, setting them to the current owner and group IDs. Does not modify the owner or group for existing files modified by the container. @param container: name of the DockerHub container without the ‘docker://’ prefix @param command: command to run in the container @param volumes: a list of volumes to mount where each item is a (source, destination) tuple @param working_dir: the working directory in the container @param environment: environment variables to set in the container @return: output from Docker run, or will error if the container errored.
- spras.containers.run_container_dsub(container: str, command: List[str], volumes: List[Tuple[PurePath, PurePath]], working_dir: str, environment: dict[str, str] | None = None) str
Runs a command in the Google Cloud using dsub. @param container: name of the container in the Google Cloud Container Registry @param command: command to run @param volumes: a list of volumes to mount where each item is a (source, destination) tuple @param working_dir: the working directory in the container @param environment: environment variables to set in the container @return: path of output from dsub
- spras.containers.run_container_singularity(container: str, command: List[str], volumes: List[Tuple[PurePath, PurePath]], working_dir: str, environment: dict[str, str] | None = None)
Runs a command in the container using Singularity. Only available on Linux. @param container: name of the DockerHub container without the ‘docker://’ prefix @param command: command to run in the container @param volumes: a list of volumes to mount where each item is a (source, destination) tuple @param working_dir: the working directory in the container @param environment: environment variable to set in the container @return: output from Singularity execute
- spras.containers.upload_gcs(local_path: str, gcs_path: str, is_dir: bool)
spras.dataset module
- class spras.dataset.Dataset(dataset_dict)
Bases:
object
- NODE_ID = 'NODEID'
- contains_node_columns(col_names)
col_names: A list-like object of column names to check or a string of a single column name to check. returns: Whether or not all columns in col_names exist in the dataset.
- classmethod from_file(file_name: str)
Loads dataset object from a pickle file. Usage: dataset = Dataset.from_file(pickle_file)
- get_interactome() DataFrame | None
- get_other_files()
- load_files_from_dict(dataset_dict)
Loads data files from dataset_dict, which is one dataset dictionary from the list in the config file with the fields in the config file. Populates node_table and interactome.
node_table is a single merged pandas table.
When loading data files, files of only a single column with node identifiers are assumed to be a binary feature where all listed nodes are True.
We might want to eventually add an additional “algs” argument so only subsets of the entire config file are loaded, alternatively this could be handled outside this class.
returns: none
- request_edge_columns(col_names)
- request_node_columns(col_names)
returns: A table containing the requested column names and node IDs for all nodes with at least 1 of the requested values being non-empty
- to_file(file_name: str)
Saves dataset object to pickle file
- warning_threshold = 0.05
spras.domino module
- class spras.domino.DOMINO
Bases:
PRM
- dois: list[str] = ['10.15252/msb.20209593']
- static generate_inputs(data, filename_map)
Access fields from the dataset and write the required input files @param data: dataset @param filename_map: a dict mapping file types in the required_inputs to the filename for that type @return:
- static parse_output(raw_pathway_file, standardized_pathway_file, params)
Convert the merged HTML modules into the universal pathway format @param raw_pathway_file: the merged HTML modules file @param standardized_pathway_file: the edges from the modules written in the universal format
- required_inputs: list[str] = ['network', 'active_genes']
- static run(network=None, active_genes=None, output_file=None, slice_threshold=None, module_threshold=None, container_framework='docker')
Run DOMINO with Docker. Let visualization be always true, parallelization be always 1 thread, and use_cache be always false. DOMINO produces multiple output module files in an HTML format. SPRAS concatenates these files into one file. @param network: input network file (required) @param active_genes: input active genes (required) @param output_file: path to the output pathway file (required) @param slice_threshold: the p-value threshold for considering a slice as relevant (optional) @param module_threshold: the p-value threshold for considering a putative module as final module (optional) @param container_framework: choose the container runtime framework, currently supports “docker” or “singularity” (optional)
- spras.domino.post_domino_id_transform(node_id)
Remove ID_PREFIX from the beginning of the node id if it is present. @param node_id: the node id to transform @return the node id without the prefix, if it was present, otherwise the original node id
- spras.domino.pre_domino_id_transform(node_id)
DOMINO requires module edges to have the ‘ENSG0’ string as a prefix for visualization. Prepend each node id with this ID_PREFIX. @param node_id: the node id to transform @return the node id with the prefix added
spras.evaluation module
- class spras.evaluation.Evaluation(gold_standard_dict: Dict)
Bases:
object
- NODE_ID = 'NODEID'
- static edge_frequency_node_ensemble(node_table: DataFrame, ensemble_files: list[str | PathLike], dataset_file: str) dict
Generates a dictionary of node ensembles using edge frequency data from a list of ensemble files. A list of ensemble files can contain an aggregated ensemble or algorithm-specific ensembles per dataset
1. Prepare a set of default nodes (from the interactome and gold standard) with frequency 0, ensuring all nodes are represented in the ensemble.
Answers “Did the algorithm(s) select the correct nodes from the entire network?”
It measures whether the algorithm(s) can distinguish relevant gold standard nodes
from the full “universe” of possible nodes present in the input network.
- For each edge ensemble file:
Read edges and their frequencies.
Convert edges frequencies into node-level frequencies for Node1 and Node2.
Merge with the default node set and group by node, taking the maximum frequency per node.
Store the resulting node-frequency ensemble under the corresponding ensemble source (label).
If the interactome or gold standard table is empty, a ValueError is raised.
@param node_table: dataFrame of gold standard nodes (column: NODEID) @param ensemble_files: list of file paths containing edge ensemble outputs @param dataset_file: path to the dataset file used to load the interactome @return: dictionary mapping each ensemble source to its node ensemble DataFrame
- static from_file(file_name)
Loads gold standard object from a pickle file. Usage: gold_standard = Evaluation.from_file(pickle_file)
- load_files_from_dict(gold_standard_dict: Dict)
Loads gold standard files from gold_standard_dict, which is one gold standard dataset dictionary from the list in the config file with the fields in the config file. Populates node_table.
node_table is a single column of nodes pandas table.
returns: none
- static merge_gold_standard_input(gs_dict, gs_file)
Merge files listed for this gold standard dataset and write the dataset to disk @param gs_dict: gold standard dataset to process @param gs_file: output filename
- static node_precision_and_recall(file_paths: Iterable[str | PathLike], node_table: DataFrame) DataFrame
Computes node-level precision and recall for each pathway reconstruction output file.
This function takes a list of file paths corresponding to pathway reconstruction algorithm outputs, each formatted as a tab-separated file with columns ‘Node1’, ‘Node2’, ‘Rank’, and ‘Direction’. It compares the set of predicted nodes (from both columns Node1 and Node2) to a provided gold standard node table and computes precision and recall per file.
@param file_paths: list of file paths of pathway reconstruction algorithm outputs @param node_table: the gold standard nodes @return: A DataFrame with the following columns:
‘Pathway’: Path object corresponding to each pathway file
‘Precision’: Precision of predicted nodes vs. gold standard nodes
‘Recall’: Recall of predicted nodes vs. gold standard nodes
- static pca_chosen_pathway(coordinates_files: list[str | PathLike], pathway_summary_file: str, output_dir: str)
Identifies the pathway closest to a specified highest kernel density estimated (KDE) peak based on PCA coordinates Calculates the Euclidean distance from each data point to the KDE peak, then selects the closest pathway as the representative pathway. If there is more than one representative pathway, a tiebreaker will be used
choose smallest pathway (smallest number of edges and nodes)
end all be all, choose the first one based on name
Returns a list of file paths for the representative pathway associated with the closest data point to the centroid.
@param coordinates_files: a list of PCA coordinates files for a dataset or specific algorithm in a dataset @param pathway_summary_file: a file for each file per dataset about its network statistics @param output_dir: the main reconstruction directory
- static precision_and_recall_pca_chosen_pathway(pr_df: DataFrame, output_file: str | PathLike, output_png: str | PathLike, aggregate_per_algorithm: bool = False)
Function for visualizing the precision and recall of the single parameter combination selected via PCA, either for each algorithm individually or one combination shared across all algorithms. Each point represents a pathway reconstruction corresponding to the PCA-selected parameter combination. If aggregate_per_algorithm is True, the plot includes a pca chosen pathway per algorithm and titled accordingly.
@param pr_df: Dataframe of calculated precision and recall for each pathway file @param output_file: the filename to save the precision and recall of each pathway @param output_png: the filename to plot the precision and recall of each pathway (not a PRC) @param aggregate_per_algorithm: Boolean indicating if function is used per algorithm (Default False)
- static precision_and_recall_per_pathway(pr_df: DataFrame, output_file: str | PathLike, output_png: str | PathLike, aggregate_per_algorithm: bool = False)
Function for visualizing per pathway precision and recall across all algorithms. Each point in the plot represents a single pathway reconstruction. If aggregate_per_algorithm is set to True, the plot is restricted to a single algorithm and titled accordingly.
@param pr_df: Dataframe of calculated precision and recall for each pathway file @param output_file: the filename to save the precision and recall of each pathway @param output_png: the filename to plot the precision and recall of each pathway (not a PRC) @param aggregate_per_algorithm: Boolean indicating if function is used per algorithm (Default False)
- static precision_recall_curve_node_ensemble(node_ensembles: dict, node_table: DataFrame, output_png: str | PathLike, output_file: str | PathLike, aggregate_per_algorithm: bool = False)
Plots precision-recall (PR) curves for a set of node ensembles evaluated against a gold standard.
Takes in a dictionary containing either algorithm-specific node ensembles or an aggregated node ensemble for a given dataset, along with the corresponding gold standard node table. Computes PR curves for each ensemble and plots all curves on a single figure.
@param node_ensembles: dict of the pre-computed node_ensemble(s) @param node_table: gold standard nodes @param output_png: filename to save the precision and recall curves as a .png image @param output_file: filename to save the precision, recall, threshold values, average precision, and baseline average precision @param aggregate_per_algorithm: Boolean indicating if function is used per algorithm (Default False)
- to_file(file_name)
Saves gold standard object to pickle file
- static visualize_precision_and_recall_plot(pr_df: DataFrame, output_file: str | PathLike, output_png: str | PathLike, title: str)
Generates a scatter plot of precision and recall values for each pathway and saves both the plot and the data.
This function is intended for visualizing how different pathway reconstructions perform (not a precision-recall curve) showing the precision and recall of each parameter combination for each algorithm.
- @param pr_df: Dataframe of calculated precision and recall for each pathway file.
Must include a preprocessed ‘Algorithm’ column.
@param output_file: the filename to save the precision and recall of each pathway @param output_png: the filename to plot the precision and recall of each pathway (not a PRC) @param title: The title to use for the plot
spras.interactome module
Author: Neha Talluri 07/19/23
Methods for converting from the universal network input format and to the universal network output format
- spras.interactome.add_constant(df: DataFrame, new_col_name: str, const) DataFrame
adds a new column at the end of the input dataframe with a constant value in all rows
@param df: input network df of edges, weights, and directionality @param new_col_name: the name of the new column @param const: some type of constant needed in the df @return a df with a new constant added to every row
- spras.interactome.add_directionality_constant(df: DataFrame, col_name: str, dir_const, undir_const) DataFrame
deals with adding in directionality constants for mixed graphs that aren’t using the universal input directly
@param df: input network df of edges, weights, and directionality @param col_name: the name of the new column @param dir_const: the directed edge const @param undir_const: the undirected edge const @return a df converted to show directionality differently
- spras.interactome.convert_directed_to_undirected(df: DataFrame) DataFrame
turns a graph into a fully undirected graph - turns all the directed edges directly into undirected edges - we will lose any sense of directionality and the graph won’t be inherently accurate, but the basic relationship between the two connected nodes will still remain intact.
@param df: input network df of edges, weights, and directionality @return a dataframe with no directed edges in Direction column
- spras.interactome.convert_undirected_to_directed(df: DataFrame) DataFrame
turns a graph into a fully directed graph - turns every undirected edge into a pair of directed edges - with the pair of directed edges, we are not losing too much information because the relationship of the undirected edge is still preserved
@param df: input network df of edges, weights, and directionality @return a dataframe with no undirected edges in Direction column
- spras.interactome.has_direction(df: DataFrame) bool
Checks if a graph has any directed edge.
- spras.interactome.reinsert_direction_col_directed(df: DataFrame) DataFrame
adds back a ‘Direction’ column that puts a column of ‘D’s at the end of the provided dataframe
@param df: input network df that contains directionality column @return a df with Direction column of ‘D’s added back
- spras.interactome.reinsert_direction_col_mixed(df: DataFrame, existing_direction_column: str, dir_const: str, undir_const: str) DataFrame
adds back a ‘Direction’ column that puts a ‘U’ or ‘D’ at the end of provided dataframe based on the dir/undir constants in the existing direction column
@param df: input network df that contains a directionality column @param existing_direction_column: the name of the existing directionality column @param dir_const: the directed edge const @param undir_const: the undirected edge const @return a df with universal Direction column added back
- spras.interactome.reinsert_direction_col_undirected(df: DataFrame) DataFrame
adds back a ‘Direction’ column that puts a columns of ‘U’s at the end of the provided dataframe
@param df: input network df that contains a directionality column @return a df with Direction column of ‘U’s added back
spras.meo module
- class spras.meo.MEO
Bases:
PRM
- dois: list[str] = ['10.1093/nar/gkq1207']
- static generate_inputs(data, filename_map)
Access fields from the dataset and write the required input files @param data: dataset @param filename_map: a dict mapping file types in the required_inputs to the filename for that type @return:
- static parse_output(raw_pathway_file, standardized_pathway_file, params)
Convert a predicted pathway into the universal format @param raw_pathway_file: pathway file produced by an algorithm’s run function @param standardized_pathway_file: the same pathway written in the universal format
- required_inputs: list[str] = ['sources', 'targets', 'edges']
- static run(edges=None, sources=None, targets=None, output_file=None, max_path_length=None, local_search=None, rand_restarts=None, container_framework='docker')
Run Maximum Edge Orientation in the Docker image with the provided parameters. The properties file is generated from the provided arguments. Only supports the Random orientation algorithm. Does not support MINSAT or MAXCSP. Only the edge output file is retained. All other output files are deleted. @param output_file: the name of the output edge file, which will overwrite any existing file with this name @param max_path_length: the maximal length of a path from sources and targets to orient. @param local_search: a “Yes”/”No” parameter that enables MEO’s local search functionality. See “Improving approximations with local search” in the associated paper for more information. @param rand_restarts: The (int) of random restarts to use. @param container_framework: choose the container runtime framework, currently supports “docker” or “singularity” (optional)
- spras.meo.write_properties(filename=PosixPath('properties.txt'), edges=None, sources=None, targets=None, edge_output=None, path_output=None, max_path_length=None, local_search=None, rand_restarts=None, framework='docker')
Write the properties file for Maximum Edge Orientation See https://github.com/agitter/meo/blob/master/sample.props for property descriptions and the default values at https://github.com/agitter/meo/blob/master/src/alg/EOMain.java#L185-L199 All file and directory names, except the filename argument, should be converted to container-friendly filenames with util.prepare_volume before passing them to this function filename: the name of the properties file to write on the local file system
spras.mincostflow module
- class spras.mincostflow.MinCostFlow
Bases:
PRM
- dois: list[str] = ['10.1038/ng.337']
- static generate_inputs(data, filename_map)
Access fields from the dataset and write the required input files @param data: dataset @param filename_map: a dict mapping file types in the required_inputs to the filename for that type
- static parse_output(raw_pathway_file, standardized_pathway_file, params)
Convert a predicted pathway into the universal format
Although the algorithm constructs a directed network, the resulting network is treated as undirected. This is because the flow within the network doesn’t imply causal relationships between nodes. The primary goal of the algorithm is node identification, not the identification of directional edges.
@param raw_pathway_file: pathway file produced by an algorithm’s run function @param standardized_pathway_file: the same pathway written in the universal format
- required_inputs: list[str] = ['sources', 'targets', 'edges']
- static run(sources=None, targets=None, edges=None, output_file=None, flow=None, capacity=None, container_framework='docker')
Run min cost flow with Docker (or singularity) @param sources: input sources (required) @param targets: input targets (required) @param edges: input network file (required) @param output_file: output file name (required) @param flow: (int) amount of flow going through the graph (optional) @param capacity: (float) amount of capacity allowed on each edge (optional) @param container_framework: choose the container runtime framework, currently supports “docker” or “singularity” (optional)
spras.omicsintegrator1 module
- class spras.omicsintegrator1.OmicsIntegrator1
Bases:
PRM
Omics Integrator 1 works with partially directed graphs - it takes in the universal input directly
Expected raw input format: Interactor1 Interactor2 Weight Direction - the expected raw input file should have node pairs in the 1st and 2nd columns, with a weight in the 3rd column and directionality in the 4th column - it can include repeated and bidirectional edges - it uses ‘U’ for undirected edges and ‘D’ for directed edges
- dois: list[str] = ['10.1371/journal.pcbi.1004879']
- static generate_inputs(data, filename_map)
Access fields from the dataset and write the required input files @param data: dataset @param filename_map: a dict mapping file types in the required_inputs to the filename for that type @return:
- static parse_output(raw_pathway_file, standardized_pathway_file, params)
Convert a predicted pathway into the universal format @param raw_pathway_file: pathway file produced by an algorithm’s run function @param standardized_pathway_file: the same pathway written in the universal format
- required_inputs: list[str] = ['prizes', 'edges', 'dummy_nodes']
- static run(edges=None, prizes=None, dummy_nodes=None, dummy_mode=None, mu_squared=None, exclude_terms=None, output_file=None, noisy_edges=None, shuffled_prizes=None, random_terminals=None, seed=None, w=None, b=None, d=None, mu=None, noise=None, g=None, r=None, container_framework='docker')
Run Omics Integrator 1 in the Docker image with the provided parameters. Does not support the garnet, cyto30, knockout, cv, or cv-reps arguments. The configuration file is generated from the provided arguments. Does not support the garnetBeta, processes, or threads configuration file parameters. The msgpath is not required because msgsteiner is available in the Docker image. Only the optimal forest sif file is retained. All other output files are deleted. @param output_file: the name of the output sif file for the optimal forest, which will overwrite any existing file with this name @param noisy_edges: How many times you would like to add noise to the given edge values and re-run the algorithm. @param shuffled_prizes: How many times the algorithm should shuffle the prizes and re-run @param random_terminals: How many times to apply the given prizes to random nodes in the interactome @param seed: the randomness seed to use @param w: float that affects the number of connected components, with higher values leading to more components @param b: the trade-off between including more prizes and using less reliable edges @param d: controls the maximum path-length from root to terminal nodes @param mu: controls the degree-based negative prizes (default 0.0) @param noise: Standard Deviation of the gaussian noise added to edges in Noisy Edges Randomizations @param g: Gamma: multiplicative edge penalty from degree of endpoints @param r: msgsteiner parameter that adds random noise to edges, which is rarely needed (default 0) @param container_framework: choose the container runtime framework, currently supports “docker” or “singularity” (optional)
- spras.omicsintegrator1.write_conf(filename=PosixPath('config.txt'), w=None, b=None, d=None, mu=None, noise=None, g=None, r=None)
Write the configuration file for Omics Integrator 1 See https://github.com/fraenkel-lab/OmicsIntegrator#required-inputs filename: the name of the configuration file to write
spras.omicsintegrator2 module
- class spras.omicsintegrator2.OmicsIntegrator2
Bases:
PRM
- dois: list[str] = ['10.1371/journal.pcbi.1004879']
- generate_inputs(filename_map)
Access fields from the dataset and write the required input files. Automatically converts edge weights to edge costs. @param data: dataset @param filename_map: a dict mapping file types in the required_inputs to the filename for that type
- static parse_output(raw_pathway_file, standardized_pathway_file, params)
Convert a predicted pathway into the universal format @param raw_pathway_file: pathway file produced by an algorithm’s run function @param standardized_pathway_file: the same pathway written in the universal format
- required_inputs: list[str] = ['prizes', 'edges']
- static run(edges=None, prizes=None, output_file=None, w=None, b=None, g=None, noise=None, noisy_edges=None, random_terminals=None, dummy_mode=None, seed=None, container_framework='docker')
Run Omics Integrator 2 in the Docker image with the provided parameters. Only the .tsv output file is retained and then renamed. All other output files are deleted. @param output_file: the name of the output file, which will overwrite any existing file with this name @param w: Omega: the weight of the edges connecting the dummy node to the nodes selected by dummyMode (default: 5) @param b: Beta: scaling factor of prizes (default: 1) @param g: Gamma: multiplicative edge penalty from degree of endpoints (default: 3) @param noise: Standard Deviation of the gaussian noise added to edges in Noisy Edges Randomizations. @param noisy_edges: An integer specifying how many times to add noise to the given edge values and re-run. @param random_terminals: An integer specifying how many times to apply your given prizes to random nodes in the interactome and re-run @param dummy_mode: Tells the program which nodes in the interactome to connect the dummy node to. (default: terminals)
“terminals” = connect to all terminals “others” = connect to all nodes except for terminals “all” = connect to all nodes in the interactome.
@param seed: The random seed to use for this run. @param container_framework: choose the container runtime framework, currently supports “docker” or “singularity” (optional)
spras.pathlinker module
- class spras.pathlinker.PathLinker
Bases:
PRM
- dois: list[str] = ['10.1038/npjsba.2016.2', '10.1089/cmb.2012.0274']
- static generate_inputs(data, filename_map)
Access fields from the dataset and write the required input files @param data: dataset @param filename_map: a dict mapping file types in the required_inputs to the filename for that type @return:
- static parse_output(raw_pathway_file, standardized_pathway_file, params)
Convert a predicted pathway into the universal format @param raw_pathway_file: pathway file produced by an algorithm’s run function @param standardized_pathway_file: the same pathway written in the universal format
- required_inputs: list[str] = ['nodetypes', 'network']
- static run(nodetypes=None, network=None, output_file=None, k=None, container_framework='docker')
Run PathLinker with Docker @param nodetypes: input node types with sources and targets (required) @param network: input network file (required) @param output_file: path to the output pathway file (required) @param k: path length (optional) @param container_framework: choose the container runtime framework, currently supports “docker” or “singularity” (optional)
spras.prm module
- class spras.prm.PRM
Bases:
ABC
The PRM (Pathway Reconstruction Module) class, which defines the interface that runner.py uses to handle algorithms.
- dois: list[str] = None
- abstractmethod static parse_output(raw_pathway_file: str, standardized_pathway_file: str, params: dict[str, Any])
- required_inputs: list[str] = []
- abstractmethod static run(**kwargs)
spras.runner module
- spras.runner.get_required_inputs(algorithm: str)
Get the input files requires to run this algorithm @param algorithm: algorithm name @return: A list of strings of input files types
- spras.runner.merge_input(dataset_dict, dataset_file: str)
Merge files listed for this dataset and write the dataset to disk @param dataset_dict: dataset to process @param dataset_file: output filename
- spras.runner.parse_output(algorithm: str, raw_pathway_file: str, standardized_pathway_file: str, params: dict[str, Any])
Convert a predicted pathway into the universal format @param algorithm: algorithm name @param raw_pathway_file: pathway file produced by an algorithm’s run function @param standardized_pathway_file: the same pathway written in the universal format
- spras.runner.prepare_inputs(algorithm: str, data_file: str, filename_map: dict[str, str])
Prepare general dataset files for this algorithm @param algorithm: algorithm name @param data_file: dataset @param filename_map: a dict mapping file types in the required_inputs to the filename for that type @return:
- spras.runner.run(algorithm: str, params)
A generic interface to the algorithm-specific run functions
spras.util module
Utility functions for pathway reconstruction
- class spras.util.NpHashEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)
Bases:
JSONEncoder
A numpy compatible JSON encoder meant to be fed as a cls for hashing, as this encoder does not decode the other way around.
- default(obj)
Implement this method in a subclass such that it returns a serializable object for
o
, or calls the base implementation (to raise aTypeError
).For example, to support arbitrary iterators, you could implement default like this:
def default(self, o): try: iterable = iter(o) except TypeError: pass else: return list(iterable) # Let the base class default method raise the TypeError return super().default(o)
- spras.util.add_rank_column(df: DataFrame) DataFrame
Add a column of 1s to the dataframe @param df: the dataframe to add the rank column of 1s to
- spras.util.duplicate_edges(df: DataFrame) tuple[DataFrame, bool]
Removes duplicate edges from the input DataFrame. Run within every pathway reconstruction algorithm’s parse_output. - For duplicate edges (based on Node1, Node2, and Direction), the one with the smallest Rank is kept. - For undirected edges, the node pair is sorted (e.g., “B-A” becomes “A-B”) before removing duplicates.
@param df: A DataFrame from a raw file pathway. @return pd.DataFrame: A DataFrame with duplicate edges removed. @return bool: True if duplicate edges were found and removed, False otherwise.
- spras.util.hash_filename(filename: str, length: int | None = None) str
Hash of a filename using hash_params_sha1_base32 @param filename: filename to hash @param length: the length of the returned hash, which is ignored if it is None, < 1, or > the full hash length @return: hash
- spras.util.hash_params_sha1_base32(params_dict: Dict[str, Any], length: int | None = None, cls=None) str
Hash of a dictionary. Derived from https://www.doc.ic.ac.uk/~nuric/coding/how-to-hash-a-dictionary-in-python.html by Nuri Cingillioglu Adapted to use sha1 instead of MD5 and encode in base32 Can be truncated to the desired length @param params_dict: the algorithm parameters dictionary @param length: the length of the returned hash, which is ignored if it is None, < 1, or > the full hash length
- spras.util.make_required_dirs(path: str)
Create the directory and parent directories required before an output file can be written to the specified path. Existing directories will not raise an error. @param path: the filename that is to be written
- spras.util.raw_pathway_df(raw_pathway_file: str, sep: str = '\t', header: int = None) DataFrame
Creates dataframe from contents in raw pathway file, otherwise returns an empty dataframe with standard output column names @param raw_pathway_file: path to raw_pathway_file @param sep: separator used when loading the dataframe, default tab character @param header: what row the header is in raw_pathway_file, default None