Clustering Algorithms

OpenEnsembles is a resource for performing and analyzing ensemble clustering This file contains calls to clustering algorithms. Please refer to this documentation for specifics about the variable parameters and their defaults, but interact with clustering through the openensembles.clustering() class.

OpenEnsembles is a resource for performing and analyzing ensemble clustering

Copyright (C) 2017 Naegle Lab

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.

openensembles.clustering_algorithms.convertDistanceToSimilarity(D, beta=1.0)[source]

A utility to convert a distance matrix to a similarity matrix

Parameters
D: matrix of floats

A matrix of distances, such as returned by returnDistanceMatrix(data,distanceType)

beta: float

A variable for mapping distance to similarity.

Returns
S: a matrix of floats

A matrix of similarity values. according to S = np.exp(-beta * D / D.std())

openensembles.clustering_algorithms.returnDistanceMatrix(data, distance)[source]

A utility to calculate a distance matrix, according to type in <distance> on the data array.

Parameters
data: matrix

Data matrix to calculate distances from

distance: string

Distance metric. See sklearn’s pairwise distances

Returns
d: matrix

the distance matrix computed by distance

Raises
ValueError:

if the distance metric is not available.

openensembles.clustering_algorithms.returnParams(paramsSent, paramsExpected, algorithm)[source]

A utility for variable parameter setting in clustering algorithms Takes two dictionaries of parameter key, value pairs and replaces that in paramsExpected with anything in paramsSent.

Returns
params: dict

Dict of parameters that represent the final parameters, overwritten in paramsExpected by paramsSent This will handle checking to make sure that if precomputed distances have been selected, that a distance or similarity matrix is also passed.

Warning

Will warn users if a key in sent does not appear in expected.