Finishing¶
OpenEnsembles is a resource for performing and analyzing ensemble clustering
Copyright (C) 2017 Naegle Lab
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.
-
class
openensembles.finishing.
graph_closure
(co_occ_matrix, threshold, clique_size=3)[source]¶ Returns a final solution of the ensemble based on treating the co-occurrence matrix as a weighted graph whose solution is found from identifying network components within the graph
Finds k-percolated cliques in G, e.g,
Unless the cliques argument evaluates to True, this algorithm first enumerates all cliques in G. These are stored in memory, which in large graphs can consume large amounts of memory. Returns a generator object. To return a list of percolated k-cliques, Notes —–
References
Based on the method outlined in Palla et. al., Nature 435, 814-818 (2005)
Based on Code for Percolation From ConradLee
Methods
finish
()Finishes the ensemble by taking a binary adjacency matrix, defined in initilization according to the threshold given and percolates the cliques
-
class
openensembles.finishing.
mixture_model
(parg, N, nEnsCluster=2, iterations=10)[source]¶ Implementation of the article Mixture Models for Ensemble CLustering Topchy, Jain, and Punch, “A mixture model for clustering ensembles Proc. SIAM Int. Conf. Data Mining (2004)
Written by Pedro da Silva Tavares and adapted by Kristen M. Naegle
- Parameters
- parg: list of lists
Solutions of assignments of objects to clusters across an ensemble
- N: int
Number of data points to cluster
- nEnsCluster: int
Number of clusters to create in mixture model Default is 2
- iterations:
Number of expectation maximization iterations Default is 10
See also
- Attributes
- labels: list of ints
Final Mixture Model partitions of objects
Methods
Compute the Expectation (ExpZ) according to parameters.
Returns the y vector.
genKj
()Generates the K(j) H-array that contains the tuples of unique clusters of each j-th partition, eg: K = [(X,Y), (A,B)]
The function initializes the parameters of the mixture model.
Update the parameters taking into account the ExpZ computed in the Expectation (ExpZ) step.
emProcess
-
expectation
()[source]¶ Compute the Expectation (ExpZ) according to parameters. Obs: y(N,H) Kj(H) alpha(M) v(H,M,K(j)) ExpZ(N,M)
-
gatherPartitions
()[source]¶ Returns the y vector. parg: list of H-labeling solutions nElem: number of features/objects