Finishing

OpenEnsembles is a resource for performing and analyzing ensemble clustering

Copyright (C) 2017 Naegle Lab

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.

class openensembles.finishing.graph_closure(co_occ_matrix, threshold, clique_size=3)[source]

Returns a final solution of the ensemble based on treating the co-occurrence matrix as a weighted graph whose solution is found from identifying network components within the graph

Finds k-percolated cliques in G, e.g,

Unless the cliques argument evaluates to True, this algorithm first enumerates all cliques in G. These are stored in memory, which in large graphs can consume large amounts of memory. Returns a generator object. To return a list of percolated k-cliques, Notes —–

References

Methods

finish()

Finishes the ensemble by taking a binary adjacency matrix, defined in initilization according to the threshold given and percolates the cliques

finish()[source]

Finishes the ensemble by taking a binary adjacency matrix, defined in initilization according to the threshold given and percolates the cliques

class openensembles.finishing.mixture_model(parg, N, nEnsCluster=2, iterations=10)[source]

Implementation of the article Mixture Models for Ensemble CLustering Topchy, Jain, and Punch, “A mixture model for clustering ensembles Proc. SIAM Int. Conf. Data Mining (2004)

Written by Pedro da Silva Tavares and adapted by Kristen M. Naegle

Parameters
parg: list of lists

Solutions of assignments of objects to clusters across an ensemble

N: int

Number of data points to cluster

nEnsCluster: int

Number of clusters to create in mixture model Default is 2

iterations:

Number of expectation maximization iterations Default is 10

Attributes
labels: list of ints

Final Mixture Model partitions of objects

Methods

expectation()

Compute the Expectation (ExpZ) according to parameters.

gatherPartitions()

Returns the y vector.

genKj()

Generates the K(j) H-array that contains the tuples of unique clusters of each j-th partition, eg: K = [(X,Y), (A,B)]

initParameters()

The function initializes the parameters of the mixture model.

maximization()

Update the parameters taking into account the ExpZ computed in the Expectation (ExpZ) step.

emProcess

expectation()[source]

Compute the Expectation (ExpZ) according to parameters. Obs: y(N,H) Kj(H) alpha(M) v(H,M,K(j)) ExpZ(N,M)

gatherPartitions()[source]

Returns the y vector. parg: list of H-labeling solutions nElem: number of features/objects

genKj()[source]

Generates the K(j) H-array that contains the tuples of unique clusters of each j-th partition, eg: K = [(X,Y), (A,B)]

initParameters()[source]

The function initializes the parameters of the mixture model.

maximization()[source]

Update the parameters taking into account the ExpZ computed in the Expectation (ExpZ) step. Obs: y(N,H) Kj(H) alpha(M) v(H,M,K(j)) ExpZ(N,M)