Demonstrate the ability to get reproducible results from non-deterministic algorithms

[1]:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import datasets
import openensembles as oe

n_samples=1500
X,y = datasets.make_blobs(n_samples=n_samples, random_state=8)
df = pd.DataFrame(X)
d = oe.data(df, [1,2])
[2]:
d_plot = d.plot_data('parent')
../_images/Examples_Demonstrate_RandomSeed_2_0.png

Compare results with and without using a random seed: Kmeans

[4]:
c = oe.cluster(d)
c_seed = oe.cluster(d)
K = 4 # choose a K such that solution is not ideal
numIterations = 10
for i in range(1,numIterations):
    name = 'kmeans_' + str(i) #to append a new solution, it must have a name (dictionary key) that is unique
    c.cluster('parent', 'kmeans', name, K, init = 'random', n_init = 1)
    c_seed.cluster('parent', 'kmeans', name, K, random_state=0, init = 'random', n_init = 1)

Mutual information should show results vary when seed is not forced to the same starting point

[7]:
mi_randomSeeds = c.MI(MI_type='normalized')
mi_plot = mi_randomSeeds.plot(add_labels=False)

../_images/Examples_Demonstrate_RandomSeed_6_0.png

When the seed is forced to the same, Kmeans should return the same results, as indicated by mutual information of 1 between all clustering results.

[8]:

mi_sameSeeds = c_seed.MI(MI_type='normalized')
mi_plot = mi_sameSeeds.plot(add_labels=False)

/anaconda3/envs/py37-openEnsembles/lib/python3.7/site-packages/scipy/cluster/hierarchy.py:2869: UserWarning: Attempting to set identical left == right == 0.0 results in singular transformations; automatically expanding.
  ax.set_xlim([dvw, 0])
../_images/Examples_Demonstrate_RandomSeed_8_1.png

Compare results with and without random seed for spectral clustering

[14]:
c = oe.cluster(d)
c_seed = oe.cluster(d)
K = 10 # choose a K such that solution is not ideal
numIterations = 10
for i in range(1,numIterations):
    name = 'spectral_' + str(i) #to append a new solution, it must have a name (dictionary key) that is unique
    c.cluster('parent', 'spectral', name, K)
    c_seed.cluster('parent', 'spectral', name, K, random_state=0)

Mutual information should show results vary when seed is not forced to the same starting point

[15]:
mi_randomSeeds = c.MI(MI_type='normalized')
mi_plot = mi_randomSeeds.plot(add_labels=False)
../_images/Examples_Demonstrate_RandomSeed_12_0.png

When the seed is forced to the same, Spectral clustering should return the same results, as indicated by mutual information of 1 between all clustering results.

[16]:

mi_sameSeeds = c_seed.MI(MI_type='normalized')
mi_plot = mi_sameSeeds.plot(add_labels=False)

/anaconda3/envs/py37-openEnsembles/lib/python3.7/site-packages/scipy/cluster/hierarchy.py:2869: UserWarning: Attempting to set identical left == right == 0.0 results in singular transformations; automatically expanding.
  ax.set_xlim([dvw, 0])
../_images/Examples_Demonstrate_RandomSeed_14_1.png
[ ]: