Demonstrate the ability to get reproducible results from non-deterministic algorithms¶

[1]:

import pandas as pd
import matplotlib.pyplot as plt
from sklearn import datasets
import openensembles as oe

n_samples=1500
X,y = datasets.make_blobs(n_samples=n_samples, random_state=8)
df = pd.DataFrame(X)
d = oe.data(df, [1,2])

[2]:

d_plot = d.plot_data('parent')

../_images/Examples_Demonstrate_RandomSeed_2_0.png

Compare results with and without using a random seed: Kmeans¶

[4]:

c = oe.cluster(d)
c_seed = oe.cluster(d)
K = 4 # choose a K such that solution is not ideal
numIterations = 10
for i in range(1,numIterations):
    name = 'kmeans_' + str(i) #to append a new solution, it must have a name (dictionary key) that is unique
    c.cluster('parent', 'kmeans', name, K, init = 'random', n_init = 1)
    c_seed.cluster('parent', 'kmeans', name, K, random_state=0, init = 'random', n_init = 1)

Mutual information should show results vary when seed is not forced to the same starting point¶

[7]:

mi_randomSeeds = c.MI(MI_type='normalized')
mi_plot = mi_randomSeeds.plot(add_labels=False)

../_images/Examples_Demonstrate_RandomSeed_6_0.png

When the seed is forced to the same, Kmeans should return the same results, as indicated by mutual information of 1 between all clustering results.¶

[8]:

mi_sameSeeds = c_seed.MI(MI_type='normalized')
mi_plot = mi_sameSeeds.plot(add_labels=False)

/anaconda3/envs/py37-openEnsembles/lib/python3.7/site-packages/scipy/cluster/hierarchy.py:2869: UserWarning: Attempting to set identical left == right == 0.0 results in singular transformations; automatically expanding.
  ax.set_xlim([dvw, 0])

../_images/Examples_Demonstrate_RandomSeed_8_1.png

Compare results with and without random seed for spectral clustering¶

[14]:

c = oe.cluster(d)
c_seed = oe.cluster(d)
K = 10 # choose a K such that solution is not ideal
numIterations = 10
for i in range(1,numIterations):
    name = 'spectral_' + str(i) #to append a new solution, it must have a name (dictionary key) that is unique
    c.cluster('parent', 'spectral', name, K)
    c_seed.cluster('parent', 'spectral', name, K, random_state=0)

Mutual information should show results vary when seed is not forced to the same starting point¶

[15]:

mi_randomSeeds = c.MI(MI_type='normalized')
mi_plot = mi_randomSeeds.plot(add_labels=False)

../_images/Examples_Demonstrate_RandomSeed_12_0.png

When the seed is forced to the same, Spectral clustering should return the same results, as indicated by mutual information of 1 between all clustering results.¶

[16]:

mi_sameSeeds = c_seed.MI(MI_type='normalized')
mi_plot = mi_sameSeeds.plot(add_labels=False)

/anaconda3/envs/py37-openEnsembles/lib/python3.7/site-packages/scipy/cluster/hierarchy.py:2869: UserWarning: Attempting to set identical left == right == 0.0 results in singular transformations; automatically expanding.
  ax.set_xlim([dvw, 0])

../_images/Examples_Demonstrate_RandomSeed_14_1.png

[ ]: