= 5
# Boilerplate Spark stuff:
=
=
#Create fake income/age clusters for N people in k clusters
= /
=
=
=
=
return
# Load the data; note I am normalizing it with scale() - very important!
=
# Build the model (cluster the data)
=
# Print out the cluster assignments
=
=
=
# Evaluate clustering by computing Within Set Sum of Squared Errors
=
return
=
# Things to try:
# What happens to WSSSE as you increase or decrease K? Why?
# What happens if you don't normalize the input data before clustering?
# What happens if you change the maxIterations parameter?
# What happens if you change initializationMode to "k-means||"