# Boilerplate Spark stuff:
=
=
# Some functions that convert our CSV input data into numerical
# features for each job candidate
return 1
return 0
return 1
:
return 2
:
return 3
return 0
# Convert a list of raw fields from our CSV file to a
# LabeledPoint that MLLib can use. All data must be numerical...
=
=
=
=
=
=
=
return
#Load up our CSV file, and filter out the header line with the column names
=
=
=
# Split each line into a list based on the comma delimiters
=
# Convert these lists to LabeledPoints
=
# Create a test candidate, with 10 years of experience, currently employed,
# 3 previous employers, a BS degree, but from a non-top-tier school where
# he or she did not do an internship. You could of course load up a whole
# huge RDD of test candidates from disk, too.
=
=
# Train our DecisionTree classifier using our data set
=
# Now get predictions for our unknown candidates. (Note, you could separate
# the source data into a training set and a test set while tuning
# parameters and measure accuracy as you go!)
=
=
# We can also print out the decision tree itself: