You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Now you are able to use it in your code. First of all, create a Spark context:
val sc = new SparkContext ("spark://master:7077", "My App")
Read input data with the IOHelper class. Currently it supports only CSV files. You can read data from any path which SparkContext.textFile method will accept.
val data = IOHelper.readDataset(sc, "/path/to/my/data.csv")
Specify parameters of the DBSCAN algorithm using SparkSettings class:
val clusteringSettings = new DbscanSettings ().withEpsilon(25).withNumberOfPoints(30)
Run clustering algorithm:
val model = Dbscan.train (data, clusteringSettings)
Save clustering result. This call will create a folder which will contain multiple partXXXX files. If you concatenate these files, you will get a CSV file. Each record in this file will contain coordinates of one point followed by an identifier of a cluster which this point belongs to. For noise points, cluster identifier is 0. The order of records in the resulting CSV file will be different from your input file. You can save the data to any path which RDD.saveAsTextFile method will accept.