Spark.ML Module
(Redirected from Spark ML)
		
		
		
		Jump to navigation
		Jump to search
		A Spark.ML Module is an ML training framework that is an Apache Spark module.
- Example(s):
- …
 
 - Counter-Example(s):
 - See: PySpark.ML, org.apache.spark.ml.classification.LogisticRegression.
 
References
2017
- https://www.quora.com/Why-are-there-two-ML-implementations-in-Spark-ML-and-MLlib-and-what-are-their-different-features
- QUOTE: Using spark.ml is recommended because with DataFrames the API is more versatile and flexible. But we will keep supporting spark.mllib along with the development of spark.ml. Users should be comfortable using spark.mllib features and expect more features coming. Developers should contribute new algorithms to spark.ml if they fit the ML pipeline concept well, e.g., feature extractors and transformers.
- spark.mllib contains the original API built on top of RDDs.
 - spark.ml provides higher-level API built on top of DataFrames for constructing ML pipelines.
 
 
 - QUOTE: Using spark.ml is recommended because with DataFrames the API is more versatile and flexible. But we will keep supporting spark.mllib along with the development of spark.ml. Users should be comfortable using spark.mllib features and expect more features coming. Developers should contribute new algorithms to spark.ml if they fit the ML pipeline concept well, e.g., feature extractors and transformers.
 
2017
- http://spark.apache.org/docs/latest/api/python/pyspark.ml.html
- QUOTE: DataFrame-based machine learning APIs to let users quickly assemble and configure practical machine learning pipelines.
 
 
   ML Pipeline APIs
       Transformer
       Estimator
       Model
       Pipeline
       PipelineModel
pyspark.ml.param module
       Param
       Params
       TypeConverters
pyspark.ml.feature module
       Binarizer
       BucketedRandomProjectionLSHE
       BucketedRandomProjectionLSHModelE
       Bucketizer
       ChiSqSelectorE
       ChiSqSelectorModelE
       CountVectorizer
       CountVectorizerModel
       DCT
       ElementwiseProduct
       HashingTF
       IDF
       IDFModel
       ImputerE
       ImputerModelE
       IndexToString
       MaxAbsScaler
       MaxAbsScalerModel
       MinHashLSHE
       MinHashLSHModelE
       MinMaxScaler
       MinMaxScalerModel
       NGram
       Normalizer
       OneHotEncoder
       PCA
       PCAModel
       PolynomialExpansion
       QuantileDiscretizerE
       RegexTokenizer
       RFormulaE
       RFormulaModelE
       SQLTransformer
       StandardScaler
       StandardScalerModel
       StopWordsRemover
       StringIndexer
       StringIndexerModel
       Tokenizer
       VectorAssembler
       VectorIndexer
       VectorIndexerModel
       VectorSlicer
       Word2Vec
       Word2VecModel
pyspark.ml.classification module
       LinearSVCE
       LinearSVCModelE
       LogisticRegression
       LogisticRegressionModel
       LogisticRegressionSummaryE
       LogisticRegressionTrainingSummaryE
       BinaryLogisticRegressionSummary
       BinaryLogisticRegressionTrainingSummaryE
       DecisionTreeClassifier
       DecisionTreeClassificationModel
       GBTClassifier
       GBTClassificationModel
       RandomForestClassifier
       RandomForestClassificationModel
       NaiveBayes
       NaiveBayesModel
       MultilayerPerceptronClassifier
       MultilayerPerceptronClassificationModel
       OneVsRestE
       OneVsRestModelE
pyspark.ml.clustering module
       BisectingKMeans
       BisectingKMeansModel
       BisectingKMeansSummaryE
       KMeans
       KMeansModel
       GaussianMixture
       GaussianMixtureModel
       GaussianMixtureSummaryE
       LDA
       LDAModel
       LocalLDAModel
       DistributedLDAModel
pyspark.ml.linalg module
       Vector
       DenseVector
       SparseVector
       Vectors
       Matrix
       DenseMatrix
       SparseMatrix
       Matrices
pyspark.ml.recommendation module
       ALS
       ALSModel
pyspark.ml.regression module
       AFTSurvivalRegressionE
       AFTSurvivalRegressionModelE
       DecisionTreeRegressor
       DecisionTreeRegressionModel
       GBTRegressor
       GBTRegressionModel
       GeneralizedLinearRegressionE
       GeneralizedLinearRegressionModelE
       GeneralizedLinearRegressionSummaryE
       GeneralizedLinearRegressionTrainingSummaryE
       IsotonicRegression
       IsotonicRegressionModel
       LinearRegression
       LinearRegressionModel
       LinearRegressionSummaryE
       LinearRegressionTrainingSummaryE
       RandomForestRegressor
       RandomForestRegressionModel
pyspark.ml.stat module
       ChiSquareTestE
       CorrelationE
pyspark.ml.tuning module
       ParamGridBuilder
       CrossValidator
       CrossValidatorModel
       TrainValidationSplitE
       TrainValidationSplitModelE
pyspark.ml.evaluation module
       Evaluator
       BinaryClassificationEvaluatorE
       RegressionEvaluatorE
       MulticlassClassificationEvaluatorE
pyspark.ml.fpm module
       FPGrowthE
       FPGrowthModelE
pyspark.ml.util module
       Identifiable
       JavaMLReadable
       JavaMLReader
       JavaMLWritable
       JavaMLWriter
       JavaPredictionModel
       MLReadable
       MLReader
       MLWritable
       MLWriter