Spark.ML Module
(Redirected from Spark.ML module)
Jump to navigation
Jump to search
A Spark.ML Module is an ML training framework that is an Apache Spark module.
- Example(s):
- …
- Counter-Example(s):
- See: PySpark.ML, org.apache.spark.ml.classification.LogisticRegression.
References
2017
- https://www.quora.com/Why-are-there-two-ML-implementations-in-Spark-ML-and-MLlib-and-what-are-their-different-features
- QUOTE: Using spark.ml is recommended because with DataFrames the API is more versatile and flexible. But we will keep supporting spark.mllib along with the development of spark.ml. Users should be comfortable using spark.mllib features and expect more features coming. Developers should contribute new algorithms to spark.ml if they fit the ML pipeline concept well, e.g., feature extractors and transformers.
- spark.mllib contains the original API built on top of RDDs.
- spark.ml provides higher-level API built on top of DataFrames for constructing ML pipelines.
- QUOTE: Using spark.ml is recommended because with DataFrames the API is more versatile and flexible. But we will keep supporting spark.mllib along with the development of spark.ml. Users should be comfortable using spark.mllib features and expect more features coming. Developers should contribute new algorithms to spark.ml if they fit the ML pipeline concept well, e.g., feature extractors and transformers.
2017
- http://spark.apache.org/docs/latest/api/python/pyspark.ml.html
- QUOTE: DataFrame-based machine learning APIs to let users quickly assemble and configure practical machine learning pipelines.
ML Pipeline APIs
Transformer
Estimator
Model
Pipeline
PipelineModel
pyspark.ml.param module
Param
Params
TypeConverters
pyspark.ml.feature module
Binarizer
BucketedRandomProjectionLSHE
BucketedRandomProjectionLSHModelE
Bucketizer
ChiSqSelectorE
ChiSqSelectorModelE
CountVectorizer
CountVectorizerModel
DCT
ElementwiseProduct
HashingTF
IDF
IDFModel
ImputerE
ImputerModelE
IndexToString
MaxAbsScaler
MaxAbsScalerModel
MinHashLSHE
MinHashLSHModelE
MinMaxScaler
MinMaxScalerModel
NGram
Normalizer
OneHotEncoder
PCA
PCAModel
PolynomialExpansion
QuantileDiscretizerE
RegexTokenizer
RFormulaE
RFormulaModelE
SQLTransformer
StandardScaler
StandardScalerModel
StopWordsRemover
StringIndexer
StringIndexerModel
Tokenizer
VectorAssembler
VectorIndexer
VectorIndexerModel
VectorSlicer
Word2Vec
Word2VecModel
pyspark.ml.classification module
LinearSVCE
LinearSVCModelE
LogisticRegression
LogisticRegressionModel
LogisticRegressionSummaryE
LogisticRegressionTrainingSummaryE
BinaryLogisticRegressionSummary
BinaryLogisticRegressionTrainingSummaryE
DecisionTreeClassifier
DecisionTreeClassificationModel
GBTClassifier
GBTClassificationModel
RandomForestClassifier
RandomForestClassificationModel
NaiveBayes
NaiveBayesModel
MultilayerPerceptronClassifier
MultilayerPerceptronClassificationModel
OneVsRestE
OneVsRestModelE
pyspark.ml.clustering module
BisectingKMeans
BisectingKMeansModel
BisectingKMeansSummaryE
KMeans
KMeansModel
GaussianMixture
GaussianMixtureModel
GaussianMixtureSummaryE
LDA
LDAModel
LocalLDAModel
DistributedLDAModel
pyspark.ml.linalg module
Vector
DenseVector
SparseVector
Vectors
Matrix
DenseMatrix
SparseMatrix
Matrices
pyspark.ml.recommendation module
ALS
ALSModel
pyspark.ml.regression module
AFTSurvivalRegressionE
AFTSurvivalRegressionModelE
DecisionTreeRegressor
DecisionTreeRegressionModel
GBTRegressor
GBTRegressionModel
GeneralizedLinearRegressionE
GeneralizedLinearRegressionModelE
GeneralizedLinearRegressionSummaryE
GeneralizedLinearRegressionTrainingSummaryE
IsotonicRegression
IsotonicRegressionModel
LinearRegression
LinearRegressionModel
LinearRegressionSummaryE
LinearRegressionTrainingSummaryE
RandomForestRegressor
RandomForestRegressionModel
pyspark.ml.stat module
ChiSquareTestE
CorrelationE
pyspark.ml.tuning module
ParamGridBuilder
CrossValidator
CrossValidatorModel
TrainValidationSplitE
TrainValidationSplitModelE
pyspark.ml.evaluation module
Evaluator
BinaryClassificationEvaluatorE
RegressionEvaluatorE
MulticlassClassificationEvaluatorE
pyspark.ml.fpm module
FPGrowthE
FPGrowthModelE
pyspark.ml.util module
Identifiable
JavaMLReadable
JavaMLReader
JavaMLWritable
JavaMLWriter
JavaPredictionModel
MLReadable
MLReader
MLWritable
MLWriter