MLlib is Apache Spark's scalable machine learning library.

Ease of use

Usable in Java, Scala, Python, and R.

MLlib fits into Spark's APIs and interoperates with NumPy in Python (as of Spark 0.9) and R libraries (as of Spark 1.5). You can use any Hadoop data source (e.g. HDFS, HBase, or local files), making it easy to plug into Hadoop workflows.

data = spark.read.format("libsvm")\
  .load("hdfs://...")

model = KMeans(k=10).fit(data)
Calling MLlib in Python

Performance

High-quality algorithms, 100x faster than MapReduce.

Spark excels at iterative computation, enabling MLlib to run fast. At the same time, we care about algorithmic performance: MLlib contains high-quality algorithms that leverage iteration, and can yield better results than the one-pass approximations sometimes used on MapReduce.

Logistic regression in Hadoop and Spark

Runs everywhere

Spark runs on Hadoop, Apache Mesos, Kubernetes, standalone, or in the cloud, against diverse data sources.

You can run Spark using its standalone cluster mode, on EC2, on Hadoop YARN, on Mesos, or on Kubernetes. Access data in HDFS, Apache Cassandra, Apache HBase, Apache Hive, and hundreds of other data sources.

Algorithms

MLlib contains many algorithms and utilities.

ML algorithms include:

  • Classification: logistic regression, naive Bayes,...
  • Regression: generalized linear regression, survival regression,...
  • Decision trees, random forests, and gradient-boosted trees
  • Recommendation: alternating least squares (ALS)
  • Clustering: K-means, Gaussian mixtures (GMMs),...
  • Topic modeling: latent Dirichlet allocation (LDA)
  • Frequent itemsets, association rules, and sequential pattern mining

ML workflow utilities include:

  • Feature transformations: standardization, normalization, hashing,...
  • ML Pipeline construction
  • Model evaluation and hyper-parameter tuning
  • ML persistence: saving and loading models and Pipelines

Other utilities include:

  • Distributed linear algebra: SVD, PCA,...
  • Statistics: summary statistics, hypothesis testing,...

Refer to the MLlib guide for usage examples.

Community

MLlib is developed as part of the Apache Spark project. It thus gets tested and updated with each Spark release.

If you have questions about the library, ask on the Spark mailing lists.

MLlib is still a rapidly growing project and welcomes contributions. If you'd like to submit an algorithm to MLlib, read how to contribute to Spark and send us a patch!

Getting started

To get started with MLlib:

  • Download Spark. MLlib is included as a module.
  • Read the MLlib guide, which includes various usage examples.
  • Learn how to deploy Spark on a cluster if you'd like to run in distributed mode. You can also run locally on a multicore machine without any setup.
主站蜘蛛池模板: 国精品无码一区二区三区在线| 亚洲aⅴ在线无码播放毛片一线天| 午夜网站免费版在线观看| 国产乱女乱子视频在线播放| 国产成人无码区免费A∨视频网站 国产成人无码区免费内射一片色欲 | 亚洲另类春色校园小说| 亚洲精品视频在线免费| 国产一级理论片| 国产精品2019| 国产精品视频2020| 天天摸天天摸天天躁| 无码超乳爆乳中文字幕久久| 日本中文在线观看| 日韩精品一区二区三区免费视频| 欧美卡一卡2卡三卡4卡在线| 日本乱偷互换人妻中文字幕| 大炕上农村岳的乱| 精品一区二区久久久久久久网站| 欧美丰满熟妇XXXX| 女人让男人直接桶| 国产无遮挡又黄又爽又色| 国产乱人伦无无码视频试看| 国产一区二区不卡免费观在线 | 亚洲人成网站在线观看播放动漫 | 99在线热视频| 中文字幕三级理论影院| 五月婷日韩中文字幕| 亚洲AV永久无码天堂网| 久久国产高清视频| 又大又紧又硬又湿a视频| 噜噜影院无毒不卡| 免费人成无码大片在线观看| 亚洲精品一区二区三区四区乱码| 亚洲最大成人网色香蕉| 亚洲一级在线观看| 两根硕大一起挤进小h| 中国speakingathome宾馆学生| chinese乱子伦xxxx视频播放| 西西人体www高清大胆视频| 91色视频在线| 男女猛烈无遮掩免费视频|