Big Data Distributed Computing
dislib is a distributed computing library highly focused on machine learning on top of PyCOMPSs. Inspired by NumPy and scikit-learn, dislib provides various supervised and unsupervised learning algorithms through an easy-to-use API.
Software Author: 
Workflows and Distributed Computing Group

Javier Álvarez (javier.alvarez@bsc.es)

Rosa M. Badia (rosa.m.badia@bsc.es)


Primary tabs

0.4.0 (Latest Version)

This version introduces distributed arrays.

Release Notes


- PyCOMPSs == 2.5
- Scikit-learn >= 0.19.2
- NumPy >= 1.15.4
- Scipy >= 1.0.0



Breaking Changes

- Most estimator methods, such as fit and predict, now expect one or two ds-arrays instead of a Dataset.



New Features

- This release introduces the distributed array as the main data structure in dislib. All estimators have been modified to accept ds-arrays instead of Datasets. The Dataset and Subset classes have been removed.



Bug Fixes

- Minor bug fixes in RandomForestClassifier and K-means




- The performance of various algorithms has been improved by using PyCOMPSs COLLECTIONS.
- K-means now accepts an 'init' parameter.


Old Versions