dislib

Big Data Distributed Computing
dislib is a distributed computing library highly focused on machine learning on top of PyCOMPSs. Inspired by NumPy and scikit-learn, dislib provides various supervised and unsupervised learning algorithms through an easy-to-use API.
Software Author: 
Workflows and Distributed Computing Group
Contact:

Javier Álvarez (javier.alvarez@bsc.es)

Rosa M. Badia (rosa.m.badia@bsc.es)

 

Primary tabs

0.4.0 (Latest Version)

This version introduces distributed arrays.

Release Notes

Dependencies

- PyCOMPSs == 2.5
- Scikit-learn >= 0.19.2
- NumPy >= 1.15.4
- Scipy >= 1.0.0

 

 

Breaking Changes

- Most estimator methods, such as fit and predict, now expect one or two ds-arrays instead of a Dataset.

 

 

New Features

- This release introduces the distributed array as the main data structure in dislib. All estimators have been modified to accept ds-arrays instead of Datasets. The Dataset and Subset classes have been removed.

 

 

Bug Fixes

- Minor bug fixes in RandomForestClassifier and K-means

 

 

Improvements

- The performance of various algorithms has been improved by using PyCOMPSs COLLECTIONS.
- K-means now accepts an 'init' parameter.

 

Old Versions

0.3.0

0.2.0

0.1.1