dislib

Big Data Distributed Computing
dislib is a distributed computing library highly focused on machine learning on top of PyCOMPSs. Inspired by NumPy and scikit-learn, dislib provides various supervised and unsupervised learning algorithms through an easy-to-use API.
Software Author: 
Workflows and Distributed Computing Group
Contact:

Javier Álvarez (javier.alvarez@bsc.es)

Rosa M. Badia (rosa.m.badia@bsc.es)

 

Primary tabs

0.5.0 (Latest Version)

Among other things, this version includes grid search and randomized search with cross-validation.

Release Notes

New Features

- Added grid search and randomized search with cross-validation
- Added K-fold splitter
- dislib command line can now run jupyter notebooks

Bug Fixes

- Fixed various bugs in fancy indexing of ds-arrays
- dislib command line now works on MacOS
- Fixed "source" links in the documentation to point to the appropriate version of the source code
- dislib command line now works even if PyCOMPSs is not installed

Improvements

- Added a new notebook and improved the existing one
- PCA now supports sparse data
- Estimators now extend scikit-learn's base estimator for greater integration

Old Versions

0.4.0

0.3.0

0.2.0

0.1.1