The data science toolbox

A collection of useful tools for data science, packaged as a virtual environment.

The Data Science Toolbox is an Ubuntu Linux distribution that comes with a huge number of tools for data-intensive research pre-installed. This includes the complete Scientific Python stack as well as R and tools for handling CSV-formatted data.

It’s still pretty basic in the sense of assuming you know what you want and know how to use it: it’s not a learning environment, but it certainly takes a lot of pain out of setting up the various tools, especially if you haven’t done much of that sort of thing before.

One interesting thing from our perspective is that the Toolbox can be run in two ways. Locally, it takes the form of a virtual machine that you can run on your PC: it runs a “Linux machine in a window” on a PC, Mac, or indeed “real” Linux machine. That means it (or you) can’t corrupt your local settings or interfere with your other work. If even that’s too real for you, it will also run in the cloud, on Amazon’s AWS platform, with 750 free hours. That’s certainly enough for a thorough evaluation, and we could think about running instances in our own data centre if there was demand.

