Download The Free eBook

The Skymind Wiki eBook: Part 6

Datasets, Model Evaluation, and Frameworks

A brief rundown of machine learning workflows, datasets, evaluation metrics, and machine learning frameworks. 

AI begins with data, and without data, there can be no AI. The process of gathering, moving, storing, cleaning and exploring your data are the necessary first steps in building a machine-learning solution. They happen after you have selected the predictive problem you want to solve, and before you have selected an algorithm to apply to your data.

A good dataset can transform an entire industry. Fei Fei Li's ImageNet dataset was a turning point for image processing and is now used to benchmark all new image-processing algorithms. Gathering and annotating data properly can be difficult and expensive, but without a good labeled dataset, there can be no supervised learning and no classifiers.

Once you've prepared and pre-processed your data, it will be time to test various machine learning algorithms on it, to see which ones manage to make the most accurate predictions. To know and monitor the performance of your algorithms, you will need a model evaluation tool and ways of measuring and comparing their performance amongst themselves and over time.