Sunday 28 October 2018

Flavors of Machine Learning

Machine Learning in Marketing
image source: www.ie.edu


Machine learning is a part of artificial intelligence and also known as a method to study the data. It relates to the development of algorithms or computer systems with the ability to automatically study from data, identify pattern and predict the outcomes without or with minimal intervention from human. Machine learning covers a wide range of applications in the field that deal with massive quantities of data.
There are four basic steps to perform a machine learning task. It starts with data collection at which the raw data can be in the form of image, sound, or any text files. Some application such in biometric may require specific acquisition device to capture finger vein images. The second step is data preparation to determine or select the data with quality. Raw data may contain outliers or noise and even missing information. Therefore, it is important to fix this issue. Specific technique or algorithm is employed in this stage to extract useful informations from the raw data. For an example, the Principle Component Analysis is a common technique to extract important features from image [1].
The next step is choosing appropriate algorithm that works with data. Over the years, researchers have been developed algorithms for specific types of data. Some algorithms are suitable for image and others are well suited for text-based type. In this step, the data is divided into two blocks; training and testing. The training set will be the majority of set and it is used to build a model. Meanwhile, the testing will be used to evaluate model’s performance. In general, machine learning algorithms are categorized as supervised and unsupervised. Supervised technique makes prediction of output by learning the labeled input data [2]. On the other hand, all data in unsupervised technique are unlabeled and the algorithm studies the structure of data to predict the output. Finally, the model evaluation is to measure the performance of the trained model with the introduction of testing set. A number of criteria can be used to evaluate the strength and weakness of the model such as storage reduction, noise tolerance, generalization accuracy and time requirements [3].

Reference :
[1]      M. S. Mohd Asaari, S. a. Suandi, and B. A. Rosdi, “Fusion of Band Limited Phase Only Correlation and Width Centroid Contour Distance for finger based biometrics,” Expert Syst. Appl., vol. 41, no. 7, pp. 3367–3382, Jun. 2014.
[2]      J. S. Sánchez, R. Barandela, A. I. Marqués, R. Alejo, and J. Badenas, “Analysis of new techniques to obtain quality training sets,” Pattern Recognit. Lett., vol. 24, no. 7, pp. 1015–1022, 2003.
[3]      F. Herrera, “Prototype Selection for Nearest Neighbor Classification : Taxonomy and Empirical Study,” vol. 34, no. 3, pp. 417–435, 2012.
By: Nordiana Mukahar