Big Data Era Quick analysis, finding meaning beneath data.
Data Analysis 1. Preparing to run the Data (Munging) 2. Running the model (Analysis) 3. Interpreting the result
Machine Learning Black-box, algorithmic approach to producing predictions or classifications from data A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E Tom Mitchel (1998)
Using to do Machine Learning Using R
Why Using R? 1. Statistic analysis on the fly 2. Mathematical function and graphic module embedded 3. FREE! & Open Source!
Application of Machine Learning 1. Recommender systems 2. Pattern Recognition 3. Stock market analysis 4. Natural language processing 5. Information Retrieval
Regression Predict one set of numbers given another set of numbers Given number of friends x, predict how many goods I wil receive on each facebook posts
Scatter Plot dataset <- read.csv('fbgood.txt',head=TRUE, sep='\t', row.names=1) x = dataset$friends y = dataset$getgoods plot(x,y)
Linear Fit fit <- lm(y ~ x); abline(fit, col = 'red', lwd=3)
2nd order polynomial fit plot(x,y) polyfit2 <- lm(y ~ poly(x, 2)); lines(sort(x), polyfit2$fit[order(x)], col = 2, lwd = 3)
3rd order polynomial fit plot(x,y) polyfit3 <- lm(y ~ poly(x, 3)); lines(sort(x), polyfit3$fit[order(x)], col = 2, lwd = 3)
Other Regression Packages MASS rlm - Robust Regression GLM - Generalized linear Models GAM - Generalized Additive Models
Classfication Identifying to which of a set of categories a new observation belongs, on the basis of a training set of data Given features of bank costumer, predict whether the client wil subscribe a term deposit
Data Description Features: age,job,marital,education,default,balance,housing,loan,contact Labels: Customers subscribe a term deposit (Yes/No)
Classify Data With LibSVM library(e1071) dataset <- read.csv('bank.csv',head=TRUE, sep=';') dati = split.data(dataset, p = 0.7) train = dati$train test = dati$test model <- svm(y~., data = train, probability = TRUE) pred <- predict(model, test[,1:(dim(test)[]-1)], probability = TRUE)
Verify the predictions table(pred,test[,dim(test)]) pred no yes no 1183 99 yes 27 47