Zalando’s images classification using H2O with R

(This article was first published on Appsilon Data Science Blog, and kindly contributed to R-bloggers)

Fashion-MNIST

About three weeks ago the Fashion-MNIST dataset of Zalando’s article images, which is a great replacement of classical MNIST dataset, was released. In the following article we will try to build a strong classifier using H2O and R.

Each example is a 28×28 grayscale image, associated with a label from 10 classes:

T-shirt/top
Trouser
Pullover
Dress
Coat
Sandal
Shirt
Sneaker
Bag
Ankle boot

You can download it here https://www.kaggle.com/zalando-research/fashionmnist

The first column is an image label and the other 784 pixel columns are associated with the darkness of that pixel.

Quick reminder: what is H2O?

H2O is an open-source, fast, scalable platform for machine learning written in Java. It allows access to all of its capabilities from Python, Scala and most importantly from R via REST API.

Overview of available algorithms:

Supervised:
- Deep Learning (Neural Networks)
- Distributed Random Forest (DRF)
- Generalized Linear Model (GLM)
- Gradient Boosting Machine (GBM)
- Naive Bayes Classifier
- Stacked Ensembles
- XGBoost
Unsupervised
- Generalized Low Rank Models (GLRM)
- K-Means Clustering
- Principal Component Analysis (PCA)

Instalation is easy:

install.packages("h2o")library(h2o)

Building a neural network for image classification

Let’s start by running an H2O cluster:

h2o.init(ip="localhost",port=54321,nthreads=-1,min_mem_size="20g")

H2Oisnotrunningyet,startingitnow...Note:Incaseoferrorslookatthefollowinglogfiles:/tmp/RtmpQEf3RX/h2o_maju116_started_from_r.out/tmp/RtmpQEf3RX/h2o_maju116_started_from_r.erropenjdkversion"1.8.0_131"OpenJDKRuntimeEnvironment(build1.8.0_131-8u131-b11-2ubuntu1.16.04.3-b11)OpenJDK64-BitServerVM(build25.131-b11,mixedmode)StartingH2OJVMandconnecting:..Connectionsuccessful!RisconnectedtotheH2Ocluster:H2Oclusteruptime:1seconds906millisecondsH2Oclusterversion:3.13.0.3973H2Oclusterversionage:1monthand5daysH2Oclustername:H2O_started_from_R_maju116_cuf927H2Oclustertotalnodes:1H2Oclustertotalmemory:19.17GBH2Oclustertotalcores:8H2Oclusterallowedcores:8H2Oclusterhealthy:TRUEH2OConnectionip:localhostH2OConnectionport:54321H2OConnectionproxy:NAH2OInternalSecurity:FALSEH2OAPIExtensions:XGBoost,Algos,AutoML,CoreV3,CoreV4RVersion:Rversion3.4.1(2017-06-30)

Next we will import data into H2O using h2o.importFile() function, in which we can specify column types and column names if needed. If you want to send data into H2O directly from R, you can use as.h2o() function

fmnist_train<-h2o.importFile(path="data/fashion-mnist_train.csv",destination_frame="fmnist_train",col.types=c("factor",rep("int",784)))fmnist_test<-h2o.importFile(path="data/fashion-mnist_test.csv",destination_frame="fmnist_test",col.types=c("factor",rep("int",784)))

If everything went fine, we can check if our datasets are in H2O:

h2o.ls()

key1fmnist_test2fmnist_train

Before we begin modeling, let’s take a quick look at the data:

xy_axis<-data.frame(x=expand.grid(1:28,28:1)[,1],y=expand.grid(1:28,28:1)[,2])plot_theme<-list(raster=geom_raster(hjust=0,vjust=0),gradient_fill=scale_fill_gradient(low="white",high="black",guide=FALSE),theme=theme(axis.line=element_blank(),axis.text=element_blank(),axis.ticks=element_blank(),axis.title=element_blank(),panel.background=element_blank(),panel.border=element_blank(),panel.grid.major=element_blank(),panel.grid.minor=element_blank(),plot.background=element_blank()))sample_plots<-sample(1:nrow(fmnist_train),100)%>%map(~{plot_data<-cbind(xy_axis,fill=as.data.frame(t(fmnist_train[.x,-1]))[,1])ggplot(plot_data,aes(x,y,fill=fill))+plot_theme})do.call("grid.arrange",c(sample_plots,ncol=10,nrow=10))

100 Random items from Fashion-MNIST dataset

Now we will build a simple neural network, with one hidden layer of ten neurons:

fmnist_nn_1<-h2o.deeplearning(x=2:785,y="label",training_frame=fmnist_train,distribution="multinomial",model_id="fmnist_nn_1",l2=0.4,ignore_const_cols=FALSE,hidden=10,export_weights_and_biases=TRUE)

If we set export_weights_and_biases parameter to TRUE networks weights and biases will be saved and we can retrieve them using h2o.weights() and h2o.biases() functions. Thanks to this we can try to visualize neurons from the hidden layer (Note that we set ignore_const_cols to FALSE to get weights for every pixel).

weights_nn_1<-as.data.frame(h2o.weights(fmnist_nn_1,1))biases_nn_1<-as.vector(h2o.biases(fmnist_nn_1,1))neurons_plots<-1:10%>%map(~{plot_data<-cbind(xy_axis,fill=t(weights_nn_1[.x,])+biases_nn_1[.x])colnames(plot_data)[3]<-"fill"ggplot(plot_data,aes(x,y,fill=fill))+plot_theme})do.call("grid.arrange",c(neurons_plots,ncol=3,nrow=4))

Hidden layer

We can definitely see some resemblance to shirts and sneakers. Let’s test our model:

h2o.confusionMatrix(fmnist_nn_1,fmnist_test)

ConfusionMatrix:Rowlabels:Actualclass;Columnlabels:Predictedclass0123456789ErrorRate08011214872362512200.1990=199/10001693823251340000.0620=62/100022446957188184901500.3050=305/100034323128652113220100.1350=135/10004161384477014250200.2300=230/10005001008650907370.1350=135/100062736224532624610702810.8930=893/100070000010708380550.1620=162/1000841132253610889740.1030=103/100090000040010408560.1440=144/1000Totals1152990112011031249117824210419729530.2368=2368/10000

Accuracy 0.7632 isn’t a great result, but we didn’t use full capabilities of H2O yet. We should do something more advanced!

In h2o.deeplearning() function there’s over 70 parameters responsible for structure and optimization of our model. Changing thme should give as much better results.

fmnist_nn_final<-h2o.deeplearning(x=2:785,y="label",training_frame=fmnist_train,distribution="multinomial",model_id="fmnist_nn_final",activation="RectifierWithDropout",hidden=c(1000,1000,2000),epochs=180,adaptive_rate=FALSE,rate=0.01,rate_annealing=1.0e-6,rate_decay=1.0,momentum_start=0.4,momentum_ramp=384000,momentum_stable=0.98,input_dropout_ratio=0.22,l1=1.0e-5,max_w2=15.0,initial_weight_distribution="Normal",initial_weight_scale=0.01,nesterov_accelerated_gradient=TRUE,loss="CrossEntropy",fast_mode=TRUE,diagnostics=TRUE,ignore_const_cols=TRUE,force_load_balance=TRUE,seed=3.656455e+18)h2o.confusionMatrix(fmnist_nn_final,fmnist_test)

ConfusionMatrix:Rowlabels:Actualclass;Columnlabels:Predictedclass0123456789ErrorRate08980141511660500.1020=102/100012990260000000.0100=10/1000212187513601350300.1250=125/1000316118925231140200.0750=75/100041061218850300200.1150=115/10005001009640241100.0360=36/10006131266225007220700.2780=278/10007000001009630270.0370=37/100084141123298110.0190=19/1000900000603709570.0430=43/1000Totals10641005103110031020985870102610019950.0840=840/10000

Accuracy 0.916 is a lot better result, but there’s sitll a lot of thing we can do to improve our model. In the future we can consider using a grid or random search to find best hyperparameters or use same ensemble methods to get better results.

var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) {var s = d.createElement(t); s.type = 'text/javascript'; s.async = true;s.src = '//cdn.viglink.com/api/vglnk.js';var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: Appsilon Data Science Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

Zalando’s images classification using H2O with R

Fashion-MNIST

Quick reminder: what is H2O?

Building a neural network for image classification

Trending Articles

Bath man appears in court charged with attempted murder of a man...

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

Universal Multi-Patch v1.3 By RADIXX11

Notorious Naushad of Ippa gang nabbed

100 Broken Heart Quotes Status for Whatsapp in English

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Police blotter for Jan. 12

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

Black Angus Grilled Artichokes

Extra Question Answer for Chapter 7 India’s Cultural Roots Class 6 Social...

Practice Sheet of Right form of verbs for HSC Students

99 God Status for Whatsapp, Facebook

IWAN – Thanks and Praise ( Throw Back Thursday )

[E² Plugin] HDF-Radio

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List