Fungsi untuk visualisasi
Library yang diperlukanadd_title <- function(vis, ..., x_lab = "X units", title = "Plot Title") { add_axis(vis, "x", title = x_lab) %>% add_axis("x", orient = "top", ticks = 0, title = title, properties = axis_props( axis = list(stroke = "white"), labels = list(fontSize = 0) ), ...) }
1. Analisis Regresilibrary(ggplot2) library(ggvis) library(NbClust)#cluster library(fpc)#analisis diskriminan library(lmtest) library(rattle)
Pemodelan dengan dataset Trees
Langkah pertama cek distribusi data untuk setiap variabel Girth terhadap Volume:
dan Height terhadap Volume:trees %>% ggvis (~Girth,~Volume) %>% layer_boxplots() %>% add_title(title = "Indentifying Anomaly data", x_lab = "#Girth session")
Dari uji diagnostik terlihat ada pola linier (positif) untuk setiap variabel. Tampak juga ada beberapa variansi data yang besar dari “height” terhadap Volume.trees %>% ggvis (~Height,~Volume) %>% layer_boxplots() %>% add_title(title = "Indentifying Anomaly data", x_lab = "#Height")
Untuk melanjutkan analisis regresi agar menghasilkan model yang baik, maka beberapa yang terindentifikasi outlier (atau anomali) dapat dihilangkan dalam perhitungan model.
Metode yang digunakan untuk hal ini dengan menggunakan jarak Cook.
mod <- lm(data=trees, Volume~Girth+Height ) cooksd <- cooks.distance(mod)
#ploting cooksd plot(cooksd, pch="*", cex=2, main="Influential Obs by Cooks distance") abline(h = 4*mean(cooksd, na.rm=T), col="red") text(x=1:length(cooksd)+1, y=cooksd, labels=ifelse(cooksd>4*mean(cooksd, na.rm=T),names(cooksd),""), col="red")
Kemudian, row data yang teridentifikasi sebagai outlier dapat dieliminasi dalam model
influential <- as.numeric(names(cooksd)[(cooksd > 4*mean(cooksd, na.rm=T))]) clean.data <- data.frame(trees[-influential, ])
summary(model <-lm(data=clean.data, Volume~Girth+Height))
##
## Call:
## lm(formula = Volume ~ Girth + Height, data = clean.data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.6396 -1.2786 -0.4938 2.0262 6.4599
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -52.2362 8.0390 -6.498 5.76e-07 ***
## Girth 4.4773 0.2518 17.781 < 2e-16 ***
## Height 0.2992 0.1179 2.538 0.0172 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.49 on 27 degrees of freedom
## Multiple R-squared: 0.9437, Adjusted R-squared: 0.9395
## F-statistic: 226.3 on 2 and 27 DF, p-value: < 2.2e-16
Kesimpulan yang didapat, bahwa Girth dan Height memiliki pengaruh yang kuat terhadap Volume pohon. Sehingga dapat dikatakan juga untuk menghitung Volume sebuah pohon 93% dapat digambarkan menggunakan nilai Girth dan Height nya.
Sebagai tambahan, dapat juga dilakukan uji kebaikan model dengan uji heteroskedastik error, model dikatakan baik jika H0 gagal ditolak (homoskedastik). Error yang homogen dapat mengidentifikasikan hasil estimasi stabil (selisih antara estimasi dan nilai asli tidak terlalu bervariasi)bptest(model)
##
## studentized Breusch-Pagan test
##
## data: model
## BP = 1.8397, df = 2, p-value = 0.3986
Normalitas error, jika error yang dihasilkan model berdistirbusi normal maka dapat dikatakan model baik
res=residuals(model,type="response") shapiro.test(res)
##
## Shapiro-Wilk normality test
##
## data: res
## W = 0.96008, p-value = 0.3113
Kemudian uji multikoliniearitas. Di anggap baik Jika nilai VIF dibawah 5.
library(fmsb) VIF(model)
## [1] 17.76253
Karena model teridentifikasi multikolinieritas, maka model dapat disusun ulang sebagai berikut dengan mempertimbangkan nilai t statistik terbesar untuk variabel independen.
summary(model1 <-lm(data=clean.data, Volume~Girth))
##
## Call:
## lm(formula = Volume ~ Girth, data = clean.data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.5036 -2.3834 -0.0489 2.3951 6.3726
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -33.3104 3.2784 -10.16 6.76e-11 ***
## Girth 4.7619 0.2464 19.33 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.813 on 28 degrees of freedom
## Multiple R-squared: 0.9303, Adjusted R-squared: 0.9278
## F-statistic: 373.6 on 1 and 28 DF, p-value: < 2.2e-16
2. Logistic Regression
library(datasets) dim(Titanic)
## [1] 4 2 2 2
Mempersiapkan data
classes <- dimnames(Titanic)[[1]] genders <- dimnames(Titanic)[[2]] ages <- dimnames(Titanic)[[3]] class <- gender <- age <- N <- Y <- rep(0,4*2*2) count<-0 for(i1 in 1:4){ for(i2 in 1:2){ for(i3 in 1:2){ count <- count + 1 class[count] <- classes[i1] gender[count] <- genders[i2] age[count] <- ages[i3] N[count] <- sum(Titanic[i1,i2,i3,]) Y[count] <- Titanic[i1,i2,i3,2] } } } Xc1 <- ifelse(class=="1st",1,0) Xc2 <- ifelse(class=="2nd",1,0) Xc3 <- ifelse(class=="3rd",1,0) Xg <- ifelse(gender=="Female",1,0) Xa <- ifelse(age=="Adult",1,0) n <- 16 cbind(class,Xc1,Xc2,Xc3,gender,Xg,age,Xa,N,Y)
## class Xc1 Xc2 Xc3 gender Xg age Xa N Y
## [1,] "1st" "1" "0" "0" "Male" "0" "Child" "0" "5" "5"
## [2,] "1st" "1" "0" "0" "Male" "0" "Adult" "1" "175" "57"
## [3,] "1st" "1" "0" "0" "Female" "1" "Child" "0" "1" "1"
## [4,] "1st" "1" "0" "0" "Female" "1" "Adult" "1" "144" "140"
## [5,] "2nd" "0" "1" "0" "Male" "0" "Child" "0" "11" "11"
## [6,] "2nd" "0" "1" "0" "Male" "0" "Adult" "1" "168" "14"
## [7,] "2nd" "0" "1" "0" "Female" "1" "Child" "0" "13" "13"
## [8,] "2nd" "0" "1" "0" "Female" "1" "Adult" "1" "93" "80"
## [9,] "3rd" "0" "0" "1" "Male" "0" "Child" "0" "48" "13"
## [10,] "3rd" "0" "0" "1" "Male" "0" "Adult" "1" "462" "75"
## [11,] "3rd" "0" "0" "1" "Female" "1" "Child" "0" "31" "14"
## [12,] "3rd" "0" "0" "1" "Female" "1" "Adult" "1" "165" "76"
## [13,] "Crew" "0" "0" "0" "Male" "0" "Child" "0" "0" "0"
## [14,] "Crew" "0" "0" "0" "Male" "0" "Adult" "1" "862" "192"
## [15,] "Crew" "0" "0" "0" "Female" "1" "Child" "0" "0" "0"
## [16,] "Crew" "0" "0" "0" "Female" "1" "Adult" "1" "23" "20"
Membuat fungsi logistik dengan metode estimasi Likelihood
Modelling menggunakan logistik modellogistic_model <- "model{ # Likelihood for(i in 1:n){ Y[i] ~ dbin(q[i],N[i]) logit(q[i]) <- beta[1] + beta[2]*Xc1[i] + beta[3]*Xc2[i] + beta[4]*Xc3[i] + beta[5]*Xg[i] + beta[6]*Xa[i] } #Priors for(j in 1:6){ beta[j] ~ dnorm(0,0.01) } }"
Menghitung nilai DIClibrary(rjags)
dat <- list(Y=Y,n=n,N=N,Xc1=Xc1,Xc2=Xc2,Xc3=Xc3,Xg=Xg,Xa=Xa) model1 <- jags.model(textConnection(logistic_model),data = dat,n.chains=3, quiet=TRUE)
Konvergensi Diagnosisdic1 <- dic.samples(model1, variable.names=c("beta"), n.iter=20000, progress.bar="none")
samp <- coda.samples(model1, variable.names=c("beta"), n.iter=20000, progress.bar="none") autocorr.plot(samp)
gelman.plot(samp)
summary(samp)
##
## Iterations = 21001:41000
## Thinning interval = 1
## Number of chains = 3
## Sample size per chain = 20000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## beta[1] -0.1643 0.2613 0.0010670 0.008718
## beta[2] 0.8569 0.1583 0.0006461 0.001258
## beta[3] -0.1651 0.1739 0.0007099 0.001575
## beta[4] -0.9280 0.1486 0.0006068 0.001882
## beta[5] 2.4314 0.1401 0.0005722 0.001060
## beta[6] -1.0711 0.2485 0.0010145 0.008102
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## beta[1] -0.6739 -0.3353 -0.1651 0.008272 0.3573
## beta[2] 0.5465 0.7511 0.8572 0.963007 1.1664
## beta[3] -0.5081 -0.2825 -0.1631 -0.045948 0.1722
## beta[4] -1.2229 -1.0272 -0.9267 -0.827322 -0.6376
## beta[5] 2.1599 2.3363 2.4310 2.525828 2.7076
## beta[6] -1.5624 -1.2349 -1.0703 -0.907697 -0.5834
Kesimpulan:
- Perempuan (mean Beta 5 >0) bertahan hidup lebih tinggi (peluang) dibandingkan laki-laki (mean Beta 4 <0)
- Crew Kelas kedua (Mean Beta 2 > 0) bertahan hidup lebih baik dibandingkan Crew kelas Pertama (Mean Beta 1 <0) dan Crew kelas ketiga (mean beta 3 <0)
- Orang dewasa (adults) memiliki kemungkinan bertahan lebih buruk (mean beta 6 <0) dibandingkan anak-anak (child) (mean beta 5> 0)
Langkah pertama melihat struktur data
head(wine,3)
## Type Alcohol Malic Ash Alcalinity Magnesium Phenols Flavanoids
## 1 1 14.23 1.71 2.43 15.6 127 2.80 3.06
## 2 1 13.20 1.78 2.14 11.2 100 2.65 2.76
## 3 1 13.16 2.36 2.67 18.6 101 2.80 3.24
## Nonflavanoids Proanthocyanins Color Hue Dilution Proline
## 1 0.28 2.29 5.64 1.04 3.92 1065
## 2 0.26 1.28 4.38 1.05 3.40 1050
## 3 0.30 2.81 5.68 1.03 3.17 1185
str(wine)
## 'data.frame': 178 obs. of 14 variables:
## $ Type : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
## $ Alcohol : num 14.2 13.2 13.2 14.4 13.2 ...
## $ Malic : num 1.71 1.78 2.36 1.95 2.59 1.76 1.87 2.15 1.64 1.35 ...
## $ Ash : num 2.43 2.14 2.67 2.5 2.87 2.45 2.45 2.61 2.17 2.27 ...
## $ Alcalinity : num 15.6 11.2 18.6 16.8 21 15.2 14.6 17.6 14 16 ...
## $ Magnesium : int 127 100 101 113 118 112 96 121 97 98 ...
## $ Phenols : num 2.8 2.65 2.8 3.85 2.8 3.27 2.5 2.6 2.8 2.98 ...
## $ Flavanoids : num 3.06 2.76 3.24 3.49 2.69 3.39 2.52 2.51 2.98 3.15 ...
## $ Nonflavanoids : num 0.28 0.26 0.3 0.24 0.39 0.34 0.3 0.31 0.29 0.22 ...
## $ Proanthocyanins: num 2.29 1.28 2.81 2.18 1.82 1.97 1.98 1.25 1.98 1.85 ...
## $ Color : num 5.64 4.38 5.68 7.8 4.32 6.75 5.25 5.05 5.2 7.22 ...
## $ Hue : num 1.04 1.05 1.03 0.86 1.04 1.05 1.02 1.06 1.08 1.01 ...
## $ Dilution : num 3.92 3.4 3.17 3.45 2.93 2.85 3.58 3.58 2.85 3.55 ...
## $ Proline : int 1065 1050 1185 1480 735 1450 1290 1295 1045 1045 ...
Melakukan standarisasi data, dan membuat kolom pertama (factor fields)
Kemudian menentukan berapa jumlah cluster yang harus terbentuk hingga hasil optimaldataLatih <- scale(wine[-1])
nc <- NbClust(dataLatih, min.nc=2, max.nc=15, method="kmeans")
## *** : The Hubert index is a graphical method of determining the number of clusters.
## In the plot of Hubert index, we seek a significant knee that corresponds to a
## significant increase of the value of the measure i.e the significant peak in Hubert
## index second differences plot.
## *** : The D index is a graphical method of determining the number of clusters.
## In the plot of D index, we seek a significant knee (the significant peak in Dindex
## second differences plot) that corresponds to a significant increase of the value of
## the measure.
##
## *******************************************************************
## * Among all indices:
## * 4 proposed 2 as the best number of clusters
## * 15 proposed 3 as the best number of clusters
## * 1 proposed 10 as the best number of clusters
## * 1 proposed 12 as the best number of clusters
## * 1 proposed 14 as the best number of clusters
## * 1 proposed 15 as the best number of clusters
##
## ***** Conclusion *****
## * According to the majority rule, the best number of clusters is 3
## *******************************************************************
barplot(table(nc$Best.n[1,]), xlab="Jumlah Cluster", ylab="Jumlah Variabel", main="Jumlah Cluster berdasarkan 26 Variabel")
Dari 2 metode diatas,ditetapkan 3 sebagai nilai Jumlah cluster yang akan dibentuk.wss <- 0 for (i in 1:15){ wss[i] <- sum(kmeans(dataLatih, centers=i)$withinss) } plot(1:15, wss, type="b", xlab="Jumlah Cluster", ylab="WSS")
Kemudian membentuk cluster menggunakan kmeans
kemudian divisualisasi menggunakan metode diskriminan untuk mereduksi dimensi, (menggambarkan banyak variabel menjadi 2 dimensi)modelCluster <- kmeans(dataLatih,3)
Terakhir, mengevaluasi model dengan menggunakan tabel, dengan mencocokan variabel type pada dataset wine dengan hasil cluster yang terbentukplotcluster(dataLatih, modelCluster$cluster)
table(wine$Type,modelCluster$cluster)
## 1 2 3
## 1 59 0 0
## 2 3 3 65
## 3 0 48 0
Dari Hasil didapatkan dengan model clustering yang digunakan, kemudian dibandingkan dengan data asli maka ditemukan terdapat 6 unit yang missclassfied. Sehingga didapat akurasi kebaikan hasil cluster dibandingkan dengan data asli maka:
(178-6)/178
## [1] 0.9662921
Thank You,
Have Fun.
Tag :
DataMining,
R
12 Komentar untuk "Simpel Regresi Linier, Logistik dan K-Means Clustering Menggunakan R"
Stress when used as a term in vocabulary does not seem dangerous but when we take into account the long term consequences of ‘stress’ you can see it takes a toll on your mental health and overall well-being. Gurgaon escorts have done a great deal in not only entertaining guests who come to this city but also in taking the stress factor out of their daily life. Escorts in Gurgaon
You are only required to let me recognize about your requirements, place, date and time. I shall assemble each and every other thing for you and would make confident you have a great and acceptable time with me.
Delhi Escorts
Hi Friends, I am Riya Sharma, an Independent Escort in Bangalore. Call me on OOOOOOOOOOO and get instant reserving for VIP Escort services in Bangalore
Escorts Bangalore
Koramangala Call Girls
Kr Puram Call Girls
Call girls BTM Layout Bangalore
Call girls Electronic City
Welcome to VIP hifi Model Escorts in Hebbal agency, we are the top charted agency for gentlemen elite hilarity services to provide industry top ranking model Escorts in Hebbal.
Call Girls Hebbal
Call Girls Hsr Layout
Call girls Indiranagar
Call girls JP Nagar
Call girls Malleswaram
They will make you annoyed to have more and more of the enjoyment that you seem for in your lone times.
Call girls Marathahalli
Call girls MG Road
Call girls Mysore Road
Call girls Rajajinagar
Call girls Whitefield
Calangute escorts agency offer service in Calangute, Colva and Condolim our hottest decision women for enjoyable and offer full satisfaction with final freelance feminine escorts in Calangute
Goa Escorts
Escorts Calangute Beach
Escorts Candolim Beach
Escorts Colva Beach
There is no one without dream of spending mind blowing time with a very hot escort girl in the town. There are hundreds of obstacles are previously standing in front of this dream and those factors made it an uneasy aim
Escorts Panaji
Escorts Margao
Escorts Vasco Da Gama
Ahmedabad is a standout between the most international cities in the India and explorers from around the world visit for business, delight and for shopping. Ahmedabad is only the place to welcome
Escorts Ahmedabad
Our extreme Companions are obvious for their longing to make this their own specific chosen calling, their require
Escorts Pune
Escorts Hinjewadi
you can hire self-governing Escorts in Bangalore. These astonishing beauty know what your body stress. Their plea,loveliness, category, and approach are literally without equal.Independent Escorts Bangalore | Escorts Bangalore | Bangalore Escorts | Female Escorts Bangalore
Gurgaon Escorts
Escorts in Gurgaon
Gurugram Escorts Agency
Escorts in Aerocity
South City Escorts Gurgaon
Kolkata Escorts Service
Gurgaon Escorts Service
Escorts Service in Gurgaon
Gurgaon Escorts Agency
DLF City Escorts Gurgaon
Membuja kursus statistik kah
Silahkan tinggalkan komentar, kritik, maupun saran dari sobat blogger tentang apa yang sobat rasakan setelah mengunjungi blog ini.