There are 2 parts of the assignment.
Part A)
1. Assume that you have a Facebook data set that has 25784 HKUST students in total. Each student is described by 10 features(items in association rule terminology). Assume you have calculated the counts for three variables.
a. likes-Harry-Potter 2324
b. music-major 1029
c. theatre-major 878
1)What would be the largest possible support for associations containing one of this three "items", two of them and three of them respectively?
2).If 166 Theatre majors liked Harry Potter (TM --> HP), what would be the support, confidence and lift of this rule?
Other 2 questions of part A you may find in the attachment.
Part B)
In the second attachment you may find a real hall-of-fame baseball player data set, in which there are 1340 players and among them 125 players have actually been selected into hall of fame. Each player is described by 16 features about their career statistics such as number of home run, positions, triples, etc,. You will investigate if K nearest neighbor method is able to predict the hall of fame selection given a player's statistics. K nearest neighbor technique is called IBK under "Lazy" category of classification methods.
Answer the following questions:
1) Does the k-NN method work?Why?
2) Conceptually, what are the pros and cons of setting the k as 1 or a very large number?
3) Experiment different values of k and report the CCI you get using 10-fold cross-validation.
I do my master using weka, you can visit my curriculum here [login to view URL]
And I´m working in my PhD with weka too. If you are interesting I can do the job, tell me if you need some weka version or if I can use the last one.
Regards
Noel
$70 USD in 1 day
0.0 (0 reviews)
1.7
1.7
3 freelancers are bidding on average $45 USD for this job
Is your project due in 2 days time?
I can help with the project but its too short a deadline for me.
perhaps providing assistance on how to approach the project?