Write some Software
This project was awarded to SabidHabib for $80 USD.Get free quotes for a project like this
Project Budget$10 - $30 USD
There are three types of data for assignment. The same data will be used for project. The project will be a group project. The assignments are individual assignments.
Dataset 1: This data set is taken from the UCI- Public data set.
This data contains labelled Cellphone Spam and Good messages. This dataset is well described. Please read the description before you start working on it.
Link to download dataset: [url removed, login to view]+Spam+Collection
Dataset 2: This data set is taken from the UCI- Public data set.
It contains sentences labelled with positive or negative sentiment, extracted from reviews of products, movies, and restaurants. Please read the description before you start working with it.
Link to download dataset: [url removed, login to view]+Labelled+Sentences
Dataset 3: Wikipedia data!
I chose the category CLASSIFICATION_Algorithms. It has 3 categories listed under it: Artificial Intelligence, Decision Tree, Ensemble Learning. We will use these categories as class labels. From each one these categories sample 14 pages. Do not sample pages under CLASSIFICATION_Algorithms!! Use these pages for assignments and projects.
There are two sets of Wikipedia articles. The first set is from Wikipedia featured articles of a certain type. The first set becomes class Featured. The second set of articles are wikipedia (non-featured) articles of similar type to featured articles. The second set becomes class Non-Featured. We are dealing with a binary classification problem.
To create attributes, extract all possible tokens from the entire dataset after stemming and stop-word removal. Create 1-gram, 2-gram and 3-grams from these tokens. Use these n-grams as the attributes for ARFF files.
Perform attribute selection on each of 1-gram, 2gram, 3-gram an using information gain and gain ratio. Perform classification using decision tree, and naïve Bayes.
Make a Wiki report on your finding including various statistical evaluation measures given by WEKA for each classifier.
Link: Classification_algorithms: [url removed, login to view]:Classification_algorithms
Link: Artificial Intelligence: [url removed, login to view]:Artificial_neural_networks
Link: Decision Tree: [url removed, login to view]:Decision_trees
Link: Ensemble Learning: [url removed, login to view]:Ensemble_learning
Stemming and Stop-Word removal: You can use NLTK!!
Stemming: Convert to root word. Running-->.Run
Stop words: High frequency but low meaning
[url removed, login to view]
Browse Related Skills
Other things people do on Freelancer
Looking to make some money?
- Set your budget and the timeframe
- Outline your proposal
- Get paid for your work
Hire Freelancers who also bid on this project
Looking for work?
Work on projects like this and make money from home!Sign Up Now
- The New York Times
- Wall Street Journal
- Times Online