Search online to download reasonably large dataset. Define your own problem based on the dataset and provide a solution to it with your knowledge of Apache PySpark platform. You may obtain some idea for defining your own problem by referring to research papers. Include the reference in this case.
Prepare a final report including
1) motivation, 2) design, and 3) relevant source code and screen shots.
Also explain difficulties experienced and how to resolve them. Clearly indicate which item you attempt. Convert your report into PDF file.
5 freelancers are bidding on average $159 for this job
Hi, I have more that 4+years of experience in hadoop and data mining technologies like HDFS, MapReduce, python, pypark, Scala, Hive etc. Please review my profile for skills. contact me for more details.