You may choose your own project but it must involve building a small scale search engine on web pages (at least 1000) crawled by you. The ranking methods should use both Cosine Similarity and BM25. You should create a benchmark topic and query suite and evaluate your system on precision@10. You should build an interface to post the query and investigate the results. The results must show a brief description/summary of the returned documents.
An example project: Crawl the Wikipedia pages related to Cricket starting from 10 seeds. Extract the title and body part of each page to get each document. Build the search engine as stated in the previous paragraph.