New! Hire Essay Assignment Writer Online and Get Flat 20% Discount!!Order Now
SIT772
Australia
Deakin University
Suppose you have joined a search engine development team to design a search algorithm based on both the Vector model and the Boolean model. You are supposed to collect unstructured documents for the following topics, and apply an index technique to convert them into an inverted index. Please collect 3 documents (less than 30 words for each) in three different topics. Topics are listed as follows, you can also choose some other topics
you prefer.
• Science
• Computer Vision
• Search Engine
• Database
• Security and privacy.
An example of document:
“Google is the most widely used Web search engine in the World. It claims
to be the World’s most comprehensive search engine, indexing over 2.4
billion Web pages.”
1. Creating the inverted index. In the process of creating the inverted index, please complete the following steps:
a. Find a stopword list in the Internet and remove all stopwords and punctuation from those three documents. Then apply Porter’s stemming algorithm to all documents. Note that there are plenty of online stemming applications available, and you may use Porter algorithm for this question. The output will be a set of stemmed terms.
b. Create a merged inverted list including the within-document frequencies for each term.
c. Use the index created in step (b) to create a dictionary and the related posting file.
2. Boolean and Vector queries.
a. Please design three Boolean queries, (for example, web AND search) and list the relevant documents for each query.
b. Please use the Vector model to query on the inverted index, and compare the result with the Boolean model. (Hint: you can use cosine similarity and set a similarity threshold)
15,000+ happy customers and counting!