ZA-WWW, 2010 Conference

Font Size: 
Search engine query generation for finding known academic publications
Melius Weideman

Last modified: 2010-08-26

Abstract


Many universities offer digital libraries for storage and retrieval of academic documents to their users. However, some of these users may not enter the digital library via the homepage menus, but could arrive at the information via a search engine. The objective of this research project was to compare different methods of query generation to successfully retrieve academic documents.

 

In a series of empirical experiments, 20 universities with digital libraries (not requiring logins) were identified. Five academic PDF documents stored in each ones' digital libraries were further found and inspected. Searches were done, using three types of query for each one of these documents. Subsequently, rankings on search engine result pages were recorded. The current visibility of these documents was then calculated. After submission to Google, a waiting period was allowed for crawler visitation, and the searches and calculations repeated.

 

The resultant data was used to measure the success of the three different types of queries over 300 searches. This was done both before and after manually submitting each document's URL to Google.

 

Results indicate that using keywords from the document title produces the most efficient query, with much improvement after submission. Secondly, using a text sequence produces the second-most efficient query, but with a small reduction in visibility. Finally, using author surnames produced a much less efficient query, although with slightly increased visibility.

 

It was concluded that academic searchers should concentrate on using a concatenation of weight-carrying keywords from the title of a known academic document as search query for most efficient document retrieval.


Full Text: PDF