Blog 7 – IA

1. What are some of the reasons that might warrant the need to use a search system on a
website?

  • When there are lots of information to browse
  • helps fragmented sites
  • used as a learning tool
  • users expect it to be there
  • can tame dynamism

Reference: lecture notes
2. Why is an Information Architect interested in search systems?

  • Search systems benefit users by leveraging metadata.
  • How the search interface can be improved.
  • How it can be integrated with browsing / navigation

Reference: lecture notes
3. Describe the core components of a search engine.

  • The Webcrawler
    This is the part of the search engine which combs through the pages on the internet and gathers the information for the search engine. Variable features which can affect your search results (Included pages, Excluded pages, Documents types, Frequency of crawling)
  • The Database
    The search engine’s database is what you are actually searching. All of the information that a web crawler retrieves is stored in a database. Every time you use a search engine, it is this database you are searching, not the live internet. Variable features which can affect your search results (Size of the database, Freshness of the database)
  • The Search algorithm
    Each search engine interprets the terms you enter into the search box in different ways. Variable features which can affect your search results (Operators, Phrase Searching, Truncation)
  • ) The Ranking algorithm
    How a search engine ranks the results of your search is possibly the most important component of a search engine. Most searches will retrieve thousands of results. Since you probably will only look through the first 1-2 pages of results, you need the most relevant results to appear first. Variable features which can affect your search results (Location and Frequency, Link Analysis, Clickthrough measurement)

Reference: practice.sph.umich.edu/micphp/files/Retrieving_Online_Info/R_O_I/CD_Master/CD/content/Search_Engines.pdf
4. What is a search zone? What are the approaches for creating search zones?

subset of a web site that have been indexed separately from the rest of the site’s content. They eliminate content irrelevant to the user, resulting in fewer search results.

Approaches:

  • Segregating documents
  • Logically tagging them
  • Content type, audience, role, subject/topic, geography, chronology, author,
    department/business unit

Reference: lecture notes
5. Explain the difference between recall and precision in terms of search results.

recall

Reference: lecture notes
6. Consider the following search engines:
a. Search engine A retrieves 600 documents out of a total of 8,200 documents. Out
of the 600 documents retrieved, only 500 are relevant out of a total of 923 relevant
documents. Calculate the recall and precision rates for the query.

Recall : 500/923 x 100 = 54%

Precision : 500/8200 x 100 = 6%
b. Search engine B retrieves 131 documents out of a total of 8,200 documents. Out
of the 131 documents retrieved, all 131 are relevant out of a total of 923 relevant
documents. Calculate the recall and precision rates for the query.

Recall : 131/923 x 100 =14%

Precision : 131/8200 x 100 =1.5%
c. Search engine C retrieves 700 documents out of a total of 8,200 documents. Out
of the 700 documents retrieved, 0 are relevant out of a total of 923 relevant
documents. Calculate the recall and precision rates for the query.

Recall : 0/923 x 100 =0%

Precision : 0/923 x 100 =0%
d. Search engine D retrieves 5,000 documents out of a total of 8,200 documents.
Out of the 5,000 documents retrieved, 923 are relevant out of a total of 923
relevant documents. Calculate the recall and precision rates for the query.

Recall : 923/923 x 100 =100%

Precision : 923/8200 x 100 =11%
7. What is the purpose of a stemming tool? Explain the difference between strong and weak stemming. Provide examples of strong and weak stemming.

The stemming tool is a tool that expands a term to include other terms that share the same root. Strong stemming includes plurals as well as other terms that include the root, where as weak stemming includes plurals only.

Root: user

Strong stemming:
– user (root)
– users
– used
– using
Weak stemming:
– user (root)
– user

Reference: lecture notes
8. What are two main issues to consider when displaying the results of a search?

– Which content components to display
– How to list or group those results

Reference: lecture notes
9. How many documents should you display in a search result?

It is up to the amount of information available and that are displayed for that results.

Reference: Chapter 8 Search Systems – Information Architecture for the World Wide Web
10. Describe some approaches for sorting and ranking search results for display.

Sorting:

  • Alphabetically (by title, by author, by department)
  • Chronologically (by date)

Ranking: Relevance, Popularity, Users’ or experts’ ratings, Pay-for-placement

Reference: lecture notes
11. When sorting search results alphabetically, why is it a good idea to omit articles such as “a” and “the”?

Omitting articles is a good idea as it can be misleading and deter from the desired document as “a” and “the” do not add any meaning to a search.

Reference: lecture notes
12. How does “best bets” ranking operate?

Best bets allows human expertise or search-log analysis to influence ranking using indexing.

Reference: lecture notes

13. What are four key factors to consider when designing a search system interface?

• Level of searching expertise and motivation
• Type of information need
• Type of information being searched
• Amount of information being searched

Reference: lecture notes
14. What are some of the ways search system designers can help a user when no results are returned for a query?

  • Provide alternative search words
  • Provide similar search words
  • Provide a means of revising the search
  • Provide search tips or other advice on how to improve the search
  • Provide a means of browsing
  • Provide a human contact if searching and browsing don’t work

Reference: lecture notes

15. Optional* Describe how Google’s PageRank algorithm operates.
16. Optional* What is SERP?
17. Optional* Describe the main Boolean operators used in search engine queries.
18. Optional* What is meant by the terms Deep and Surface Web? How might documents
end up in the Deep Web?
19. Optional* What are the two primary goals when designing a search engine’s
architecture?
20. Optional* Describe the search engine indexing process.
21. Optional* What is the purpose of a web crawler?

Advertisements

답글 남기기

아래 항목을 채우거나 오른쪽 아이콘 중 하나를 클릭하여 로그 인 하세요:

WordPress.com 로고

WordPress.com의 계정을 사용하여 댓글을 남깁니다. 로그아웃 / 변경 )

Twitter 사진

Twitter의 계정을 사용하여 댓글을 남깁니다. 로그아웃 / 변경 )

Facebook 사진

Facebook의 계정을 사용하여 댓글을 남깁니다. 로그아웃 / 변경 )

Google+ photo

Google+의 계정을 사용하여 댓글을 남깁니다. 로그아웃 / 변경 )

%s에 연결하는 중