Thursday, November 15, 2012

Reading Notes (11/16-11/23)

Reading Notes 11/16-11/23:

Reading 1: Web Search Engines: Part 1   and Part 2

-Over past decade, Google, Yahoo, MSN have been the dominant search engines analyzing and processing more information than any previous search engine
-The Amount of data contained in these search engines lies in the range of 400 Terabytes
-Basic search processing revolves around tracking whether links have been visited, sorting them by relevance, referred to as crawling algorithms
-As the internet has grown, these search engines need to adapt their algorithms to react to issues such as increasing speeds and duplicated links
-To adjust to the rising costs associated with these speed increases, GYM have used a series of prioritized links that, are usually most clicked and will show at the top of search queries
-This also helps reject spam
-Using index algorithms, these search engines are able to process incoming documents and information
-Index algorithms focus on certain search aspects like keywords and phrases or a combination of factors to provide a more relevant result
-Query Processing Algorithms use the particular search words entered in the search to provide only results containing all inputs
-Real Query Processors attempt to sort links based on topical relevance
-To increase speed, these processors exclude or censor certain links to cater to user preferences and caching important user data

Reading #2: Current developments and future trends for the OAI protocol for metadata harvesting. Library Trends, 53(4), 576-589.


-The Open Archives Initiative was created to help manage access to diverse scholarly publications as they were transferred to online and digital formats

-The OAI was begun and adopted in 2001
-Works through basic alterations to XML and HTTP code
-Many OAI environments have switched to domain-specific setups with examples including individually run OAI environments in archives and museums
-To provide completeness of results, institutions have developed methods to create inventories and generate responses
-These registries make up the backbone of OAI retrieval
-Extensible Repository Resource Locators are examples of how XML documents can be manifested within OAI
-Challenges to OAI still exist in the form of metadata variations, and varying formats amongst the data
-Currently developers are trying to edit OAI's to adapt to access restrictions and best practices to connect the institutions that dominate the OAI landscape

Reading #3: http://quod.lib.umich.edu/cgi/t/text/text-idx?c=jep;view=text;rgn=main;idno=3336451.0007.104


-The Deep Web consists of the pages that make up the un-searchable internet, including off-tracking links, out of service web pages, regionally restricted pages, or privately held pages on secure websites

-It is believed that the Deep Web is comprised of information many times over that publicly accessible on the World Wide Web
-Search Engines sometimes attempt to draw from the information and pages on the Deep Web, often with varying results
-Many Deep Web pages are hidden from search engines due to their very shallow scanning of the World Wide Web for results
-Problems of retrieving these hidden resources are rooted in the fact that most of these resources are un-indexed making retrieval nearly impossible
-Thus Searching must be changed to account for these untapped resources
-Deep Web information could cover the overwhelming majority of search queries
-Managing these resources has turned into an argument of "micro" vs "macro" searching



No comments:

Post a Comment