Patrick Trembeth's Course Blog: LIS 2600

Thursday, November 29, 2012

Reading Notes: Dec. 3

Reading 1 - http://www.noplacetohide.net/

-The government has a deep investment in online information tracking
-Metadata left in imprints could be a possible reason for the lack of internet privacy

Reading 2 - http://epic.org/privacy/profiling/tia/

-With the Terrorism Information Act, government gained unchecked access to personal materials over electronic networks
-News leans towards this being an abuse of federal or gov't power
-Costs of this act has far outweighed its need

Reading 3 - http://greatlibrarynews.blogspot.com/2008/09/myturn-protecting-privacy-rights-in.html

-Legislation has now allowed for libraries to transmit certain aspects of patrons' information
-This is an overreach which has been rejected and compromised in many locations
-This raises serious questions of ethics as libraries can be put in the middle of government-citizen disputes or government monitoring
-Libraries could become less trusted if the public views their information as insecure there

Muddiest Point (11/26-12/2)

Muddiest Point (11/26-12/2)

Can digital library initiatives cut the need for physical institutions, re-envisioning the funding model of library systems, making it more efficient to focus budgets on systems that require little man-power upkeep?

Thursday, November 15, 2012

Reading Notes (11/16-11/23)

Reading Notes 11/16-11/23:

Reading 1: Web Search Engines: Part 1 and Part 2
-Over past decade, Google, Yahoo, MSN have been the dominant search engines analyzing and processing more information than any previous search engine
-The Amount of data contained in these search engines lies in the range of 400 Terabytes
-Basic search processing revolves around tracking whether links have been visited, sorting them by relevance, referred to as crawling algorithms
-As the internet has grown, these search engines need to adapt their algorithms to react to issues such as increasing speeds and duplicated links
-To adjust to the rising costs associated with these speed increases, GYM have used a series of prioritized links that, are usually most clicked and will show at the top of search queries
-This also helps reject spam
-Using index algorithms, these search engines are able to process incoming documents and information
-Index algorithms focus on certain search aspects like keywords and phrases or a combination of factors to provide a more relevant result
-Query Processing Algorithms use the particular search words entered in the search to provide only results containing all inputs
-Real Query Processors attempt to sort links based on topical relevance
-To increase speed, these processors exclude or censor certain links to cater to user preferences and caching important user data

Reading #2: Current developments and future trends for the OAI protocol for metadata harvesting. Library Trends, 53(4), 576-589.

-The Open Archives Initiative was created to help manage access to diverse scholarly publications as they were transferred to online and digital formats
-The OAI was begun and adopted in 2001
-Works through basic alterations to XML and HTTP code
-Many OAI environments have switched to domain-specific setups with examples including individually run OAI environments in archives and museums
-To provide completeness of results, institutions have developed methods to create inventories and generate responses
-These registries make up the backbone of OAI retrieval
-Extensible Repository Resource Locators are examples of how XML documents can be manifested within OAI
-Challenges to OAI still exist in the form of metadata variations, and varying formats amongst the data
-Currently developers are trying to edit OAI's to adapt to access restrictions and best practices to connect the institutions that dominate the OAI landscape

Reading #3: http://quod.lib.umich.edu/cgi/t/text/text-idx?c=jep;view=text;rgn=main;idno=3336451.0007.104

-The Deep Web consists of the pages that make up the un-searchable internet, including off-tracking links, out of service web pages, regionally restricted pages, or privately held pages on secure websites
-It is believed that the Deep Web is comprised of information many times over that publicly accessible on the World Wide Web
-Search Engines sometimes attempt to draw from the information and pages on the Deep Web, often with varying results
-Many Deep Web pages are hidden from search engines due to their very shallow scanning of the World Wide Web for results
-Problems of retrieving these hidden resources are rooted in the fact that most of these resources are un-indexed making retrieval nearly impossible
-Thus Searching must be changed to account for these untapped resources
-Deep Web information could cover the overwhelming majority of search queries
-Managing these resources has turned into an argument of "micro" vs "macro" searching

Muddiest Point (11/12)

Muddiest Point (11/12)

Can Digital Libraries alleviate the strain on traditional institutions by appealing to donors and funding agencies?

Friday, November 9, 2012

Reading Notes (11/09-11/12)

Reading #1 - http://www.dlib.org/dlib/july05/mischo/07mischo.html

- Major problems in digital librarianship is keeping up with the plethora of methods through which digital materials are published
-Finding a link between these various methods has been most difficult task
-Interest in Digital Libraries began with government funded studies during the early 1990's
-68 Million dollars in grants were dispersed among 6 university research projects from the NSF
-Despite the early flow of funding, the development of digital publishing has far outpaced the research done in digital libraries
-Federation of materials and resources among institutions is the current best practice in developing digital libraries

Reading #2 - http://www.dlib.org/dlib/july05/paepcke/07paepcke.html

-In 1994, the NSF launched the Digital Library Initiative
-Combined Librarians, Records Managers, and Historians to develop possible Digital Library solutions
-Computer Scientists saw it as a great way to distribute information, Librarians saw a new way for increased funding for innovation and research
-Despite early success, the growth of the World Wide Web challenged these institution based projects
-Copyright restrictions in the DLI prevented great use of the WWW in research and development under federal grants
-CS experts welcomed the use of the WWW, but librarians understood the underlying issues of distributing information without check
-Still, because of the speed associated with the WWW, digital librarians still have a place in providing reference to those looking to narrow research and search results

Reading #3 - http://www.arl.org/bm~doc/br226ir.pdf

-The drop in online storage costs and the growth of Internet use have led to interesting possibilities for developing digital libraries
-They focus mainly on connecting online institutional repositories, compiling databases of resources that can be accessed en masse and at any time
-Many institutions have combined with technology powerhouses like HP to develop the technology and software necessary to support digital based repositories
-Institutional repositories are ways which library systems provide access and information to larger communities through quick online or digital distribution
-These are necessary to adapt to the quickening of information distribution facilitated through the World Wide Web and online repositories like Wikipedia or publication databases
-It also provides a way for scholars or instructors to provide information to students in the short academic terms without needing to acquire alot of physical materials
-Greatly facilitates increases in scholarly materials and speeds up the academic process
-Issues still occur, such as the "watering down" of scholarly information as more and more is produced, and also issues with copyright infringement inherent to published materials challenge quick and easy distribution
-Because of this trends are constantly changing to adapt to the speed of internet demand and its reaction to the traditionally protected status of paper published materials,

Muddiest Point (11/09/12)

XML Muddiest Point (11/09/12)

Are new languages based in XML gaining a significant user base to continue it's development?

Friday, November 2, 2012

Reading Notes (11/4-11/10)

Reading Notes 11/4-11/10

Reading 1 - https://burks.bton.ac.uk/burks/internet/web/xmlintro.htm

-XML stands for Extensible Markup Language
-XML allows users to bring multiple files together
-Provides processing control information to supporting programs, such as web browsers
-Has no predefined set of tags
-Can be used to describe any text structure (book, letters, reports, encyclopedias, etc)
-Assumes documents are composed of many "entities"
-Uses markup tags to allow readers to more easily understand document components
-Most tags have contents, although empty tags can serve as placeholders
-Unique Identifier attributes allow communication between two separate parts of document
-Can also allow for incorporation of characters and text outside of standard databases
-Can identify tables and illustrations and their positioning
-XML was developed for easier navigation and storage in databases

Reading 2 - http://www.ibm.com/developerworks/xml/library/x-stand1/index.html

-XML 1.0 is the base XML technology, building on UniCode
-Defines strict rules for text format and allows Document Type Definition (DTD)
-Only the english production is considered standard
-XML 1.1 fundamentally alters the definitions of characters to allow them to adapt to changes in unicode specification
-Also provides for normalization of characters by referencing the character model of the World Wide Web 1.0
-XML is based on the Standard Generalized Markup Language
-Article provides sections giving third party internet tutorials
-XML Catalogs is used to set guide for XML entity identifiers defined by their Uniform Resource Identifiers
-URI is similar to web browser URLs
-Namespaces in 1.0 is a mechanism for universal naming of elements and attributes
-Namespaces in 1.1 updates to support for URIs
-XML Base increases efficiency of URI resolution
-X Link provides ability to express links in XML Documents
-Schematron provides a top down management method for users

Reading 3 - http://www.w3schools.com/Schema/default.asp

-Schema describes structures of XML Documents
-Allows easy manipulation of XML language
-Simplifies Database

Reading 4 - http://xml.coverpages.org/BergholzTutorial.pdf

-HTML can not define content
-In terms of Syntax, XML is very similar to HTML in form
-DTDs define the structure of XML documents
-DTD elements are either terminal or nonterminal
-Unlike HTML links, XML can be two-way
-THe Extensible Stylesheet Language is a form of template
-XML Schema works on replacing DTDs and defines datatypes
-XML is so important and popular because it is very versatile more like a family of languages than a single structure