SKOS and the Semantic Web: Knowledge Organization, Metadata, and Interoperability

The Simple Knowledge Organization System (SKOS) is a Semantic Web framework, based on the Resource Description Framework (RDF) for thesauri, classification schemes and simple ontologies. It allows for machineactionable description of the structure of these knowledge organization systems (KOS) and provides an excellent tool for addressing interoperability and vocabulary control problems inherent to the rapidly expanding information environment of the Web. This paper discusses the foundations of the SKOS framework and reviews the literature on a variety of SKOS implementations. The limitations of SKOS that have been revealed through its broad application are addressed with brief attention to the proposed extensions to the framework intended to account for them. 1 University of St. Augustine for Health Sciences Correspondence email: erobinson@usa.edu


Introduction
One of the major hurdles for goals for metadata technologies is the promotion of machineactionability and interoperability among the wide variety of schemas that have been created to serve different communities and purposes. The expression of metadata as code interpretable by computational processes has, over the last decade, progressed beyond the need for the simple capture of static machine-readable metadata information of schemas like Dublin Core or MODS (Metadata Object Description Standards). Since Tim Berners- Lee's (2001) vision of the Semantic Web, tremendous efforts have been made to encode data in semantically meaningful ways that are machine-readable, and publicly shareable through the web. By expressing metadata as an expression of conceptual relations among uniquely identified resources, metadata can be leveraged in vastly enriched ways that allow logic and inference to play a role in the search process (K. Coyle, 2008;. Such capabilities can improve the recall and precision of information searches critical to locating relevant resources among the enormous amounts of data on the Web (Antoniou & Harmelen, 2008).

SKOS (Simple Knowledge
Organization System) is a vocabulary for representing semi-formal knowledge organization systems (KOS) like thesauri, subject heading lists, and ontologies within the framework proposed for Semantic Web processes. By employing the Resource Description Framework (RDF) of the Semantic Web as its foundation, SKOS enables and promotes the interoperability and machine-readability so important to a web-based information publication environment (Isaac & Summers, 2009). Since it is

March 2010
based on Semantic Web technologies, however, SKOS also allows the capture meaningful relationships among the conceptual units composing these classification systems. The resulting semantic conceptual networks then, can be employed to unify disparate classifications, and help to bring some level of authority control to the widely variant metadata descriptions at work on the Web. Furthermore, the standard machine-readable framework SKOS and RDF lends itself readily to the publication and reuse of metadata information on the Web. Thus, SKOS can not only be widely employed in linked data initiatives, but it can be recombined within existing metadata schemas like Dublin Core to enrich resource descriptions with logic-driven capabilities (Cantara, 2006) and add the potential for incorporating linked data into vocabulary control efforts, enabling more semantically rich searches and structured query expansion in the web environment. This paper will review the technologies that underlie the Simple Knowledge Organization System (SKOS) framework and will examine the efforts made by researchers and institutions to employ SKOS for wider sharing, reuse, and access of semantically enriched linked data. While these efforts have some recognized limitations, their potential for improving metadata for search and retrieval is undeniable.

Knowledge Organization Systems
Knowledge organization systems (KOS) have long been used to assist in the organization and management of information resources. These KOS include thesauri, subject heading lists, taxonomies, and classification systems used to provide standardized access points to large repositories of information (Hodge, 2000). The single goal of KOS is to enable organization of information resource for efficient and accurate retrieval.
Chief among the ways in which this is accomplished the employment of controlled vocabularies to eliminate ambiguities in search formulation  thereby increasing precision and recall. The semantic hierarchies usually present in KOS also facilitate conceptual representation of the structured ontology of the knowledge base, the properties and class relationships shared among a multitude of related concepts. While this semantic element has previously remained out of the reach of computational processes, simple encoding technologies based upon the Extensible Markup Language, XML, offer the potential to vastly improve efficient information searching and retrieval.

Semantic Web Basics
Since early discussion of the potential for the Semantic Web to improve searchability and machineprocessing of web resources (Berners-Lee et al., 2001), efforts have been made to develop metadata schemas that are more in line with the standards of semantic processing. The most critical element for the capture of meaning in this context is the description of relationships among resources. By describing such relationships with a clear, expressive syntax, applications can be applied to metadata descriptions that incorporate logical inference to enrich the process. SKOS enables the use of a Semantic Web framework to describe knowledge organization systems for the easy publication, sharing, and re-use of these systems on the Web. SKOS is built upon more foundational data formats and schemas, including XML, RDF and RDF Schema.

Resource Description Framework
The Resource Description Framework, or RDF, is the standard framework for modeling such relationships for data interchange on the Web (Antoniou & Harmelen, 2008). The model is based upon the eXtensible Markup Language (XML) and is designed to capture simple relationships that express clusters of uniquely identified resources, properties, and property values (McBride, 2004a)Meaningful relationships are defined by expressing subject, predicate, object triples that define a relation between two resources or specify the value of a given property for a single resource (K. Coyle, 2008). These resources can be expressed as graph of relations consisting of one node representing a resource, an arrow as the property or relation, and the leaf the value of that property (see Figure 1).
For example, the triple Melville:authorOf:Moby Dick might express the proposition that "Melville is the author of Moby Dick". Another might express Ishmael:characterIn:MobyDick, i.e., "Ishmael is a character in Moby Dick". Logical extension would then allow the extension to the proposition that "Herman Melville created the character Ishmael". By capturing these expressions in RDF syntax and describing definitions of classes, the logical relations of these propositions become more readily processable in computational terms (K. Coyle, 2008).
RDF extends this simplified triplet resource description to include Uniform Resource Identifiers (URI) for each resource and property described. These are usually identified by providing a URI in the form of a unique web address or by employing a hashed version of a URI (Sauermann & Richard Cyganiak, 2008), e.g.: http://www.example.com/about#herman_ melville.
Unlike the document URLs of traditional web hyperlinks, these URIs are intended to simply provide a representational code for unique identification of resources in RDF. They can, however, also be given explicit description documentation that can be added to the URL in order to define intended use or to provide detailed specifications (Sauermann & Richard Cyganiak, 2008).
Combining URI identification of resources with XML namespace capabilities allows for the assembly of entire schemas of concepts and relations under a single heading expressed by an XML namespace (Antoniou & Harmelen, 2008). By uniquely identifying resources and properties in this way, RDF not only allows for the employment of very precise conceptual definitions in the variety of metadata descriptions being developed, but it also allows for their publication and re-use in the web environment (McBride, 2004a). In addition, the properties themselves can be identified with URIs, enabling rich repositories of organizational information that can be referred to via web addressing, published, and used by other programmers.
Once a useful collection of concepts and properties has been established, reference can be made to that schema via the URI and incorporated in new and useful ways. Thus, the repurposing of data can be encouraged on a broad scale as these URIs and schema descriptions are utilized and incorporated within existing metadata schemas. RDF then is the backbone to any description of broadly applicable, semantically enabled metadata.

RDF Schema
As RDF defines the basic syntax for expression of relationship triplets, RDF Schema (RDF-S) acts as specification for describing vocabularies of RDF expressions for use in specific contexts.
RDF-S should be viewed as a semantic extension of RDF and provides a description of the basic elements of structured relationships for the creation of ontologies (McBride, 2004b). These ontologies capture hierarchical and class relations necessary for describing the world of semantic interactions among the broad array of resources and concepts inherent to any KOS. RDF-S creates predefined structures for the description of class membership, properties, datatyping, and conceptual relationships between the resources described using RDF syntax (McBride, 2004b).
The classes and properties described by RDF-S include the description of subClassOf and type relations defined in RDF syntax. By employing the XML namespace feature to refer to these definitional frameworks, RDF-S simplifies the description of these relations in the creation of the lengthy files necessary for full description of resources.
For example, a simple class of #animal can be defined in a hypothetical RDF ontology. Many animals might be defined as members of the class of animal. This can rapidly become very complex in its RDF syntax expression. For example, horse might be defined as a member of the animal class in RDF as in figure 2.
By incorporating the RDFS namespace into the description, one can simplify this description by utilizing ready-made definitions to refer to classes and properties as in figure 3. Here, the description of horse as a subclass of the class resource animal is accomplished with a much more compact syntax.

SKOS Structure
Having reviewed the basics of Semantic Web representation, we can now understand how SKOS utilizes the RDF syntax and the classes of RDF-S to describe the structure of knowledge organization systems. SKOS builds on the simplified class and property descriptions of RDF-S, to describe standards identified for thesauri, classification schemata, and other knowledge organization systems (A. . SKOS began as a RDF Schema framework by the to advance the Semantic Web effort in Europe. This early version, known as the DESIRE project (Lacasta, Nogueras-Iso, Lopez-Pellicer, Muro-Medrano, & Zarazaga-Soria, 2007;, was undertaken by the Semantic Web Advanced Development group for Europe (SWAD-E) and was intended to represent a generic thesaurus representation for the Semantic Web. DESIRE was elaborated and improved upon as LIMBER, a domain specific knowledge organization system for the social sciences.
LIMBER incorporated standard guidelines for the creation of thesauri such as those conceived by ANSI/NISO and related organizations (International Organization for Standardization, 1985Standardization, , 1986 and incorporated translational elements for use in international metadata effort to allow queries in a user's own language (A.  SKOS is intended to capture the basics of KOS ontologies in a simpler and more widely employable framework. As Mikhalenko has described it, SKOS is intended to fill the "need for a language to express vocabularies of concepts for use in semantically rich metadata, which is powerful enough to support semantically enhanced search, but simple enough to be undemanding in terms of the cost and expertise required to use it." (Mikhalenko, 2005, par. 5).
It is composed of three separate specifications, the SKOS Core vocabulary, SKOS Mapping, and SKOS extension. SKOS Core provides the basic vocabularies necessary to describe the hierarchical structures, class dependencies, and properties important for the representation of a KOS (A. . The core specification also provides documentation vocabulary, such as scope notes to elaborate upon the precise intention for use of concepts in the KOS, and historical notes for tracing changes to a specific implementation. The SKOS Mapping specification is a reference guide for supporting alignment and linking between different KOS concept schemes (Alistair Miles & Bechofer, 2009;Alistair Miles & Dan Brickley, 2004). Finally, SKOS Extension represents properties and relations peculiar to only some KOS (Lacasta et al., 2007). These are often unique adaptations to a particular KOS need within a specific domain or schema implementation.
The concepts comprising any thesaurus implementation in SKOS are represented using a few simple properties to describe the nodes of the thesaurus ontology. The ontological relations of the thesaurus, being hierarchical in nature can be viewed as a branching tree structure with each concept class being treated as a node in the structure. The RDF-S subclassOf property is used to describe the relations of narrower terms in the hierarchy, and the elements of the KOS themselves are treated as a subclass of the conceptscheme, using the inScheme property. Each term is treated as a concept and its properties are described using SKOS terminology in an RDF style document. These properties include syndetic class relations or hierarchical, semantic relationships, label preferences, and documentary notations.
The hierarchical relationships of the thesaurus such as those represented in a thesaurus as broader and narrower terms are captured with the properties skos:broader and skos:narrower. Nonhierarchical, or simple semantic associative relationships, such as "see also", are expressed via the skos:related property (Isaac & Summers, 2009;Alistair Miles & Bechofer, 2009). Statements are made in the usual RDF fashion by declaring properties to be about a concept, represented as a particular URI, and then enumeration the properties of that concept in SKOS. This is useful for the computational recognition of potentially related search terms and can be employed either to present potential terms to a user, or to automatically broaden a search if necessary.
Expressed in RDF-style syntax then, the basic semantic relationships can be captured as in figure 4, shown without namespace declarations. The xxx# descriptions in this example would typically be replaced by form of unique identifier, such as an authority system control number.
Term labels in SKOS allow the expression of preferred terminology and the capture of alternate terminologies. For example, the above "Economic Cooperation" entry might also be sought as "Economic Co-operation" This could be expressed using the altLabel and prefLabel properties (Isaac & Summers, 2009). SKOS also contains a property called hiddenLabel for capturing common misspellings. This capability allows recognition of the access points that might be attempted, but that the programmer does not want to appear to the public (Alistair Miles & Bechofer, 2009). This could be useful for linking to commonly misspelled names, beyond the scope of alternate, but legitimate names, often used in name authority files. Thus, a potential misspelling of Mark Twain's autonym, and the pseudonym itself might be captured as in figure 5.
AltLabel can also be used to incorporate interlingual elements into SKOS by related translated terms to a single declared URI concept. These interlingual elements would then be identified with an appropriate xml:lang attribute. This allows for even greater integration and re-usability, since metadata descriptors throughout the international community can utilize and build upon single SKOS concepts, fostering interoperability on a worldwide scale.
Finally, the SKOS Core specifications also allow for documentary descriptions. These descriptions can define textual content to describe intended use and scope and also permit a modest level of administrative detail regarding the developmental history of the schema. The NISO guidelines for thesaurus creation recognize the need to clearly distinguish uses which may be ambiguous, or to identify that particular range of scope intended to be covered by term (International Organization for Standardization, 1985, 1986. SKOS utilizes the skos:definition property to define the scope of a concept and the

Applications of SKOS in the Web Metadata Environment
By encoding these relationships, both hierarchical and terminological, in RDF, the vocabulary control process and hierarchical relationships of ontologies can be readily leveraged into the online searching environment to increase searchability and interoperability and to promote precision and recall (Antoniou & Harmelen, 2008). Applications and processors can be written that refer to the published namespaces and schemata contained therein, combining defined KOS into other metadata schemata like Dublin Core (DC) and the Metadata Object Description Standard (MODS). The easily linkable nature of SKOS schema also allows the large computational resources of the web to be brought to bear upon the navigational structure of the web, enriching search processes by increasing interoperability and allowing cross-searching of a vast number of resource repositories.
One of the most significant applications of SKOS and RDF in the web environment is the so called "Linked Data" movement. A sort of rebranding of the Semantic Web, linked data builds upon the RDF-style expressions described with a focus on the connection and exposure of data within documents, rather than the simple linking between documents themselves using hyperlinks (Bizer, Heath, Ayers, & Raimond, n.d.). Such a focus is contrasted with the hyperlinks of the traditional web, by referring to this data as "hyperdata" (Bizer, R. Cyganiak, & Heath, 2008). By structuring and labeling the data using Semantic Web technologies, it is extracted and made more accessible, allowing connections between a wide variety of forms. For example as Bizer et al. describe, "Using these links one can navigate from a computer scientist in dbPedia to her publications in the DBLP database, from a dbPedia book to reviews and sales offers for this book provided by the RDF Book Mashup, or from a band in dbPedia to a list of their songs provided by Musicbrainz or dbtune" (n.d., para.

4)
The Library of Congress (LC) has recently made tremendous efforts in this regard, and in 2009, it bagan to make its ubiquitously employed authority records available as linked data ("Authorities & Vocabularies (Library of Congress)"; Bradley, 2009;Karen Coyle, 2009;Harper & Tillett, 2007). Its primary goal is to enable data access via dereferencable URIs in the form of SKOS encodings. This allows the LC's controlled vocabularies and the data values that they comprise. Thus, creators of content or programmers who build metadata processors can incorporate LC metadata as linked data. The vocabularies themselves are also readily made available in a web-publishable format for easy vocabulary minting, updating, and downloading ("Authorities & Vocabularies (Library of Congress)").
The new "webified" LC Authorities and Vocabularies, by being published in SKOS allow for more rapid updating of systems that employ this data, and also provide cost-free access far superior to the days of the "Big Red Books", the paper issue of the Library of Congress Subject Headings.
Other efforts for utilizing SKOS on the Semantic Web front include projects to incorporate controlled vocabularies into the organization and classification of user generated content and the incorporation of federated search standards. One of the standard methodologies for mapping across metadata schemas is the employment of an intermediary linking standard, a switching language, to which equivalent terms are converted (Zeng & Chan, 2004). These anchor terms serve as a master language then, for the conversion of multiple KOS. The ability of SKOS to represent a wide range of alternate terminology and its potential to capture a range of schema through RDF and XML namespace representation make it an ideal "interlingua" for KOS interoperability.
Tudhope and Binding have examined the efforts of the STAR Project, a massive integration of English Heritage thesauri, for its employment of SKOS as a standard conversion format . Efforts have been also made to establish effective procedures for the automated conversion of many different thesauri into SKOS under the auspices of the W3C's Semantic Web Best Practices Working Group (Van Assem, Malaisé, A. Miles, & Schreiber, 2006) The STAR (Semantic Technologies for Archaeological Resources) Project utilized SKOS in this way to create a search interface for federated search of seven different thesauri encompassing archaeological, materials, and buildings and monuments indices employed in the mapping of the broad domain of English Heritage. It allowed for the creation of a multifunctional interface incorporating standard search procedures, search term suggestions, and query expansion based on the related hierarchies in the thesaurus array. Explorable concept schemes were also generated for user navigation and linking to relevant documents Tudhope, Binding, May, & Heritage, 2008). The success of the STAR project showcases the ability for SKOS to serve as a switching language for interoperability on a grander scale. The creation of such interfaces is an enormous, but necessary step for metadata technology if libraries are to leverage the multiple silos of data that exist on the web and to bring them under the umbrella of federated searching projects.
The range of isolated information repositories and the variety of control schemas employed is only one of the problems posed by metadata searching in the web environment. Since about 2004, new social networking technologies have begun to employ usergenerated tagging to provide quick labeling of content under interactive content models of Web 2.0. Websites such as Flickr or del.icio.us began the trend, but now even commonly accessed news resources regularly allow users to label content with whatever terminology they might find useful for their own reference and to maintain lists of tags affiliated with their user accounts. While this creates a vast repository of incredibly inexpensive metadata for the rapidly generated content of the Web, the lack standards and control in these systems has been criticized. It is widely accepted that these folksonomies display an inability to deal with concepts of synonomy, variant usage, and spelling, and, at least in their native form, are difficult to utilize for accurate information retrieval (Limpens, Gandon, & Buffa, 2009). As MacGregor and McCulloch state, "to ensure effective indexing and to maintain the overall efficacy of the retrieval system, it is necessary to apply some degree of control to the indexing process" (2006, p. 292).
As described above, the labels employed for SKOS concepts can be broadened to incorporate the widely divergent vocabularies employed under social tagging, thus linking them to more usable controlled KOS (Isaac & Summers, 2009). By employing records which merge user-generated tag lists with representative control data, online searches can be expanded in a controlled way to retrieve relevant information with semantic enhancement of tag-style metadata. Tagging software might be written to incorporate simple selection processes which facilitate linking to controlled concept URIs.
SCOT (Social Semantic Cloud of Tags), an RDF-based ontology has been developed as an extension of SKOS intended to capture the structure and semantics of tagging systems. The declared intention of SCOT is to create repurposable semantic data for use in federating existing folksonomies (Kim, Passant, Breslin, Scerri, & Decker, 2008).

Similar efforts have been made by Simon Jupp and his
colleagues at the Sealife project to incorporate a large number of biomedical ontologies into a single accessible SKOS framework. Their Conceptual Open Hypermedia Service (COHSE) eschews the richer ontology languages like OWL for the simpler representation of SKOS, since it allows them to incorporate semantically weaker structures like thesauri into the COHSE system (Jupp, Stevens, Bechhofer, Yesilada, & Kostkova, 2008). Their project uses linked data coded in SKOS to identify background knowledge represented in a repository of web linked documentation. By identifying appropriate content via existing KOS, COHSE is able to support semantic web navigation through the specialized Sealife semantic web browser. Sealife utilizes the ontologies to mechanically markup documents with semantic encoding dynamically at the time of browsing. Thus, without prior semantic preparation, which can be time consuming and expensive, semantic technologies can be leveraged to identify key content in a document and offer links to appropriate services from the browse site.

Limitations of SKOS
The original SKOS recommendations were only taken up by the W3C in 2005. Thus, it is viewed as a work in progress. As we have seen, a review of the literature shows that in that short time, a number of very successful employments of SKOS for improved searching and interoperability. However, as it continues to be applied in a wide range of metadata environments, limitations have been identified.
Particularly criticized is the lack of detailed and structured representations of ontogenesis in SKOS (Panzer & Zeng, 2009;J. T Tennis, 2005). That is, the history of the development of the KOS received only cursory address in the original SKOS Core specification. Since its initial proposal efforts have been made to extend the documentary descriptions to include a more detailed record of the changes made throughout the history of a given KOS Schema. Often, changes in the world of knowledge in a domain area require reflection in the KOS and alterations must be be made to preferred usage. Such versioning is an important part of the KOS maintenance process (Hodge, 2000). Alternatively, as scientific progress is made, new terminology is proposed and comes into broad acceptance and requires representation in the KOS. This may involve structural adjustments as well as simple shifts in terminological preference (International Organization for Standardization, 1985, 1986. The original SKOS schema allowed only the simplest of notations, with no way to incorporate records of structural changes. It is very common as hierarchical relationships are developed and maintained that simpler concepts take on hierarchical structure, or that existing hierarchical arrangements are deemed to be less useful than newer arrangements. Thus, the instability of the thesaurus necessitates an additional mechanism in order to "to express relationships of similarities and dissimilarities across the different versions" (J. T Tennis, 2005, p. 1). In order to search the schema effectively, account must be made to historical changes in the record. For example, if a user is searching for items on the history of "Myanmar', the system should recognize that anything cataloged prior to 1989 political changes might likely be listed as "Burma". The metadata schema ought to allow some way for a processor to incorporate this type of information if it is to function efficiently.
Tennis and Sutton have recently worked to create SKOS extensions for a vocabulary development application, that leverage the ability of a concept to represent clusters of other concepts in order to address this problem. They propose an additional SKOS entity, the concept instance, which serves an intermediary role between a given concept and the scheme of which it is a part. Thus, individual changes can be recorded as a property of a concept, vis a vis its membership in a given scheme version (Joseph T. Tennis & Sutton, 2008). SKOS has also been criticized by some computer scientists for its lack of the formal logical properties necessary for significant artificial reasoning on its ontologies (Sanchez-Alonso & Garcia- Barriocanal, 2006). Specifically, the broad application of concepts creates problems of definition and multiplicity of reference when applied across all but the simplest of domains. This lack of computational semantics, it is argued, seriously limits the performance of automated reasoning tasks upon the information contained in the KOS. Sanchez-Alonso and Garcia-Barriocanal propose the utilization of broader-based "upper ontologies" to provide the unambiguous reference necessary for higher level semantic reasoning. Their OpenCyc, an upper ontology for "all of human consensus reality" (2006, p. 267), is intended then as an enormously comprehensive representation of the array of human knowledge across a wide range of disciplines. Such an effort, they argue, is necessary to lend the specificity of definition to SKOS concepts if they are to be effectively utilized in semantic reasoning.
However, Jupp has argued, at least within the context of the Sealife project described above, that while the stricter, formal semantics might be useful for modeling ontological descriptions of reality, the looser semantics of SKOS are an important element of its primary purpose. That is, the broader applicability of SKOS better enables navigation and retrieval by exploiting the wealth of existing ontologies that exist within the biomedical domain (Jupp et al., 2008).
Without doubt, further limitations will be identified as SKOS is applied in new frameworks with different historical needs. However, the extensibility and iterative hierarchical structure of SKOS seems to allow for the creation of newer elements and element extensions as they become required.

Conclusion
With a deluge of resources proliferating on the Web each day, it has become a necessity to incorporate new types of metadata into effective processes for cataloging and description. Interoperability among web schemas is arguably the major challenge presented to information science in the era of the rapid content generation of the Web. Intelligent systems are needed that can facilitate searching across a range of data repositories that are often organized under unique knowledge organization systems. While dreams of the vastly intelligent service agents described by Berners-Lee in 2001 are still in the distant future, technologies are enabling more intelligent operations to be performed on metadata. SKOS, while still under continual development by the open source community, has shown itself to be an effective tool for the wide sharing of schemas that will be necessary for these disparate repositories and KOS to be brought into alignment. It not only enables the machine-actionability on metadata requisite for efficient searching in the web, it simplifies the unification of diverse information tools for vocabulary control in the verbal chaos of web classification and provides a strong framework for the creation of switching mechanisms for federating search processes.