All simple ideas for semantic web projects

Semantic Web

The Internet is a public data repository, a source of information and a place of work for millions of people. Computers and programs manage innumerable databases, websites and communication platforms. These services are usually based on basic technical operations, such as the delivery of
Documents or error messages as well as the provision of applications for entering web content. Programs can access the data available in the network, but there are only limited possibilities to evaluate the available content with added value for the user.

The best way to demonstrate the benefits of the Semantic Web is through search queries. As a typical example, the difficulties faced by a literary scholar in the search for an electronically available text are outlined here:

There is a translation of Shakespeare's Hamlet by Maik Hamburger. Although the version is often played in German theaters, it is not published publicly by any publisher. The literary scholar tries to find the text on the Internet and enters the keywords “Hamlet” and “Hamburger” into a search engine. Now it is the case that there is a chain of restaurants in the USA called "Hamburger Hamlet" (because Hamlet also means a small town in English). In addition, there are many performances of the play Hamlet in the city of Hamburg, so that information about the Hamburg translation is either impossible or extremely difficult to find.

If there was a way for the scientist to make the search engine understand that she was looking for a book with the title “Hamlet” that was written or translated by a person named “Hamburger”, the restaurant chain and Reports about productions in Hamburg are excluded from the search results.

What would it take? The search engine should “know” that a book has a title and an author and that this author in turn has a name. Such information cannot currently be recorded by search engines.

The abbreviation WYMIWYG stands for "What You Mean Is What You Get" and is based on the abbreviation WYSIWYG known from word processing. It represents the central result of all Semantic Web efforts. With web searches, the user should get the result he actually meant without being influenced by accidental technical inadequacies. In addition, the modeling of content references also enables references (hyperlinks) to be created automatically.

The idea behind the Semantic Web of making the meaning of data machine-readable in order to receive improved search results, assistance and information when working with data goes back to the 1960s and has been a central field in artificial intelligence research ever since . Depending on the objective, these approaches have been further developed in different directions. Accordingly, different tools are now available that could become important in the context of the Semantic Web.

Taxonomies are a simple tool for structuring and labeling data. They have long been used in librarianship and in the field of lexicons. A taxonomy consists of a hierarchy of keywords. The key words that are higher up in the hierarchy form the generic terms for the key words below. The actual resources are always marked with the most precise key word, because the higher-level key words can be inferred from the hierarchy.

For pragmatic reasons, taxonomies are a good choice for improving the information offered, because users with hierarchies are already familiar from navigating with sitemaps and they offer an effective way of structuring content further.

A major disadvantage of taxonomies arises where resources combine several elements and can no longer be clearly assigned to a branch of the category system. In addition, it is necessary to develop and maintain the taxonomy, which in a dynamic field of knowledge can be very time-consuming.

Word networks
Since there are some areas of knowledge that cannot be mapped with simple taxonomies, the development of networks that can correctly map the meaning of keywords was started early on. Word networks based on the Princeton Word model are very successful in individual applications and are correspondingly widespread. Synonymous terms are summarized in so-called synsets and linked with one another with semantic relations.

For example, the terms “car”, “automobile”, “car” and “car” form a common synset. Another synset could consist of “vehicle” and “means of transport”. Between the first and the second synset, there is then the relationship between the sub-term and the generic term (hyponomy-hyperonomy).

Words depicts a good dozen relations that describe how terms and concepts are related. This includes, for example, the “part-whole relation”, which makes it possible to describe that a piston is part of an engine. Using these relationships, complex search queries can be processed better. If someone wants to know how an engine works, a list of all objects that are parts of an engine can be generated.

Further information on word networks can be found on the websites of the Tübingen project GermaNet.

The "Complete Multilingual WordNet List by Language" compiled by Samuel Chong (Pasadena City College) offers a compilation of word networks in numerous languages ​​and the licenses under which they stand.

Object orientation
Object orientation was originally developed as a programming paradigm, for which, among other things, the Java programming language is known. The object orientation knows classes, which in turn have properties and methods. So it can be For example, enter a class "car", which has the properties "manufacturer", "maximum speed", "seats" and "fuel consumption". In addition, this class can contain methods that describe what you can do with a car, such as refueling, driving, repairing.

So a class is an abstract description of things. From this abstract description, concrete instances can be formed that adopt the properties and methods of the class. Such an instance of a class is called an object. So there could be an instance of the class car called »Aston Martin« and whose property “seats” has the value “two”.

In addition, object orientation knows the concept of inheritance. A class can be derived from another superclass. Derived classes inherit the attributes and methods of their respective superclasses. So that the derived class is not a mere copy of its superclass, there is the option of adding attributes and methods to the subclasses. The class “truck” could have “loading area” as an additional attribute and provide “loading” and “unloading” as additional methods.

Description logic
Logic formalisms such as propositional logic or predicate logic allow statements to be inferred automatically. The first thing to do is to formulate rules, such as: B. "if A then B, if B then C". The rules are supplemented by status descriptions such as: "A applies". This then automatically results in C.

This example comes from propositional logic. More comprehensive formalisms such as predicate logic increase the complexity of the rules that can be formulated and thus allow more complex queries. The mapping of logical connections makes it possible to react to unexpected questions from the user with implicitly inferred answers.

The concepts of the Semantic Web presented so far have in common that they represent knowledge about contextual relationships in the form of keywords that are linked by certain formalisms. The Semantic Web is not least emerging to become a further expansion stage of the Internet. However, the Internet is not made up of keyword networks, but of an overwhelming amount of websites, texts, images, videos, documents - in short, resources that can each be found via a unique address, the URL. Bringing a better order into this resource universe is a central hope associated with semantic web technologies.

So the question arises as to how the ordered keyword networks and the chaotic resources of the WWW can be brought together. Originally, two standards were set up to achieve the goals of the Semantic Web. On the one hand there is the W3C with its standard troika RDF, RDF-S and OWL, which build on one another. On the other hand, there is the Topicmap Consortium with an ISO standard that originally comes from the publishing and library sectors and is far simpler and more intuitive, but less meaningful.

  • The Rescource Description Framework (RDF) comprises a so-called triple of subject, predicate and object, which makes it possible to represent relationships between resources in addition to attribute-value pairs.
  • RDF schema (RDF-S) is an XML standard with which a class of similar resources such as B. Web pages can be determined how the assignment of properties should take place and which basic relationships exist with other properties. One example is the Dublin Core metadata schema.
  • The Web Ontology Language (OWL) is based on the principle of object orientation and offers both the possibility of designing attributes and mapping the inheritance of classes.

There are now a number of applications for creating knowledge bases. Most of these applications produce data that conforms to the W3C's standards for semantic markup.

  • Altova, a software company that specializes in XML editors, has been offering SemanticWorks for some time. As a graphical editor, it should enable the professional creation of W3C-compliant ontologies.
  • An open source alternative to SemanticWorks is the Protege Editor, which is developed in Stanford and, like SemanticWorks, with the help of Java. The protege should also enable the creation of ontologies by qualified personnel.

Problems and Outlook
Although the Semantic Web has been propagated for many years, it has only been realized in individual flagship projects so far. The presented semantic web standards are technically standards, but still far from being actually accepted standards through widespread use. In practice there are considerable problems. They range from the question of how detailed one should, want or can model semantic relations, to the political consideration of whether there should be authorities who centrally define meanings.

First of all, the additional enrichment of data with machine-readable information also means additional work. In addition to creating the article, editors also have to ensure that the keywording is correct.

Furthermore, interests are also reflected in the meta-information tag. The markup language HTML already offered an element in a very early version that should contain meta information about the respective document. It turned out, however, that website providers used the information in the meta tag to improve their positioning in the search engines, regardless of whether the meta information was correct or not. Therefore, the major search engines started to ignore the meta information. In principle, the Semantic Web faces the same problem - without having a solution.

The concepts of the Semantic Web are highly complex formalisms. They will stay that way even if they succeed in developing an intuitive application. The construction of semantic networks is primarily reserved for experts. Even more than encyclopedias, semantic networks define what the world is, what is true, and what is false. This means that a very small group of people has an extremely large power of definition.

An alternative approach is provided by social tagging systems that can be used to build decentralized ontologies (folksonomies) from a large number of individually assigned keywords. However, this results in the opposite problem: what makes a day is defined by the mean value of the mass of all objects that have tagged them. Yet examples of popular fallacies are abundant.

The popularity of tagging in various Web 2.0 applications nevertheless gives rise to hope that two basic assumptions of the Semantic Web are correct: On the one hand, many users try to enrich data with semantically correct metadata. On the other hand, tags seem to have established themselves as a useful and understandable search option alongside full-text searches.

Last change: June 11, 2015

Print preview

create a PDF (2015). Semantic Web. Last changed on 06/11/2015. Leibniz Institute for Knowledge Media: Accessed on May 23, 2021