Как работает семантическая сеть

Feb 09 2006

Несмотря на то, что они играют центральную роль в создании и поддержании Интернета, компьютеры не могут самостоятельно разобраться во всей этой информации. Они не могут читать, видеть отношения или принимать решения, как вы. Узнайте, как семантическая паутина может это изменить.

Всемирная паутина представляет собой интересный парадокс: она создана с помощью компьютеров, но для людей. Сайты, которые вы посещаете каждый день, используют естественный язык, изображения и макет страницы для представления информации в удобном для вас виде. Несмотря на то, что они играют центральную роль в создании и поддержании Интернета, сами компьютеры не могут разобраться во всей этой информации. Они не могут читать, видеть отношения или принимать решения, как вы.

Semantic Web предлагает помочь компьютерам «читать» и использовать Web . Основная идея довольно проста: метаданные , добавляемые к веб-страницам, могут сделать существующую Всемирную паутину доступной для машинного чтения. Это не даст искусственного интеллекта и не сделает компьютеры самосознательными, но даст машинам инструменты для поиска, обмена и, в ограниченной степени, интерпретации информации. Это расширение, а не замена Всемирной паутины.

Возможно, это звучит немного абстрактно, и это так. Хотя некоторые сайты уже используют концепции Semantic Web, многие необходимые инструменты все еще находятся в разработке. В этой статье мы перенесем концепции и инструменты, лежащие в основе Semantic Web, на землю, применив их к далекой-далекой галактике.

Благодарю вас

Спасибо Джошу Сенекалу за помощь в написании этой статьи.

Содержание

Почему семантическая паутина?
Разметка: XML и RDF
Знание того, что к чему: URI
Языки и словари:RDFS, OWL и SKOS
Связывание всего вместе
W3C и будущее семантической сети

Почему семантическая паутина?

Предположим, вы хотите купить бокс-сет «Трилогия Звездных войн» онлайн и у вас есть некоторые основные критерии для покупки. Во-первых, вам нужны широкоформатные, а не полноэкранные DVD- диски , и вам нужен набор с дополнительным диском с бонусными материалами. Во-вторых, вам нужна самая низкая доступная цена, но вы предпочитаете покупать новый набор, а не подержанный. Наконец, вы не хотите платить слишком много за доставку и обработку, но вы также не хотите слишком долго ждать доставки.

На этом этапе эволюции Интернета вам лучше всего будет просмотреть веб-страницы различных розничных продавцов, сравнив цены, сроки и тарифы на доставку. Вы также можете поискать сайт, который будет сравнивать цены и варианты доставки сразу у нескольких продавцов. В любом случае вам придется проделать большую часть виртуальной беготни, а затем принять решение о покупке и разместить заказ самостоятельно.

С Semantic Web у вас будет другой вариант. Вы можете ввести свои предпочтения в компьютеризированный агент , который будет искать в Интернете, находить лучший вариант для вас и размещать заказ. Затем агент может открыть программное обеспечение для личных финансов на вашем компьютере и записать сумму, которую вы потратили, и он может отметить дату, когда ваши DVD-диски должны появиться в вашем календаре. Ваш агент также узнает ваши привычки и предпочтения, поэтому, если у вас был неудачный опыт покупки на одном конкретном сайте, он будет знать, что больше не использовать этот сайт.

Агент будет делать это не просматривая картинки и не читая описания, как это делает человек, а просматривая метаданные , которые четко идентифицируют и определяют то, что агенту нужно знать. Метаданные — это просто машиночитаемые данные, которые описывают другие данные. В семантической паутине метаданные невидимы, когда люди читают страницу, но они ясно видны компьютерам. Метаданные также позволяют выполнять более сложные и целенаправленные поиски в Интернете с более точными результатами. Перефразируя Тима Бернерса-Ли, изобретателя Всемирной паутины, эти инструменты позволят Сети, которая в настоящее время похожа на гигантскую книгу, стать гигантской базой данных.

Далее мы рассмотрим инструменты, которые могут сделать документы машиночитаемыми.

Уголок викторины

Что вы знаете о Web 3.0 и о том, что он делает? Проверьте свои знания с помощью нашей викторины Web 3.0!

Семантика

Чтобы сделать Web машиночитаемым, нужны слои и слои метаданных, логики и безопасности. Большинство визуальных представлений этих слоев включают стопку — что-то вроде башни из блоков, представляющих все слои. Стек меняется и развивается по мере развития концепций Semantic Web. Посмотреть, как выглядит обычная версия стека, можно здесь , как часть введения в Semantic Web.

Разметка: XML и RDF

В тройке RDF есть субъект (Энакин Скайуокер), объект (Люк Скайуокер) и свойство, которое их объединяет.

Допустим, вы хотите сделать это предложение понятным для компьютера:

Энакин Скайуокер — отец Люка Скайуокера.

It's easy for you to figure out what this sentence means -- Anakin and Luke Skywalker are both people, and there is a relationship between them. You know that a father is a type of parent, and that the sentence also means that Luke is Anakin's son. But a computer can't figure any of that out without help. To allow a computer to understand what this sentence means, you'd need to add machine-readable information that describes who Anakin and Luke are and what their relationship is. This starts with two tools -- eXtensible Markup Language (XML) and Resource Description Framework (RDF).

XML is a markup language like hypertext markup language (HTML) , which you're probably somewhat familiar with from surfing the Web. HTML governs the appearance of the information you look at on the Web. XML complements (but does not replace) HTML by adding tags that describe data. These tags are invisible to the people who read the document but visible to computers. Tags are already in use on the Web, and existing bots, like the bots that collect data for search engines , can read them.

RDF does exactly what its name indicates -- using XML tags, it provides a framework to describe resources. In RDF terms, pretty much everything in the world is a resource. This framework pairs the resource (any noun, like Anakin Skywalker or the "Star Wars" trilogy) with a specific item or location on the Web so the computer knows exactly what the resource is. Clearly identifying resources keeps the computer from doing things like confusing Anakin Skywalker with Sebastian Shaw or Hayden Christiansen, or the original trilogy with the One-Man "Star Wars" Trilogy.

To do this, RDF uses triples written as XML tags to express this information as a graph. These triples consist of a subject, property and object, which are like the subject, verb and direct object of a sentence. (Some sources call these the subject, predicate and object.) RDF already exists on the Web -- for example, it's part of RSS feed creation.

So far in this example, the computer knows that there are two objects in this sentence and that there is a relationship between them. But it doesn't know what the objects are or how they relate to one another. We'll look at the tool for adding this layer of meaning next.

Knowing What's What: URIs

A URI gives a computer a specific point of reference for each item in the triple -- there's no need for interpretation or potential for misunderstanding.

Even with the framework that XML and RDF provide, a computer still needs a very direct, specific way of understanding who or what these resources are. To do this, RDF uses uniform resource identifiers (URIs) to direct the computer to a document or object that represents the resource. You're already familiar with the most common form of URI -- the uniform resource locator (URL), which begins with http://. A URI can point to anything on the Web and may also point to objects that are not part of the web, like appliances in computerized homes. Mailto, ftp and telnet addresses are some other examples of URIs.

For our example, we'll use the characters' pages at the official Star Wars site as their URIs.

Now the computer knows what the subject and object are -- Anakin Skywalker is the entity represented by the first URI, and Luke Skywalker is the entity represented by the second. But you'll notice that the middle URI in our triple -- the one for the property -- doesn't point to the Star Wars site. Instead, it points to a make-believe document on the server. If that page really existed, it would be our XML namespace.

Unlike HTML, which uses standard tags like for bold and for underline, XML doesn't have standard tags. This is useful -- it lets developers create unique tags for specific purposes. But it means that a browser doesn't automatically know what the tags mean. An XML namespace is basically a document that tells applications the meaning of all the tags in another document. The creator of an XML document declares the namespace at the beginning of the document with a line of code. In our example, our namespace declaration would look like this:

That line of code says to the computer, "Any tags you see that begin with 'hsw' use the vocabulary found in this document. You can look up any tag beginning with 'hsw' here." That way, people can create the XML tags they need for a document without conflicting with other XML documents on the Web.

XML and RDF are the "official language" of the Semantic Web, but by themselves they're not enough to make the entire Web accessible to a computer. We'll look at some of the other layers next.

That's (not) Impossible!

XML and RDF are at the heart of the Semantic Web. They give computers a structure in which to look for information and define relationships between resources. Applications can also merge graphs that use identical URIs. For example, an application could merge the graph above with another one specifying the relationship between Anakin Skywalker and Darth Vader. The application could then infer that Vader is Luke's father.

Languages and Vocabularies:RDFS, OWL and SKOS

An example of a very small number of the resources and connections that might be found in a Star Wars ontology. You can figure these out on your own from watching the movies and surfing the Web, but a computer must have a clear outline to make sense of it.

Another obstacle for the Semantic Web is that computers don't have the kind of vocabulary that people do. You've used language your whole life, so it's probably easy for you to see connections between different words and concepts and to infer meanings based on contexts. Unfortunately, someone can't just give a computer a dictionary, an almanac and a set of encyclopedias and let the computer learn all this on its own. In order to understand what words mean and what the relationships between words are, the computer has to have documents that describe all the words and logic to make the necessary connections.

In the Semantic Web, this comes from schemata and ontologies. These are two related tools for helping a computer understand human vocabulary. An ontology is simply a vocabulary that describes objects and how they relate to one another. A schema is a method for organizing information. As with RDF tags, access to schemata and ontologies are included in documents as metadata, and a document's creator must declare which ontologies are referenced at the beginning of the document.

Schema and ontology tools used on the Semantic Web include:

RDF Vocabulary Description Language schema (RDFS) - RDFS adds classes, subclasses and properties to resources, creating a basic language framework. For example, the resource Dagobah is a subclass of the class planet. A property of Dagobah could be swampy.

Simple Knowledge Organization System (SKOS) - SKOS classifies resources in terms of broader or narrower, allows designation of preferred and alternate labels and can let people quickly port thesauri and glossaries to the Web. For example, in a Star Wars glossary, a narrower term for Sith Lord could be Darth Sidious and a broader term could be villain. Similarly, alternate labels for Han Solo might be nerf herder and laser brain.

Web Ontology Language (OWL) - OWL, the most complex layer, formalizes ontologies, describes relationships between classes and uses logic to make deductions. It can also construct new classes based on existing information. OWL is available in three levels of complexity -- Lite, Description Language (DL) and Full.

The trouble with ontologies is that they are very difficult to create, implement and maintain. Depending on their scope, they can be enormous, defining a wide range of concepts and relationships. Some developers prefer to focus more on logic and rules than on ontologies because of these difficulties. Disagreements regarding the roles these rules should play may be one potential pitfall for the Semantic Web.

Next, we'll tie it all together by looking at our original example -- those "Star Wars Trilogy" DVDs.

Accessing the Metadata

One of the long-term goals of the Semantic Web is to allow agents, software applications and web applications to access and use metadata. A key tool for doing this is simple protocol and RDF Query Language (SPARQL), which is still in development. SPARQL's purpose is to extract information from RDF graphs. It can look for data and limit and sort the results. One of the advantages of the RDF structure is that these queries can be very precise and get very accurate results.

Tying it All Together

In our original example, we talked about buying "Star Wars" DVDs online. Here's how the Semantic Web could make the whole process easier:

Each site would have text and pictures (for people to read) and metadata (for computers to read) describing the DVDs available for purchase on their site.

The metadata, using RDF triples and XML tags, would make all the attributes of the DVDs (like condition and price) machine-readable.

When necessary, businesses would use ontologies to give the computer the vocabulary needed to describe all of these objects and their attributes. The shopping sites could all use the same ontologies, so all of the metadata would be in a common language.

Each site selling the DVDs would also use appropriate security and encryption measures to protect customers' information.

Computerized applications or agents would read all the metadata found at different sites. The applications could also compare information, verifying that the sources were accurate and trustworthy.

Of course, the Web is enormous, and adding all this metadata to existing pages is a huge undertaking. We'll look at this and some of the other potential hurdles for the Semantic Web next.

Security and Proof

As with any Web document, the Semantic Web requires security measures to protect data and transactions. Included in W3C's recommendations for the Semantic Web are digital signatures, encryption , proofs and trust. Proofs and trust relate to the logic of the Semantic Web and applications' abilities to verify that data is correct and consistent through all of the web's layers.

W3C and the Future of the Semantic Web

Like the World Wide Web, the Semantic Web is decentralized -- no one organization or agency has control over all of its rules and content. However, some people and organizations have taken leadership roles in the development of Semantic Web guidelines and protocols. These include the World Wide Web Consortium (W3C), its director Tim Berners-Lee and its member organizations. The W3C is not a research organization, so universities, other organizations and the public also play an active role in Semantic Web development.

Some areas of the World Wide Web have already incorporated Semantic Web components. These include RSS feeds, which use RDF, and the Friend-of-a-Friend (FOAF) project, which proposes to create machine-readable personal web pages.

But much of the Semantic Web's function and practicality are still in development, and there are some pretty big obstacles to overcome. Decentralization gives developers the freedom to create precisely the tags and ontologies that they need. But, it also means that different developers might use different tags to describe the same thing, which could make machine comparisons difficult. Critics also question the "identity problem" -- does a URI represent a Web page, or does it represent the concept or object the page describes. For example, is "http://www.starwars.com" meant to represent the "Star Wars" films, or just the Web page?

Некоторые разработчики расходятся во мнениях относительно того, следует ли семантической паутине больше полагаться на правила или на онтологии. Критики также говорят, что проект чрезвычайно непрактичен. Во-первых, люди на самом деле не думают с точки зрения графов, которые использует RDF. Во-вторых, маловероятно, что предприятия и существующие сайты действительно потратят время и ресурсы, необходимые для добавления всех необходимых метаданных. В будущем готовое программное обеспечение может включать в себя опции для добавления метаданных при создании новых документов, но этот инструмент по-прежнему может не сделать проект осуществимым в большем масштабе.

Для получения дополнительной информации о World Wide Web и Semantic Web перейдите по ссылкам на следующей странице.

Много дополнительной информации

Статьи по Теме

Викторина Веб 3.0

Существует ли Веб 1.0?

Как работает Веб 2.0

Как работает семантическая сеть

Как работают веб-страницы

Как работают веб-серверы

Как появился Интернет?

Кому принадлежит Интернет?

Как работают веб-страницы

Как работает интернет-инфраструктура

Как работает шифрование

Как работает Google

Как работает электронная коммерция

Больше отличных ссылок

W3C: семантическая сеть

Семантическая сеть: введение

Блог Тима Бернерса-Ли

Scientific American: Семантическая сеть

Инициатива Дублинского ядра по метаданным

Источники

Адамс, Кэтрин. «Семантическая сеть: различие между таксономиями и онтологиями» онлайн; июль/август 2002 г.;

Беккет, Дэйв. «Руководство по ресурсам структуры описания ресурсов (RDF) Дэйва Беккета». http://planetrdf.com/guide/

Кларк, Кендалл. «SPARQL: Web 2.0 встречает семантическую сеть» О'Рейли. http://www.oreillynet.com/pub/wlg/7823

Гринберг, Джейн и др. «Метаданные: фундаментальный компонент семантической сети». Бюллетень Американского общества информационных наук и технологий, апрель/май 2003 г.

Гринберг, Джейн. «Генерация метаданных: процессы, люди и инструменты». Бюллетень Американского общества информационных наук и технологий, декабрь 2002 г. / январь 2003 г.

Гринберг, Джейн. «Семантическая сеть: больше, чем видение». Бюллетень Американского общества информационных наук и технологий, апрель/май 2003 г.

Грубер, Том. «Что такое онтология?» http://www-ksl.stanford.edu/kst/what-is-an-ontology.html

Хардин, Стив. «Тим Бернерс-Ли: семантическая сеть — сеть машинно-обрабатываемых данных». Бюллетень Американского общества информационных наук и технологий, февраль/март 2005 г.

Хоук, Сандро. «Как работает семантическая сеть». http://www.w3.org/2002/03/semweb/

Хендлер, Джеймс. «Наука и семантическая сеть». Наука, 24 января 2003 г.

Хоррокс, Ян и др. Ал. «Семантическая веб-архитектура: стек или две башни?» http://www.cs.man.ac.uk/~horrocks/Publications/download/2005/HPPH05.pdf

Джейкоб, Элин К. «Онтологии и семантическая сеть». Бюллетень Американского общества информационных наук и технологий; Апрель/май 2003 г.

Миллер, Эрик и Ральф Суик. Обзор деятельности W3C Semantic Web». Бюллетень Американского общества информационных наук и технологий, апрель/май 2003 г.

Парсия, Бижан. «Семантические веб-сервисы». Бюллетень Американского общества информационных наук и технологий, апрель/май 2003 г.

Ширки, Клэй. «Семантическая сеть, силлогизм и мировоззрение». http://www.shirky.com/writings/semantic_syllogism.html

Шварц, Аарон. «Семантическая сеть в ширину». http://logicerror.com/semanticWeb-длинный

Ван Эман, Джей. «Экспорт OWL из полного тезауруса». Бюллетень Американского общества информационных наук и технологий, октябрь/ноябрь 2005 г.

W3C: Как мы идентифицируем вещи в семантической сети. http://www.w3.org/2001/03/identification-problem/

W3C: Руководство по языку веб-онтологий OWL http://www.w3.org/TR/owl-guide/

W3C: RDF Primer http://www.w3.org/TR/rdf-primer/

W3C: Заявление о семантической веб-деятельности. http://www.w3.org/2001/sw/Activity

W3C: Основное руководство SKOS http://www.w3.org/TR/swbp-skos-core-guide/

W3C: Учебное пособие по семантическим веб-технологиям http://www.w3.org/Consortium/Offices/Presentations/RDFTutorial/