Tuesday, February 14, 2012

Calais: Not just cross-channel ferries!

Open Calais employs semantic web technology to identify key elements on your webpages. It is sponsored by the Thompson/Reuters news agency who are committed to keeping Calais open - as in open source. Here's a brief intro

The basic idea is that if we want the web to 'read' our pages so it can help us connect ideas, then we have to give it something to work with.Semantic proxy takes the text, identifies proper nouns, category areas and syntactic relationships.

For example, my blog entry for the book review for Hunston's "Corpus Approaches to Evaluation" produces this table for the semantic content areas. Interestingly, when I tried it just yesterday, the last 4 entries here were first and had 3 stars.

The semantic proxy version gives almost the same results, but is not as pretty. They are a little clearer on what they do. This one does what it says on the tin:

Perhaps more telling is the chart for people, companies, and events, below. Here there is an attempt to tag all names, presumably to make the dream of a semantic web (see here and here) just a little closer.

On close inspection, however, we can notice a number of errors here. John McH Sinclair loses his surname. The term Modal-Like Expressions in the review is abbreviated to MLEs, which is recognised here as a company. Again, yesterday's results also included Bernstein (as in Basil) as a company. None of the books or articles in the references is recognised as a publication.

The final set of results is probably the set that is likely to include the most errors. Here there is an attempt to match semantic relations between the entities in the text, using simple predication, see left. Subjects, objects and verbs are tied together under the misleading title of generic relations. For instance 'Nick Moore' is identified as a subject where the object is 'a reviewer for a number of journals' and the verb is identified as 'be.' The only other subject-agent listed here is Susan Hunston, who 'very similar patterns' (object) 'produce' (verb) or 'the methodology of the chapter' (object) 'apply' (verb). Susan Hunston (subject) also does intransitive things such as 'investigate' (verb). The Quotations are not such in the original article, but real quotations such as “one function of evaluation is to reify texts and propositions by assigning them an epistemic status.” is not recognised as such (unless they mean something else by quotation).

Overall Grade: C+. Keep trying!!!

