Sunday 8 June 2014



The Semantic Web: the inside story






JIM HENDLER
Rensselaer Polytechnic Institute
Department of Computer Science 

VIDEO



OVERVIEW:  In this talk I look at the Semantic Web idea of adding knowledge to the Web in ways compatible with machine processing.  Emerging in the late 90s, and growing since then,the languages , usage and uptake of semantic technologies has been increasing.  I'll discuss the genesis of this idea, some key steps in its history, and current usage. I also proposes challenges: Having far surpassed the original vision, how do we continue to use and grow the semantic web?

READINGS:
    Hendler, J., & Berners-Lee, T. (2010). From the Semantic Web to social machines: A research challenge for AI on the World Wide Web. Artificial Intelligence, 174(2), 156-161.
    Shadbolt, N., Hall, W., Hendler, J. A., & Dutton, W. H. (2013).
Web science: a new frontier. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 371(1987), 20120512. Hendler, J. (2014). Big data meets computer science. Journal of Computing Sciences in Colleges, 29(6), 5-6.

34 comments:

  1. Même si chaque «webmaster» émet une représentation RDF pour son site pour que les moteurs de recherche puisse utiliser cette information, il y a la question de confiance. Comment faire confiance aux informations diffusées par un site?

    ReplyDelete
    Replies
    1. Translation : Even if every webmaster puts a RDF reprensentation of the metadata of their web site, how do you trust that information? How do search engines trust to use that information?

      Delete
    2. Why is this different than the issue of how to trust what is in the text (or video) of the web? It's just another ya to press information but the info providers may still not be trustworthy

      Delete
  2. This comment has been removed by the author.

    ReplyDelete
    Replies
    1. I guess the solution is the automatic generation of semantic metadata - something mentioned in 'The Unreasonable Effectiveness of Data' article (http://www.computer.org/csdl/mags/ex/2009/02/mex2009020008-abs.html). It seems Google goes that direction (?).

      Delete
    2. My point is Google has move away from automatic generation,they are using schema.org so web masters embed the information - it is more accurate that way.

      Delete
  3. Already, web developers are finding the syntax of HTML too verbose-- projects like Markdown are becoming more popular, where users can make webpages with almost plain text instead of HTML. The example from schema.org looks so bulky with all the properties in every tag. It seems to me that developers are tending toward less typing vs. more-- why do you think most developers will be willing to use a more verbose syntax?
    Using JSON might solve the problem, but there's a saying in computer science-- "The best code is the code you don't have to write." I know Google and Facebook might love the idea of the semantic web, but I'm not sure most web developers will think the extra work is worth it.

    ReplyDelete
    Replies
    1. I think that web developers will be motivated to include the extra syntax if there's a pay off in terms of reaching their target audience better, and being able to serve customers more effectively. In the same way that people modified their web pages once they learned how Google's page rank system worked, I think people could also be motivated to adopt the extra syntax of the semantic web. Today, it seems like showing up high in google's searches translates very well to monetary value, so if using additional tags from schema.org helps to make your website more accessible to people, then there will be a strong incentive for web developers to adopt this behaviour.

      Delete
    2. Nicole is right, the payoff is in better search placement - I should have made that clear - will add it to my next version of the talk

      Delete
  4. Quelle est la motivation des webmasters a publié une représentation RDF de leur site? Est-ce simplement pour que les moteurs de recherche puissent utiliser cette information?

    ReplyDelete
    Replies
    1. Translation : What motivation do webmasters have to publish an RDF representation of their website? Is it simply information for search engines to use?

      Delete
    2. Il y a aussi probablement une question de faire un écosystème qui peut être miné par la suite. Évidemment, on rentre dans une dynamique de coopération, mais il y a probablement moyen de "punir" les parasites.

      Delete
    3. I think this is same answer as above - better search placement corresponds to better commerce (or better visibility)

      Delete
  5. What about privacy in the semantic web? Is it possible to choose some resources and links which are hide, that have some no-explicit names?

    ReplyDelete
    Replies
    1. I think it's mostly a legal matter. As Prof. Golbeck said yesterday, there's a difference between the Anglo-Saxon world and continental Europe. The intellectual property law here will allow you to give all the rights over the content you create to a third party (that's what you do when you accept service agreements with Google and Facebook). However, it's different in continental Europe, where you keep some rights over your creation – hence the "right to be forgotten".

      In practice, however, companies mostly apply the US model, regardless of the local law.

      Delete
    2. The paradox, I think, is that LD can provide some answers to the privacy problem, because they could allow users to have centralized privacy settings and to list the websites where they deposited content. The industry has come up with some ideas – e.g. OpenID. But it hasn't picked up.

      I think the lack is philosophical. The dangers are obvious, but I think we still lack the hermeneutical ressources to address them appropriatly.

      Delete
  6. Dear Jim, Thank you very much for your challenging presentation ! I have a more precise technical and not new question : Amongst knowledge representation methodologies from IA, how to choose between Prolog and semantic web technologies ? Prolog is still useful for reasoning? Big project like Watson in IBM still use it... Why not choose only ontologies, be completely in and contribute for the Semantic Web?

    ReplyDelete
    Replies
    1. Prolog is a language for processing information, it's not an annotation format. RDF, SPARQL, OWL are, in essence, KR languages, they specify the relationships between and among entities and classes - many Sem Web tools use prolog, although there are scaling issues -- Watson, for example, used prolog to specify some rules to improve parsing, but not as a primary reasoner in their heuristics because of this. So really there's both.

      Delete
  7. 1. How the semantic web is going to deal with evolutionary drift i.e. the changes in the meaning of words that take place in all natural languages. Is it reasonable to think about tools that dynamically update the semantic definitions. At any case the drift will be considerably slow but nevertheless significant.

    2. Does cleaning up linked data necessarily requires human intervention or can it be automated, at least to a certain extent, by comparing multiple instances for example ?

    3. In view of the internet of things. Do you see the semantic web progressively extending into the physical dimension and every physical object being tagged with RDF tags.

    4. Can the semantic web be extended to account for temporal processes or will it always be confined to static objects and relations?

    Thanks for the talk!

    ReplyDelete
    Replies
    1. With respect to the semantic drift, automatic cleaning of linked data and, at the extreme, learning semantic data automtically (either from structured html or unstructured text), there is a W3C proposal to extend RDF with a context (http://www.w3.org/2011/rdf-wg/wiki/RDFwithContexts) where a context is understood as "a social agreement about the intended meaning of some vocabulary of [URIs]". Another related proposal suggests using N-Quads instead of N-Triples in a form of " ".

      While I am not sure I understand all technical details of these proposals, introducing the contextual element into RDF triple seems to be logical considering that a) meaning of words is contextual; b) meaning of words is not 'written in stone' and changes over time. Context does not need to be a predefined string - it can be a reference (or set of references) to other triples in a RDF graph (of the same or different namespace). I think this comes closer to natural languages where words are recursively defined by other words. There is an article that proposes the above and gives a formal description: http://arxiv.org/abs/1006.1080.

      I would like to ask Jim Hendler what does he think about these proposals and ideas. In my view they may be very useful in making Semantic Web more widespread, enabling automatic construction of RDF metadata, etc. Would appreciate any comments and responses.

      Delete
    2. I wrote a long reply to this which hasn't shown up -- I'm hoping it is something to do with moderation and will show up eventually - short answer is these are questions being explored in research community and many papers being written about different aspects.

      Delete
  8. You asked: who cares if we all speak a different language? Is it your point of view or most researcher are thinking the same? Isn’t it any will of united semantic?

    ReplyDelete
    Replies
    1. To be clear, I was contrasting the view that said "just link data, don't worry abotu ontologies" to the Tower of Babel. We need some semantics. On the other hand, the more agreement needed the harder to get it, so the Sauron tower needs too much. My argument is that success will lie somewhere in the middle - schema.org is a start, but very simplistic -- my challenge is how we do better.

      Delete
  9. This was a wonderful talk! My question for Professor Hendler concerns the flexibility of ontologies on the semantic Web. Academic research often leads to a reclassification of the ontologies we use to make sense of the world. Classical mechanics, for instance, in effect abolished the long-held ontological distinction between sub- and supra-lunar ontologies. Something similar can be said about general relativity and our conception of space and time. Examples abound. My question is, how flexible are Web ontologies? Can they accommodate such changes?

    ReplyDelete
    Replies
    1. THere are versioning mechanisms in the OWL language to permit ontology update, but not very good either social or technical means for how to use them in practice - a great topic for a thesis!

      Delete
    2. It could be useful to use statistically driven machine learning techniques, like clustering to edit existing ontologies based on these shifts, allowing them a greater degree of flexibility in the face of change.

      Delete
  10. Thank you for this nice talk.
    The web are changing with the semantic web, this are going to create a new formalism how to store knowledge. Do you think, Professor James, can we use all this mass knowledge (extract from ontology and linked data) in order to promote a rise of general intelligence (like Watson-IBM but more cognitive)?

    ReplyDelete
    Replies
    1. The Semantic Web will let us create a more expressive "link space" -- the web graph is an unlabeled graph, the semantic web adds many labels. How these can be used for AI/cognitive research remains an open and active topic. My personal belief is they will be a very useful technology, especially if coupled with the sort of language and network research Dr. Han discussed.

      Delete
  11. With the exponential increase in online data, will the semantic web help filter out redundant and obsolete data? How many copies of online data currently exist?

    ReplyDelete
    Replies
    1. there's many copies of many things, and no one really knows the answer - certainly the archives of Facebook and Google contain many copies of the same information -- and the web archive contains tons of copies of things, many with semantics. I don't know of people working to filter this, current technology is just to such it all in and use bigger and bigger server farms to process -- sooner or later that won't scale, but people seem to think we have a pretty long way to go before we hit the limits.

      Delete
  12. Was the Web a product of its time? If Tim Berners-Lee did not initiate the Web do you think someone else would have built a similar framework; or, would we still be living in a non-digital age? Are there many different ways the Web could have be implemented, or is Sir Tim's framework special in some way?

    ReplyDelete
    Replies
    1. I actually asked him, Wendy Hall, and some others that question once -- in the late 70s and early 80s some other people had some of the technical ideas already (Ted Nelson in particular, check out "Xanadu") -- synthesizing the many answers I got the feeling was it took someone with Tim's genius to unite the technical, social and socio-political obstacles to make this work (for example, Tim had been influenced by the open software movement and thus made an open source Web, others had been stumbling on how to monetize it). I don't think we'd be non-digital without it, but I wonder often what things would look like if he hadn't come along -- I think he deserves all the credit he gets - the framework was a product of its time, but also of his genius. (and he was the only one who got offended when I called him a genius :-))

      Delete
  13. If your question doesn't have my answer it is because the system seems to have decided periodically that I was a robot (I don't think I am). I hope some of the long answers I typed will be found by a moderator and added - if not, if there is no answer to your question, please feel free to reach out to me via email or twitter (@jahendler) and I will get you an answer

    ReplyDelete