Sunday 8 June 2014


Social and Semantic Web: Adding the Missing Links

INRIA Research Center of Sophia-Antipolis

VIDEO



OVERVIEW: Since the mid-90s the Web re-opened in read-write mode and, almost as a side effect, paved the way to numerous new social media applications. Today, the Web is no longer perceived as a document system but as a virtual place where persons and software interact in mixed communities. These large scale interactions create many problems -- in particular, reconciling the formal semantics of computer science (e.g. logics, ontologies, typing systems, etc.) on which the Web architecture is built, with the soft semantics of people (e.g. posts, tags, status, etc.) on which the Web content is built. Wimmics, among other research labs, studies methods, models and algorithms to bridge formal semantics and social semantics on the Web. We focus on the characterization of typed graph formalisms to model and capture these different pieces of knowledge and hybrid operators to process them jointly. This talk will describe the basics of semantic web formalisms and introduce different initiatives using these frameworks to represent reason and support social media and social applications on the web.

READINGS:
    Nicolas Marie, Myriam Ribiere, Fabien Gandon, Florentin Rodio, Discovery Hub: on-the-fly linked data exploratory search,  Proc. of I-Semantics 2013, Graz, Austria
    Michel Buffa, Nicolas Delaforge, Guillaume Erétéo, Fabien Gandon, Alain Giboin, Freddy Limpens: ISICIL: Semantics and Social Networks for Business Intelligence. SOFSEM 2013: 67-85
    Nathalie Aussenac-Gilles, Fabien Gandon, From the knowledge acquisition bottleneck to the knowledge acquisition overflow: A brief French history of knowledge acquisition, International Journal of Human-Computer Studies, Volume 71, Issue 2, Pages 157-165, February 2013
    Guillaume Erétéo, Fabien Gandon, and Michel Buffa, SemTagP: Semantic Community Detection in Folksonomies, IEEE/WIC/ACM International Conference on Web Intelligence, August 2011, Lyon.
    Freddy Limpens, Fabien Gandon and Michel Buffa, Helping Online Communities to Semantically Enrich Folksonomies, Web Science Conference, April, 2010, Raleigh, NC, USA.
    Guillaume Erétéo, Michel Buffa, Fabien Gandon, and Olivier Corby. Analysis of a Real Online Social Network using Semantic Web Frameworks. In Proc. International Semantic Web Conference, ISWC'09, Washington, USA, October 2009


33 comments:

  1. Most of my slides are available online on Slideshare:
    http://www.slideshare.net/fabien_gandon/presentations

    ReplyDelete
  2. Reminiscent of factor identification in Factor Analysis.

    ReplyDelete
    Replies
    1. Indeed, here we make an additional hypothesis: since we are looking at epistemic communities we consider that one thing that bind them is the interest they share and therefore thanks to a thesaurus we generalize the tags used by the end-users to suggest shared interests.

      Delete
  3. Really interesting talk. Gandon's work about detecting and labeling communities (discovering the reason and shared interest in the communities vs. just looking at the links that exist) reminds me of Les Carr's comment in one of the scientometrics presentations about trying to find a social realist application from the analysis. It would be cool if the speakers who find scientometric information used these semantic tools.

    ReplyDelete
    Replies
    1. in our case two immediate applications are: (1) assisting community managers in understanding their communities and animating them and (2) using these labelled communities in filtering, routing, notifying new and existing content.

      Delete
  4. Le capteur d'émotion ça t'arrange, Clélia, pour la lecture de l'esprit?

    ReplyDelete
    Replies
    1. Reminds me of the 'bowing' symbol used in Japanese typing and emoticons.

      Delete
    2. Oui, cela aiderait beaucoup les usagers!

      Delete
  5. On peut créer notre propre RDF sur un domaine particulier. Grâce à ça, on obtient le Linked Open Data d'aujourd'hui. Avec l'ensemble de ces dépôts RDF, est-ce qu'il existe actuellement des «web crawler» qui vont mettre ensemble TOUS les dépôts RDF et de créer un gros dépôts RDF (un genre de google, mais pour le web sémantique)

    Ex.
    Supposons que depot1 (ww.w.site1.com) possède le triplet RDF suivant :
    www.site1.com/Paris www.site1.com/villeDe www.site1.com/France

    Et supposons que depot2 (www.site2.com) possède le triplet suivant :
    www.site1.com/Paris www.site2.com/nombreHabitant "2000000"

    Je m'attendrais que le web crawler combine les deux triplets, car les dépôts parlent d'un même URI (www.site1.com/Paris) avec des informations différentes. Le dépôt central (qui a effectué le crawl) devrait avoir les deux triplets suivants :
    www.site1.com/Paris www.site1.com/villeDe www.site1.com/France
    www.site1.com/Paris www.site2.com/nombreHabitant "2000000"

    Est-ce qu'il existe des technologies qui font ça (efficacement) aujourd'hui?

    ReplyDelete
    Replies
    1. Oui et non :-)

      En fait il y a des crawlers comme Sindice qui parcourent le web à la recherche de triplets et offrent un moteur de recherche sur ces triplets mais pas un point d'accès SPARQL unique sur des données collectées.

      En parallèle les moteurs existants comme Google parcourent aussi le Web à la recherche notamment de RDFa par exemple pour l'utilisation dans des "Rich Snipets" pour adapter l'affichage des résultats de recherches aux types des objets trouvés. L'initiative Schema.org rassemble d'ailleurs plusieurs grands moteurs.

      Cependant la quantité de données publiée actuellement et surtout dans le futur ne laisse pas penser pour l'instant qu'il serait envisageable d'avoir un entrepôt unique de toutes les données trouvées, et les options actuelles sont soit des entrepôts thématiques, soit des architectures distribuées permettant de découper les requêtes entre plusieurs bases et recoller après les morceaux.

      Delete
  6. It was a very clear and interesting talk, thank you. I found this emotions detecting system quite helpful, as I said. Also, I really liked the idea that we can ask the system to understand the links between what we choose and what the system proposes.

    ReplyDelete
  7. While the top few Wikipedia contributors are bots, what percentage of all entries are bots? It only makes sense that bots are going to win for the most number of articles published. We can program them to write a page on a topic then let them publish on all the instances of that topic. I think that graph may confuse us about how much humans vs bots contribute to Wikipedia.

    ReplyDelete
    Replies
    1. 410 out of 21,759,825 named accounts are bots, according to Wikipedia.

      Delete
    2. There are several other points to be made from that graph in my opinion:
      (1) hybrid communities are not science fiction
      (2) even if there are more humans than robots, in terms of edition acts the robots impacts millions of pages and users
      (3) it is not only about the number of robots but also about their ability to have huge numbers of interactions with many users

      Even one robot in a community can be responsible for many interactions, activities, and can drastically influence the life of the community.

      Delete
  8. De la façon que je comprends le Web Sémantique, si on utilise une URI existante (par exemple, FOAF) dans nos données RDF, est-ce que le lien est bidirectionnel ou unidirectionnel?

    C'est-à-dire, est-ce que FOAF sait que j'existe? Dans ce cas, une requête SPARQL sur FOAF pourrait retourner également les informations de ma base RDF.
    Si non, existe-t-il une façon ou comment pourrait t'on faire pour que liens soient bidirectionnel?

    ReplyDelete
    Replies
    1. FOAF est un ontologie, un schéma, qui permet de représenter un réseau social. Foaf définit des classes comme "Person" et des relations comme "name" ou "knows".

      Le vocabulaire de FOAF peut donc être utiliser pour décrire une personne et ses amis. Un lien comme "knows" est orienté en RDF donc initialement unidirectionel mais un schéma peut aussi créer des relations symétriques (en utilisant OWL) pour indiquer qu'une relation est bidirectionnelle.

      c.f. http://xmlns.com/foaf/spec/#term_knows

      Donc quand vous affirmez (#me, knows, #you)
      le système connait #me, et #you et il sait qu'il y a un lien "knows" de #me à #you et pourra réponde à des requêtes SPARQL sur ça (il vous connait et connait un lien entre vous et moi).

      Par contre en l'absence d'autres connaissances il ne dérivera pas que (#you, knows, #me) i.e. le lien inverse n'est par défaut pas inféré sauf connaissances supplémentaires.

      Si demain en OWL vous déclarez que "knows" est une propriété symétrique alors le système dérivera que (#you, knows, #me)

      Delete
    2. Je me rends compte que j'ai mal posé ma question. Elle concerne les liens entres les URI de deux dépôts RDF différent.
      Ex.
      Supposons que j'ai l'URI www.dbpedia.org/class/City
      Ensuite, dans mon dépôt RDF, je déclare www.monsite.com/Québec comme étant une instance de www.dbpedia.org/class/City.

      Si je fais une requête SPARQL sur DBpedia de tous les instances de www.dbpedia.org/class/City, est-ce que j'aurais Québec dans mes résultats?

      Pour avoir Québec dans les résultats, la seule façon est de rajouter l'entrée manuellement dans DBpedia?

      Delete
    3. En effet vous n'aurez pas, par défaut, la ressource dans vos résultats.

      Outre le fait de rajouter votre ressource à Wikipedia et attendre qu'il soit extrait dans DBpedia, il y a deux grandes familles d'approches (et bien sûr comme à chaque fois des solutions hybrides entre les deux):
      - soit vous construisez des entrepôts re-centralisant les sources de données qui vous intéressent pour le interroger de façon intégrée.
      - soit vous reposez sur des systèmes de gestion de requêtes distribuées capables de découper et distribuer votre requête à plusieurs sources et recomposer les résultats partiels.

      Chaque approche peut être à son tour spécialisée par exemple dans le cas de la résolution distribuée:
      - des approches (standard) se contentent de vous laisser indiquer quelle partie de votre requête doit être déléguée
      http://www.w3.org/TR/sparql11-federated-query/
      - des approches (encore R&D) analysent et découpent votre requête en fonction de ce qu'elles savent sur le contenu des entrepôts qu'elles connaissent ex FedX:
      http://www.fluidops.com/fedx/

      Ce dernier point nous amène sur un dernier aspect: indiquer le contenu des bases et un exemple d'approche légère est par exemple le schéma VoID qui permet notamment d'indiquer le domaine d'une base:
      http://www.w3.org/TR/void/

      Delete
  9. I was very interested by Dr. Gandon’s discussion of Web 3.0, and particularly by the increasing automatizing of database compiling, especially for Wikipedia entries. I was especially interested by the interplay between formal and social semantics. How can we maximize the interplay and mutual enrichment of both semantics?

    ReplyDelete
    Replies
    1. This is the subject of the field called "Social Semantic Web"
      http://en.wikipedia.org/wiki/Social_Semantic_Web
      You will find article on that topic at ISWC, ESWC, WWW, ICWSM

      There are even book n the subject such as the one of John Breslin et al.
      http://www.amazon.com/The-Social-Semantic-John-Breslin/dp/3642011713

      In this domain you will find very different kinds of approaches e.g. using ontologies to capture and reason on social structures and resources ; using social objects and traces to extract and populate knowledge bases ; coupling the life cycles of representations in both worlds to foster them for instance by suggesting tags from formalized thesauri ; etc.

      Delete
    2. This is extremely interesting! Thank you for the reply, Dr. Gandon.

      Delete
  10. To link Goldstone's ideas into this talk, the creation of a folksonomy (and language too) really depends on imitation. If everyone was creating their own hashtags, on say Twitter, then a good folksonomy wouldn't be able exist. There has to be a willingness to use things that other people have already made.

    ReplyDelete
    Replies
    1. Yes, however there are also approaches to align and link tags, concepts, resources even if they use different terms at the begining.

      For instance every year the Ontology Alignment Evaluation Initiative is a contest where different approaches for ontology alignment compete:
      http://oaei.ontologymatching.org/

      The approach I showed from the Ph.D. of Freddy Limpens in the ISICIL project with data from ADEME is also an example of techniques to link tags.
      http://www.slideshare.net/Freddy.Limpens/phd-defense-multipoints-of-view-semantic-enrichment-of-folksonomies

      Delete
  11. Excellent and informative talk, thanks! One of the most interesting parts was about using crowd sourcing to correct semantic maps like CAPCHA. Given the huge task of creating a semantic web, we will need to recruit as many participants as possible. A semantic web will be an unprecedented quantum leap towards human-machine convergence and the prospect of a global brain.

    ReplyDelete
    Replies
    1. The integration of crowd sourcing and semantic web is a very promising direction with a growing community now, for instance the
      CrowdSem workshop http://crowdsem.wordpress.com/

      Delete
    2. This is reminiscent of something that Robert Goldstone mentioned in his talk about labelling galaxies. Human volunteers managed to do it the quickest and with the most accuracy. What's clever is that the user is benefited by their involvement (i.e., refining their search) so it's a win-win for everyone involved.

      Delete
    3. Yes, you are referring to GalaxyZoo:
      http://en.wikipedia.org/wiki/Galaxy_Zoo

      which is an example of Web-sourcing (crowd-sourcing on the Web) and Human-based computing:
      http://en.wikipedia.org/wiki/Human-based_computation

      Delete
  12. Dear Fabien, Thank you for your presentation ! One question : What is the simplest way in the semantic web technologies to formalize epistemic communities, epistemic operator, and knowledge that they are supposed to share? Thank you very much !

    ReplyDelete
    Replies
    1. Several vocabularies exist to describe communities with semantic web formalisms.

      Among the most well-known are:

      FOAF: Friend Of A Friend
      That allows you to describe people, their interest, the the fact they know each others.
      http://xmlns.com/foaf/spec/

      SIOC: Semantically-Interlinked Online Communities
      capture traces of online community sites (forums, blogs, etc.) to describe the information that communities have about their structure and contents
      http://rdfs.org/sioc/spec/

      RELATIONSHIP: A vocabulary for describing relationships between people
      http://vocab.org/relationship/.html

      Dublin Core: to describe the resources of the community
      http://dublincore.org/documents/dces/

      More generally speaking if you are looking for a vocabulary you can use LOV to search with keywords the directory of schemas maintained at OKFN

      For instance for the keyword "community" you obtain:
      http://lov.okfn.org/dataset/lov/search/#s=community

      The site also gives you the different links and versions of the vocabularies. For instance for SIOC:
      http://lov.okfn.org/dataset/lov/details/vocabulary_sioc.html

      Delete
    2. Thank you very much Fabian ! I found also this reference : Grimm, S., & Motik, B. (2005, November). Closed World Reasoning in the Semantic Web through Epistemic Operators. In OWLED.

      Delete
  13. This comment has been removed by the author.

    ReplyDelete