Sunday 8 June 2014


Visual Tools for Interacting with Large Networks

Mcgill University
School of Information Studies


VIDEO

     OVERVIEW: Useful real-work networks tend to be large and complex, which makes them difficult to browse and navigate by humans. Visual interfaces can mitigate this problem but these tools inevitably suffer from scalability issues, which have led to the development of various clutter reduction techniques such as sampling and filtering. We present and discuss ongoing work concerning visual tools for information exploration and retrieval using large semantic ontology networks (e.g., Library of Congress Subject Headings, Medical Subject Headings, personal information folder structures), which aim to help searchers describe and recognize the information they seek, and discover previously unknown and valuable topics.

READINGS:
    Ellis, G., & Dix, A. (2007). A taxonomy of clutter reduction for information visualisation. IEEE Transactions on Visualization and Computer Graphics, 13, 1216-1223. 
    Gruber, T. (2008). Ontology. In Liu, Ling; Özsu, M. Tamer. Encyclopedia of Database Systems. Springer-Verlag.
    Katifori, A., Halatsis, C., Lepouras, G., Vassilakis, C., & Giannopoulou, E. (2007). Ontology visualiazation methods - a survey. ACM Computing Surveys, 39(4, article 10), 1-43.
von Landesberger, T., Kuijper, A., Schreck, T., Kohlhammer, J., van Wijk, J. J., Fekete, J. D., & Fellner, D. W. (2011). Visual Analysis of Large Graphs: State-of-the-Art and Future Research Challenges. Computer Graphics Forum, 30(6), 1719-1749. 

28 comments:

  1. I find the idea of pruning semantic hierarchies very useful. I question for Professor Julien is if his research has found an optimal level of detail or “sweet spot” to maximize ease of use for searching and browsing.

    ReplyDelete
    Replies
    1. Good question. We looked into some of the PIM litterature for an idea. Nothing clear enough to use in the prototype for now. We resorted to letting the user choose to prune and grow as needed. Would be nice to know what an optimal tree looks like for any given user/task context. Perhaps someone else knows more.

      Delete
    2. Maybe this is the optimal solution, to let the user decide the 'resolution' adequate to her search task! Perhaps, there may be no objective "sweet spot" after all, when we consider the task at hand and the uses of the technology. Or maybe there are many optima. What do you think, Professor Julien?

      Delete
    3. That's my assumption: we don't really ever know what a specific user wants to do and how he/she will go about doing it. Make something they recognize, give new options...reveal them gradually so as not to overwhelm.

      Delete
  2. Is there any chance to play with demonstrated applications? I did not find anything online. I may change my mind about hierarchies after that..

    ReplyDelete
    Replies
    1. No. It's a C# application on a few PCs. Got a small grant to produce a Web version. It's moving forward but programming takes a lot of time. Will let you know when there's an online beta.

      Delete
  3. Dans la présentation, quand on parlait d'ontologies, on parlait d'hiérarchies. D'après ce que j'ai appris, une ontologie n'est pas seulement une hiérarchie. Les taxonomies sont des hiérarchies. Une taxomonie est une «sous-forme» d'ontologie selon moi qui n'a pas toutes les propriétés d'une ontologie.
    Les ontologies sont des hiérarchies avec des relations explicites entre les deux concepts. (Ex. Une mère est une femme qui possède au moins un enfant). Dans une taxonomie, nous pouvons pas réprésenter le «au moins un enfant»).

    Dans ce cas, si nous avons une ontologie dans le sens ci-haut, il me semble qu'on peut perdre beaucoup de sémantique dans le réseau lorsqu'on fait du «filtrage» de concept afin de réduire le réseau. (Ex. on enlève tous les instances de mère, on les déplace dans le concept de femme, et on supprime le concept de femme).

    ReplyDelete
    Replies
    1. Taxonomie, ontology, hierachie, polyhierarchy, arbre, DAG...on me dit aussi que thesaurus serait mieux pour LCSH et MesH. Much to learn.

      "perdre beaucoup de sémantique dans le réseau"

      C'est très possible. Nous payons la reduction de la structure avec une perte sémantique. Je ne sais pas si la sémantique a la même valeur si, par example, le noeud n'offre aucun accès additionel à la collection ou un regroupement. Ca va dépendre du cas task/user/context.

      Delete
  4. I encourage you to continue your search. I think you're going to a good way in order to visualize information another way. Thank you for show me the patterns and structure how we can organize information.
    Interact with information is basically essential. Thanks Professor CHARLES-ANTOINE JULIEN.

    ReplyDelete
  5. I don't know much about this topic, but I was wondering what kinds of hierarchies this approach would be most useful for. Intuitively, I would just think that choosing the topics strategically would eliminate a lot of the complexity of hierarchies. i.e. if there are some topics that only have one topic, it shouldn't be a topic. So I guess my question is what kind of tree structures is this tool most useful for? Is this approach mainly useful when the hierarchy is automatically generated or when the topics are not chosen in a smart way?

    ReplyDelete
    Replies
    1. I don't really think we know a priori the specifics of what is a "smart way" to draw any kind of hierarchy that can have very different shapes, or graphs in general. We might be able to draw a good starting point but we have to expect the user will eventually need to manipulate the set in various ways. This is a data carving or tree pruning tool. It carves to reveal the areas that provide the most access to the collection. You can always uncarve. Might not be useful for some collection/structure combinations. LCSH and MeSH certainly yielded dramatic size (and BC) reduction with little or no loss of collection access. I suspect PIM folders might be similar. We can do a lot of work using those datasets but there a plenty of other organized collections.

      We haven't tested automatic hierarchies yet. I'd be curious to try...a few. Compare BC scores and user tests. So much data, so little time.

      Delete
  6. There's a bet here—that people will begin to use trees to find data for other data than their own if it's well done. I wouldn't make that bet. But I have a feeling that if it did, it would be because there would be some specific interest in browsing an discovering the tree independent from the motivation of finding documents.

    ReplyDelete
    Replies
    1. Sure. It's certainly might be useful in that case. I think it should be the user's call.

      Delete
  7. Étant donné la quantité d’information accessible aujourd’hui, je trouve la proposition de Charles-Antoine Julien très pratique et utile. Laisser l’utilisateur décider me semble essentiel, afin de ne pas se perdre parmi la quantité d’information. Toutefois, je pense qu’il serait intéressant de prendre également en considération la proposition de Francis Heylighen qui proposait un trie par probabilité.

    ReplyDelete
    Replies
    1. Un score composite ou index dynamique pourrait inclure plusieurs facteurs pour déterminer l'importance l'importance d'un noeud pour le réseau. Le problème pour moi c'est que l'utilisateur contrôle moins les facteurs individuellement si on les combine a priori. A la limite on pourrait avoir un slider additionnel pour le tri par probabilté.

      En suite il y a le DOI. J'ai lu le papier de Furnas hier. La valeur baisse selon la distance du noeud actuel. Ça tient bien dans l'analogie du monde physique qu'il utilise; je suis pas certain que c'est vrai dans un monde sémantique en ligne où les coût de transport et déplacements son apparemment beaucoup plus bas. Un slider de plus. Y a risque de surcharge. Pourrait y avoir un index combiné pour les débutants, et un interface avancé pour controler les facteurs individuels.

      Delete
  8. L’utilisateur est obligé d’ajuster la visibilité de son réseau a chaque utilisation et ceci est plus difficile si l’utilisateur ne sais pas où se trouve le mot recherché !!!

    ReplyDelete
    Replies
    1. La dernière position de l'arbre est présentée. Y a pas de raison de pas donné la possibilité d'ajouter des bookmarks/landmarks.

      L'intégration avec la recherche c'est le problème actuel. C'est pas clair exactement ce que la boite de recherche devrait faire avec l'arbre. Y a des options.

      Delete
  9. The user is obliged to adjust the visibility of its network each use and this is more difficult if the user does not know where word is !

    ReplyDelete
  10. L’idée de réduire la taille du réseau a la demande de l’utilisateur est intéressante, toute fois je trouve que l’idée de vouloir balancer l’arborescence par la formule de BC (browser Comlexity) ne va pas répondre aux besoins et aux habitudes de l’utilisateur dans sa recherche (si nous réduisant des parties qui contiennent le terme recherché) !!

    ReplyDelete
    Replies
    1. Le BC c'est point de départ pour comparer les arborescenses et leurs versions simplifiée. Faudrait que je retrouve mais il me semblait que l'on a des indications que des arbres balancés sont plus facilent à naviguer. La profondeur nous semblait évidente. Il pourrait y avoir la redondance pour les polyhierachies (noeuds avec plusieurs parents). Faudrait définitivement faire des tests utilisateurs pour vérifier. Faudrait probablement que ce soit longitudinale pour vraiment savoir.

      En gros le BC facilite les décisions de navigation (smaller more predictable number of narrowe term options), et réduit le nombre de décisions de navigation. Je pense que ça se cadre dans un Info Foraging Enviroment.

      Delete
  11. The idea of reducing the network size (at the user’s request) is quite interesting, but I find the idea to balance the tree by the formula BC (Complexity browser) probably not going to meet the user needs and habits (if we reduce the network’s parts that contain the term that we are looking for)!

    ReplyDelete
  12. Dr. Julien's network trees are impressive and I imagine they would be very useful if combined with a search function. However, is the time and energy required to make an ideal network tree worth the potential benefit. Dr. Börner stated that the volume of online documents doubles every 1.5 years. Organizing this massive influx of documents into a network tree may require much more time and work than writing a super-effective search algorithm for a flat database.

    ReplyDelete
    Replies
    1. That's always a good question. I think the question your asking is if we should continue to even try to organize some information collections? I suspect some may be worth it. Government documents seems like a good candidate. I don't think it necessary to organize all documents. From my perspective, my LCSH and MeSH collections are real-world data sets that say something about how we structure and access what we think we know. Still thinking about how exactly search should interact with the structure.

      Delete
  13. Dear Charles-Antoine, Thank you very much for sharing your methods ! I have one question : How it is possible/easy to extract a dynamic visualization of subsumption between concepts in ontologies in the context of text mining for ontology acquisition? Could you recommend me tools and readings?

    ReplyDelete
    Replies
    1. Couldn't say, sorry. You're looking at automatic data or text mining for topic acquisition. I come in once you have a large semantic network with instances...how this was built is not my current concern.

      Delete
  14. Thanks for the talk, kind of a late response now that I reviewed the video. Actually compressing the tree structure of the ontology can be subjected to machine learning.
    Every single trajectory on the tree has a certain frequency of being traversed. It seems that the way a certain community of researchers will normally access the tree can be modeled using Bayesian methods. Researchers can be clustered into communities using keywords of fields of interest and the optimal compression of the tree can be inferred by the Bayesian model of accessing and traversing. For really big trees, letting the user to choose in only half the solution the system needs to help him to orient into the the level of detailing that with the highest probability for him to find it useful. Simply put, sampling the habits of people with certain fields of interest can yield valuable information about how one would like to see the tree.

    ReplyDelete
    Replies
    1. I guess you're assuming we have a large log of past tree browsing behaviors to model future behavior (i.e., certain frequency of being traversed). I don't have that for any of the trees I've been given...and my issue was initially that no one was traversing the tree...thus the need for an easier interface to browse the tree.

      We've done an experiment with a Random Walk approach based on the document accessibility. Gave different results but it's hard to know which is best.

      Delete
  15. I thought your pruning technique was a very elegant solution to a complex problem. The question of developing ontologies that are intuitive to navigate is an exceptionally difficult one, given that different people will find different assignments of instances intuitive. In terms of integrating search, might I suggest that you give the option to start with keywords that limit the number of nodes on display, after which point you give the option of pruning down the hierarchy? This would maintain the potential benefits ("learning something along the way") of navigating the hierarchy.

    ReplyDelete