Sunday 8 June 2014


You Can't Hide: Predicting Personal Traits in Social Media




JENNIFER GOLBECK
University of Maryland
Computer Science

VIDEO



Overview: People share a huge amount of personal information online. With over a billion people on social media, this is opening up new abilities for researchers to predict a range of personal attributes that reveal how we live, think, and interact, even as people may try to keep this information private. This presentation will cover the methods and results in this area and argue for the future science and policy these advances demand.

READINGS:
    Golbeck, J. (2013). Analyzing the social web.
    Newnes.
 Golbeck, J., Robles, C., Edmondson, M., & Turner, K. (2011, October). Predicting personality from twitter. In Privacy, security, risk and trust (passat), 2011 ieee third international conference on and 2011 ieee third international conference on social computing (socialcom) (pp. 149-156). IEEE
    Kosinski, M., Stillwell, D., & Graepel, T. (2013). Private traits and attributes are predictable from digital records of human behaviorProceedings of the National Academy of Sciences, 110(15), 5802-5805.
    Golbeck, J., Robles, C., & Turner, K. (2011, May). Predicting personality with social media. In CHI'11 Extended Abstracts on Human Factors in Computing Systems (pp. 253-262). ACM..


18 comments:

  1. Very interesting and informative talk. thanks!
    Do you mean trust in a general sense or context dependent trust (like recommending movies for me) ?

    ReplyDelete
  2. Correlating usage of words to intelligence: are there differences in the range and complexity of vocabulary being used or even sentence length ?

    ReplyDelete
  3. About the "I love being a mom" being a predictor of low IQ: it's actually not surprising from a care theory perspective. Care is invisible work; it's the ultimate work for which school will not prepare you – and IQ tests were first designed to predict school performance. If you add things like devaluation of parenthood in work settings (especially for women), stereotype threat and the feedback loop between stereotype-indiced credibility deficits and development of hermeneutical and cognitive ressources by the individuals, and you get a coarse picture of the system of forces that makes this result somewhat unsurprising.

    ReplyDelete
  4. I found Professor Golbeck’s talk extremely interesting. I am shocked that it has become possible to infer personality traits like intelligence from data seemingly unrelated to those traits (such as liking curly fries on Facebook). I am worried about the kind of digital “paper trail” we might leave my liking things completely unrelated to the information that is inferred about us from those same likes. How can we protect our personal information if this kind of inference is possible, except by avoiding social media?

    ReplyDelete
  5. Thanks for your terrific presentation ! Do you think science blogs (for instance, http://www.nature.com/news/2006/060703/multimedia/50_science_blogs.html), contain informations and attitudes necessarilly more complex to analyze and to predict in comparison to others less serious common social media as Facebook? Which are the most difficult to analyze?

    ReplyDelete
  6. Thank you JENNIFER GOLBECK. It was very interesting your presentation. It is very difficult to protect people in Facebook. Many companies collect information about us by Internet social media (specially by web social). That's true. We like to be published by the Internet and sometimes the cost is too high.

    ReplyDelete
  7. Vraiment très informative comme présentation.
    Je me demandais s'il existe des lois (ou qui vont exister) qui limite l'utilisation de données FB ou Twitter pour des sujets tel que l'emploi, les prêts de banque, etc.

    ReplyDelete
    Replies
    1. pour avoir des lois il faut que les utilisateurs porte plainte sur cette utilisation.
      en plus , sur le web les choses évoluent plus rapidement que les lois et les juris-prudence.

      Delete
  8. I'll post that question Prof. Golbeck threw so other people can contribute: in terms of law, should we own our data? Is it the/a way to protect users?

    ReplyDelete
  9. La personnalité des gens traverse les mots, mais on peut se poser la question de savoir si derrière un écran certaines personnes ne se construisent pas une nouvelle personnalité, parfois même identité. Les notions d’espace et de temps étant différentes avec les médias sociaux tels que twitter ou facebook, il est facile de fabriquer une nouvelle identité, de choisir et de contrôler l’image que l’on souhaite refléter. Jusqu’à quel point peut-on juger l’honnêteté des gens utilisant les médias sociaux?

    ReplyDelete
  10. The effect size is from the facebook study is neglibible (0.001). Here is the text from the PNAS paper:

    "For example, the well-documented connection between emotions and physical well-being suggests the importance of these findings for public health. Online messages influence our experience of emotions, which may affect a variety of offline behaviors. And after all, an effect size of d = 0.001 at Facebook’s scale is not negligible: In early 2013, this would have corresponded to hundreds of thousands of emotion expressions in status updates per day."

    ReplyDelete
    Replies
    1. Unless I am mistake, an effect size with a Cohen's d that small is described as having "suggested low practical significance". While, certainly ethically dicey, the effect of the study, at least with the given manipulation, does not keep me up at night.

      Delete
  11. Thanks for the great talk. I have a few things to questions and comments:

    1) I’m curious if privacy settings and amount of information posted (e.g., including email, city, or favorite movies on your facebook profile) could be used as variables alongside the text analysis to deduce the big 5 personality traits.

    2) Has the big 5 questionnaire been updated since that web and social media? Perhaps attributes such as extroversion vs. introversion are difficult to predict using facebook and twitter information because people may express themselves differently online than in person.

    3) Concerning deducing political orientation from tweets: Is there any research asking whether the people following republicans choose to do so because of their own political beliefs—or, are people’s political opinions developed based on who they are following?
    4) Beyond the story of the Quebecoise woman losing her IBM benefits, over 100 firefighters were deemed to have fraudulently claimed traumatic stress disorders after 9/11 and claimed $24 million in compensation. Social media was highly involved in uncovering this case.

    ReplyDelete
  12. 1) I thought that the curly fries example was a good illustration of how machine learning algorithms aren't enough to give you the right insights all on their own, but how you also need to dig a bit into why a certain correlation exists, as Jennifer did with her theory of homopholy. This is in agreement with one of the earlier speakers who mentioned that data mining problems shouldn't just be exported to experts of data mining alone, but that it's also important to have in-depth knowledge in the relevant field.

    2) I wonder how issues of privacy on the internet will shape the extent to which a "global mind" is able to develop on the web.

    ReplyDelete
  13. There was a question about how to enhance privacy. Jennifer Golbeck's response was along the lines of using tools to prevent tracking, and also mentioned someone who used an automated system to make Amazon purchases in order to avoid the accuracy of the recommendation system. This is a principle that is generalizable to any domain wherein information about one's self is available: randomization provides protection against predictability. For instance, in order to avoid being properly classified by your Facebook likes, like pages at random. Injecting randomness into a process should ultimately be done by tools instead of the user, and some such tools exist or are in progress. For example, Anonymouth is a tool being made to anonymize the stylometry of your online writing (consider that even with no tracking of any sort, you can be identified by the statistics of your writing). Also, just a note - it seems that adding randomization invariably comes at a cost - for random Amazon purchases, it's money; for randomizing the times of your sent emails via remailers, your emails might be delayed; for randomizing your power consumption, you might take in an excess; for randomizing your physical routes, you miss taking the shortest path; for Facebook likes, your friends have no reliable measure of what you really like; etc...

    It was noted that the creepy sweaty-guy video was perhaps not representative of the real threat to individuals without privacy. Indeed, the real power of this sort of data, for good or for bad, is vastly underestimated by most people. And, it calls for an emphasis on criticality, in an age where one must be able to discern the multitude of structuring influences (of bias, of deception, of filtering, of conflicts of interest, of the misleading, of feedback loops, etc.) in a self-organizing global brain, where we are embedded.

    ReplyDelete
  14. You mentioned many stories of people losing their insurance benefits and even jobs over Facebook posts. I like Facebook as much as the next person, but I would much rather keep my job and insurance rates; since companies are taking extreme actions based on people's sometimes-innocuous social media posts, do you think this will lead to a rise in people leaving Facebook?

    Or do you think habits such as your own (deleting posts after a short time) will become more likely? I think if one's "online identity" is a continuum, a person who deletes all her posts is negligibly more "present" than someone without a Facebook at all. It is interesting to ponder the implications of limiting online identity. I think we are still in "oversharing" mode and I wonder what will be the tipping point.

    ReplyDelete
  15. I find it enticing to think about customizing interfaces to suite people's personalities. We could get to a point where our computers/the web is customizing its GUI based on our moods. I think questions of interfaces is extremely relevant for the evolution of the global brain. Katy Borner's video illustrates the point about our relationship to technology. That is why I find the marketing angle somewhat disturbing, because at base, they are using my data in an attempt to make me more vulnerable to their sales pitch. If not introducing randomness, I am at least an advocate for ad blocking (despite that this is how the online service is being funded).

    ReplyDelete