notes


Finding Others Online: Reputation Systems for Social Online Spaces

PDF

Abstract

In this paper, we examine what types of reputation information users find valuable when selecting someone to interact with in online environments. In an online experiment, we asked users to imagine that they were looking for a partner for a social chat. We found that similarity to the user and ratings from the user's friends were the most valuable pieces of reputation information when selecting chat partners. The context in which reputations were used (social chat, game or newsgroup) affected the self-reported utility of the pieces of reputation information.

Review in Short

To come.

Selection

I initially picked up this paper because, similar to Chen and Bial's paper, its title matched the pattern of old work partially linked to my research. A closer look hints that the focus is more on service providers helping their members find the right people to connect to, so it might have more application to online dating than investigative data mining. Nonetheless, it may hold useful insights, and it's in my folder of things which need reviewing.

Comment by Section

Introduction

So I quickly come to understand (though this understanding is later shown to be false) that the focus of this paper is on making sense of an existing set of systems for calculating user reputation, rather than necessarily proposing an improved measure. A categorisation of reputation systems is to be followed by a comparative evaluation with users of online community systems, the intent seemingly to draw out explanations about how the technology is used and understood.

The paper then makes a rather weak challenge of moderation as an approach to dealing with undesirable behaviour, offering reputation systems as the answer. This seems like a weak argument, and a far more potent one was lampshaded right at the start of the introduction -- that moderation is only really good for removing bad behaviour, while reputation systems can also encourage good behaviour.

Background

The authors divide reputation systems into ranking systems, rating systems and collaborative filtering systems. They comment that ranking systems which compare users on some common metric are most suited for goal-oriented communities like those centred around gaming. An odd sentence caught my eye:

These reputation systems typically only provide information about what kind of pattern users follow, and reveal little or no personally relevant information. Indeed, the extent to which these systems report reputation as a function of behavior is arguable.

Surely those patterns are personally relevant? I'm not sure how your high-score at a game doesn't carry reputation in a gaming community, but it at least carries information about how you behave with regards to playing the game, just like comment statistics might give information about how you behave in a discussion forum.

Moving on, the authors explain that rating systems are those where users explicitly rate each other, and an aggregate of ratings is provided. They distinguish rating systems from collaborative filtering systems in that rating systems assume that the population is homogeneous -- that one rating aggregate will suffice for everyone -- whereas in a collaborative filtering system ratings are adjusted so that ratings from people you tend to agree with are weighted higher.

The authors go on to introduce two new categories of reputation system. Implicit peer-based reputation systems generate ratings from the activities of a user's 'friends', a term they air-quote but do not elaborate on, perhaps hinting at something in general suspicious about friendships. Explicit peer-based systems rely on the ratings given by a user's friends. In practice these both seem like classes of collaborative filtering technology rather than something distinct, the only possible distinction I can see being that 'friends' are different to the general userbase of a site (but then the question becomes whether or not people who you share interests with and don't know should be your friends, which surely is what we're looking for here).

Despite finding this a reasonable overview of the type of reputation systems possible, I was disappointed to see that the authors didn't pursue actual classification of specific existing reputation systems -- a categorisation scheme which works only in the abstract seems far too easy to conjure up.

Research Goals

This odd pre-Methods section describes at a high level the experimental setup, whereby subjects in an online experiment were:

  1. Asked to select one person to interact with out of a pair, using only the reputation information associated with the potential partners. 2 of 5 systems were used for each partner.
  2. Asked to evaluate how likely they were to interact with a single person, using reputation. All 5 systems were available.
  3. Asked about the importance of different types of information.

The authors include hypotheses about which systems they expect users to use in various situations, and though their phrasings seem a little vague, it boils down to:

  1. For social chat, peer-based recommendations will be preferred. (Experiments 1 and 2 should favour one of the two peer-based systems).
  2. Context affects the reputation system, with social-type tasks being peer-based. (Experiment 3 will show that this system isn't considered universally good).

Methods

The methods section provides implementation details on the above methodology, referencing the use of an online testbed website which now appears to have been taken over and forgotten by some sort of advertising agency (not surprising, given the 12-year gap). Both the first and second questions are given the context of an online chat system.

The first question proceeds much as planned, with 377 people being asked 30 times which of two individuals they would pick, where each individual had a higher score in one of the two reputation systems and a lower one in the other.

The second question is rather interesting. The 315 subjects are asked 10 times to rank how likely they are to interact with a person, where all five reputation scores are available but must be clicked on to view. The researchers are not interested in the ranking so much as which reputation measures users look at before deciding. This kind of cunning is what would make me paranoid if I was ever in one of these studies.

The final question asked the participants to consciously rate the importance of the various reputation systems. Interestingly, there were more people completing this than the second question (I'd guess because they found the interface annoying on that one).

Results

For the first question, the data on participant choices was submitted to a logistic regression analysis, where weights for the predictive capability of each reputation system were produced, controlling for the actual reputation values expressed in each system. The table below shows their output.

Rep System Contribution
Collaborative Filtering 30.3%
Explicit peer-based 22.6%
Rating 21.6%
Implicit peer-based 15.2%
Ranking 10.3%

If I understand their explanation correctly, then each figure is a representation of the predictive power of that reputation system.

In the analysis for the second question, it was revealed that most people seemed to look at all five reputation scores before making their decision (I extrapolate from a reported mean of 4.65, though there is a high SD of 1.64).

The authors dig into their data and find that users look up more reputation scores earlier in their attempts at the questions, and so they focus more on the later questions, where users are familiar with the task. They again find that collaborative filtering and explicit peer-based rating are considered most useful (that is, those options are more commonly viewed, and more commonly viewed first).

The final questionnaire reveals that users consider collaborative filtering and explicit peer-based rating systems to be most valuable across all contexts. Pairwise comparisons showed that these two measures tended not to have distinct mean ratings, only differing significantly in the social chat context, where explicit peer-based outperformed collaborative filtering.

Discussion

The opening paragraph of the discussion states that the paper explores 'what types of information people find valuable'. While this is a reasonable leap, I think this is not necessarily what they did explore. None of their tests actually implement the reputation systems, the participants merely choose between the labels they consider or imagine will be most useful. There are confounding factors here -- take a look at the names and systems the authors used to present reputation systems to their participants

System Name in experiments Description in experiments
Ranking Rank in community How much community members like interacting with this person, on average
Rating Rated by community members How long and how much this person has participated in the community
Collaborative Filtering Has similar interests to me How well your interests and activities match up with those of this person.
Implicit Peer-based Interacts with my friends How often this person interacts with one of more of your friends
Explicit Peer-based Rating given by my friends How much your friends like interacting with this person, on average

The authors were very upfront about these descriptions, but never discussed their potential impact, which I think is a serious drawback to their analysis. The description for rating systems, for example, seems almost contradictory to the name, lending a degree of uncertainty to the definition which could well explain its low ranking (and something similar could be said for the ranking system). Also, having implicit peer-based and explicit peer-based systems described as they are ensures that explicit peer-based system would be favoured, as the difference in the two descriptions reminds users that their friends might not like interacting with someone. While this is unavoidable to some degree, a shift in terminology could well have helped.

Continuing the discussion, the authors stress that their two peer-based reputation systems performed well, and while they acknowledge that similarity (collaborative filtering) 'also' performed well across the board, they do so in the same breath as they link similarity to friendship, revealing an obvious bias which has been bubbling under the surface of this paper -- the researchers want their invented classes of reputation system to do well.

The authors predicted that the peer-based systems would be viewed favourably in social chat contexts, like the contexts of their first two experiments. In both those experiments, users' preferences demonstrate that both peer-based systems are inferior to collaborative filtering. In the first experiment, the similarity of a potential partner's interests to the participants was considered a more decisive factor than the partner's rating by a participant's friends. In the second experiment, similarity of interests was viewed significantly more often, and was significantly more often selected first by participants looking for information to base a rating on. The authors do not attempt to hide these results, but neither do they explicitly acknowledge the rejection of their earlier hypothesis. There is nothing shameful about negative results, and a lack of reflection on what this might mean again weakens their analysis.

The paper moves on to a discussion of whether users will properly understand social networks and whether they understand the potential privacy issues, revealing a miniature lab examination where seven participants navigate a paper-prototype gaming site with social network feedback. They found that the users understood the network information, but were not concerned about the privacy issue, though they all declined to fill in a personal profile. They are understandably cautious about generalising the results of this preliminary test.

Conclusions

A tiny conclusions section fails to record their conclusions and instead calls for further work in this area generally.

Impact

This paper seems quite popular -- Google Scholar tracks a whopping 107 citations.

2 of that number do not actually reference this paper.

13 patent, usage unclear.

  1. Through a glass darkly: Information technology design, identity verification, and knowledge contribution in online communities: cites this paper in support of "Effective identity communication can help community members find similar others with whom to build relationship", which seems to be correctly identifying the good performance of collaborative filtering in this paper's experiments.
  2. Investigating interactions of trust and interest similarity: comments briefly and critically on the authors' assumption that similarity of interest and friendship are linked.
  3. Recommending twitter users to follow using content and collaborative filtering approaches: cites this paper, along with others, in support of "It is no surprise then that the recent literature includes a number of interesting analysis [sic] of Twitter's real-time data, largely with a view to developing an early understanding of why and how people are using services like Twitter" which as a citation is a little bewildering, as this paper makes no reference to Twitter and indeed would be unlikely to, as Twitter was only created four years after its publication. I can only assume some sort of bibliographical error on the part of the authors of that paper.
  4. History of emergence of online communities: cites this paper as an example of research into the development of friendships in online communities, which seems to fail to capture the key topic, but is not incorrect.
  5. Do social networks improve e-commerce?: a study on social marketplaces: gives this paper as an example of social networks improving the reliability of reputation systems, which certainly seems in keeping with the authors' intended thrust, but is incorrect, as this paper only shows that users say they prefer social network reputation information in a specific context, says nothing about reliability, and also demonstrates that collaborative filtering information is more used for making decisions.
  6. Semantic web recommender systems: uses the class of explicit peer-based systems as an example of transplanting recommender systems into decentralised scenarios, considering them to have proposed this approach (which it could be argued that they did, though it seems to me that the authors simply described a class of possible systems), making no reference to this paper's implicit alternative, the omission being for no clear reason. This paper does not seem to be particularly concerned with decentralisation, so the citation is somewhat misleading.
  7. Towards decentralized recommender systems: First repeats the citation context from #2 exactly, then cites it again as an example of cross-fertilisation of HCI from the behavioural sciences, which might well be considered true, though the lead author is from a computing college and the other two are from Microsoft Research.
  8. Playability heuristics for mobile multi-player games: cites this paper first in support of "players who become targets of anti-social behavior may quit playing the game", which the authors of this paper do say, in passing, in their introduction. The next citation places this paper with others as evidence of a research trend that suggests games should be designed to minimise deviant behaviour, which seems a somewhat warped reading of the initial suggestion that reputation systems are ways to prevent deviant behaviour. Finally, this paper is cited in support of "In the more complex multi-player games, the friends in the game are usually a major reason for the players to keep playing the game" a statement which is not obvious anywhere in this paper, and like both the other citations does not reflect the study's focus or results.
  9. An Entropy-Based Approach to Protecting Rating Systems from Unfair Testimonies: Inaccessible.
  10. Improving Wikipedia's accuracy: Is edit age a solution?: Inaccessible.
  11. The state-of-the-art in personalized recommender systems for social networking: cites this paper as evidence of interest similarity being a predictor of interpersonal trust, which might be considered accurate, as participants chose to interact with someone based on interest similarity, which hints at some form of trust. It seems to me, though, that the trust demonstrated is in the reputation system, and this paper did not evaluate whether partners selected through reputation systems were in fact trusted.
  12. The design of a reliable reputation system: repeats word-for-word the citation context in #5.
  13. A system dynamics approach to study virtual communities: cites this paper as an example of prior researchers applying social psychology theories, which seems to agree with #7's second usage.
  14. How much do you tell?: information disclosure behaviour indifferent types of online communities: cites this paper as an example of an attempt to "encourage skeptical users to trust the network", which I cannot view as at all accurate, as there was never any such effort in this paper.
  15. Personality traits, usage patterns and information disclosure in online communities: cites this paper as targeting the increase of user trust to counter privacy and trust issues in online communities, presumably meaning increasing interpersonal trust via reputation systems. This citation is recognisable, but not entirely reflective of the content of this paper.
  16. Understanding web credibility: a synthesis of the research literature: Incompletely accessible.
  17. Why do members contribute knowledge to online communities?: Seems to draw on this paper as suggesting that (perceived) identity verification helps with relationship-building, referring to the perceived usefulness of similarity measures to this paper's study participants.
  18. Understanding continuance intention of knowledge creation using extended expectation–confirmation theory: an empirical study of Taiwan and China online communities [Paywalled]: uses the exact same citation as #1.
  19. Evaluation of computer tools for idea generation and team formation in project-based learning [Paywalled]: cites this paper in evidence of social networking technologies being used to build connections between participants, which aside from being almost tautological seems to refer to this paper's suggestion of social reputation systems.
  20. Using digital socialization to support geographically dispersed AEC project teams: cites this paper first in a table, as a digital implementation of reputation mechanisms, which is not strictly accurate as none of the classes of reputation systems referred to in this paper were implemented. Another citation is placed in support of "Research also suggests that the implementation of explicit reputation mechanisms, similar to those found in online auction websites, works as an incentive for people to spend time building knowledge repositories and retrieving knowledge" which is certainly not an accurate reflection of the research in this paper, and is at best a tortured reading of parts of the introduction.
  21. The experience of 'bad'behavior in online social spaces: A survey of online users: references this paper as a general resource for learning about reputation systems, a task for which it seems acceptable but not excellent.
  22. Examining identity and organizational citizenship behaviour in computer-mediated communication [Paywalled]: cites this paper to say "Identity communication can help community members find others similar to themselves with whom to build relationships", continuing a string a papers which reference the good perceived usefulness of similarity-based reputation measures.
  23. An Identity-Based Theory of Information Technology Design for Sustaining Virtual Communities: Inaccessible.
  24. Online Reputation Systems in Web 2.0 Era [Paywalled]: includes this paper in its reference list, but does not seem to refer to it directly in its text.
  25. Personality matters: incorporating detailed user attributes and preferences into the matchmaking process: interprets this paper as having "found that users are more likely to rely on information about similarity to other players than on reputation systems.", which is correct. They reference this paper twice more, once for an example of reputation systems in gaming platforms and once in support of the idea that similarity is a good predictor for partner choice, which is true to this paper's results in a way which similar-sounding citations have not been, suggesting that a whispers game has been afoot.
  26. Credibility of online reviews and initial trust: The roles of reviewer's identity and review valence: cites this paper in support of "Knowing the identity of the information source helps individuals find people who have much in common with themselves" which somewhat begs the question. This paper finds that participants make use of similarity information more than other reputation systems, it doesn't evaluate how well particular similarity systems work.
  27. The influence of sociotechnological mechanisms on individual motivation toward knowledge contribution in problem-solving virtual communities: cites this paper in support of the statement "a reputation system transforms a member’s past valuable contribution into a positive image", which the authors of this paper take pains to show is not always the case with their categories of different reputation systems.
  28. Design of E‐Business Web Sites: is in Portuguese, but appears to cite this paper in the context of a list of social networks where reputation can be managed by businesses. Many of these networks were not around when this paper was published, so it seems a poor citation.
  29. IT design for sustaining virtual communities: An identity-based approach: cites this paper in evidence of more accurate online identities helping people identify similar people to connect to, which is a reasonable inference.
  30. Encouraging participation in virtual communities through usability and sociability development: an empirical investigation [Paywalled]: references this paper as support for the types of reputation system that it describes.
  31. Towards Reputation Systems applied to Communities of Practice: uses this paper's categories to describe an 'Epinions' site as being a "Reputatation System based on Social Networks", which seems to be one of the two peer-based options, presumably the explicit one.
  32. An investigation of student-teachers' use of social networks and their perceptions of using technology for teaching and learning: Inaccessible.
  33. Boosting social network connectivity with link revival [Paywalled]: cites this paper twice as an example of existing work focusing on recommending new friends to users, which is accurate.
  34. Computer-facilitated community building for E-learning: mentions this paper as arguing that in a social context peer-based recommendations are important, which is accurate, though they, like the authors of this paper, neglect to mention that such recommendations were not found to be most important. They also replicate the table I gave above showing the descriptions and names given to the reputation systems, mentioning only the examples of reputation system categories.
  35. A methodology to evaluate the usability of digital socialization in ''virtual''engineering design: first makes use of this paper's categorisation of reputation systems, in an entirely faithful manner. It then cites it in the context of Digital IDs helping build a user's reputation based on quality of contribution, which seems a little dubious. Finally, it cites this paper as evidence for "through implementation of reputation mechanisms similar to those found in online auction websites" which is somewhat odd but seems to be referring to a comment by the authors that reputation and recommendation systems are most commonly found in e-commerce websites.
  36. Evaluating software for communities using social affordances [Paywalled]: cites this paper in a list of features for the evaluation for social networking services, under 'ratings and votings'. This would be fine, but the exact citation context, "These reputation systems hold value in certain areas and have been found to facilitate interaction, trust, and limit aversive behavior", is somewhat misleading, as the last item is not studied in this paper, and arguably neither is the second.
  37. Understanding continuance intention of knowledge creation in online communities from a social-psychological perspective [Paywalled]: cites this paper as support for the statement that "the likelihood of communicating and building relationships with each other is increased in people who have similar identity, including similar interests, in similar social groups or with similar experiences", one of the most complete summaries of this paper's results.
  38. Beyond adoption intention: Online communities and member motivation to contribute longitudinally: says "with identity support, individuals can interact more effectively and build strong relationships" which is vaguely but not really what is presented in this paper.
  39. Effect of indirect information on system trust and control allocation¶ [Paywalled]: references this paper's distinction between implicit and explicit recommendations, though appropriating it in a more general sense than used here. It then references this paper as an example of indirect information having been studied in reputation management, which is not incorrect, but very vague.
  40. A Tale of Two Sites: An explorative study of the design and evaluation of social network sites: is a near-exact duplication of the citation context from #36.

Reproducibility


If you have no more direct means of contacting me, and wish to comment on this post, send electronic mail to lancaster.ac.uk for the user m.edwards7