Sex, Evolution, and User Ratings
Jun 15, 2009 By Joshua Allen
A few days ago during a design discussion, our team was forced to grapple with that age-old community-design question: 5-star user ratings for content, or a "likes" feature? When allowing users to provide feedback on your content, is it best to use a range of values, or give a "thumbs-up" (and maybe a thumbs-down)?
User ratings help foster participation and engagement, and help the best content filter to the top. But, as Derek Powazek has been saying since at least 2003 in his talks about community, whenever you create a user feedback system, you create a game. And people will game the system.
The typical community designer, when faced with the fact that people will game feedback systems, focuses on incentives. What are the incentives for people to provide feedback in the first place, and how is that feedback used within the system? What incentives can you give in order to get better quality feedback? How do you "reward" people with "social capital" or "whuffie"? Can you create a feedback loop that shows hot items rising to the top, to encourage people to "vote"? These approaches have their merits, but I think that this is the wrong way to look at the problem, and just creates more trouble down the road.
As my teammate Nishant observed during our discussions, you don't need to bribe people to get them to express approval or disapproval of one another. People love to judge one another, even if it comes at a high personal cost. This is one of the most enduring and unique aspects of human nature. The real challenge for a community designer is to align the rating system with this definitive human characteristic, and to avoid any incentives which distort people's inborn desire to provide feedback.
The best book I have found on this topic is "Comeuppance: Costly Signaling, Altruistic Punishment, and Other Biological Components of Fiction", by William Flesch on Harvard University Press. It masquerades as a book about evolutionary biology and literature, but it's really about community design. The author starts by enumerating a set of extremely powerful and innate urges that motivate all humans:
- People love to reward those who do good, even at high personal cost. This is called "strong reciprocation"
- People love to punish those who do bad, even at high personal cost. This is called "altruistic punishment" (altruistic, because it is not done out of individual self-interest)
- The reward and punishment needs to be observed and understood by others. It cannot happen in a vacuum.
- This impulse extends through multiple levels. We watch how people punish and reward one another, and are motivated to respond. We are motivated to reward the "altruistic punishers" and "strong reciprocators", even at high personal cost. And we are motivated to punish those who do not reciprocate or punish as they should.
- Over a lifetime, we form opinions about people based on a *history* of how they have responded to such situations. Reputation is a narrative that we hold regarding someone's history of strong reciprocity and altruistic punishment.
These facts about human nature are self-evident, and therefore don't really need an explanation, since they are true. However, the author uses evolutionary biology to provide one explanation for why this is true.
In species where the creatures "choose" or "select" their mates, sexual selection is often driven by something called "costly signaling". A costly signal is an evolutionary characteristic that ironically works *against* the creature's self-interest, and makes him less likely to survive. The classic example is the peacock's feather display, which makes him more vulnerable to predators and diseases, but is irresistible to the female. By choosing the peacock with the grandest feathers, the peahen is choosing a mate who is *so* genetically superior that he can survive *despite* the impediment of an impractical pile of tail feathers. Stay with me, because this is where things get really interesting for community design:
- Costly signals must be easily observable and quickly identifiable. The peahen needs to be able to judge the peacock's mating potential by watching him.
- Costly signals must be very difficult to fake. Otherwise, you end up with mimicry, and everyone wearing the same designer jeans and pheromone-laced perfume. The only way to know that a signal is honest, and not faked, is if that signal is costly.
- Costly signals, and the responses to them, are not conscious or calculated. They are closely linked to, and as primal as, the sexual reproduction instinct.
In this framework of evolutionary biology, altruistic punishment and strong reciprocation are the human equivalents of the peacock's feathers. They are driven by primal emotional impulses that often act against our own individual self-interest, are acted out publicly to be observed by potential mates, and are extraordinarily expensive to maintain biologically. The human urge to rate content on message boards is as closely related to our survival instinct as is a peacock's urge to fan his feathers and strut.
So, what does this say about community design? Here's my take:
- Make your rating system as natural and visceral as possible. Nobody watches a mugger on the street, and remarks to his friends, "On a scale of 1-5, I rate that mugger a NEGATIVE 10, bro!". Stick with "likes".
- Don't put too much emphasis on aggregating, tabulating, or averaging people's feedback. Voting is too abstract and disconnected from the way human nature works. You can quietly mine the community's behavior for your own purposes, but don't put it in people's faces.
- Encourage content that is personal and judgeable. When people have a chance to judge another human being, they will. So make sure that they see the face and personality behind the content, and avoid being too passive.
- Make sure that the act of rating is public and personal. Ideally, the rating will be attached to the reputation of both the person who wrote the content, and the person who did the rating. You should be able to look at a person's profile and see who he rewarded (and punished). You should be able to look at an author's profile and see who rewarded them or punished them.
- Don't worry if it takes some effort. People often think of "likes" as being superior simply because people get instant gratification. I even wrote a whitepaper arguing this point back in 2000. But I think this is probably wrong. In fact, you could probably add a CAPTCHA or similar "expensive" step to your rating system and get better engagement. Costly signals are supposed to be costly, and people are already highly motivated to judge one another. It's not as if you're trying to entice them to do something unnatural, so don't worry if they need to work for it.
So, what do you think? Do you prefer "likes", a 5-star rating, or something entirely different? You can reply in the comments section here, or let me know via twitter.



Follow the Conversation
7 Comments so far. You should leave one, too.
That explains why young people and singles do so much of it. Do people stop commenting on message boards after they stop having kids?
Like.
:-P
But seriously – I prefer the “Likes / Dislikes” system. I think it comes back to being able to instantly see my vote add something to the pool. With a 5-star “average” – whatever my vote, I’m not going to get that same level of feedback, even though ultimately the vote is worth the same, it doesn’t appear to be
I can see a future where we’ll have universal feedback systems that are contributors to feeds from Facebook, Windows Live and MySpace to draw visitors to the content publishers. These feedback systems will be intelligent and will understand the preference of the given user. For instance, do they prefer a like/dislike metaphor or more granular appraisals with a 5 star rating? This preference may even change depending on what type of content is being considered.
I prefer the simplicity of the likes/dislikes system. For me, the stars rating system can become convoluted. Netflix movie ratings uses stars and their description for 3-stars is “Liked It”, however, for me 3-stars is more “It was ok” so I end up giving 2-stars or sticking with the 3 – depends on if it’s a good day (of course, I now feel my ratings are inconsistent).
@Grant – funny!
@Liam, @James – thanks for the comments; good points.
@Lisa – we did discuss Netflix, since more granularity assists with better recommendations when you are providing a high degree of personalization (one advantage of 5-star, although StumbleUpon does just fine without it). ButI hadn’t even thought of the drawback you point out. The lack of clear alignment with mental categories like “OK”, “so-so”, etc. means that people will second-guess themselves and not be as sure. Thanks for sharing that insight.
I agree with the fact that people like to judge other people, and this many times changes the behavior also…so as to fit the most accepted characteristics. This accounts to the evolutionary path of that species or even for that matter a theory. Feedbacks shape the future. I think Heisenberg uncertainty principle holds true here also and we cannot have an absolute judgement from the feedback systems…but yes they help in evolution to a better tomorrow. I personally favor the like/dislike system, as it is simple and the author gets a holistic view of his/her work as percieved by others.
Let’s look at it from a different perspective -
Because of our cognitive abilities (and biases), we humans are rather ‘infovores’. We have a natural tendency to ask is it worth my time and efforts? We also tend to filter out things that fail to provide us both value/utility and fun.
Coming back to the user ratings, if I am abt to try a new stuff I would quickly want find out more info abt it. Rating is one such information and a powerful one. Ratings instantly tell me – Yes or No. Search engines too rate the content relatively and list the better ones at top.
But when it comes to ‘rate stuff’, I would only make efforts as long as I see value or fun in it. Yes ‘Costly signals’ do reflect honest opinions but the required efforts should also be aptly compensated.
Search engines take so much pain cuz they make truck loads of money.
When it comes to the rating system, it depends on the number of items that can be rated. That is competitiveness of the environment.
In lesser competition an absolute rating system would work well and in a highly competitive environment designer should chose relative grading of stuff.
Abt ‘likes’ and ‘5-stars’, I must emphasize that rating is still a piece of information and most people would interpret it differently, comparing it with the original ‘query’.