Do we need Metacritic?

No. No we don’t.

It’s more complicated that that, of course. Before today, Metacritic was a good concept gone insane, a website for vengeful gamers to post zero out of ten reviews on games they haven’t played as much as it was an aggregate service for professional reviews. That it carried weight with publishers made us laugh: seriously, publishers, you care a whit about an aggregate of critical acclaim? This makes sense in the world of film, where “prestige” is a major factor: you want to release films that make your studio look good, like its taking chances. There are no prestige games: there are indie games, which court artistic merit, and there are million sellers that make you billions of dollars. Games are too expensive, too difficult to move for a publisher to blow twenty million dollars for “prestige”.

And yet here we are today, with Obsidian Entertainment, developers of Fallout: New Vegas, being forced to lay off people principally because they failed to meet an incentive clause in a contract: an 85 metacritic score for Fallout: New Vegas. They had an 84 (well, an 83 and fraction as of this writing, but let’s not be academic). Cue sad tuba.

This is a game that made a mega-publisher three hundred million in revenue. It sold 5 million copies. And its developer is in financial dire straits because a bunch of reviewers didn’t give it an arbitrarily high number.

It doesn’t take a rocket scientist to see this system is fucked up. Go down the list of reviewers featured in the metacritic score and you’ll see a lot of big names, sure, but you’ll also see small, hobbyist blogs: you’ll see a lot of low scores from these sites but relatively positive quotes (on line with the bigger sites), most likely because they don’t adhere to the ridiculous review score system created by larger publications. You’ll also note the absence of plenty of other major sites like Rock, Paper, Shotgun, which refuse to offer review scores (though, to be fair, RPS’ review was pretty scathing).

You’ll also note 1Up, a very respected publication, gave the game a “B”, which translates to a 75 for some insane reason. Read their review and it’s more in line with the other 85s.

Judging a game based on a mishmash of randomly collected sources makes about as much sense as me asking all the people in a five hundred foot radius whether they like Fallout: New Vegas. There’s no statistical significance.

Furthermore, here’s the elephant that’s popped up in the room: by taking the cover off publisher incentives for high metacritic scores, now reviewers are screwing over people by posting negative review scores. If you are a reviewer for a publication, your 65 review score might leave dozens of people out of a job. You might get a game canceled. In fact, if 1Up’s scores were calculated reasonably and a bunch of small sites used a traditional review score system, then we probably wouldn’t be in this situation. If there is a “reviewer bias”, even more than advertising, it’s this: a bad review will lose people their jobs.

Good god, this whole situation is ludicrous. It stems from video game publishing’s biggest problem: the insane combination of the “prestige” film with the million seller. The “prestige” film, for the unacquainted, goes like this: major film studios fund artistic films, such as this year’s The Artist, not because they appeal to an external audience but instead because they appeal to critics. The thinking is twofold. One: these movies will tell critics that the particular studio is making important cinema instead of just cashing in on the Battleship fad. Two: it could win, or be nominated for, an Oscar, making millions of dollars.

Video games obviously don’t work this way. Big budget game award shows are more about teabagging and reveals than they are about critical victories; in fact, there’s not a respected critical brain trust about games like there is for films. We don’t have someone like Roger Ebert, whose opinion can move mountains; we have quality reviewers, but no one with built-in cachet. There’s a reason Origin, on highlighting Mass Effect 3, used a review from a major newspaper rather than a gaming outlet: the paper’s name, without a reviewers, means more than any single video game reviewer.

These two factors combine to make video games a prestigeless field. Our The Artist is Journey, a game made by a small, independent company, distributed by Sony for very little risk. The last time a game company funded an “artistic” venture as a major release is probably Child of Eden or Rayman; of the big publishers, only Ubisoft seems to have missed the memo: you make more money pandering to the crowd, with commercial successes, than you do with critical ones.

So video games have created the odd hybrid, the critical/commercial success. Instead of releasing poorly received critical titles that sell millions of copies, publishers want it both ways: they want high metacritc scores and millions of copies. Games like the aforementioned Rayman are seen as aberrations. Critical and commercial success are one in the same in the eyes of publishers.

It’s tough because it forces critics into the position of consumer advocate instead of, well, being critics. The majority of reviews focus on whether or not the consumer would love a game, whether it succeeds at its basic functions, rather than whether or not it is a worthwhile work of art. This slaughters games like Fallout: New Vegas, which brilliantly succeeds as an experience and slips a little bit at basic functions. It’s ugly, it was buggy at launch, but it’s the kind of game I’ll be playing ten years from now, as opposed to more technically proficient experiences without merit as video games. It’s the kind of game that’s successful in terms of critical reception, but took hits as a strictly consumer experience.

For it to be judged on its metacritic score is patently ludicrous.

18 Comments

  1. Kimadactyl

    An example of a publisher using Metacritic in an incredibly questionable way doesn’t seem grounds to dismiss Metacritic in general. The whole argument is reductio ad absurdum to me. Of course any statistical average has it’s problems, and of course they can improve the way the aggregate stats. It doesn’t invalidate the whole thing though.

    I don’t know who “we” is, either — I don’t think you get to speak for the entire gaming public, which is the implication in your tone.

    • Tom Auxier

      The whole “we instead of I” concept has plenty of grounding in sensationalism (which I am admittedly indulging in). “Do I need Metacritic?” is masturbulatory. “Do we need Metacritic?” makes sense, and rather than implying me speaking for everyone, it’s implying me talking to everyone. It’s saying you and I are both people who care enough to read (or write) an article about stupid internet drama.

      Damning Metacritic over one example is, of course, ludicrous. There are plenty of other examples of publishers inflating their own review scores, critics getting lambasted because they dared sully the metacritic score of a popular game, et cetera. I’d be thrilled if they improved it, but it doesn’t really fit the situation created by games publishers.

      • Andrew McDonald

        *Cough*KaneAndLynchReviewGamespot*Cough*

  2. Matt Dodd

    I agree that that’s a ridiculous example, both of how Metacritic doesn’t really work the way it’s supposed to, and of how game reviews are deeply flawed. However, although the details are wrong, I think they may’ve had the right idea in mind.

    I can’t think of a more objective measure of a game’s quality than Metacritic. Of course, as you pointed out, it’s not a very objective measure of a game’s quality–but it’s still the best we have; by aggregating many subjective measures, it begins to approach objectivity. If there’s a more objective way to judge art (of any kind), I don’t know what it is.

    I say this as a huge fan of Obsidian. It’s awful that it hurt them like this, and the system needs radical overhaul. But in general, game companies incentivizing quality over pure sales is something that I want to see more of, not less.

    On a personal level, and not really related to what you’re saying in the article here, I also find some value in Metacritic simply as a convenient way to read reviews of a game I’m considering purchasing. Rather than use the raw score, as Metacritic ostensibly intends, however, I usually try to read one glowing review, one scathing review, and one mixed review to try getting an overview of the different opinions from professional reviewers.

    • Tom Auxier

      I’d love for companies to give bonuses for the quality of games, rather than sales. And metacritic is, probably, the best way for that to happen. I think, though, it might be more worthwhile to say, “X number of reviews above Y” rather than an aggregate, since Metacritic lets in a wide variety of quality of reviewers. And I’d rather encourage games that get 20 great reviews and 10 awful ones than those that get 30 85’s. That’s just me, though. To me an 80 is pretty much a kiss of death if I’m not already interested: it means the game is as advertised.

      It’s something that doesn’t have an easy solution. That’s the problem with these sorts of controversies: they’re a lot more fun if everyone knows how to fix them.

      • Matt Dodd

        On further reflection, I may’ve thought of a better way, although it would depend on Metacritic doing some restructuring.

        Metacritic already has a function where you can declare a particular user review to be helpful or not helpful. The first step would be to expand that function to the critics’ reviews also. Then use the ratio of helpful to not helpful, and the total number of helpful votes, to create a “trustedness” score for each reviewer (user and critic).

        From that, expand from the current two Metacritic scores (critic and user) into four scores. The first would be averaged from just the top 10% highest ‘trustedness’ critic reviewers who reviewed the game. This would be the one most suitable for contract bonuses and the like. The second would be averaged from all critic reviews, just like it is now. The third would be from just the top 10% highest ‘trustedness’ user reviewers who reviewed the game. And the fourth would be averaged from all user reviews, again just like it is now.

        It might also be useful to have a fifth score that averaged both the first and the third.

        I think a system like that would serve most needs that could be served by aggregate numbers.

        • Fernando Cordeiro

          I’m pretty sure critics already have that “trustiness” level. It is the weight used in there average weighted score, after all. The problem I see is that this trustiness is completely invisible.

          For users score, I simply think we should use the median, instead of the average. That way the user has no incentive to lie. I explained this idea once here: http://nightmaremode.net/2012/01/fixing-reviews-the-lying-score-14892/

          • Matt Dodd

            I did a little digging on Metacritic, and you’re right, they do apply a “weighted average” to the critic reviews. But not only is it invisible, it seems to be something they determine themselves, which is entirely unsuitable. There’s no way they’re informed enough about even a fraction of game reviewers to properly weight them manually. Even if they were, it’s the sort of thing that should change with time, so unless they were constantly tweaking it, its accuracy would naturally degrade.

            After reading your article, I agree there is some merit to the idea of using medians. However, I can’t say it’s without problems of its own, especially for games with few reviews.

            For example, consider a game that is a “love it or hate it” type of affair, and not popular enough to get a ton of reviews. A game with no or a single review is clearly not going to work well for this anyway, so let’s skip those cases. So let’s begin with two scores, a 1 and a 10. The median is 5.5, same as the average. No problem–although somewhat misleading, it’s the best that can be done. What if a third review is added? Another 1 would make the median 1–that’s not an accurate result. Another 10 would make the median 10, again an inaccurate result. The more reviews, the weaker this effect is (unless the game was truly so polarizing that only 1s and 10s were ever given), of course, but it means that a median score really isn’t workable until at least 20 reviews.

  3. Fernando Cordeiro

    I do agree with Kimadactyl and Matt Dodd.

    Is the article reductio ad absurdum? Yes. It relies on falso hypothesis that 100% of Metacritic reviews will be used as contract clauses, which is ridiculous. Furthermore, the question of whether or not we need Metacritic is tendentious. Of course we don’t. We survived so far without it, no? The question should be “do we want Metacritic”.

    Well, do we?
    Yes. A thousand times yes.

    The value of an aggregation is incredible. Does Metacritic has its flaws? Yes: both with the user and the critic score.

    Both are fixable.

    What the piece should be about is how retarded is to use something completely BEYOND your control as part of a KPI (Key Performance Indicator) in your contract clause. Even more stupid would be Obsidian Entertainment for accepting it (Yup, a contract implies BOTH parties are content with it. Crazy!, I know).

    Bonus payments should be about profits – long-term or short-term. No more, no less. It should be only about revenues and development costs. Using the Metacritic scale is like setting your bonus to the weather: the date of the cut/measurement will determine everything. This article should be about the halfwit directors who thought this was a good contract OR thought the risk of being without the bonus was derisive. Or perhaps the finance analyst who failed to see the company’s cash flow would not survive without such bonus.

    Either way, it’s not Metacritic’s, gamers’, developers’ or critic’s fault.
    The article should be named “Do we need nitwit executives?”

    (By the way, the critic who feels bad for giving a bad review, should not be a critic. If you want a job that makes people happy, join the Red Cross.)

    • Matt Dodd

      I don’t know that just because they entered the contract, they were content with it. That might be the case. However, I can think of at least two types of contracts off the top of my head I’ve entered without being happy about it–house/apartment rental (because the terms are near universal and I prefer not being homeless) and just about any EULA.

      I also think the Metacritic score isn’t outside of their control, at least not totally. I’m sure you can easily imagine how they could’ve intentionally torpedoed their score by, for example, releasing the game in its early alpha state. In fact, you can envision the creation of the game as starting at an estimated 0 Metacritic score and it rising as development continued, albeit with diminishing returns. If they’d kept the game in development longer, perhaps the additional polish and fewer bugs would’ve put them over that 85 mark. Again, I say this as an Obsidian fan.

      I think a large part of his argument is that the main problem with the Metacritic score is that a lot of low-quality reviews drown out the influence of the high-quality reviews. As you replied to me above, and I verified by poking around on Metacritic, they do attempt to compensate for that with their “weighted average” formula, but it’s clearly not working because I agree with him that it does seem to be a problem.

      • Fernando Cordeiro

        For me it’s evident that they were content. I mean, it was signed free-willingly, so there were gains for Obsidian that made them think the risks were bearable. And once you sign a contract, the responsibility is onto you. EULA are different though… and the market will soon find the need to regulate them.

        Metacritic is outside their control because you can’t control who votes. I mean, imagine you make a game for a given niche and, inside that niche, your game is a legend. It is profitable, glorious! However, the voting on Metacritic won’t be limited to that market segment. Everybody will chip in, including those who were clearly not the original aim of the game.

        Every game is developed with some kind of player in mind. They are the game’s “clients”. Quality should be defined by the client… and the opinion conveyed by Metacritic won’t reflect that.

        As for the comment above… I dunno if the median is any worse than the average for game with few reviews.. I mean, they are equally bad for they equally fail to represent the population. This would be a small sample size problem any statistical analyst faces. But one can measure than with a “Confidence” index that tells how well the sample represents the population – if would be a nice indicative.

        • Dylan Holmes

          That’s an unrealistically myopic view of contracts. I agree that it would be very interesting to know how much of an effort Obsidian made to fight that clause. However, this wasn’t a case of them shopping around to publishers. Bethesda, as the license holder, is the only one they can go through. If they stick to their guns and demand an ideal contract, the result is that they don’t get to make the game at all. Bethesda is in the position of power here, and power matters in contracts. Do not confuse legal consent with actual contentment.

          An equivalent argument would be my contract with Comcast. I don’t like it; but my alternative is to say “fuck you guys” and have no internet in my home.

    • Tom Auxier

      Fern you never approve of my sensationalism.

      At least I didn’t title it, “Is Metacritic a commodity?”

      • Fernando Cordeiro

        Not Metacritic, but data in general is 😉

      • aerothorn

        BREAKING NEWS: FERN DISAPPROVES OF STORY NOT GROUNDED IN VERIFIABLE DATA.

  4. Do you need Metacritic? No. Do I? It’s a useful way to see aggregated opinions, but I’d rather it just dropped numeric scores and gave me a snippet of text. Then again, I wish game reviews would just drop the score completely, but media outlets love giving out hyperbolically large scores (PSXtreme or whatever) or hyperbolically low (Destructoid) just to gain attention.

    “Judging a game based on a mishmash of randomly collected sources makes about as much sense as me asking all the people in a five hundred foot radius whether they like Fallout: New Vegas. There’s no statistical significance.”

    This statement is ‘patently ludicrous’: game critics are not randomised, they are enthusiasts and fans of the medium! Asking everyone within a certain radius would be a more random sample, but if you knew anything about statistical significance, you’d know random sampling is actually very important (the opposite being a biased sample, naturally).

    I’m not trying to troll you: I’m honestly not sure what point you are trying to make, unless the point is “this Metacritic situation is so crazy that it makes me difficult to write a coherent blog about it”.

    You may be interested in Craig’s Metacritique series athttp://www.split-screen.net/features/metacritique-ratings-the-inner-circle which offers some statistical analysis into how metascores pan out.

    • Tom Auxier

      That’s an interesting series on the topic; I’m glad to know someone better at statistics than I am is doing work on the topic.

      And it can be statistically significant. The type of critics who refuse to use review scores are very much a specific kind of enthusiast: they’re the kind of people willing to deeply engage with games and write reviews devoid of the “Game is adverbly adjective” paradigm. It’s excluding a very specific type of reviewer: the kind interested in criticism.

      Preferable to me would be a 4/5 star system. Let metacritic value things at 100 – 90 – 80 – 70 – and 50. Why not. It makes me think 1Up, despite horribly misvalued review scores, has it right.

  5. All good points Tom. Of course the blame lies with the publisher, as always, not with the site that simply aggregates already existing data.

    Everyone (read: reviewers) complains about how they equate different scores to come up with an average. But take 1up for instance.

    They can assign an A, B, C, D, or F. That’s a 1-5 scale, extrapolating a 75 from a B makes sense, as far as 1up’s scheme goes. Whatever scale an outlet goes with will exist independent of whatever arbitrary designations they attribute to it. So no matter what 1up says a B is, the fact of the matter is that it’s better than approximately 75% of the scores possible below, by virtue of the scale they’ve chosen.

    And in the end, whatever problems there are with metacritic, and there are obviously plenty, they won’t ever become an issue until a pub/dev puts undue weight on them, in which case it will be the fault of that company, not the supplier of the info from which they wrongly based a decision.