As often happens when getting to know someone, it will inevitably be asked “what’s your favorite movie?” Having favorite films is part of American culture and identity as movies are one of our biggest and more popular world imports, and even in times of great economic uncertainty, people have been piling into cinemas to experience new releases in all their silver screen glory.
My problem is being indecisive, and what’s my favorite things are at the time will be largely dependent upon my mood and what new things I’ve encountered that have impressed me.
But since I’m a data nerd, I wondered if there was a way to best arrive at something at least approximating my favorite 100 films, ranked in some sort of reasonable order?
The methodology was pretty simple: list the first 100+ movies that come to mind and rate each one using a score from 1 to 5 and see where it all falls after sorting it. For reference and for fun, I included RottenTomatoes scores for comparison.
So here’s a ranked of what data shows are my favorite films, including my ranking and its RottenTomatoes score.
So I got a Fitbit and started tracking my walks in all of July (or at least when I remembered to turn the GPS on). There’s definitely a walking pattern to my life.
This might not look like much of a geographic range, but I just earned my London Underground badge of 402km lifetime distance since I started using the device. Of course, because I forget to track it constantly, my actual distance on foot traveled dwarfs that.
I need new shoes.
There have been a few attempts to make a directory of fake news sites to guide social media users away from reading and sharing misinformation.
So I still felt pretty unsatisfied. We could go so much deeper.
I’ve taken a stab at compiling a data of news websites – both real and fake – that I will maintain and update in response to new information and feedback.
This directory not only tracks which publications are fake, but which are trustworthy, those that are merely partisan (or hyperpartisan) and those that are legitimate information sources regardless of their leanings.
It’s an attempt to categorize purported information sources in a more descriptive, fair and accurate way without necessarily demonizing outlets that focus more on opinion or political agendas.
I compiled data from the excellent Media Bias Fact Check project, a non-profit dedicated to scoping out the political leanings and veracity of news organizations, blogs and other online information hubs.
Judging how liberal or conservative a media source is can often be super subjective, completely dependent upon one’s personal sensibilities, but MBFC does a really good job, so this was my starting point.
From there I built out a database and assigned my own broader ratings based heavily on the MBFC ratings, while also adding another layer of depth by parsing out the partisan from the hyperpartisan. I also attempted to classify sources and publications by whether they were part of what’s nebulously termed “the mainstream media” (even though this is subjective and a misnomer4).
It’s always helpful to define some terms.
First, the political leanings:
Centrist: can be liberal or conservative, often wavering between the two situationally, but overall tries to stand in the middle ground.
Lean: Most often takes stances or reports stories that could be viewed as left/right of center, but still strives to be fair.
Partisan: representing the left or right is part of its identity. It owns being on one side, with either liberals or conservatives being their core audience. Can still be fair and self-critical.
Hyperpartisan: completely died-in-the-wool leftwing or rightwing, sometimes fringe, often unreasonable, never really critical of its own side but extremely hostile towards its perceived opposition. Not necessarily always wrong, can make valid points and share reasonable opinions, but rarely in a balanced way.
Next, the information types:
Reliable: generally tries to responsibly report the facts as they know them, applying at least some level of editorial rigor to its content.
Mixed: a mix of sensationalism, cursory reporting and unverified information occassionally interrupted by good, original stories.
Unreliable: generally untrustworthy, doesn’t try very hard to verify information, low editorial rigor, deals mostly in gossip, rumors, unsupported opinion and shoddy reporting.
Satire: those that pretend to be information sources for the sake of humor, parody and entertainment, not trying to be taken seriously or deceive.
Hoax: those that pretend to be information sources while actively attempting to deceive its audience with false reports.
Conspiracy: utter, contemputuous garbage, often mixing pseudoscience and outright untruths with wild speculation to generate some of the darkest, most unbelievable lies.
So what should you read?
Anything you like.
The problem isn’t that there’s a huge variety of different kinds of news sources. That can be an asset if harnessed properly.
The problem is a crippling lack of media literacy in the general population and the widening partisan cultural divide between right and left. The problem is people on the right or left believing their favorite information source is the only arbiter of truth and facts simply because it generally agrees with them.
The problem is noise, yes, but also a general inability for people to cut through it to find the signal.
So read anything you like. I would only implore you to know what it is, where it’s coming from, what kind of information it’s sharing – and not sharing – and what other sources are saying too.
What you like, what you believe and what you share with others is up to you. Just don’t claim that solid, trustworthy news reporting is being withheld from you or is hard to find. Unless you live in a totalitarian regime, it’s not.
Despite the constant ragging on the media landscape, there’s probably better, deeper investigative journalism reaching a larger audience today than there’s ever been. It’s just a matter of sifting through the mountains of garbage to find it.
My advice would be to read things from across these categories. The super basic 24-hour news grind can keep you apprised of the latest, reading newspapers can scrape at the layers beneath that story, partisan opinion sites can lend voice to each side of an issue and even the hyperpartisan hatchet jobs can lend insight into how the fringes are thinking.
If you need to rely on your Facebook and Twitter feeds, go for it, just simply Like or Follow a huge variety of different news sources and the information will come to you. After that, you just need to read what trickles into your stream.
All that said, this guide isn’t perfect, this I know. It’s not meant to be perfect, as I’m not sure that’s an attainable goal. People will poke at it, and I gladly welcome constructive criticism. But I hope this attempt at a media guide can at least generate some good discussion and add even the slightest bit of clarity to an increasingly controversial subject.
At work, our Star Tribune team has been winning a number of awards for A Cry for Help – about police killing mentally ill people – a story I spent several months wrangling on the data and visualization end of things.
Yesterday the project won a first place National Headliner Award for web/interactive storytelling, and a couple weeks ago it won the Al Nakkula Award for Police Reporting. It’s also been nominated for the Silver Gavel Awards from the American Bar Association (winners to be announced in May).
Our coverage of the Philando Castile shooting was also a finalist for the 2017 IRE Awards, which was a tremendous honor for our team just to be mentioned.
The Strib also took seven other National Headliner Awards in yesterday’s haul and included an honorable mention for Andy Mannix’s fantastic Solitary: Way down in the hole, which I also put a few months of work into.
Of course, all of journalism waited with bated breath for the 2017 Pulitzer Prize announcements last week, which recognized a lot of great reporting from last year.
The most striking takeaway for me this year is how much great data-driven journalism has been recognized, and how deeply intertwined it’s been with traditional shoeleather enterprise reporting. For us, all of our biggest projects were huge team efforts tackling the challenges of traditional reporting, data reporting, photojournalism, graphic design and web design.
I don’t do this for awards. Some of the best, longest-working journalists haven’t amassed many. It’s an honor just to be working alongside my extremely talented colleagues on such vital storytelling.
I can literally see my house from Google Earth, in a full 3D rendering. The last time I used the app, this was definitely not the case. It even renders the trees.
My favorite mapping tool, Mapbox GL JS, also supports 3D buildings on its awesome vector maps now.
Eventually we’ll have a completely scale model digital replica of the whole planet, complete with timelapse going back decades, if not centuries. With vast datapoints like these, augmented reality could give virtual reality a run for its money.
As I delve into exploring this and libraries tools like these and Three.js, I’ll be seeking ways to implement 3D into my storytelling.
My goal: cover the map
Accurate. I’ll only confirm having fewer than seven nuclear weapons.
Last Week Tonight speaks to my soul.
The Iowa Caucus results1 were yet another example of what many of us data nerds have been insisting upon for a long time: polling the presidential primary season – especially in the early stages– is an increasingly worthless endeavor.
Hillary Clinton in 2008 was seen as inevitable and unbeatable in the Democrat primaries, according to polls, until she wasn’t and fell to Barack Obama. The polls showed Mitt Romney was slated to beat Obama in 2012, until he didn’t. Strategists and pundits leaning on single polls like a crutch for political analysis mocked statistician Nate Silver for his extremely accurate 2012 elections predictions based on polling aggregation and analysis, plus other weighted factors.2 The 2014 midterm was disastrous for media polling as GOP victories swept across the country. Then, of course, the 2016 Iowa Caucus and likely the caucuses and primaries that follow have demonstrated support for certain candidates are undervalued while others are exaggerated when viewed through the lenses of rolling and aggregated polls.
The data is too sparse and unreliable and the methodology suspect in an age where landline phones near extinction and Internet surveys are only slightly more believable than Internet polls.
There was a heyday for election polling spanning the 20th Century, when willing participants were more common and more easily reached. But barring a shift away from outdated, inadequate or underfunded research models, polling seems to be in crisis and simply cannot be relied upon as a trustworthy means of taking the electorate’s temperature.
Pollsters well recognize3 this reality as more people use unlisted cellphones and increasingly don’t take the time to answer surveys, meaning that conducting quality research has become more difficult and expensive. This narrows the demographics of respondents to those older and less technologically connected, which is increasingly unrepresentative of the electorate.
I had hoped media organizations would have been given more pause when Gallup, which has tracked horse races since it accurately predicted Franklin Roosevelt’s 1936 landslide, announced it wouldn’t be engaging in primary polling4 this election cycle due to significant mismatches between reporting and outcomes.
There is a difference between a poll being accurate and one being predictive. Did the 600 people polled tell the truth about their candidate preferences? Probably, so the poll could be fairly accurate within the scope of the sample. But where the wheels fall off that bus is when media pundits and analysts try using such a snapshot to predict a race’s outcome when people start voting, and I would argue there simply isn’t enough reliable information to make that leap, especially in cases where actual voter turnout5 isn’t being accounted for.
Many data journalists and statisticians, including the aforementioned Nate Silver of FiveThirtyEight, have been maligned for suggesting Donald Trump is unlikely to win the GOP nomination and that his soaring national poll numbers are likely inflated. This may or may not prove to be true. But Iowa was the first salvo showing Silver is probably more right than not, since Trump and Ted Cruz’s fortunes were reversed from expectations and Marco Rubio’s totals were significantly higher than polling aggregates suggested.
Source: HuffPost Pollster
I would argue any resemblance between the media’s polling and actual vote counts in primaries and caucuses is purely incidental. Even if we argue that aggregate polls for the Democratic race in Iowa showed a clear, narrowing trend line between the top two candidates over time, none suggested the series of coin tosses Hillary Clinton and Bernie Sanders found themselves engaged in. It was never supposed to be this close according to what the constant flow of media polls suggested.
Source: HuffPost Pollster
We only have Iowa under our belt thus far this primary season, so sure, every other state could somehow fall into line with horse race trends reported by various media outlets. But I’m not holding my breath.
Ann Seltzer – hailed as the best pollster in Iowa for the Des Moines Register, who has a very good track record6 – showed a comfortable Trump caucus win in her final survey result (though Cruz has led the pack at various points over the last few weeks and the drill-down suggested he was more popular). Rather than a knock on a very good pollster who has done well in a very hard to gauge state7, the Register’s poll uncharacteristically missing the mark simply adds to growing doubts about the practice of horse race polling overall and further suggests that Gallup knows something many media outlets aren’t really widely acknowledging: primary polling is rather inaccurate.
Do I have a solution to close the gap between primary polls and voting outcomes, a golden measurement to predict political futures? No. And there are pretty good polls and surveys out there outside of the realm of early presidential races. Plus, as the campaigns wear on and the field narrows into the general election, polls become slightly more valuable and even more predictive – though even then aren’t beyond suspicion.8
Rather than tossing election polls out as utterly worthless just yet, I instead simply append a mental asterisk to new numbers and don’t let them guide my expectations. Because beyond the realm of data there are other, better indicators to determine how elections, nominations, primaries and caucuses will turn out – like waiting for people to vote.
Phillip, Abby. “ ↩