I’m a data hoarder. Not afraid to admit it. Hard drives groan and buckle beneath the weight of countless gigabytes of vital information, fun datasets, invaluable photographic memories, stupid memes and completely worthless crap.
If my Cloud started raining, it would be Noah’s fucking Flood.
As I was moving stuff onto yet more secure hard discs, I got to wondering why I do this.
Data permanence, of course, is a massive technological and philosophical subject that very smart people think about for fun. My interest in the subject is much, much shallower.
I’m a minimalist in every other aspect of life. My apartment is clean, my closet organized, and I just finished getting rid of 90 percent of my personal belongings, which was a completely liberating, nearly spiritual experience that helped me understand myself on a deeper level. Also, coworkers who visit my desk are convinced the lack of clutter in my workspace is a certain sign I’m a serial killer.
And yet, none of my proclivities towards cleanliness and less being more in IRL translates to the digital world.
Several things are at play. First, digital storage is cheap and infinite, unlike physical storage, so I’m never forced to choose between two files like I am two couches.
Secondly, I’m used to things going away. Most of the best things in my life have disappeared before I was ready for them to, so if I like it and I can save it, it stays.
Along those lines, Internet information decay is real. Ask anyone working for the Internet Archive or someone wishing they could retrieve their Geocities pages. Websites and online apps are constantly being updated, overwritten, removed or simply abandoned, so pages are here today and digital dust tomorrow.
Keeping things solely on social presents its own privacy dangers, and the risk of a platform going away and taking all my memories and information with it. This happened with MySpace long ago. And I’ve definitely shifted my blogging exploits away from things like SquareSpace, Medium or Tumblr because I don’t feel ownership of my own information in those places like I do somewhere either self-hosted or easily transferrable. It’s part of why I’m so appreciative that Google, Facebook and Twitter allow me to download my whole posting history.
News orgs also archive the living shit out of things periodically. Rather than rely on someone to lovingly save old articles in free and easy-to-access places forever, I would rather save PDFs of stuff I find interesting.
A dangerous flip side threatens our civilization’s gluttonous digitization of all human knowledge though. Unless data is engraved onto golden records and shot into space on a probe, every bit of information on Earth is in danger of permanent oblivion in the case of nuclear war or some disaster that either kills power sources or damages global electronics. In that case, books will live longer than iPads.
Also, cybersecurity is a constant issue. Just ask Equifax or Facebook right now, as they lose control of mass amounts of our personal information. And my own networks are laughably insecure anyone else’s, despite my best efforts and due diligence.
And as cheap and helpful as the Cloud is, keeping things on someone else’s computer instead of my own doesn’t fill me with confidence.
But I guess the upsides to digital information storage vastly outweigh the hypothetical doomsday downsides.
In the event of global catastrophe giving way to a Mad Max hellscape, whether or not we can access old YouTube videos will be the least of our worries.
So I guess I’ll just keep saving stuff until the lights go out.
Bob is a generic name for a generic, non-descript guy. Bob is likely to be male, white, older than 35 and we know almost nothing else about him. Selecting him from a crowd is a challenge.
But we’ve known Bob for a long time. He’s been in commercials for nearly every company as an example of an everyman we’re all supposed to identify with or project upon. For the last 20 years Bob has been synonymous with everyone and no one in particular. Bob even became a meme himself in 20092, signaling peak saturation.
But Bob is fading. I’ve been seeing him less and less over the last handful of years. In its place, another familiar name has started to rise.
This phenomenon gradually came to my attention as friends, snickering, would send me article after article at an increasing frequency. “You don’t know me Jeff”3, “why do women keep sleeping with Jeff”4, “Jeff has a lot of nukes”5. There are countless other examples6. Odds are, you’ve seen them too.
Over the months, I’ve come to realize something chilling: Jeff is the new Bob.
I’m not the only one who has noticed, especially among those of us named Jeff. And to be clear, I’m not upset by it at all. I think it’s pretty funny. But it also raised my curiosity as to what could be driving my humble first name to be on the tip of everyone’s tongue.
We’ve seen this with traditionally female names as well. Just ask the Karens8 and Felicias9 of the world. Though with those two examples, there seem to be points of cultural reference (movies, TV, etc) that started the memification of those names. Emily is another extremely common name that gets used a generic reference too.
In the case of Jeff – and before him, Bob – I haven’t found such an origin story. It seems to have just arisen randomly from the hivemind. This is the name we’re using to drag the everyman until we get bored with it.
I’ve searched for anywhere this phenomenon might be described, why names themselves become memes and what constitutes a generic call-out name? Is it the name’s commoness or popularity?
There are some data to answer that question. It comes from the Social Security Adminstration and U.S. Census Bureau, and snapshots Minnesota as a microcosm, which has proven to be broadly indicitive of national trends. And the stats show Jeffrey has not become a more common name by any means, and in fact has crashed significantly10 in popularity since its peak in the late 1960s.
So it’s not a matter of the name’s popularity. In fact, when looking at Robert, it’s possible that both Jeff and Bob are being picked on because they’re increasingly unpopular names. The number of males born and named Robert each year shows a similar downward rollercoaster trendline that peaked in the 1930s11. Basically, what’s happened to Jeff happened to Bob a generation earlier.
Note: looking up the shortened versions of the names in question (Robert = Bob, Jeffrey = Jeff), doesn’t produce nearly as many datapoints.
So my imperfect, unscientific theorum based on data and personal experience (and not much else), is this: Jeff is the new Bob because it has vague resonance with young adults as the name of someone they could know, but don’t.
Here’s an explanation: both corporate marketing and meme sharing are targeted at and among Millennials, who comprise the coveted 18-to-35-year-old demographic that advertisers spend their waking moments figuring out how to manipulate into spending their money.
There was a time when that age bracket was comprised of Generation X, who were largely born in the late 1960s through the 1970s. Jeff was a super popular name among GenX guys. Mostly every other Jeff I meet is GenX. When working at the University of Minnesota or Minnesota Daily in college, I was pretty unique in the Jeff category. Working at my real-world job, my workplace is now lousy with Jeffs. I’m the only Millennial I’ve encountered whose parents named him Jeffrey. I know there are more, but we’re comparatively rare and becoming rarer. So for Millennials, there are a lot of Jeffs out there – just few-to-none who they personally know. To today’s young adults, it seems like a really common name for someone else. You know, those other people that various things happen to that never actually happen to you.
Among GenX, when they were the golden young adult demographic, Bob would have been that guy, someone among their parents’ or grandparents’ generation whom didn’t usually exist in their immediate friend circle. There were a lot of Bobs, but the hip generation of the moment didn’t hang out with them so much.
This is maybe why Bob, and now Jeff, have become fair game for memification. It’s common enough in general to resonate but not personally specific enough among young adults to be offensive.
What this idea doesn’t answer is why Bob? Why Jeff? Yes, they were common names at one point, but why not other common names? I can’t throw a rock without hitting a Boomer guy named some variation of James. Does it hit too close to home as someone’s dad? Why isn’t Katie or Ashley being picked on?
So my conclusion could be completely wrong, but it’s what I’ve sussed out so far. I’m interested to hear other thoughts for sure, and to see if Olivia and Henry12 someday become the Bobs and Jeffs for the next generation.
It made its first big splash with its break of the Monica Lewinsky scandal during the second term of the Clinton administration. Since then, it’s been not only a darling among rightwing news consumers, but a page to watch among mediaites across the country.
While Drudge has since broken very few big stories itself and does very little original reporting, the site is an extremely useful link aggregator. The site not only links to new stories from a huge variety of different outlets, but Drudge himself provides his own voice to the daily news cycle, shaping the dialogue by writing his own headlines that highlight – and some would argue sometimes distort – aspects of news stories that are important to him.
Drudge Report was literally the first thing I ever saw on the Internet, on the very same night Princess Diana died. It was how my family, newly-connected online via a noisy 36k modem, learned about this tragic event. It would since become a primary information source for my parents throughout my childhood and adolence. To this day, I still read it daily, just to catch a glimpse at how a subsection of American culture is framing the day’s events.
The site’s look, feel and functionality have famously not changed significantly since the 1990s, its brutalist simplicity a favorite among minimalist designers everywhere. It’s not even mobile responsive. It has no real official social media presence, outside of Drudge’s engimatic personal Twitter account.
As of July 23, 2017, Drudge was ranked 719 on Alexa3 and often tops over 1 billion pageviews per month4. The site is so integrated into Internet culture and history, that getting “Drudge rushed” is a well-known phenemonon for sites fortunate – or unfortunate – enough to get linked on its front and only page.
The Drudge Conundrum
Feelings across the media landscape about the impact of Drudge are mixed. On the one hand, the site is a convenient aggregator that puts a lot of good journalism in front of people who otherwise might not seek it.
On the other hand, its priorities are strongly slanted towards not just the general rightwing, but Matt Drudge’s personal brand of right-leaning politics. News stories are sometimes misrepresented, and sometimes dubious sources like independent blogs and conspiratorial screed farms like InfoWars get prominent placement if they feed a particular narrative Drudge is trying to drive home. Drudge also tumbles down conspiratorial rabbit holes that don’t pan out, like his apparent obsession with Bill Clinton’s allegedy illegimate son5, a John Kerry intern scandal that didn’t exist6 and lots of Birther nonsense concerning Barack Obama’s heritage. He also very obviously promoted positive stories about then-candidate Donald Trump in an effort to help get him elected president.
And that’s not to say that other sites, publications and news sources don’t have slants. They do. But they are more often influenced by time, place, history, external and internal cultural forces and editorial mission, rather than the whims of a single person. In newspapers and other traditional media forms, news and opinion tend to be more clearly labeled (though people still get confused). On Drudge, it’s hard to tell where news ends and opinion begins when it comes to his presentation of story links.
Despite all of this, a refrain I often hear from friends and family is this: “I don’t get my news from the mainstream media, I get it from Drudge!”
First, Drudge IS the mainstream media when it attracts billions of views and helps set the tone for national conversations about global events and issues. It’s also definitely integrated into the rest of the mainstream media when its primary sources are major news organizations like The New York Times, Washington Post, Wall Street Journal, The Daily Mail, FOX News and CNN.
However, those aren’t the only sources on Drudge, as he also tends to elevate InfoWars, Breitbart, The Sun, The National Enquirer, The Daily Caller, The Gateway Pundit and other disreputable, tabloid-y or simply hyperpartisan news publications. So while the legacy news media is featured prominently on Drudge, so is the insurgent online rightwing media and other sources that get equal placement on the site.
Institutional voices are made to compete with alternative ones for people’s attention, usually without obvious attribution as to where any particular linked headline comes from. And to make things even more complicated, Drudge has also shown an aversion to linking directly to the original source of a story and will opt for wire versions or reblogs on other websites, whenever possible.
Turning Drudge into Data
So this begs some questions: what’s the typical composition of the news content aggregated on Drudge Report? How has it changed over time? Are there other patterns to be found in Drudge’s choices of headlines and information sourcing?
I decided to crunch some data to find out.
Drudge itself doesn’t keep any archives of its historical homepages or headlines. This is where Drudge Report Archive – an independently-run website that snapshots the website several times per day – becomes invaluable.
For this, using a specially-crafted Python scraper, I ripped down the headlines from morning snapshots for every available day of every year from January 2002 to October 2017. I could have gone longer or deeper, but it didn’t seem necessary to scrape everything in order to get a sizeable, representative sample. In the end, after eliminating duplicate links across days, ads and references in the huge directory at the bottom of the page, I ended up with about 200,000 story URLs spanning 16 years.
The scraper stored all of the links on the Drudge homepage frome every targeted snapshot in time, breaking out their URLs (information source) and text (headline) and timestamped each entry.
Using a bunch of Excel magic, I eliminated any duplicates (sometimes the same links last for days on Drudge) and any links that weren’t news headlines, such as ads and the long list of blogs and news orgs at bottom of the page.
Pivoting on multiple metrics, I produced a number of summary tables breaking multiple trends found in the data, by year, headline, link source and more. Charts were created using C3.js.
Drudge’s favorite sources
Drudge takes in a vast variety of different news sources – more than 5,000 distinct web domains appear in the data.
But there are some clear favorites. Overwhelmingly, Drudge relies on wire services with some rightwing commentary and analysis from sites like Brietbart and InfoWars thrown in.
Nearly 20 percent of the links in the data sample come through wire services like the Associated Press and Reuters, and wire-heavy news sites like Yahoo! News.
About 6 percent of the links came from Breitbart News – slightly more than The New York Times and The Washington Post combined (though they also rate relatively highly compared to other sources).
While Drudge certainly does give voice to lots of smaller rightwing blogs and columnists, the bulk of its content is a selection from the mainstream reporting provided by the same newswires that help power the reporting of major news organizations.
This also shows that Drudge – while often self-referential and instantly springing upon any story mentioning the website – doesn’t often link internally to its own domain, which makes sense since, as previously mentioned, the site is not a source of much original reporting. The “Drudge exclusive” is a rare thing indeed. However, Drudge did link to the stored pages on Drudge Report Archives about 4,000 times from 2002 through 2017.
Determining Drudge’s tilt
Analyzing just the news websites that comprise 1 percent or more of a 200,000-link Drudge dataset shows the site favors center, center-right and rightwing sources over left-leaning or leftwing sources.
Basically, Drudge has a center-right slant with some alt-right occassionally thrown in.
For a deeper, more detailed analysis of media bias and political alignment, check out my lookup tool.
Hot Drudge topics
Drudge headlines are very often written by staff, or even by Matt Drudge himself, instead of using the story titles provided by news agencies. Just as they select the stories featured on Drudge, they craft the headlines describing the linked content.
Any frequent reader of Drudge knows about the site’s preoccupation with various recurring topics, such as robots, sex, sex robots, AI, demonic possession, apocalypses, general news of the weird and Hollywood buzz. The site is also traditionally LGBT-friendly, has little patience for overtly Christian politicians and doesn’t embrace traditional religious conservatism.
But more than anything, Drudge has been primarily obsessed with covering the lives, words and policies of U.S. presidents. This is understandable, as the site rose to fame breaking Bill Clinton’s scandals. And it’s yet another respect where Drudge reflects the mainstream media it draws information from, since presidential administrations get a lot of coverage – even more so these days under the Trump administration.
Though, the angle from which Drudge describes these stories might differ, as to even a casual observer it’s decidely more anti-Clinton and anti-Obama and much more pro-Trump.
And Drudge has had a particular obsession with covering Obama.
Running a textual analysis of Drudge headlines (and discarding those words with very low frequency) reveals the name “Obama” appearing nearly 9,000 times over about a decade.
By comparison, over the 16-year time period, “Bush” appeared about 2,700 times, “Clinton” 2,155 times, “Trump” about 2,000 times and “Hillary” nearly 1,800 times. For non-presidential context, the word “sex” appeared about 2,500 times.
The bottom line is this: The Drudge Report could not exist without the mainstream media. There wouldn’t be any content. First, the bulk of Drudge links that come from wire services would vanish, as would its reliance on news broadcasters, The New York Times and The Washington Post. And since most of the blogs and alternative news sites Drudge links to also draw heavily upon those sources for information, spin and reaction, those sites would also diminish.
And it bears repeating that Drudge IS the mainstream media in terms of sources, traffic and media, with some non-mainstream links and right-biased snark thrown in to give its content cocktail a distinctly rightwing flavor that makes news more palatable to a more conservative audience who feel like western journalism isn’t serving their interests.
There are still questions I can’t find answers to within this dataset: how can we quantify what important stories Drudge doesn’t feature at all? Does page placement of a link affect readership? How many only read the headlines versus actually clicking through on a story link? How often does Drudge purposely link to websites that only echo original reporting instead to a story’s actual source? These seem like questions only those living withing the core of the Drudge world would know, or would require a much more massive research lift to understand.
At the end of the day, this is simply a means of demonstrating with data what we already knew anectdotally: that Drudge Report is a useful news aggregation tool with a distinct rightwing tilt with content drawn from and therefore reflecting its favorite mainstream and non-mainstream information sources.
What I imagine Batman’s Fitbit stats look like on a slow night.
Our knowledge of understanding and manipulating genetics have come a long way.
And you know the tech is good when it’s in the hands of consumers, and with that comes exciting possibilities for the present and the future.
It also provides the opportunity to win an argument by bringing down the full hammer of science upon the heads of your opponents.
Heritage is important. Maybe not nearly as important as who we are now, and where we’re going, but understanding where we come from can help inform those paths.
Up to this point, I’ve identified myself as equal parts English, Irish, Scottish and German. This is based purely on family lore, which says the first Hargarten to travel from Germany to America was my great-great grandfather William Hargarten near the turn of the 20th century. He was escaping conscription into the German military, the story goes, and brought his family to the United States to start a new life. My grandfather joked that despite living in a German household, his father, my great-grandfather, could only speak two words of German: “ja” and “nein.”
On my mom’s side are lots of Carters – mutts from the British Isles with a cocktail of different tribes in them, Scottish, Irish and English predominantly. The Carters have been in the U.S. forever, people with that name are very common and the family trees are huge and often disconnected, so tracing them back gets easily muddled in uncertainty.
Because of this, German heritage is how my family has most strongly identified over the past century. There’s even a town in Germany called Hargarten just outside of Trier, which I’ve visited, and there’s lore of a Hargarten University where anyone named Hargarten could attend free of charge, until the institution was bombed to ash in World War I.
The thing about family lore is that it tends to fall apart under scrutiny. There are certainly grains of truth in the legends, but the official story starts to fall apart upon inspection.
First, these stories provide a very limited view of our family’s past as it doesn’t shed any light on anything that happened before the 1800s. Additionally, the verifiable details that my family can point to suggest some very different stories.
For instance, there is indeed a town called Hargarten in Germany. But there’s another one in France, and perhaps even more than that spread across the area historically known as the Rheinland. And the Rheinland, which Hitler reclaimed for Germany in World War II, had traditionally been home not only to native Germans, but many French and Scandinavian peoples as well.
Also, the name Hargarten isn’t even German, but just sounds that way to English-speaking ears. It’s actually closer to something Norweigian or otherwise Scandinavian. And while William Hargarten was from Germany, and did speak German, that’s not necessarily where his family originally came from, and that’s where the trail goes cold when my relatives try to reconstruct our lineage. William just didn’t impart a lot of knowlegdge to his kids, and they in turn left even less to theirs, leaving my immediate and extended family members flailing in the dark. This was partially due to rising anti-German sentiment in the United States across two global wars, and the Hargarten family did what they could to submerge their cultural heritage, just short of changing the name.
Then there’s my dad, the amateur geneology sleuth who has his own battery of theories about our family’s origins, including a bunch of nonsense about British-Israelism, how there can’t be a single drop of actual German blood in our veins and other things not really borne of objective logic or evidence.
So basically, I’ve wanted to address some of these open-ended questions for years, and after consulting friends who went through a similar process, I decided to order 23andMe.
It was a simple process. I ordered the kit for about $80, sent in a saliva sample to their lab and within two months received access to an online report explaining my genetic heritage.
So what did it find?
Science has confirmed what has long been suspected: that I am the whitest person alive.
It turns out, in this case, family lore is pretty close to the truth, at least in broad strokes.
I am 100 percent European, which breaks down to 55 percent British and Irish, 19 percent French and German, 7% Scandinavian and 16 percent “Broadly Northwestern European” – whatever that means.
Oddly, the most unexpected results was the 2 percent of my chromosome that’s Iberian – easily the least-white part of my map.
And unlike many other white people from Europe, I don’t seem to have any Mongolian in me - a common trait due to Genghis Khan’s sweeping conquest of the world.
The timeline is also really interesting, showing how many generations ago my most recent ancestors from each population lived.
It’s fascinating too that the likely Scandinavian root of my last name dovetails with my most recent living fully Scandinavian ancestor dating back to the 1830s – William Hargarten would have arrived in the U.S. only some decades later.
There’s also a small, vague part – only 0.4 percent – of me labeled Easter European, which ranges between five and eight generations back in time. This is explained by 23andMe thusly: “You most likely had a third great-grandparent, fourth great-grandparent, fifth great-grandparent, sixth great-grandparent, or seventh great (or greater) grandparent who was 100% Eastern European. This person was likely born between 1710 and 1830.”
This all leaves a lot of room for long-held family concepts of our lineage to be true, at least from 50,000 feet. The genetic analysis unfortunately can’t isolate all the gritty details of my heritage, but it leaves a lot of interesting questions to ponder.
The material and paternal migrations also hold some fascinating findings, including how those grups moved from Africa and into Europe, and the fact that I’m related to King Louis XVI of France and the House of Bourbon.
It also demonstrates that no matter white someone is, we’re all descended from eastern Africa, the cradle of human civilization.
If there was an actual Eve for humanity, it would be a single woman who lived in eastern Africa up to 200,000 years ago. X chromosomes from her contemporaries have all since disappeared, making her our universal mom.
Additionally, everyone’s paternal lines trace back more than 275,000 years to a single man who was the common ancestor of haplogroup A. Rather than being analogous to the fabled Adam of western world religions, this guy was one of many eastern Africans. It’s just that his Y chromosomes were far more successful than that of his brethren, whose ancestors have all since died out.
One final tidbit of interest is that I have 305 Neanderthal variants in my chromosome, which is reportedly more than 87 percent of 23andMe customers, while accounting for less than 4 percent of my DNA.
But we are in the Golden Age of prestige television, a world where each episode can cost well over $8 million dollars. So getting sucked into the thrills of high-production, serialized storytelling is pretty much unavoidable for me.
So here’s a ranked of what data shows are my favorite TV programs, including my ranking and its RottenTomatoes score.
Food is cheap and plentiful in wealthy western countries, so for me to complain about food is pretty offbase, it could be argued.
But how well are we eating?
There’s an ocean of conflicting advice from doctors, researchers, regulators, businesses, health gurus and various huxsters trying to make a quick buck off a populace with absolutely zero notion of what a healthy diet actually looks like.
And I’m one of those people.
Then there’s the parade of fad diets like Dukan, Atkins, Paleo and other programs aimed at shedding pounds and promoting general health that simply don’t work2. And everyone’s got a dietary plan3 it seems, each claiming to hold the true key to health through property dietary nutrition.
Much of the consensus is built around the idea that if food tastes good, or is purchased from a store or restaurant, that there’s something inherently wrong with it. For some, unless you’re shaking fresh soil off a piece of organic produce, you’re eating poison.
And what about quality of life? If we’re all going to die anyway, why does it matter what we eat or don’t eat, ultimately? Shouldn’t we just enjoy ourselves? Just don’t eat in ways that will cause you suffering down the road, and pay attention to your individual needs rather than broadly prescribed guidelines.
My solution so far has been simple: everything in moderation. Unless I’m legitimately allergic to it, I’ll eat or drink anything edible – just not too much of it. But even this idea, which seems to work for me, has come under fire5.
So I’ve come to the conclusion that nothing is safe, nothing is true and if that’s the case, it might as well taste good.
But that hasn’t stopped me from at least being somewhat conscious of what I’m putting in my body. I’ve successfully eliminated soda and beer from life and drink water, tea and various juices instead. I train every day, walk everywhere and consume calories in balance with what I burn. This simple formula has me feeling much better and more energetic.
Health gurus might faint dead away at the notion of things being this uncomplicated, but I have to say, at this point, I really don’t care.
A lot of people – and a lot of gluttonous orange cats – lament the existence of Mondays.
But what if I told you that Mondays changed my life drastically for the better?
I grew up in a household where Saturday was absolutely the seventh day, entirely for religious reasons. I didn’t even know throughout most of my childhood that this wasn’t a universally accepted fact, or that calendars existed where Sunday was the last day of the week instead.
The Gregorian Calendar, obviously, is extremely popular and ubiquitous in the U.S.1 But I never fathomed until I was already an adult that the ISO 8601 calendar existed as well2, which places Monday in the week’s first slot and is an international standard.
And that makes more sense, doesn’t it? If Friday, Saturday and Sunday are considered “the weekend” – when most people have work and school off – why would Sunday also be considered the start of a new week?
Making Sunday the beginning of a new week comes with a lot of problems, especially for someone like me whose Saturdays as a child were dominated by religious obligations instead of what the other kids described as “fun.”
Ridding myself of that nonsense and shifting the first day of my week to Monday changed everything for me. Suddenly my schedules started to make sense. Chunking out weekly time blocks from Monday to Sunday just fits perfectly with a two-day wind-down at the end before starting another weekly cycle. My calendar apps don’t look confusing to my anymore.
Saturday is now used as another day to be productive, and Sunday is a rest day. Using Sunday as a fulcrum to recover from the current week and prepare for the next one has helped me immeasurably.
As often happens when getting to know someone, it will inevitably be asked “what’s your favorite movie?” Having favorite films is part of American culture and identity as movies are one of our biggest and more popular world imports, and even in times of great economic uncertainty, people have been piling into cinemas to experience new releases in all their silver screen glory.
My problem is being indecisive, and what’s my favorite things are at the time will be largely dependent upon my mood and what new things I’ve encountered that have impressed me.
But since I’m a data nerd, I wondered if there was a way to best arrive at something at least approximating my favorite 100 films, ranked in some sort of reasonable order?
The methodology was pretty simple: list the first 100+ movies that come to mind and rate each one using a score from 1 to 5 and see where it all falls after sorting it. For reference and for fun, I included RottenTomatoes scores for comparison.
So here’s a ranked of what data shows are my favorite films, including my ranking and its RottenTomatoes score.
So I got a Fitbit and started tracking my walks in all of July (or at least when I remembered to turn the GPS on). There’s definitely a walking pattern to my life.
This might not look like much of a geographic range, but I just earned my London Underground badge of 402km lifetime distance since I started using the device. Of course, because I forget to track it constantly, my actual distance on foot traveled dwarfs that.
I need new shoes.
There have been a few attempts to make a directory of fake news sites to guide social media users away from reading and sharing misinformation.
So I still felt pretty unsatisfied. We could go so much deeper.
I’ve taken a stab at compiling a data of news websites – both real and fake – that I will maintain and update in response to new information and feedback.
This directory not only tracks which publications are fake, but which are trustworthy, those that are merely partisan (or hyperpartisan) and those that are legitimate information sources regardless of their leanings.
It’s an attempt to categorize purported information sources in a more descriptive, fair and accurate way without necessarily demonizing outlets that focus more on opinion or political agendas.
I compiled data from the excellent Media Bias Fact Check project, a non-profit dedicated to scoping out the political leanings and veracity of news organizations, blogs and other online information hubs.
Judging how liberal or conservative a media source is can often be super subjective, completely dependent upon one’s personal sensibilities, but MBFC does a really good job, so this was my starting point.
From there I built out a database and assigned my own broader ratings based heavily on the MBFC ratings, while also adding another layer of depth by parsing out the partisan from the hyperpartisan. I also attempted to classify sources and publications by whether they were part of what’s nebulously termed “the mainstream media” (even though this is subjective and a misnomer4).
It’s always helpful to define some terms.
First, the political leanings:
Centrist: can be liberal or conservative, often wavering between the two situationally, but overall tries to stand in the middle ground.
Lean: Most often takes stances or reports stories that could be viewed as left/right of center, but still strives to be fair.
Partisan: representing the left or right is part of its identity. It owns being on one side, with either liberals or conservatives being their core audience. Can still be fair and self-critical.
Hyperpartisan: completely died-in-the-wool leftwing or rightwing, sometimes fringe, often unreasonable, never really critical of its own side but extremely hostile towards its perceived opposition. Not necessarily always wrong, can make valid points and share reasonable opinions, but rarely in a balanced way.
Next, the information types:
Reliable: generally tries to responsibly report the facts as they know them, applying at least some level of editorial rigor to its content.
Mixed: a mix of sensationalism, cursory reporting and unverified information occassionally interrupted by good, original stories.
Unreliable: generally untrustworthy, doesn’t try very hard to verify information, low editorial rigor, deals mostly in gossip, rumors, unsupported opinion and shoddy reporting.
Satire: those that pretend to be information sources for the sake of humor, parody and entertainment, not trying to be taken seriously or deceive.
Hoax: those that pretend to be information sources while actively attempting to deceive its audience with false reports.
Conspiracy: utter, contemputuous garbage, often mixing pseudoscience and outright untruths with wild speculation to generate some of the darkest, most unbelievable lies.
So what should you read?
Anything you like.
The problem isn’t that there’s a huge variety of different kinds of news sources. That can be an asset if harnessed properly.
The problem is a crippling lack of media literacy in the general population and the widening partisan cultural divide between right and left. The problem is people on the right or left believing their favorite information source is the only arbiter of truth and facts simply because it generally agrees with them.
The problem is noise, yes, but also a general inability for people to cut through it to find the signal.
So read anything you like. I would only implore you to know what it is, where it’s coming from, what kind of information it’s sharing – and not sharing – and what other sources are saying too.
What you like, what you believe and what you share with others is up to you. Just don’t claim that solid, trustworthy news reporting is being withheld from you or is hard to find. Unless you live in a totalitarian regime, it’s not.
Despite the constant ragging on the media landscape, there’s probably better, deeper investigative journalism reaching a larger audience today than there’s ever been. It’s just a matter of sifting through the mountains of garbage to find it.
My advice would be to read things from across these categories. The super basic 24-hour news grind can keep you apprised of the latest, reading newspapers can scrape at the layers beneath that story, partisan opinion sites can lend voice to each side of an issue and even the hyperpartisan hatchet jobs can lend insight into how the fringes are thinking.
If you need to rely on your Facebook and Twitter feeds, go for it, just simply Like or Follow a huge variety of different news sources and the information will come to you. After that, you just need to read what trickles into your stream.
All that said, this guide isn’t perfect, this I know. It’s not meant to be perfect, as I’m not sure that’s an attainable goal. People will poke at it, and I gladly welcome constructive criticism. But I hope this attempt at a media guide can at least generate some good discussion and add even the slightest bit of clarity to an increasingly controversial subject.
Update August 8, 2017: Buzzfeed News also has a great analysis of this subject.
At work, our Star Tribune team has been winning a number of awards for A Cry for Help – about police killing mentally ill people – a story I spent several months wrangling on the data and visualization end of things.
Yesterday the project won a first place National Headliner Award for web/interactive storytelling, and a couple weeks ago it won the Al Nakkula Award for Police Reporting. It’s also been nominated for the Silver Gavel Awards from the American Bar Association (winners to be announced in May).
Our coverage of the Philando Castile shooting was also a finalist for the 2017 IRE Awards, which was a tremendous honor for our team just to be mentioned.
The Strib also took seven other National Headliner Awards in yesterday’s haul and included an honorable mention for Andy Mannix’s fantastic Solitary: Way down in the hole, which I also put a few months of work into.
Of course, all of journalism waited with bated breath for the 2017 Pulitzer Prize announcements last week, which recognized a lot of great reporting from last year.
The most striking takeaway for me this year is how much great data-driven journalism has been recognized, and how deeply intertwined it’s been with traditional shoeleather enterprise reporting. For us, all of our biggest projects were huge team efforts tackling the challenges of traditional reporting, data reporting, photojournalism, graphic design and web design.
I don’t do this for awards. Some of the best, longest-working journalists haven’t amassed many. It’s an honor just to be working alongside my extremely talented colleagues on such vital storytelling.
I can literally see my house from Google Earth, in a full 3D rendering. The last time I used the app, this was definitely not the case. It even renders the trees.
My favorite mapping tool, Mapbox GL JS, also supports 3D buildings on its awesome vector maps now.
Eventually we’ll have a completely scale model digital replica of the whole planet, complete with timelapse going back decades, if not centuries. With vast datapoints like these, augmented reality could give virtual reality a run for its money.
As I delve into exploring this and libraries tools like these and Three.js, I’ll be seeking ways to implement 3D into my storytelling.
My goal: cover the map
Accurate. I’ll only confirm having fewer than seven nuclear weapons.
Last Week Tonight speaks to my soul.
The Iowa Caucus results1 were yet another example of what many of us data nerds have been insisting upon for a long time: polling the presidential primary season – especially in the early stages– is an increasingly worthless endeavor.
Hillary Clinton in 2008 was seen as inevitable and unbeatable in the Democrat primaries, according to polls, until she wasn’t and fell to Barack Obama. The polls showed Mitt Romney was slated to beat Obama in 2012, until he didn’t. Strategists and pundits leaning on single polls like a crutch for political analysis mocked statistician Nate Silver for his extremely accurate 2012 elections predictions based on polling aggregation and analysis, plus other weighted factors.2 The 2014 midterm was disastrous for media polling as GOP victories swept across the country. Then, of course, the 2016 Iowa Caucus and likely the caucuses and primaries that follow have demonstrated support for certain candidates are undervalued while others are exaggerated when viewed through the lenses of rolling and aggregated polls.
The data is too sparse and unreliable and the methodology suspect in an age where landline phones near extinction and Internet surveys are only slightly more believable than Internet polls.
There was a heyday for election polling spanning the 20th Century, when willing participants were more common and more easily reached. But barring a shift away from outdated, inadequate or underfunded research models, polling seems to be in crisis and simply cannot be relied upon as a trustworthy means of taking the electorate’s temperature.
Pollsters well recognize3 this reality as more people use unlisted cellphones and increasingly don’t take the time to answer surveys, meaning that conducting quality research has become more difficult and expensive. This narrows the demographics of respondents to those older and less technologically connected, which is increasingly unrepresentative of the electorate.
I had hoped media organizations would have been given more pause when Gallup, which has tracked horse races since it accurately predicted Franklin Roosevelt’s 1936 landslide, announced it wouldn’t be engaging in primary polling4 this election cycle due to significant mismatches between reporting and outcomes.
There is a difference between a poll being accurate and one being predictive. Did the 600 people polled tell the truth about their candidate preferences? Probably, so the poll could be fairly accurate within the scope of the sample. But where the wheels fall off that bus is when media pundits and analysts try using such a snapshot to predict a race’s outcome when people start voting, and I would argue there simply isn’t enough reliable information to make that leap, especially in cases where actual voter turnout5 isn’t being accounted for.
Many data journalists and statisticians, including the aforementioned Nate Silver of FiveThirtyEight, have been maligned for suggesting Donald Trump is unlikely to win the GOP nomination and that his soaring national poll numbers are likely inflated. This may or may not prove to be true. But Iowa was the first salvo showing Silver is probably more right than not, since Trump and Ted Cruz’s fortunes were reversed from expectations and Marco Rubio’s totals were significantly higher than polling aggregates suggested.
Source: HuffPost Pollster
I would argue any resemblance between the media’s polling and actual vote counts in primaries and caucuses is purely incidental. Even if we argue that aggregate polls for the Democratic race in Iowa showed a clear, narrowing trend line between the top two candidates over time, none suggested the series of coin tosses Hillary Clinton and Bernie Sanders found themselves engaged in. It was never supposed to be this close according to what the constant flow of media polls suggested.
Source: HuffPost Pollster
We only have Iowa under our belt thus far this primary season, so sure, every other state could somehow fall into line with horse race trends reported by various media outlets. But I’m not holding my breath.
Ann Seltzer – hailed as the best pollster in Iowa for the Des Moines Register, who has a very good track record6 – showed a comfortable Trump caucus win in her final survey result (though Cruz has led the pack at various points over the last few weeks and the drill-down suggested he was more popular). Rather than a knock on a very good pollster who has done well in a very hard to gauge state7, the Register’s poll uncharacteristically missing the mark simply adds to growing doubts about the practice of horse race polling overall and further suggests that Gallup knows something many media outlets aren’t really widely acknowledging: primary polling is rather inaccurate.
Do I have a solution to close the gap between primary polls and voting outcomes, a golden measurement to predict political futures? No. And there are pretty good polls and surveys out there outside of the realm of early presidential races. Plus, as the campaigns wear on and the field narrows into the general election, polls become slightly more valuable and even more predictive – though even then aren’t beyond suspicion.8
Rather than tossing election polls out as utterly worthless just yet, I instead simply append a mental asterisk to new numbers and don’t let them guide my expectations. Because beyond the realm of data there are other, better indicators to determine how elections, nominations, primaries and caucuses will turn out – like waiting for people to vote.
Phillip, Abby. “ ↩