One Friday night five strangers came together at Comedy Hack Day LA: a comedian and self-professed muckraker, a UX designer with a Reddit bone to pick, a back-end developer who knows how to put Ruby on Rails, an electrical engineer, and a copywriter. It was not easy, egos were bruised, personalities were wrangled, but by Saturday night, they had launched a web-based app that measures Reddit users’ use of questionable speech and participation in, shall we say, “less than pleasant” subreddits.
It was built as a response to the question: How do I spot bigots and avoid them? – a thought you probably have most days if (like most of our team) you are a woman, or person of color, or queer, or nerdy, or portly, and so on.
For example: let’s say you’re considering applying for a job. You probably want to check that your potential new boss isn’t saying creepy shit on the Internet. Useful, right?
Initially, we explored the idea of scanning Twitter, looking at the social graph, keywords, and doing some basic similarity analysis based on tweets and accounts we felt were hateful and bigoted – the kind of people you’d want to avoid unless you’re exactly like them. However, it turns out Twitter’s API is overly restrictive so we pivoted to Reddit.
Hallelujah! Well, it turned out that not only is Reddit’s API quite full-featured, but Reddit itself a goldmine of easily-rattled white supremacist wanna-be overlords whose only contribution to the world is flecks of spittle and hobbyist-level word clouds of routinely-misspelled hate speech.
Type a Reddit username into the search box. If the comments of that user are already in the database, you will be redirected straight to their report. If not, it will prompt you to connect your Reddit account to the site so we can access the Reddit API and scan the history of the target user.
Our system was pre-programmed with a list of “bad” subreddits, areas where the community typically hates on a certain kind of person. For example, WhitesWinFights is a white supremacist subreddit. Some of those choices were controversial, but 99.9% of them were fairly cut-and-dried.
We also made lists of hate speech terms. Again, few surprises here – if someone’s saying “feminazi” a lot on the Internet and you’re a woman, you probably wouldn’t want to be around them.
Once we had all the comments from a user, we generated a report containing:
So that’s it! Fairly straightforward, and with lots of caching and whatnot we were able to withstand all the traffic that we were getting from various corners of the Internet, mostly articles on Engadget, Vice, fastcodesign, etc.
We don’t sell separately Reddit upvotes as it doesn’t make sense. We offer Premium services to reach Reddit’s front page.