You Belong to an Algorithm

p.c. NYMag

Disclaimer: I have, at best, passing knowledge of what is discussed in this post. If there are errors, please point them out and I will correct accordingly.

I’ve always loved listening to the radio. It’s a great way to discover new music, and it takes very little brain power: you don’t have to think about what plays next. But there’s also something special that the radio does once in a blue moon: it plays that song. You know, your favorite song. When that song comes on, you feel validated, as if someone else out there has good enough taste in music to get you, as if maybe they’ve walked some of the same path as you, as if they understand. But in an age when we can all walk around with almost all the music ever produced in our pockets, it can be really difficult to find that feeling. With so many songs, so much content, if we all listen to completely different subsets of music, how can anyone ever capture that special feeling of belonging?

The answer, as you may have guessed, is algorithms. It is generally impossible for humans to catalog, organize, let alone consume all of the data and content that is produced these days. Every minute, hundreds of hours of content is being uploaded to YouTube alone. In order to manage all of this stuff, the traditional engineering solution has been to find a way to automate these processes. Where computers are concerned (and in general, though typically not as explicitly), we use algorithms to do this.

“Algorithm” these days is somewhat of a buzzword, but in reality an algorithm is just a discrete set of steps for doing something. Recipes for cooking are algorithms: you prepare all of your ingredients (inputs), combine them in a specific order using specific techniques (the algorithm), and at the end of it you wind up with some sort of prepared dish (the output). When we abstract out the actual process for doing something, we can learn a lot about how most efficiently or effectively to do it, whatever it may be. There are very sophisticated techniques for studying and writing algorithms that are well beyond our scope here, but if you’ve ever wondered what some of us computer scientists do with our time, reasoning about algorithms ranks pretty high on the list.

Algorithms are rapidly coming to rule most of our daily lives. If you happened to catch any of Mark Zuckerberg’s testimony in front of Congress last month, you probably heard the term algorithm quite a bit. In a world with so much content companies have realized that not only are algorithms essential to making their services viable, but algorithms can be used to make their businesses profitable. Herein lies our problem: companies are attempting to maximize their profits by using algorithms to optimize the experience we the users have when using their products and services.

As we’ve recently seen with stories like Equifax and Cambridge Analytica, companies are willing to scoop up as much of your data as possible if they think it will make them money, either by selling it or by using it to convince you to use their service more. For companies like Facebook and Twitter, you are the product, and the more you use their service the better off they are. As such, companies like this are always looking for a way to grab your attention. More and more frequently, tuning into a sense of belonging is how they do it.

In short, your Twitter and Facebook feeds are engineered to make you feel like you belong, so you’ll keep using the service. This fabrication of a sense of belonging is often done by using a large swath of data about you, your friends, and currently trending topics. It can be very difficult to disentangle all the factors impacting your use of services that have your data, how those services use your data, and what can go wrong as a result.

In this post, my aim is to explain how you are represented as a digital facsimile of yourself, how companies acquire your data and facsimile, and how they use it to benefit themselves, often putting you at great risk. With this knowledge, hopefully you can have a better grasp on the current landscape of digital society, your role in it, and what you can do to better protect yourself in it.

Full disclosure, if you haven’t figured it out by now I have somewhat of a bleak outlook on all of this. It is important to remember that many of the services harangued below do provide a lot of value to a lot of people, myself included. My aim is not strictly to try to tear down these companies; my goal is to foster better understanding of how your data gets used and what you can do about it. With that being said, let’s talk about what “your data” is and why it matters.

Everything is a Number

By now I’m sure you’ve heard the digital cliche, “it’s all just 1s and 0s.” This is a core fact about how computers operate, but, in a much grander sense, it is a core fact about the world too. Everything is quantifiable. Everything can be assigned a number, including you and everything about you. When people talk about “your data,” typically what they mean is the amalgamation of numbers that represent you as a person. Not just your driver’s license number or your SSN — it goes much deeper than that. Companies build up a digital persona of you, a fingerprint that corresponds to only you and contains everything they could want to know about you.

Everything about you can be represented as a number. Height or age? Well, those are gimmes. Eye color? 1, 2, or 3. If blue, green, and brown don’t describe your eyes well enough, we can assign numbers in between 1, 2 or 3, or use a different representation altogether. Name? Mine is 77 97 116 116. Location? That’s what coordinates are for. Political preference? Ever wondered what these were for?

What’s scarier than just intrinsic properties being encoded in numbers is that less obvious things can also be represented this way. The way you move your computer mouse, the particular WiFi networks your cellphone tries to connect to, even the way you walk are all fairly uniquely identifying things about you, and things that can be easily collected from the device in your pocket. These numbers can be used to represent and identify you as well as any other attributes. Moreover, they are frequently used to study your behavior and to group you with others like you, as we shall see later on.

All the hands are in all the cookie jars

Figuring out which numbers are important, and whose numbers belong to whom, is big business. So much so that most of the sites you visit, including this one, go to great lengths to collect and associate as much data about you as they can. Obviously you willingly tell sites like Facebook a lot of information about you. But Facebook also knows a lot about you that you didn’t knowingly tell them. How do they do that?

To some degree, just visiting a website can reveal a lot about you. The kind of web browser you use, the operating system you’re using, what type of computer you have, even where you are in the world to some degree are all things that you almost have to tell any given website in order for it to work. The site needs to know what content you are capable of displaying (you know, those obnoxious flash player warnings and the like), and this information is very useful in making that determination. There are ways around this, of course, but for the average person, a fair amount of your data gets leaked every single time you click on a link or go to a new web page.

Unfortunately, that information is not particularly useful to, say, Amazon, if they’re trying to figure out what to sell you. This is where a lot of the much more invasive tracking comes into play. A primary vehicle sites use for tracking is cookies, which are essentially small pieces of data that sites store in your browser. Cookies are useful for many things: they prevent you from having to log in every time to visit a new page on a website, they keep track of what items you’ve put in your shopping cart, they allow you to authorize one website to access your account on another website (all those “Log in with Facebook” buttons you see), and more.

Many sites store cookies to let you authenticate yourself to third parties. Credit.

But because cookies are stored in your browser across interactions with many different websites, they also allow sites to track you. Because websites often load content from other places, e.g. advertisements, it’s not uncommon for third-party cookies to be set in your browser to enable many different sites to know who you are. With this information, they can display ads that you have a high likelihood of clicking on (this is why when you look for kitchen supplies on Amazon you see ads for kitchen supplies on Facebook). This has a side effect of telling advertisers all the sites you’ve visited, which enables them to build up a profile of you.

It’s not all just cookies though. There are many other ways that sites can track you. Tracking pixels are a popular choice: a tiny, practically invisible image that can indicate that you’ve opened an email or scrolled to a particular part of a webpage. Sites can even track you across devices, and apps frequently track your physical location. Facebook was recently found to scrape call and messaging data on Android phones. For brevity I’m not going to go too much in depth into this, but Me and My Shadow is a good resource for further reading. To find out how much your browser lets you be tracked, visit the EFF’s Panopticlick. There are many defenses against tracking, like Ghostery, uBlock, and Brave, but again that discussion is a bit off topic for our purposes.

The long and short of it is that companies have the ability to collect a lot of your data, data you give them knowingly and otherwise. Any given company can likely create a fine-grained profile for you, and it is frequently in their interests to do so.

But what’s the use of looking when you don’t know what they mean?

So everything about you is a number, and lots of companies have all your numbers. So what? They’ve got everyone else’s numbers too, right? As I mentioned in my opening, this is a lot of data. We, as a civilization, barely know how to process all this data, and we’re just getting a grip on how useful it can be.

The primary driver of the Internet at present is advertising, so it’s no surprise that companies like Facebook and Google are trying to convert your data into ad revenue, and retailers are trying to figure out what to sell you before you know you want to buy anything at all. They build up profiles, and then use statistical techniques to find interesting correlations between profiles. For example, Walmart knows to stock up on Pop-Tarts prior to hurricanes. Target can know you’re pregnant before you do. Companies can do all of this just because they’ve collected enough data from their customers to build powerful models of human behavior.

This is where a lot of the buzzwords you’ve heard typically come into play. The ways companies process data typically involve, at a very high level, some degree of artificial intelligence and machine learning. Processing all of this data into useful user profiles takes a lot of work, and usually relies on techniques from machine learning. Once you have a bunch of profiles of users made up of a bunch of different numbers corresponding to many attributes, you train algorithms to intelligently make predictions about how users with a particular profile will behave (this is the artificial intelligence part).

For instance, let’s say Amazon wants to figure out how to predict who is going to buy George Foreman grills. They can look at the user profiles of all the people who have bought grills in the past and find out what specific attributes these people have in common. They can also look at other purchases these people have made to find more similarities. Once they’ve done this, they can build a “proto-profile,” i.e. a profile of a fake user who would buy a George Foreman grill 100% of the time.

Once they have this profile, they can then compare against all other users who haven’t bought the grill yet. If some of these users have a very high similarity to the proto-profile, that indicates that they might soon buy a grill, and that they can likely be persuaded to buy a grill. Since Amazon wants to make as much money as possible, they can send these people ads for the grill to get their attention. They can also look at anyone who recently bought a grill and suggest related products that many other users have also purchased, again based on comparison with this proto-profile. For instance, maybe lots of people who buy the grill also buy bubble wrap, so Amazon can target grill-buying users with ads and recommendations for bubble wrap too.

This is cutting out a lot of the details and oversimplifying greatly, but hopefully you can see how using your data is hugely profitable. Companies are driven to maximize their profit, and naturally realized that the more data they collect, the more accurate predictions they can make and therefore the more sales they can make. This incentivizes them to collect as much data as possible, regardless of how useful it may seem at the time.

Cambridge Analytica is perhaps the most infamous example of how this profit-driven data collection and analysis can present serious security risks to you and me. It is not in Facebook’s best interest to protect our data; our data makes them all their money. It’s important to stress that Cambridge Analytica did nothing that strictly violated Facebook’s terms of service. In reality, they were just using Facebook as it is intended.

You Are Fake News

In order to deal with the loss of user trust in the fallout of the Cambridge Analytica story, Mark Zuckerberg went before Congress and pledged that Facebook is working hard on protecting its users from future data exfiltration. But make no mistake, selling your data is what Facebook does, so I am highly skeptical that we’ll see major reforms in that domain if Facebook is left to its own devices.

However, in order to keep users happy and increase the likelihood of Facebook continuing to provide profitable marketing services, Zuckerberg also indicated that there is another major issue at play: Fake News. During the 2016 election cycle in the United States, we saw a massive proliferation of fake news on sites like Facebook. This has called into question how Facebook decides to show what content to which users, the response to which is usually something like “algorithms.”

Because it’s Facebook’s business to make users happy so they’ll buy the products advertised on the site, the algorithms they use optimize for keeping users happy. Hence, when a fake story about how President Obama has ordered the U.S. Army to occupy the state of Texas and instate martial law gets massive amounts of attention from users with certain similar attributes in their profiles, Facebook’s algorithm propagates the story far and wide amongst other people with those same attributes in their profiles.

But it’s not just news proliferation: Facebook recommends friends and pages to you based on its profile of you as well. People are happiest when they’re surrounded by friends and family who share similar viewpoints as them, and happy users click on ads. So Facebook is optimized to create filter bubbles to a great degree. Facebook isn’t the only culprit either; Apple, Google, Twitter, Microsoft, Medium, and many others all optimize to show you what they think you want to see.

This has the apparent effect of causing polarization. The world looks more purple to people who like purple and more green to people who like green because the algorithms, processing the data for their users to turn a profit, realize that putting purple people with other purple people and green people with other green people makes the company more money.

Voices for the Voiceless

But I think it goes beyond polarization. Technology, and specifically the Internet and social media, have empowered many more people than ever before to develop and express complex opinions. We each have our own platform to voice our opinions, and we hear so many more opinions from others at the same time than in years past. Because these platforms are incentivized to make us happy we most frequently hear voices that mostly align with our own.

Whereas before, fringe opinions were almost never represented and rarely heard, social media platforms can create the illusion that they are commonly held beliefs. Worse, entities can manipulate these platforms to make it appear as if these fringe beliefs are more common than they are, sowing discord amongst targeted groups. Because companies are so willing to collect our data and use it in ways that make them profit, we are all at risk of being manipulated by and for the ways we think.

To be sure, this is a symptom of the problem, not the problem itself. The Internet is not constructed to prevent companies from scooping up all our data and selling it, and instead incentivizes them to do whatever they can to make us want to give up our data. To fix this problem would require a fundamental change to the business model of the Internet, a change that is unlikely to come solely from the corporations and other entities that make the Internet what it is. Europe’s new privacy laws are about to go into effect, and certainly provide a great incentive to force companies to rethink their practices. However, until other major players, like the U.S., step up and do the same, it’s unlikely we’ll see much change in the way business is done.

We can each do our best to protect ourselves and our data, but at the end of the day Facebook and Twitter and Google are useful. Tech companies will continue to mine our data and continue to create a false sense of belonging to encourage you to give up more of your data. This makes it really important to remember: whatever people or groups appear on it are not actual people or groups of people, merely projections of people into the digital world. Belonging to one of these groups is not truly belonging. On the Internet, you don’t belong to a group of complex, reasonable people. On the Internet, you belong to an algorithm.

Thanks to Ian and Letty for helping edit this post.

--

--

--

Not a professional driver on a closed course.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Best USB-C adapters

Which Virtual Reality System Is the Best Buy & Why

The Evolution of Software and the Dawn of the Intelligent Agent

How edge analytics work for all Industries?

TikTok Gets Reprieve on Trump’s Demand for a Ban on the App

A Virtual Reality Home Gym

A man using optional equipment, controllers, and a virtual reality head-mounted display (HMD) for training.

Apple Readies MacBook Pro, MacBook Air Revamps With Faster Chips

How Brands Will Survive in a Post-Screen World

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Matt Bernhard

Matt Bernhard

Not a professional driver on a closed course.

More from Medium

Announcing a Partnership between Variant Bio and the Genetic Biobank of the Faroe…

Real fellowship — more Slavic, less Varangian

Recycled Fuel: A Rebuttal to Harper’s Spent Fuel

FORECAST 2022: What to expect and what to do about it