James Gallagher

why owning your data is so important

Published on June 12, 2020 in privacy

I have been taking control over my data through this blog. Every post you see here today is published on infrastructure I have built myself. This blog is by no means perfect, and depending on your taste my site may not appeal to you as much as, say, a Medium publication. To me, this site serves my needs, and that's all I care about.

One of the core facets of the Indie Web that I am really excited about is the idea of taking control over your own data. You should be able to own your own data. This seems like an obvious fact -- of course you should own what data you create -- but when you think about the role of centralized platforms on the internet, it becomes clear that we actually don't own as much of our data as we think.

Owning your data gives you complete control over it. You can decide whether to delete it, modify it, or create more of it. Every decision about your data is up to you. Let's contrast this with the experience you would have using a platform like Facebook or Twitter. You may be able to modify a post on Facebook, but that doesn't mean that the platform has not got a record of your old post somewhere. You can't even edit a Tweet. What you publish on Twitter stays in its original form unless you delete it. Even then, who is to say that Twitter doesn't somehow keep a record of these Tweets? Maybe they don't, but not all platforms are as trustworthy, if that's even the right word to use, as Twitter.

I don't have to ask these types of questions when I am thinking about the data on my site, because I control it. I have written every line of code for this website and so I know how it works and interacts with my data. I know what web services it interacts with and what it should not do.

One of the biggest problems with centralized platforms is their inherently closed nature. Platforms like Twitter and Facebook, in order to grow, have had to develop their own secret sauces. They have done this through developing algorithms that grab our attention. From a business perspective, this was the right move for them; more targeted advertising means more revenue. For a user, this is not optimal. You submit your data to a platform, and somehow they are able to profit from it.

Yesterday, a somewhat crazy idea came to mind: what if I was to willingly sell my data to these platforms? At least that way I would get something out of it. Unfortunately, I can't imagine Twitter being receptive to me calling them up and asking if they would like to buy more data on me. Maybe there are a few companies out there that would be interested, who knows. I can't believe how much money centralized platforms have been able to make based on data that they are not creating. They are storing it, but that doesn't justify the immense profit they can make and the ways in which they make that money.

Owning your data means that you are not beholden to some external platform for the integrity of your data. If you host your blog yourself, you know that you are not going to wake up one day and have all of your posts ranked in a different order. You know that everything will appear in exactly the way you want it to, and you can change your website at any time. That's not the case with centralized platforms. Who is to say that Twitter will not massively change their rankings algorithms tomorrow? Does that sound too unrealistic? Consider how often Google changes their algorithms. I would have to assume that Twitter would do so at quite a frequent rate as well.

Another reason why owning your own data is important is because a platform you rely on today could close down tomorrow. I can't think of a lot of examples of this because I'm still a relatively young citizen of the modern web, but I do know that many platforms have been acquired and then shut down since they were acquired. Or, in cases like Tiny Letter, the platform has been acquired and essentially abandoned by its parent company, so it doesn't receive many updates.

Imagine if Twitter shut down tomorrow. That's unlikely, but hear me out. What would you do? All of the Tweets you wrote may now be gone, forever. "Sure, James, but the Internet Archive will have captured them." Will they? Did you create a snapshot of your profile on the Internet Archive? Even if you did, it will not account for all of your Tweets; it will show a select few which displayed at the time the snapshot was taken. Your data would be gone.

That's not something I really have to worry about with my blog. My blog could go down due to some systems problem, but I could just boot it back up again. I don't have to worry that I am going to lose my data if a company goes out of business. If my domain provider goes out of business, it could be a problem, but I do still own this domain. It's mine. Although it's unlikely that many centralized platforms shut down, it could happen. The internet has only been around for a few decades. Think about how often companies shut down; websites will shut down at a similar rate, I assume.

In addition, owning your data gives you full control over how your data is formatted. One person sent me an email a few months ago asking me where all my content was from my old blog. I did get it up and running on this blog, but it wasn't structured in the way that I wanted it to be. Squarespace only allowed me to export to Wordpress, and the format in which my export archives arrived was suboptimal. I struggled to convert my data to the format I needed for my blog. As a result, I decided to start afresh with this blog and write completely new content.

On this blog, all my writing is formatted using markdown. Markdown is standard, and so I know that it will be around for a while. If there's one thing I know for certain, it's that the internet is going to change. If I don't have control over my data, I will not have control over how it evolves as the internet changes. I do have control over this blog, so I know that even as standards change, I'll be able to adopt my data. My writing sits in raw markdown files which are available on the open-source repository associated with this blog. That makes me feel a bit better about the long-term preservation of my data. It's in a format I understand and know I can use in different ways.

The modern web has become more centralized than I would like, and I am happy to see more movements emerge to take back control of data. I don't quire understand the technology behind the blockchain, but what I do know is that it supports decentralized ownership of data. Blockstack and other platforms may not have the widest adoption in the world, but every notable technology starts small. I think they've still got a long way to go in figuring out how they should articulate their offerings to the average internet user -- and to convince people that the inconvenience of switching platforms is worth it -- but their work is a step in the right direction.

On my ideal internet, I would have complete control over all my data. At the press of a button, I should be able to revoke access to my data from any platform to which I have submitted it. This means that companies should not store it in a centralized database. It should be owned by me, preferably on my own infrastructure. I wouldn't go so far as to say that I want a server in my house that stores my data, but I would want control over the infrastructure on which the data is hosted. I don't want it to be the case that when I submit data to a company I have no recourse.

I know that legislation like the California privacy law that was recently passed is, again, a step in the right direction. It's not enough, however. That law only applies to California residents, and I am going to assume that tech companies have spent months figuring out "hacks" to ensure that they comply with the absolute minimum requirements imposed by the law. We need more than legislation. We need companies that are willing to respect the ownership of their users' data.

What about profits? That's a good question. On my ideal internet, companies would not profit by running my data through algorithms or selling it to other companies or advertisers. They would either make a profit from users directly by charging them, or they would not make a profit at all. We're seeing that the open-source community has been able to do quite well with patronage, sponsorships, grants, and other revenue streams. While the status quo is not perfect, the forms of funding being explored in open-source indicate that advertising is definitely not the only option companies can pursue if they want or need to make money from their creations.

I am proud to say that I own the data on my website. I own the metadata, too. I know that no platform shutting down tomorrow is going to destroy this blog. I know that no algorithm change will affect how you see content on this site. I have complete control over the user experience. It feels great to know that if I am unsatisfied with something on this blog, I can change it.

What concerns me is that many people out there just don't know what is possible in terms of data ownership. People are not aware of the extent to which big companies are using their data. If I am being honest, I have no idea how many of the big companies are using my data. They are like a black box, and we're probably never going to know exactly how our data is used. Companies hide behind privacy policies and terms of service which have been carefully drafted by lawyers to indemnify the platforms from legal liability. I wish that I knew sooner about initiatives like owning your own domain name and building your personal website.

There are a few things I am doing to take control of my data. First, I own my own domain name. jamesg.app is mine (wipe your feet when you come in, and sign the guestbook on your way out). Second, this blog is the primary platform on which I post content. I do use platforms like Hacker News and such, but I don't rely on them to share my views about the world. That's what this blog is for; it is my home on the internet. If I used other platforms enough, I could set up programs that allowed me to syndicate my content -- this is called POSSE in Indie Web terms -- but right now I am satisfied with the way I am treating my data. I could go farther. I'm even considering moving to ProtonMail for my personal email. I'm still trying to figure out where I stand on the spectrum of data ownership and convenience.

You can own your own data. Set up a domain name. Build your own personal website. Syndicate your content to other platforms, but publish it on your own site first. Do whatever it takes to take ownership over your data. It is yours, after all. The pictures I upload online are mine. The text I write is mine. Why should I let the integrity of that data be owned by a big, centralized silo who I don't know much about? The modern web has brought about a lot of convenience, but at the cost of data ownership. It's time to start rethinking how we submit data to big companies.

Webmentions (0)

There are no webmentions on this post.
Your webmention must begin with http[s]://
Do you have any feedback on this blog post? Send me an email.
Do you want to hear more from me? Subscribe to my weekly Coffee with James newsletter.
Made by @jamesg_oca. Code on GitHub.
←  An IndieWeb Webring πŸ•ΈπŸ’ Β β†’