Asserting My Right to be Forgotten (on Reddit)

Abstract

Reddit provides no easy way by which a person might delete their posting or comment history. You can delete your account, but the contents of the account remain.

One imagines that this is by design, as Reddit’s business model is predicated on both an influx of new content, and a historical record of past content.

I find this to be disagreeable, as sometimes I just want to wipe my account and start over again.  So I wrote a bit of Python that does exactly that.

The Solution

In a nutshell, here’s how the script works:

  1. It borrows the cookie from an extant login session, stored in the browser. In this case I’m using Firefox.
  2. It retrieves the account history of the user from Reddit. This isn’t an API that returns nicely formatted JSON, but is instead an ugly HTML page.
  3. It formats the HTML and parses it for comment or post IDs.  Comment IDs start with t1, post IDs start with t3.
  4. Using the list of conent IDs, it submits each of those IDs in turn to the Shreddit API endpoint. The API endpoint for each is the same, but the “operation” parameter changes.
  5. So long as there are items in the element_ids list, it loops and destroys.
import requests
from bs4 import BeautifulSoup
import browser_cookie3

# Fetch cookies from Firefox
firefox_cookies = browser_cookie3.firefox()

# Define the domain for which you need cookies
domain = ".reddit.com"

# Define the target reddit username
reddit_username = "<>"

reddit_session_cookie = ""
csrf_token_cookie = ""

for cookie in firefox_cookies:
    if cookie.domain.__contains__("reddit"):
        if cookie.name == "reddit_session":
            reddit_session_cookie = cookie.value
    if cookie.name == "csrf_token":
        csrf_token_cookie = cookie.value

exp_cookie = f'reddit_session="{reddit_session_cookie}"; csrf_token={csrf_token_cookie};'

url = f'https://www.reddit.com/svc/shreddit/profiles/profile_overview-more-posts/new/?t=ALL&name={reddit_username}&feedLength=10000'
headers = {
    'Accept': 'text/vnd.reddit.hybrid+html, text/html;q=0.9',
    'Accept-Language': 'en-US,en;q=0.5',
    'Accept-Encoding': 'gzip, deflate, br, zstd',
    'Cookie': exp_cookie,
}

# Send an HTTP GET request, to return the list of posts and comments
response = requests.get(url, headers=headers)

# Parse the HTML response with BeautifulSoup
soup = BeautifulSoup(response.text, 'html.parser')

# Find all elements with the thing-id attribute
elements = soup.find_all(attrs={'thing-id': True})

if elements:
    comment_ids = [element['thing-id'] for element in elements]

    while(comment_ids):
        # Create a list of content IDs
        print(f"{len(comment_ids)}")

        comment_ids = set(comment_ids)
        comment_ids = list(comment_ids)
        # Select only unique content ID values by converting first to a set, then back to a list

        url = 'https://www.reddit.com/svc/shreddit/graphql'
        headers = {
            'Accept': 'application/json',
            'Accept-Language': 'en-US,en;q=0.5',

            'Content-Type': 'application/json',
            'Cookie': exp_cookie,
        }

        # Content that starts with t1 is a comment
        # Content that starts with t3 is a post
        # The JSON for each is different, so they have to be handled differently
        for comment in comment_ids:
            if comment.__contains__("t1"):

                data = {
                    "operation": "DeleteComment",
                    "variables": {
                        "input": {
                            "commentId": f"{comment}"
                        }
                    },
                    "csrf_token": csrf_token_cookie
                }
                comment_ids.remove(comment)

            elif comment.__contains__("t3"):
                data = {
                    "operation": "DeletePost",
                    "variables": {
                        "input": {
                            "postId": f"{comment}"
                        }
                    },
                    "csrf_token": csrf_token_cookie
                }
                comment_ids.remove(comment)

            response = requests.post(url, headers=headers, json=data)

            try:
                # Print the response, catch any errors resulting from the response not being JSON-formatted
                print(f"Comment id: {comment} - {response.json()}")
            except Exception as e:
                print(f"Error processing comment '{comment} - {e}")
else:
    print("Reached the end of the content")

 

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *