How to Remove Yourself from OpenAI and Other AI Companies
May 23, 2023
Welcome to the Age of AI
According to Bill Gates, the age of AI has begun. Statistical models have mastered the rules of logic and linguistics to generate (mostly) factual human-sounding prompts and responses. Bill’s reaction to ChatGPT, the powerful chatbot built by OpenAI, says it all. "The most important advance in technology since the graphical user interface.” ChatGPT can be your tutor, your ghostwriter, your translator, your pair programmer, or your assistant. But will it be a privacy nightmare?
It seems effortless now, but its power comes from billions of bits of human-generated content engineers gathered, organized, and trained and tuned the system. Some data sources are harmless like encyclopedic Wikipedia pages, but many sources include copyrighted material, sensitive personal information, or biased opinions that may cause LLMs to produce harmful results. According to OpenAI, they don’t seek out personal data to train models, but a lot of public data includes personal data and they can’t filter it all out. So, if you notice an error, harmful material, or a sensitive breach in an LLM, you can and should take action.
Luckily, when I asked ChatGPT directly for personal contact information, it had been trained to respectfully decline saying, "Please note that personal contact information, such as email addresses or phone numbers, is considered private information and cannot be provided by me as an AI language model. It's important to respect privacy and use appropriate channels to reach out to individuals in a professional and respectful manner." But just because this safeguard was put in place by OpenAI doesn't mean future companies will play by the same rules.
How to Check for Your Information
Asking the LLM about yourself may seem like a good idea, but proceed with caution. By sharing your personal information like home address or birthday, you may unintentionally train the model to associate sensitive information with you for anyone to receive in a response. It's safe to start with general queries like "Who is [enter your name]" to see it provides accurate information about you.
Another way to check for your information is to email the company directly. This is called a Data Subject Access Request. For example, contacting OpenAI at [email protected] will start their process and will likely involve:
Verifying proof of jurisdiction like proof of address, residency, or citizenship
Verifying your ownership of an account (if applicable)
Explaining any issues or violations and providing supporting evidence
Keep in mind that your rights associated with data subject access requests are different depending on whether GDPR or CCPA privacy laws apply to you. Companies may have the right to continue using data collected from 3rd parties about you for verification or anti-fraud purposes.
How to Flag Content and Request Removal
All major LLM companies are required by law to provide individuals a place to report cyber crime, copyright infringement, and privacy violations. Kanary does not yet automate detection and removal of personal information from LLMs, but this guide intends to help you get it done until we’ve built in the capability. Here are helpful resources for contacting some major AI companies directly. If you’d like support with one not listed here, just let us know ([email protected]).
Summary: Uses user inputs to train system, will delete data from training sets, will require verification, provides contact for data subject access requests, provides ability to delete chat history.
Object to processing your data: https://share.hsforms.com/1UPy6xqxZSEqTrGDh4ywo_g4sk30
Summary: Uses user inputs to train system, provides ability to delete chat history, unclear if you can request or delete data from training sets.
Information Collected: https://bard.google.com/faq
Deleting your activity: https://support.google.com/bard/answer/13278892?hl=en
Summary: Uses user inputs to train system, will delete data from training sets, will require verification, provides ability to request data, can control account privacy settings
How your data is used (using Nvidia products): https://www.nvidia.com/en-us/about-nvidia/privacy-policy/
Exercise your right to erasure: email [email protected]
Summary: Uses user inputs to train system (focused on image generation), will delete data from training sets, will require verification, provides ability to request and delete data
Information they collect: https://docs.midjourney.com/docs/privacy-policy#:~:text=Midjourney
Email to request removal: [email protected]
You Decide How Powerful LLMs Will Be
According to The Verge, OpenAI’s legal problems are just beginning. The debate focuses on two major topics: the source of the data and the recommendations from the systems. As regulators grapple with this fast moving technology, you can get involved. Let regulators know that it should be illegal for private citizens' personal data to surface from these models and that the liability for such a breach should fall on the companies. Consider contributing to the latest request for information from the Consumer Financial Protection Bureau about data brokers.
If the activist route isn’t your style, keep these simple tips in mind for staying private and protected from LLMs and bad actors alike:
Clean up your personal info on public sites and search engines. This is where LLMs find trianing data. Kanary helps with this!
Avoid sharing confidential or sensitive information during conversations with LLMs as they may use data you provide to inform conversations with other users.
Report harmful or biased responses generated by LLMs to their owners.
If your requests are not respected, do not be afraid to escalate. Check out our removal research guide to learn how we escalate when sites are unresponsive to removal requests.
Feel free to reach out to our team or try Kanary for free.