Why ChatGPT's browsing capabilities are limited
We can conclude somethings about how OpenAI is thinking about alignment/safety
I asked ChatGPT about its1 new browsing capabilities. I did this because its browsing capabilities are obviously limited. I wanted to know what it could and could not do with its browsing plugin, and further, why its limitations were set as they are. This information is useful, because it gives us a view into (1) how ChatGPT explains the constraints under which it must operate and (2) how OpenAI thinks about those constraints. It is, in other words, a window into OpenAI’s views into AI alignment/safety.
Here’s how it explains itself:
As an AI developed by OpenAI, I have access to a simplified web browsing tool that can perform several key tasks:
Search: I can issue a query to a search engine. This means I can find the most up-to-date information on the internet, including information that was published after my last training cut-off in September 2021.
Click: I can open a webpage from the search results. This allows me to read the text of webpages, but note that I can only interpret and interact with plain text. I can't understand or interact with images, videos, or interactive elements (like forms or dropdown menus) on webpages.
Quote: If I find useful information on a webpage, I can store a text span from it. This allows me to refer back to the information later and provide a citation for where I found it.
Back: I can go back to the previous page, similar to how you might use the back button in a browser.
Scroll: I can scroll up or down on an open webpage, which helps me read more of the text that isn't visible initially.
Open URL: If provided with a specific URL, I can open it directly.
These capabilities allow me to find and provide more recent information than what I was trained on. However, they don't allow me to perform more complex interactions with the web, such as submitting forms, interacting with complex UI elements, viewing or understanding images or videos, or executing JavaScript on webpages.
Once you get past the insipid “As an AI developed by OpenAI…” preamble, you learn some useful information about how OpenAI thinks of guardrails. Specifically, ChatGPT’s browing plugin appears to be a stunted, text-only, browser which permits only certain web browsing activities. Its contraints are fairly straightforward: it can only deal with text, it can’t fill out forms, it can’t interact with complex UIs, etc.
I then asked ChatGPT why it had these constraints, and its response is interesting:
There are several reasons why I, as an AI model, can't perform more complex interactions such as submitting forms, interacting with complex UI elements, or executing JavaScript on webpages.
Technological Limitations: The web browsing tool I use is text-based and doesn't interpret or interact with the visual, interactive elements of web pages. It's designed to pull in and process text, not to handle the wide array of other data types and interactive elements found on the web.
Security and Privacy: Allowing an AI to submit forms or interact with websites in more complex ways could pose significant security and privacy risks. For instance, an AI shouldn't be able to create accounts, make purchases, or submit personal data on behalf of users. This would not only risk violating users' privacy, but could also potentially be misused for activities like spamming or even fraud.
AI Understanding and Context: AI models like me interpret and generate text based on patterns learned during training, but we don't have the same kind of understanding or context that a human would have. Handling complex interactions often requires a deeper understanding of context and an ability to make judgments that AI currently lacks. For example, filling out a form often requires understanding the specific context of the form and making decisions based on that context.
Adherence to Web Standards and Policies: Web services often have terms of service that restrict automated interactions, including scraping and form submission. By limiting the browsing capabilities of the AI, OpenAI ensures that its use is compliant with such terms and respectful of the services and resources being accessed.
Control and Predictability: Limiting the browsing capabilities of the AI helps to ensure that its behavior is predictable and under control. If an AI could freely interact with web services, it would be much harder to anticipate and manage its behavior. This is especially important given the potential for misuse or unintended consequences.
These are some of the key reasons why the AI's web browsing capabilities are designed to be relatively simple and focused on retrieving and processing text.
You can read more about OpenAI’s thoughts on alignment here. OpenAI seems to have concluded that stunting ChatGPT’s browsing capabilities conforms to alignment principles. In practice this makes some sense: a malicious actor could design a prompt which exploits flaws in a web site’s security, for example. Why take the risk of a malicious actor using an AI tool which we don’t fully understand, when you can eliminate that risk by constraining the set of activities which the browser can do? In some sense this is a blunt and inelegant tool, but sometimes a hammer is the most effective tool available.
I can’t (yet) justify personifying it by referring to it as ‘he’ or ‘she’, as some are wont to do.