AI & data leakage: the corporate institutionalist's nightmare
Corporations want to control how their employees use AI tech. At the same time they need to rapidly reorganize in order to keep up with their competitors using the technology.
An interesting observation about artificial intelligence is that, as it becomes increasingly powerful, it devolves control away from corporate institutionalists. Let’s unpack that statement a bit: a lot of corporate employees are now using generative AI tools like ChatGPT, sometimes surreptitiously, in order to be more productive. When they do so surreptitiously, these employees assert control over their work in a way that is unfamiliar to, and uncomfortable for, many employers.
This kind of bottoms-up strategy has been used before, particularly by Stripe, in its early days. However, what’s different about tools like ChatGPT is that they’re useful for all employees, not just technical people looking to rapidly build out payments capabilities. Add to this that the best uses of ChatGPT and related tech are not currently well-understood, and, well, you have the stuff of a corporate institutionalist’s nightmare. The corporate institutionalist acts to preserve corporate, institutional prerogatives. AI technology devolves control away from these institutionalists.
Ethan Mollick has a long post about how corporations and other large organizations are going to run into trouble when they start to deal with the implications of this technology:
Each new wave of technology ushered in a new wave of organizational innovation. Henry Ford took advantage of advances in mechanical clocks and standardized parts to introduce assembly lines. Agile development was created in 2001, taking advantage of new ways of working with software and communicating via the internet, in order to introduce a new method for building products.
All of these methods are built on human capabilities and limitations. That is why they have persisted so long—we still have organizational charts and assembly lines. Human attention remains finite, our emotions are still important, and workers still need bathroom breaks. The technology changes, but workers and managers are just people, and the only way to add more intelligence to a project was to add people or make them work more efficiently.
But this is no longer true. Anyone can add intelligence, of a sort, to a project by including an AI. And every evidence is that they are already doing so, they just aren’t telling their bosses about it: a new survey found that over half of people using AI at work are doing so without approval, and 64% have passed off AI work as their own.
This sort of shadow AI use is possible as LLMs are uniquely suited to handling organizational roles — they work at a human scale. They can read documents and write emails and adapt to context and assist with projects without requiring users to have specialized training or complex custom-built software. While large-scale corporate installations of LLMs may add some advantages, like integration with a company’s data (though I wonder how much value this adds), anyone with access to GPT-4 can just start having AI do work for them. And they are clearly doing just that.
Consider one of ChatGPT’s most powerful features. You can upload any kind of file to ChatGPT and ask it questions about the document. No longer do you have to wade through a long PDF to extract relevant information. If you’ve ever had to deal with a hundred-page contract because you needed to extract financial terms from it, in order to know how to bill a client, you’ve encountered this issue. No longer do you have to wade through thousands of rows of an Excel spreadsheet to find whatever data you are looking for. Now all you need to do is upload the file to ChatGPT, and provide it a natural language prompt. Tell me what data this file contains, and propose some analyses to run on it. This is extraordinarily powerful technology for the average corporate employee.
But, to the corporate institutionalist mindset, this presents a number of risks. The main risk this poses is that proprietary and confidential data is leaked to OpenAI. And, to the corporate institutionalist, this is a non-starter. This is why a number of companies have banned ChatGPT. This risk is referred to as ‘data leakage,’ and I want to focus a bit on what this is, and why corporate institutionalists are so concerned about it.
Data leakage in the context of files uploaded to ChatGPT refers to the unintended exposure or transfer of sensitive information through the files that users upload for processing. This can occur in various ways:
Inherent File Content: The file contains sensitive or personal information (like names, addresses, financial details, etc.)
Metadata Leakage: Files often contain metadata—information about the file itself, like the author’s name, the computer used to create the file, location data in images, etc. This metadata might be overlooked by the user but could be accessed or processed by the AI, leading to unintentonal data exposure.
Indirect Leakage via Model Learning: In some AI systems, the data provided during interactions (including file uploads) could be used to further train and refine the AI model. This might lead to situaations where the model indirectly learns and later exposes sensitive patterns or information.
Given all of this, how can data leakage be prevented? Here are two tips:
Users should ensure that no sensitive information is included in files they upload to tools lik ChatGPT.
Users should understand how their data will be used and stored, and they should consent to these uses.
While these two points make theoretical sense, the corporate institutionalists to whom I repeatedly refer in this post will have a much different view. Their view is that they can’t trust their employees to be knowledgeable about these issues, and to the extent that employees are knowledgeable about these issues, the corporate institutionalists don’t trust their employees to behave responsibly. In other words, the corporate institutionalist thinks it is inevitable that private and confidential data will be leaked, inadevertently or not, by her employees.
Thus, the blanket bans on ChatGPT.
The more fundamental problem is, as Ethan Mollick intimates in the post to which I link above, that this technology is here, it is rapidly becoming much more powerful, and no corporation will be able to ignore it. Those corporations which contend with the uncertainty inherent in this new technology will fare well. Those corporations which submit to the centralized and institutionalist imperative will likely fare poorly, as competitors rapidly eclipse them, and employees leave for greener pastures.
Here are some relevant news articles which provide more information about this issue.
South China Morning Post: This article discusses how companies using generative AI tools like ChatGPT risk exposing confidential customer information and trade secrets. It emphasizes the potential vulnerability of corporate secrets in the face of rapidly advancing AI technologies.
Bloomberg Law: Samsung Electronics Co. banned the use of tools like ChatGPT by its employees after discovering that staff uploaded sensitive code. This is an example of a direct corporate repsonse to the risk of data leakage.