Can ChatGPT kill paperwork? Sort of...
Let's explore using ChatGPT to understand complex documents
Introduction
A potentially powerful use case for ChatGPT is to upload complex PDFs to it, and ask the AI questions about the document. The thinking is something like this: uploading a PDF to ChatGPT and asking it questions about the document is akin to, but cheaper than, asking your bright employee to analyze the document. What could go wrong?
Well, a lot. Though the most recent version of GPT-4 is less prone to hallucinations than previous versions, it still occasionally hallucinates. It seems especially susceptible to hallucinations when parsing long PDFs.
So is there a reliable way to use ChatGPT to parse PDFs, while keeping in mind that it has a tendency to hallucinate? To some extent, the answer to this question is ‘yes’ but it requires that you learn a particular style of working with the AI in order to ensure the best performance from it. And, after reading through this post, you might well conclude that this isn’t (yet?) a good use case for ChatGPT.
Ideally, you would be able to ask ChatGPT any open-ended question about the PDF, and it could respond reliably and without hallucination. This is, after all, what you would expect of any employee tasked with reviewing a document.
So let’s consider the situation. You have a long PDF, you have ChatGPT, and you want to marry the two. What are the best practices? Let’s ask ChatGPT: Assume I upload a PDF to you. What steps can you and I take to minimize any hallucinations that arise from you answering questions that I have about the PDF document?
Its answer follows after the section break. After the answer, the rest of this post will review some of the suggestings by testing them against an actual PDF. Hopefully this will provide you with sufficient information to determine whether this use case is presently useful to you, or whether you would be better off waiting for an improved version of ChatGPT that is less subject to hallucinations.
There are a two important considerations to keep in mind while reading this post:
Non-determinism: ChatGPT is, like all large language models, non-deterministic. This means, essentially, that if you try to replicate what I do in this post, you won’t get the same response. ChatGPT may offer you a substantively similar response, or it may not.
Infrastructure-related limitations: ChatGPT frequently has very heavy loads placed on it due to scaling issues that OpenAI seems to have trouble dealing with. So you may find that getting a response from it is either slow, or that it fails entirely. This issue happens more frequently when you ask it to parse a PDF, than with other uses. I believe that this particular issue has less to do with limitations of the underlying LLM technology, and more to do with limitations of OpenAI’s infrastructure. This suggests that if OpenAI is able to get a handle on its scaling issues, this issue will go away.
ChatGPT’s response to my prompt
To minimize the risk of hallucinations or inaccuracies when you upload a PDF and ask questions about its contents, you and I can take several steps. Hallucinations in this context refer to instances where the language model (like ChatGPT) generates incorrect or misleading information that is not supported by the content of the PDF. Here are steps to mitigate this risk:
Provide Specific References: When asking questions, refer to specific sections, pages, or paragraphs in the PDF. This helps me focus my analysis on the relevant parts of the document and reduces the chance of drawing on unrelated or incorrect information.
Use Direct Quotes: If possible, include direct quotes from the PDF in your questions. This gives me a clear context and helps ensure that the answers are directly related to the content you’re referring to.
Ask for Direct Citations: Request that I provide the page number or section where the answer is found in the PDF. This allows you to cross-reference my responses with the actual content.
Clarify Ambiguities: If any part of the PDF is ambiguous or open to interpretation, please highlight these ambiguities in your questions. This helps in providing more accurate and context-aware responses.
Sequential Reading: If the content of the PDF needs to be understood sequentially (like a report or a narrative), ask questions in the order of the content. This approach maintains the context throughout the interaction.
Iterative Clarification: If an answer seems off or not fully accurate, feel free to ask follow-up questions for clarification. This iterative process helps in refining the responses and correcting any misunderstandings.
Limit the Scope: Keep the questions tightly related to the content of the PDF. Avoid asking for opinions or interpretations that go beyond what is explicitly stated in the document.
Validate Against the PDF: After receiving an answer, it’s a good practice to verify it against the PDF. This not only helps in catching any potential errors but also in understanding how the model is interpreting the document.
Be Cautious with Complex Interpretations: If your questions involve complex interpretation or analysis of the PDF’s content (like inferring the author’s intentions or predicting implications), be aware that these tasks are more prone to inaccuracies.
Report Inconsistencies: If you notice any inconsistencies or errors in the responses, bring them to my attention helps in adjusting the approach for future questions.
By following these steps, we can work together to ensure that the information provided is as accurate and reliable as possible, given the constraints and capabilities of the language model.
Reviewing ChatGPT’s suggestions
So how well does all of this work? Let’s test some of these bullet points, and see what kind of results we get. The PDF that we will use is Apple’s most recent annual report. You can find the PDF here.
Provide Specific References
My prompt: Refer to the section in this document titled “Item 7A. Quantitative and Qualitative Disclosures About Market Risk”. Summarize this section for me.
Its response: The “Quantitative and Qualitative Disclosures About Market Risk” section in the document primarily addresses the company’s exposure and management strategies related to interest rate risk and foreign exchange rate risk.
Interest Rate Risk: The company is exposed to fluctuations in U.S. interest rates, impacting its investment portfolio and term debt. An increase in interest rates can negatively affect teh fair value of the investment portfolio and increase the interest expense of the term debt. To mitigate this risk, the company may employ derivative instruments, balance interest rate-sensitive assets and liabilities, or control the duration of its investment and debt portfolios. A hypothetical increase of 100 basis points in interest rates could result in a decline in fair value of the investment portfolio ($3.089 billion in 2023 and $4.022 billion in 2022) and an increase in annual interest expense ($194 million in 2023 and $201 million in 2022).
Foreign Exchange Rate Risk: The company, as a net receiver of currencies other than the U.S. dollar, is susceptible to foreign exchange rate risks. Strengthening of the U.S. dollar could negatively impact the company’s net sales and gross margins when expressed in U.S. dollars, and affect the fair values of certain assets and liabilities. To manage this risk, the company might use derivative instruments, offset exposures, or adjust local currency pricing of its products and services. It is noted, however, that the company might choose not to hedge certain foreign currency exposures due to various reasons, including accounting considerations or prohibitive costs. Using a value-at-risk (VAR) model, which includes a Monte Carlo simulation, the company estimated a maximum one-day loss in fair value of its foreign currency derivative positions to be $669 million in 2023 and $1.0 billion in 2022, with a 95% confidence level. It’s important to note that changes in the company’s underlying foreign currency exposures generally offset changes in the fair values of its foreign currency derivatives.
My response to this output: This is, while comprehensive, not a summary of the text. Rather it is a rephrasing of the text. ChatGPT often does not appear to have any sense of what is meant by ‘summarize’. Rather it dutifully rephrases a text in an equally verbose manner. While this can sometimes be useful, it’s not what I asked for.
Use Direct Quotes
My prompt: Find this quote in the document: “The Company is focused on expanding its market opportunities related to smartphones, personal computers, tablets, wearables and accessories, and services.” Tell me a bit about the section in which it is located.
Its response: The quote "The Company is focused on expanding its market opportunities related to smartphones, personal computers, tablets, wearables and accessories, and services" is found in a section of the document that discusses the company's competitive strategies and the factors that are crucial for its success in the market. This section highlights the company's commitment to continuous innovation and timely introduction of new products, services, and technologies. It underscores the importance of designing and developing comprehensive solutions, including hardware, operating systems, software applications, and related services. The section also outlines the key competitive factors for the company, such as pricing, product and service features, quality and reliability, design innovation, a robust ecosystem of third-party software and accessories, marketing and distribution capabilities, and a strong reputation for service and support. This focus on expansion in various technology segments underlines the company's strategy to maintain and enhance its market position.
My response: This is a comprehensive response. It’s overly verbose, as was the case with the previous test. However, we avoid hallucinations with this kind of structured inquiry. It’s not perfect, but it’s certainly impressive.
Ask for Direct Citations
My prompt: Tell me where in this document I can find Management’s Discussion and Analysis of Financial Condition. Provide a page number citation for me.
Its response: Management’s Discussion and Analysis of Financial Condition and Results of Operations can be found starting on page 16 of the document.
My response: This is a hallucination. The section starts on page 20 of the PDF. See this screenshot:
I don’t know why this particular hallucination occurred. But, it shows an important, current limitation to using ChatGPT as a replacement for humans.
Clarify Ambiguities
My prompt: There is a section of this document titled “Evaluation of Disclosure Controls and Procedures.” I find this ambiguous. Can you clarify for me what the intent of this section is?
Its response: The "Evaluation of Disclosure Controls and Procedures" section in the document explains that the company's management, including its principal executive officer and principal financial officer, concluded that the company's disclosure controls and procedures were effective as of September 30, 2023. These controls and procedures are defined in Rules 13a-15(e) and 15d-15(e) under the Exchange Act. The purpose of these controls and procedures is to ensure that information required to be disclosed by the company in reports it files or submits under the Exchange Act is recorded, processed, summarized, and reported within the specified time periods of the SEC rules and forms. Furthermore, these controls and procedures are designed to accumulate and communicate information to the company’s management, including its principal executive officer and principal financial officer, to allow timely decisions regarding required disclosure. This section also acknowledges the inherent limitations of internal controls over financial reporting, which are designed to provide reasonable assurance regarding the reliability of financial reporting and the preparation of financial statements
My response: This is verbose, but its verbosity is reasonable, given that I asked for a clarification of the text. Overall, I think that the explanation offered is accurate and relevant.
Concluding thoughts
This post is getting long, so I want to stop here and offer some concluding thoughts. Overall, a highly-structured manner of interacting with ChatGPT seems to work well for understanding complex PDFs. But the ‘highly-structured’ part is key. A more open-ended method of just asking questions that come to mind won’t necessarily work as well. As with much of AI technology I suspect that over time, as LLMs become more powerful, you will be able to be much more flexible and open-ended with your interactions, and less structured.