Artificial Intelligence News

Use the underlying model of generative AI to summarize and answer questions using your own data


Large language models (LLM) can be used to analyze complex documents and provide summaries and answers to questions. Post Perfection of Domain Adaptation of the Foundation Model in Amazon SageMaker JumpStart on Finance data explains how to enhance your LLM using your own dataset. Once you have a solid LLM, you will want to expose it to business users for processing new documents, which can be hundreds of pages long. In this post, we demonstrate how to build a real-time user interface to enable business users to process PDF documents of arbitrary length. After the file has been processed, you can summarize the document or ask questions about the content. The sample solution described in this post is available at GitHub.

Work with financial documents

Financial reports such as quarterly earnings reports and annual reports to shareholders are often tens or hundreds of pages long. These documents contain a lot of boilerplate language such as disclaimers and legal language. If you want to extract key data points from one of these documents, you’ll need time and familiarity with boilerplate language so you can identify the interesting facts. And of course, you can’t ask an LLM question about a document it hasn’t seen.

LLMs used for summarizing have limits on the number of tokens (characters) passed into the model, and with a few exceptions, these are usually no more than a few thousand tokens. That usually gets in the way of being able to summarize longer documents.

Our solution handles documents that exceed LLM’s maximum token sequence length, and makes those documents available to LLM to respond to queries.

Solution overview

Our design has three important parts:

  • It has an interactive web application for business users to upload and process PDFs
  • It uses the langchain library to split large PDFs into more manageable chunks
  • It uses augmented retrieval generation techniques to allow users to ask questions about new data that LLM has never seen before

As shown in the following diagram, we are using a front end implemented with React JavaScript hosted on an Amazon Simple Storage Service (Amazon S3) bucket fronted by Amazon CloudFront. The front-end application allows users to upload PDF documents to Amazon S3. Once the upload is complete, you can trigger a text extraction task powered by Amazon Textract. As part of post-processing, the AWS Lambda function inserts special markers into the text that indicate page breaks. When that work is done, you can call an API that summarizes the text or answers questions about it.

Because some of these steps may take time, the architecture uses a decoupled asynchronous approach. For example, a call to summarize a document calls a Lambda function that posts messages to an Amazon Simple Queue Service (Amazon SQS) queue. Another Lambda function takes that message and starts an AWS Fargate Amazon Elastic Container Service (Amazon ECS) job. The Fargate task calls an Amazon SageMaker inference endpoint. We use Fargate tasks here because compressing very long PDFs may require more time and memory than available Lambda functions. When the summary is complete, the front-end application can fetch the results from an Amazon DynamoDB table.

For summary, we use the AI21 Compact model, one of the base models available through Amazon SageMaker JumpStart. While this model handles documents up to 10,000 words (roughly 40 pages), we use the langchain text separator to ensure that each summary call to the LLM is no longer than 10,000 words. For text generation, we use the Medium Cohere model, and we use GPT-J for embedding, both via JumpStart.

Summary processing

When working with larger documents, we need to determine how to divide the document into smaller parts. When we get the extracted text back from Amazon Texttract, we insert markers for larger chunks of text (configurable number of pages), individual pages, and line breaks. Langchain will split based on those tokens and compose smaller documents that fall below the token limit. See the following code:

text_splitter = RecursiveCharacterTextSplitter(
      separators = ("<CHUNK>", "<PAGE>", "\n"),
         chunk_size = int(chunk_size),
         chunk_overlap  = int(chunk_overlap))

 with open(local_path) as f:
     doc =
 texts = text_splitter.split_text(doc)
 print(f"Number of splits: {len(texts)}")

 llm = SageMakerLLM(endpoint_name = endpoint_name)

 responses = ()
 for t in texts:
     r = llm
 summary = "\n".join(responses)

The LLM in the summary chain is a thin wrapper around our SageMaker endpoint:

class SageMakerLLM(LLM):

endpoint_name: str
def _llm_type(self) -> str:
    return "summarize"
def _call(self, prompt: str, stop: Optional(List(str)) = None) -> str:
    response = ai21.Summarize.execute(
    return response.summary 

Answer the question

In the augmented fetch creation method, we first divide the document into smaller segments. We create an embed for each segment and store it in the open source Chroma vector database via the langchain interface. We store databases in the Amazon Elastic File System (Amazon EFS) file system for later use. See the following code:

documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size = 500,
                                                chunk_overlap  = 0)
texts = text_splitter.split_documents(documents)
print(f"Number of splits: {len(texts)}")

embeddings = SMEndpointEmbeddings(
vectordb = Chroma.from_documents(texts, embeddings, 

When the embed is ready, the user can ask a question. We searched the vector database for the text snippet that best fits the question:

embeddings = SMEndpointEmbeddings(
vectordb = Chroma(persist_directory=persist_directory, 
docs = vectordb.similarity_search_with_score(question)

We take the snippet that fits best and use it as context for the text generation model to answer the question:

cohere_client = Client(endpoint_name=endpoint_qa)
context = docs(high_score_idx)(0).page_content.replace("\n", "")
qa_prompt = f'Context={context}\nQuestion={question}\nAnswer="
response = cohere_client.generate(prompt=qa_prompt, 
answer = response.generations(0).text.strip().replace('\n', '')

User experience

Although LLM represents advanced data science, most use cases of LLM ultimately involve interaction with non-technical users. Our sample web application tackles an interactive use case where a business user can upload and process a new PDF document.

The following diagram shows the user interface. The user starts by uploading a PDF. Once documents are stored in Amazon S3, users can start text extraction jobs. Once done, users can run summary tasks or ask questions. The user interface presents several advanced options such as chunk size and overlapping chunks, which will be useful for advanced users testing the app on a new document.

User interface

The next step

LLM provides significant new information retrieval capabilities. Business users need easy access to these capabilities. There are two directions for future work to consider:

  • Take advantage of the state-of-the-art LLM already available in the base Jumpstart model. With only a few lines of code, our sample app can implement and use advanced LLM from AI21 and Cohere for text summarizing and generation.
  • Make this capability accessible to non-technical users. The prerequisite for processing PDF documents is extracting text from documents, and summarizing tasks may take several minutes to run. It requires a simple user interface with asynchronous backend processing capabilities, which is easy to design using cloud-native services like Lambda and Fargate.

We also note that PDF documents are semi-structured information. Important cues like section titles are hard to identify programmatically, because they depend on font size and other visual indicators. Identifying the underlying information structure helps LLM process data more accurately, at least until it can handle input of infinite length.


In this post, we show you how to build an interactive web application that allows business users to upload and process PDF documents for summaries and answer questions. We look at how to leverage Jumpstart’s basic model to access advanced LLMs, and use text splitting and augmented generation techniques to process longer documents and make them available as information to LLMs.

Today, there’s no reason not to make these powerful capabilities available to your users. We encourage you to start using the Jumpstart foundation model today.

About the Author

Author imageRandy DeFauw is a Senior Principal Solutions Architect at AWS. He holds an MSEE degree from the University of Michigan, where he works on computer vision for autonomous vehicles. He also holds an MBA from Colorado State University. Randy has held various positions in technology, from software engineering to product management. Entered the Big Data space in 2013 and continues to explore the area. He is active on projects in the ML space and has presented at conferences including Strata and GlueCon.


Source link

Related Articles

Back to top button