In modern enterprise AI applications, building data and AI intelligence capabilities is crucial for technological breakthroughs. The HaxiTAG Intelligent Application Platform has established a comprehensive LLM technology supply chain and software ecosystem that integrates knowledge data, local data, device-edge hosted data, and extended data required for API-hosted inference, thereby providing efficient data management and inference capabilities.
We offer data analysis, screening, evaluation, and due diligence services to several financial institutions, particularly in the areas of corporate background checks and investment target analysis. The complexity of securitization documents, including intricate legal details and maturity terms, often makes them difficult to navigate. Investors, traders, and sales personnel must carefully analyze all aspects of securities during due diligence, including their overall structure, individual loan mechanisms, and seniority structure. Similarly, understanding equity-structured notes requires precise interpretation of the nuanced terminology used by different issuers. Although these documents are relatively short, clients must quickly and efficiently identify key elements such as guarantee/protection mechanisms, payment formulas, and governing laws. Currently, investors primarily rely on keyword searches in PDFs, which can be time-consuming and inefficient when seeking precise answers and relevant context.
Advantages of Large Language Models
LLMs are particularly well-suited to address these challenges, providing a natural language interface capable of delivering contextually relevant responses. However, the challenge lies in the fact that LLMs cannot accurately "learn" specific transactional documents, which can lead to potentially misleading answers. A common solution is the implementation of a Retrieval-Augmented Generation (RAG) system, which combines efficient document storage with vector database-based retrieval to select relevant text snippets, allowing the LLM to generate accurate answers to user queries through prompt engineering.
To ensure scalability, it is essential to maintain reproducibility and accuracy in these experiments. While the RAG approach has been extensively studied for general use cases, its application in specific deep-domain environments, particularly in finance, warrants further exploration. This study aims to identify the optimal setup for ML systems in such use cases by:
- Defining the correct standards through appropriate questions.
- Weighing the trade-offs between long-context LLMs and RAG solutions in different scenarios (e.g., analyzing OpenAI’s recent release of the 128k-context GPT-4).
- Analyzing the components of this system: vector database similarity search, LLM context comprehension, and the quality of LLM-generated answers.
- Identifying additional components necessary for an optimal system setup, such as UI/UX elements and LLM methodologies.
Model Evaluation and Results
To assess the model's capabilities, subject matter experts (SMEs) selected a set of high-value questions related to investment due diligence. These questions targeted key features of the securities, such as the assets provided, their principal distribution/nominal value, the identity of relevant entities, and geographic distribution. Beyond focusing on key details in the provided documents, the questions were designed to test the LLM’s ability to comprehend various language challenges, including names, dates, places, lists, and tables. This diverse set of questions aimed to highlight the model's strengths and limitations.
We divided the experiments into three major components of the functional RAG tool:
Similarity Search Experiment: The goal was to identify relevant portions of the documents to answer our queries. We found that five search results were typically sufficient to construct a representative context for the model. This approach not only improves efficiency but also reduces the amount of information sent to the LLM, thus lowering operational costs and system latency.
Context Comprehension Experiment: We evaluated the LLM’s ability to accurately identify supporting evidence in the text snippets returned by the similarity search. In some cases, it was useful to directly quote the source documents or reinforce the LLM-generated answers with the original text. On average, the model correctly identified the text snippet containing the answer 76% of the time and effectively ignored irrelevant paragraphs 91% of the time.
Answer Quality Assessment: We analyzed the responses to queries for two distinct purposes: value extraction (answers with specific values such as nominal amounts, dates, issue size, etc.) and textual answers (answers in sentence or paragraph form). For both tasks, we compared the performance of GPT-3.5 and GPT-4, with the latter consistently delivering superior results. For value extraction tasks, GPT-4's accuracy ranged between 75-100%, while for text-based answers, the quality of the generated responses ranged from 89-96%, depending on the complexity of the task. The 128k context window generally performed on par or slightly worse than traditional shorter windows in these cases.
Conclusion
In this study, we analyzed the impact of different designs and configurations on retrieval-augmented systems (RAG) used for investment due diligence on documents related to various financial instruments. Such systems are likely to become integral reasoning components in LLM agent design and in delivering comprehensive AI experiences for our clients. Current experiments show promising results in identifying the correct context and extracting relevant information, suggesting that RAG systems are a viable tool for LLM conversational agents to access when users need to extract specific transactional definitions from vast amounts of financial documents. Overall, the findings from these investigations lay a solid foundation for designing future LLM question-answering tools. However, we recognize that effective retrieval and generation are only part of a fully integrated conversational process design. LLM agents will likely employ a suite of such tools to understand and contextualize a wide range of customer needs, with the right user experience approach playing a crucial role in delivering timely and information-rich financial due diligence experiences for our clients.
The HaxiTAG Intelligent Application Platform is not limited to applications in the financial sector; it also offers extensive potential for complex document analysis in other industries, such as healthcare and legal. With its advanced data collaboration and AI intelligence capabilities, the platform is poised to play a critical role in driving digital transformation across various sectors.
Related Topic
Enterprise-Level LLMs and GenAI Application Development: Fine-Tuning vs. RAG Approach
Unlocking the Potential of RAG: A Novel Approach to Enhance Language Model's Output Quality
LLM and GenAI: The New Engines for Enterprise Application Software System Innovation
The Path to Enterprise Application Reform: New Value and Challenges Brought by LLM and GenAI
HaxiTAG Studio: Pioneering Security and Privacy in Enterprise-Grade LLM GenAI Applications
Five Applications of HaxiTAG's studio in Enterprise Data Analysis
HaxiTAG EiKM: The Revolutionary Platform for Enterprise Intelligent Knowledge Management and Search