Revolutionizing Legal Contracts: A Deep Dive into Next-Generation AI with Intel Developer Cloud and oneAPI Toolkits
Legal contracts are often complex and tedious to analyze. It becomes even more tedious when it involves comparing different versions to identify important changes made over time. It would require the reader to not just detect the additions or deletions of words, but also understand how a change in phrasing can affect the whole meaning of the document. This requires meticulous scrutinization with the utmost attention. This crucial task can be time-consuming, prone to human error, and highly exhausting as well. However, the advent of generative AI and large language models (LLMs) can help make this process easier.
In this article, we will explore how we were able to leverage these AI tools, specifically large language models (LLMs), to analyze and compare different versions of legal contracts. Though a large number of LLMs are available, a lot of them are proprietary, and developing solutions using them can be cost-intensive. Our goal is to provide a simple and efficient tool for regular users so that they can run it easily, even on their desktops. To achieve this, we decided to focus on open-source LLMs.
Open-source LLMs have recently gained popularity for their proficiency in understanding and generating high-quality human-like text. The AI landscape is rapidly evolving, with new and more powerful models released monthly. These models can be trained to identify key changes in contract versions, highlighting important alterations and additions.
However, a major challenge in employing them is that they are substantially memory and computation-intensive. This poses a barrier for regular users. Optimizing them for desktop hardware can help to tackle this challenge to a huge extent, bringing sophisticated AI capabilities to the fingertips of regular users without the need for extensive resources.
This is where a range of services that Intel provides, including the Intel Developer Cloud and oneAPI toolkits, can aid in this process. The Intel Developer Cloud offers a robust platform for developing, testing, and deploying AI models. It provides access to Intel’s advanced hardware and software technologies, enabling optimized performance for AI workloads. While developing this solution, we used the Intel Developer Cloud and oneAPI toolkits to optimize the performance of our large-language model. We would like to demonstrate how these tools can not only improve the efficiency of legal contract analysis but also offer insights that would be difficult to achieve through manual evaluation.
Why contract analysis?
At Affine, as a medium-sized organization, we face the challenge of managing legal agreements without a team of dedicated legal experts. When we interact with clients, we are required to sign several agreements. For each agreement, the Affine team must go through several versions and ensure everything is in place. Recognizing that this is a common hurdle for similar organizations, we were motivated to create a solution not only for our own needs but also to assist others facing similar challenges.
How do large language models assist in contract analysis, and what are the benefits?
Large language models can be trained to perform a large variety of tasks. Using LLMs, it is quite simple to repurpose the model with simple prompt engineering as opposed to traditional models, which require the user to set up an entire training and testing cycle to develop any solution. Additionally, since LLMs are already trained on a large corpus of data, they excel in understanding data from diverse domains and performing natural language-related tasks.
With LLMs, we can analyze and compare different versions of contracts, highlighting key changes. This process can be optimized using tools like the Intel Developer Cloud and oneAPI toolkits, which provide a powerful platform for developing, testing, and deploying these AI models. Leveraging these technologies makes the task of contract analysis and comparison more efficient and less prone to human error. They also facilitate automating routine tasks such as contract review and due diligence, saving time and reducing errors.
Before we dive into the technical aspects of contract analysis, it’s important to first understand the tools we’ll be using — the Intel Developer Cloud and oneAPI. Let’s take a closer look at these platforms and explore their features, benefits, and how they stand out in the landscape of AI development tools:
- Enables comprehensive comparison across various legal versions of a file.
- Goes beyond mere textual changes and includes advanced detection of contextual changes within the legal documents.
- Visualizes the comparison results in a user-friendly graphical interface, presenting a side-by-side view for easy identification of changes.
- Capable of running on local systems, providing flexibility and convenience to users.
- Supports deployment and optimization of state-of-the-art generative AI (GenAI) models for enhanced accuracy and sophistication in document comparison.
Overview of Intel Developer Cloud and oneAPI
The Intel Developer Cloud (IDC) is a robust platform designed to empower developers with the latest Intel hardware and software. [JA1]
Here’s an overview of IDC and its unique features:
- Powerful Environment: IDC provides a comprehensive environment for developers, especially those working in AI and machine learning, to learn, prototype, test, and run their applications.
- Latest Intel Hardware and Software: IDC stands out in the cloud computing landscape due to its access to the latest Intel hardware and software, making it an attractive choice for developers. The platform provides access to a variety of virtual machines, bare metal systems, edge devices, and platforms designed for AI training.
- Comparison with Other Platforms: Compared to platforms like Google Colab and Amazon SageMaker, IDC offers unique features that enhance the development process.
- oneAPI Toolkits: IDC utilizes oneAPI toolkits, particularly the Intel Extensions for PyTorch, to enhance Large Language Model (LLM) inference. This allows for more efficient and effective AI application development and deployment.
- Advanced Tools: Users benefit from Kubernetes environments, JupyterLabs, SSH direct connections, and the full-stack machine learning operating system.
In essence, IDC is a powerful tool that stands out in the cloud computing landscape, offering a range of features and tools that make it an attractive choice for developers working in AI. It provides a comprehensive environment for learning, prototyping, testing, and running applications, leveraging the latest Intel hardware and software to enhance the development process.
Intel Developer Cloud’s Unique Advantage over similar platforms
While working on Intel Developer Cloud (IDC), it was natural that we compare it with other similar platforms such as Google Colab or Amazon SageMaker. In comparison to such platforms, the IDC offers unique advantages that make it stand out:
- Hardware and Software Synergy: IDC’s integration with Intel hardware, such as CPUs, GPUs, and FPGAs, provides an optimized environment for applications requiring high computational power.
- Focus on Edge Computing: Unlike Colab and SageMaker, which focus more on cloud and server-based computing, IDC emphasizes edge computing, making it ideal for IoT and edge AI applications.
- oneAPI Toolkits: IDC leverages Intel’s oneAPI toolkits, offering a unified programming model across multiple architectures. This feature is less emphasized in Colab and SageMaker.
- Customizable and Scalable: IDC is more customizable and scalable for different workload needs compared to the more fixed configurations in Colab and SageMaker.
These features make IDC a powerful tool for developers working in Generative AI and machine learning, offering an excellent environment for learning, prototyping, testing, and running applications.
Overview of Unique Approach to Document Comparison Using LLM
In this section, we will explain how our legal document analysis solution works. In this project, users upload an old version of a document (V1) for analysis against a new version (V2). The project utilizes the capabilities of Intel oneAPI-optimized large-language models. For document embedding, we employ the open-source Intel oneAPI-optimized model of MiniLM-L6-v2.
Here’s the step-by-step process of our unique solution:
- Chunk Division:
The V1 and V2 legal documents are divided into manageable chunks, typically paragraphs, facilitating processing by large language models (LLMs).
- Embedding Storage:
Embeddings for each V1 document chunk are saved, representing a numerical representation capturing the semantic meaning of the text.
- Similarity Search:
A chunk from V2 undergoes a similarity search, identifying the most similar chunks from V1 based on the embeddings saved in the previous step.
- Change Detection:
The similar chunks from V1 and the selected V2 chunks are input into the LLM, which is tasked with detecting any changes between them.
- Highlighting Changes:
Changes detected by the LLM are highlighted on both V1 and V2 legal documents, providing a clear visualization of alterations between the two versions. Additionally, detailed tracking in structural form is available.
With this approach, we were able to efficiently utilize the capabilities of Intel oneAPI-optimized LLMs for accurate legal document comparison, emphasizing key changes and differences.
For those who are interested, we are also attaching the architecture:
As can be seen, all the libraries and models we have employed are open source, which empowers the end users. Along with that, the IDC environment has helped develop this solution in a very efficient and timely manner. During the development, we used the intel oneAPI extension to optimize the embeddings to optimize both the embeddings model and the LLMs. This gave us a speed-up of 2x while processing using embeddings. Not only that, but we were able to load these LLMs easily in IDC, which was impossible to load in our personal systems. This made the development quite easy.
In short, the fusion of Generative AI and the robust capabilities of the Intel Developer Cloud and oneAPI toolkits has helped us develop a novel solution for legal contract analysis. We hope this innovative solution empowers legal practitioners and organizations, especially smaller ones like us, by streamlining the arduous task of comparing contract versions.
By leveraging open-source LLMs alongside Intel’s optimized hardware and software, our solution facilitates efficient document comparison, detecting nuanced changes. This unique approach not only enhances accuracy but also significantly reduces human error and the overall time invested in the process. We hope this advanced solution will empower smaller organizations by providing them access to sophisticated contract analysis tools previously available only to larger entities with extensive resources.