Generative AI, LLMs, Chat GPT and GPT-4 – What’s the difference? Find out with Ascertus

Ascertus | Resource | 28 June 2023

There is a lot of talk within the legal and other business sectors around how generative AI can be harnessed to reduce delivery timelines, increase accuracy and maximise efficiency. We are at the start of a journey of exponential growth in this emerging technology field and will see more rapid developments and change than ever before in a shorter period of time. But what do you need to know and consider for your strategy over the next 12 months?

What is Generative AI?

Generative AI is a type of machine learning that is able to produce text, video, images and other types of content based on user-given prompts or dialogue which are then delivered to the user through their chosen application. Currently, the majority of end users’ day-to-day interaction with generative AI is through OpenAI’s ChatGPT application.

OpenAI is a not-for-profit American artificial intelligence research laboratory with a for-profit subsidiary, OpenAI Limited Partnership. OpenAI has developed models and products using generative AI including GPT (various versions), ChatGPT and DALL-E. There are other players in this market such as Google Bard, Microsoft Research AI, DeepMind, IBM Watson, Anthropic’s Claude and Amazon AI. For this article we have chosen to focus on OpenAI. Their models are currently creating the most conversation as they are predominantly open to ‘end users’ to utilise and with their open source ChatGPT plug ins and GPT APIs are beginning to be included within and connect with software and solutions utilised by legal, commercial, regulatory, risk and HR teams.

What’s the difference between ChatGPT and GPT-4?

ChatGPT and GPT-4 (or previous versions i.e., GPT-3) are not the same thing!

GPT-4 (released in March 2023) is a large language model (LLM) that gives AI the ability to generate text, video, images and other types of content by learning patterns and structures from existing data. GTP-4 also has the ability to automate human tasks, mimicking human like creativity and the decision-making process, these can include automating workflows or drafting documents and contracts. Users can connect with GPT-4 through an API and surface that information into ChatGPT (GPT-4 API release date yet to be confirmed, only GPT-3 openly available), or an application of their choice. GPT-4 is a significant advancement on GPT-3 as it has advanced reasoning abilities, it achieves this as the deep learning approach leverages more data and processing to create more sophisticated language models. OpenAI publicises that GPT-4 not only passed the US Bar Exam but scored in the top 10% of test takers (GPT-3 scored in the bottom 10%). Previous versions predominantly had the ability to only generate text – so the version used is critical to the experience and benefits delivered. GPT-3 is the industry standard for language models currently (GPT-4 will likely be the standard soon) and was trained using a huge amount of text from various sources such as books, articles, websites, and social media posts, and ChatGPT is the industry standard for AI chatbots.

By contrast ChatGPT is a web application designed specifically for chatbot applications and optimised for dialogue and text. It relies on GPT to produce a text response for the user based on the question or prompt asked/inputted. It can be tailored to build different functions such as summarising text, copywriting and language translation. It has an open API that lets anyone tap into GPT-3 or GPT-4 (with an advanced subscription only) to build their own AI applications with its functions. Even though GPT is a language model and ChatGPT is a chatbot, they each have their own open API, which lets other apps connect to them. But a point to note is that ChatGPT’s training data, based on GPT-3, only goes up to mid-2021 because the process of training a LLM model like ChatGPT is extremely computationally intensive and time-consuming and that was the data set cut off.

So, what does this mean for you?

LLMs are here to stay and progressing at a fast pace. Most industry software and solution providers have already or are at advanced stages of considering their integration into products as standard. So, whether you elect to develop bespoke applications or simply upgrade existing solutions (or purchase new systems) you will likely be using a LLM in some form. How can you harness the benefits whilst managing the challenges and risks?

Benefits of using AI

Many of us are already utilising real life applications of generative AI in some form with specific use cases, such as AI legal assistants, contract lifecycle management or automating legal tasks. However much of this is based on closed networks or the use of consolidated data from a few vendors. LLMs are changing the volume of data readily available to us at the touch of a button and reducing the time and administration spent searching for and collating this data. The benefits we can recognise and look to utilise now include:

From an individual’s perspective:

Generative AI is an on-demand resource to learn the basics about a topic or concept
LLM can assist in reframing or repurposing information such as ‘reword this text to remove legal language’ or ‘extract and summarise the key obligations in X document’
Turning meeting minutes, notes or briefings into other formats, such as a presentation, can also be achieved, as can reducing key information into a concise format
Generative AI can be used to facilitate the creation of sophisticated experiences, tasks and automation with almost no coding knowledge required
The time spent on research is reduced, such as collating information on laws, regulations and precedents. Generative AI is increasing its level of accuracy to provide information that can then be reviewed by legal professionals

From a legal and an enterprise level perspective:

OpenAI’s ChatGPT plugin and GPT-3 APIs (and GPT-4) can enable software vendors to build their own product specific integrations to super-charge existing web-enabled applications for specific use cases, saving time and creating efficiencies
Current integrations into existing solutions can reduce the time spent drafting and reviewing documents, analysing contracts or other legal documents and facilitating contract reviews to identify important clauses, flag potential issues, errors, inconsistencies and suggest revisions shortening the negotiation process duration and reducing the risk of future disputes
Integrated GPT can support the analysis of legal data, providing predictive insights that allow in-house legal teams to make better-informed decisions
LLM enabled systems can monitor changing regulations and ensure that businesses remain compliant with relevant laws and industry standards, alerting them to any changes that may impact their operations and reducing risk – this has huge potential impacts for contract lifecycle management and risk planning
Generative AI can help in the process of patent analysis, trademark searches and infringement detection, making it simpler and faster to manage intellectual property portfolios.

With the speed of the development and growth of this area we will soon be using integrated LLM systems within our core business systems, as part of our everyday work life in ways we cannot currently predict or forecast. With this in mind there are considerations and questions we should be asking regarding organisational policies, security, compliance, privacy and IP retention to ensure that our organisations aren’t exposed to additional risks.

Considerations, caution and ethics

ChatGPT, GPT-3 or GPT-4? If you are working with vendors who state they have integrated ‘GPT’ into their solution, be aware of which version they have integrated it into, and if it is a plug-in or an API. At the time of writing this white paper, the GPT-4 API was not publicly available. Not only will the experience be different but there will be different security considerations and you may need to be aware of and control the information your users are inputting and sharing.

GPT-3 was trained on ‘dialogue and human demonstration datasets’ with a supervised finetuning layer of reinforcement learning from human feedback (RLHF) to evaluate the accuracy of the data (the more samples the more accurate the output). Hence the learning data is from up to 2021.

GPT-4 has been trained on additional public and third-party data (knowledge up to August 2022 only) and OpenAI has not released the model or contents of the data set they have used for finetuning and evaluating this data and have specifically excluded stating RLHF was utilised. As a result, we cannot confirm if GPT-4 can understand every legal nuance or contains known biases. So, as time goes on, the information may not be the most up to date, as we don’t know the data set update frequency, therefore caution is advised.

“I’ve tried GPT-4 and it is an impressive leap over GPT-3 in terms of capability but questions about data security still remain. In OpenAI’s 98-page document introducing GPT-4, they proudly declare that they won’t disclose any details about the contents of their training set.” Cameron Coles, Cyberhaven

Accuracy

When using generative AI in a corporate environment, the truthfulness and accuracy of the data returned is paramount. Generative AI uses machine learning to ‘infer’ information, not necessarily return only checked factual information, therefore creating potential inaccuracy issues. Current GPT models also give no indication of the weighting or rating given to different data sources. Pre-trained LLMs are not yet dynamic in terms of keeping up with new information and may, unchecked, proliferate inaccurate details and information, that become stated as fact. OpenAI states on its own website that “GPT-4 still has many known limitations that we are working to address, such as social biases*, hallucinations**, and adversarial prompts***. We encourage and facilitate transparency, user education, and wider AI literacy as society adopts these models. We also aim to expand the avenues of input people have in shaping our models.”

*AI biases – Algorithms are not neutral. AI bias refers to the inclination of algorithms to reflect human biases, make erroneous assumptions (machine learning process) and then in turn deliver systematically biased results as a consequence, further reinforcing and perpetuating biases.

**Hallucinations – AI can ‘hallucinate’ the truth. AI hallucination occurs when an AI model generates outputs different from what is expected, returning made up statements or incorrect facts which it presents with assertion and absolute authority as the truth.

***Adversarial prompts – AI models can encounter both incidental adversity prompts (i.e., data becomes corrupted) and intentional adversity prompts or prompt injections (i.e,. active sabotage / security exploitation). Adversarial prompts can mislead an AI model into delivering incorrect predictions, results or statements. Adversarial robustness refers to a model’s ability to resist being fooled.

As a result of this, there is the potential for knowledge managers and team members to spend additional time to become ‘truth analysts’. In addition, you also have the potential risk of ‘a little knowledge is a dangerous thing’ if your business users believe that they can simply search for the answer to a legal query and take the answer returned as a given and not engage the legal team before they respond.

Data Misuse

As shown above, AI can produce misleading, harmful or misappropriate content in any context. A recent well publicised case, featured in various US publications in late May 2023, highlights the need for users and lawyers to verify the insights generated by AI powered tools:

A New York lawyer at Levidow, Levidow and Oberman, cited fake cases generated by OpenAI’s ChatGPT in a legal brief filed in federal court, and as a result may face professional and legal sanctions. The incident involved a personal injury lawsuit filed against airline Avianca. The lawyer stated in an affidavit that he ‘consulted ChatGPT to supplement legal research he performed’ when preparing a response to a motion to dismiss. The Judge found that “six of the submitted cases appear to be bogus judicial decisions with bogus quotes and bogus internal citations.” What is most concerning is that the lawyer stated that, when queried with a prompt, ChatGPT provided the legal sources, assured him of the reliability of the opinions and citations and also responded that the cases “can be found in reputable legal databases such as LexisNexis and Westlaw”. This leads to the question ‘how much search and fact checking will be required’ when using these tools to ensure that the returned data is the truth and who in your organisation will be qualified to undertake this task?

Hallucinations, fake, false and incorrect information can have significant legal and risk implications leading to financial and reputational damage both at the individual and organisational level.

Confidentiality

Before any generative AI technology can be used in an ‘enterprise-grade’ solution or application, confidentiality and the risk of exposing sensitive data must be considered, especially for solutions that claim to offer enterprise level security capabilities. For example, if anyone in the contract management space (legal or otherwise) enters terms into a generative AI platform they could be breaking confidentiality terms. A further consideration is that any information returned by an LLM may have not adhered to the terms of usage of websites, other data sets or privacy policies it has gathered the data from making it impossible to ensure the data you may be repurposing is IP or copyright free, carrying a risk of being the subject to litigation (ChatGPT indemnifies itself from this under its Terms of Use).

A legal AI GPT solution, currently still in BETA but being trialled by a number of law firms, utilised general legal internet data from the GPT model and was initially trained using general legal data, it now is trained by the all the beta firms own work outputs and templates. Data privacy and compliance is being managed by anonymising user information and the removal of data after a specific time period. GPT still follows the same principle of other technologies – the quality of the output depends on what you put into it on a continuous use basis.

“Sensitive data makes up 11% of what employees paste into ChatGPT. This most common types of confidential data leaking to ChatGPT are sensitive/internal only data, source code and client data. Between the weeks of February 26 and April 9 the number of incidents per 100,000 employees where confidential data went to ChatGPT increased by 60.4%” Cyberhaven Research Report Q2 2023.

Privacy and security issues may become less of an issue in the future for generative AI as OpenAI, Amazon, Microsoft and other organisations have announced they are working on enterprise-level, private cloud LLM GPT options, but there are no announced release dates as yet.

Copyright & IP

Copyright ambiguities because of generative AI create a new avenue of ethical concern and legal debate (and potential revenue generation) whilst also creating new challenges for legal and corporate teams regarding their own data and knowledge IP retention and their use of generative AI in producing advice, contracts and other policy or legal documents. There is ambiguity, untested in the courts, regarding who owns the copyright and intellectual property rights to content or creative works, and how they can be used, when they are AI generated or part human expert and part AI authored. There are currently conversations among GPT experts about how the humble ‘prompt’ itself can be important IP and may be a key consideration for new future startups and their product offerings.

Contracting

Contract lifecycle management (CLM) is likely to be impacted and benefit the most in the short term from generative AI. Contract lawyers and contract management teams will maintain the need to have signed contracts in one place, ensure high levels of contract data accuracy and be able to search and report on active and archived contract content.

The change will come, and the required skill sets will evolve with GPT, regarding the development of contract management policies and playbooks in CLM that reflect clients’ and firms’ risk appetite and objectives. If your team is not upskilled in the correct prompts, inputs and analysis of information it will become more challenging to generate accurate content for the point of delivery. The old adage of ‘rubbish in, rubbish out’ will still apply and the benefits gained using GPT won’t be useful even for simple tasks.

Security

Even as AI and tech evangelists celebrate GPT-4’s advancement of deep learning, many cybersecurity and privacy concerns remain. ‘Threat actors’ are already exploiting deep learning neural network LLMs to spread malware and scams via online platforms. There is also the potential for internally generated malicious content (by employees) to be easily shared externally which has the potential to become regarded as ‘fact’ by others external to your organisation. Not all users will have good intentions and there is no simple process to request the removal of this content.

Counterfeiting ‘people’ is also a key consideration, which will impact the future security access to many systems among other considerations. Voice cloning from available recordings can be used and repurposed by LLMs. Have you recently set up telephone security for your bank account using samples of your voice? This is already happening in deep fake videos with audio , but can be done on a wider scale. Do you participate in webinars or produce video for your clients or employees? What if a previous recording was doctored with highly damaging content and released?

In response, individuals and organisations will need to become more vigilant and review published code, content and communications more closely to try and identify AI-assisted attacks and threats. Organisations must also take proactive measures to prevent misuse by implementing the appropriate safeguards, detection methods and ethical guidelines, which will need to continually evolve.

Signing up to Terms & Conditions

Being aware of who, what and which jurisdiction you are agreeing to be bound by when you register or sign up to use an LLM is of paramount importance. “By using the ChatGPT tool you are confirming you are authorised to enter into the contract and are providing indemnities which will be governed by the Laws of the State of California. You may or may not be comfortable with this but obviously many organisations have people blindly signing up binding their organisations. The implications could be serious which has caused some firms to ban the tool.” Derek Southall, Hyperscale Group

The next 12 months for AI

The opportunities to proactively and successfully harness generative AI and LLM are vast and varied with many different benefits. While OpenAI’s GPT-3 and GPT-4 are the most popular LLMs today, there will be a lot more competition very soon. For example Bard, Google’s AI chatbot which is powered by its own Language Model for Dialogue Applications (LaMDA) operates slightly differently from OpenAI’s LLM and Microsoft CoPilot. But who will win the overall ‘space race’ and what impact will this have on the solutions and software that you may be looking to implement? Should we take a breath and see what plays out over the next 6-12 months especially in light of the likely standards, regulations and legislation (The AI Act (EU), AI Regulation White Paper (UK)) etc. that are in development?

How technology will evolve within the legal industry remains to be seen, but it will be a significant game changer, and the emergence of private cloud LLM technology (rather than public cloud) will see a momentous shift in how we all workday to day and commoditise our knowledge and outputs. Your organisational level strategy, policies, governance and risk appetite will become an important roadmap for if, how, where and when you introduce and/or upgrade your current technology stack to integrate which ever generative AI solution and vendor solutions you choose.

Currently there are a lot of unknowns and more questions than answers. The key considerations will be how you balance knowledge sharing and access with your security structures and policies, how you govern and monitor your content research, inflows and outflows to reduce privacy breeches, and how you detect new AI based threats and prevent IP loss. The secure storage and retention of client and organisation wide proprietary and confidential information is paramount – even if you are not yet using an enterprise or team LLM do you know what your employees are using and sharing data with?

Many clients are delaying making decisions and holding back on implementing new solutions likely to be significantly impacted by emerging generative AI, such as contract lifecycle management, until they see what emerges. Are you willing to be an early adopter of a technology that is still in its infancy? Do you have the appetite from your project sponsors, buy in from your IT security team and users with the knowledge to program and adopt it? What are your use cases that will deliver value, innovation and growth for you and your clients?

If you want to benefit from early efficiencies, pick the most applicable solution for your needs now, from a vendor that also has a documented roadmap and proven history of delivering timely tech advances (successfully!). Realise the benefits and ROI now and plan to upgrade when your core and chosen solution(s) releases their proven secure generative AI based offering.

Ascertus

Contact:

Sonya Gosling

07874 863 161

Leading iManage partner & document lifecycle expert providing solutions, analysis, data migration, consultancy & training to law firms.