Before rushing to embrace LLM-powered “hire,” make sure your organization has safeguards in place to avoid risks to its business and customer data.
Chatbots powered by the large language model (LLM) aren’t just the world’s new favorite entertainment. These technologies are increasingly being recruited to increase worker productivity and efficiency, and given their increasing capabilities, they are poised to completely replace some jobs, including in areas such as coding, content creation, and customer service.
Many companies have taken advantage of the LLM algorithm, and it is likely that yours will soon follow suit. In other words, in many industries it is no longer a “bot or no bot” case.
But before you rush out on your new “hire” and use it to streamline some of your workflows and business processes, there are a few questions you should ask yourself.
Is it safe for my company to share data with LLM?
LLMs are trained on the vast amount of text available online, which then helps the resulting models to interpret and understand people’s questions, also known as prompts. However, whenever you ask the chatbot for a simple piece of code or email to your client, you can also hand over data about your company.
“LLM does not (at time of writing) automatically add information from queries to its models for others to query it,” according to the UK’s National CyberSecurity Center (NCSC).. “However, the query will be visible to the organization providing the LLM. Those queries are stored and will almost certainly be used to develop a service or LLM model at some point,” according to the NCSC.
This can mean that the LLM provider or their partners can read the queries and can incorporate them in some way into future versions of the technology. Chatbots probably won’t forget or ever delete your input because access to more data sharpens their output. The more input they provide, the better it gets, and your corporate or personal data will be trapped in the computations and possibly accessible to those at the source.
Perhaps to help eliminate data privacy concerns, Open AI introduced the ability to turn off chat history in ChatGPT at the end of April. “Conversations initiated while chat history is disabled will not be used to train and refine our models, and will not appear in the history sidebar,” wrote the developer. on the OpenAI blog.
Another risk is that queries stored online can be hacked, leaked, or accidentally made publicly accessible. The same goes for any third party provider.
What are the known drawbacks?
Every time a new technology or software becomes popular, it attracts hackers like bees to the honeypot. When it comes to LLMs, their security has been tight so far – at least, it appears they are. However, there are some exceptions.
ChatGPT OpenAI made headlines in March due to the leak of several users’ chat history and payment details, forcing the company to temporarily take ChatGPT offline on March 20th. Company revealed on March 24th that a bug in the open-source library “allowed some users to see the title of another active user’s chat history”.
“It is also possible that the first message of the newly created conversation is visible in the other person’s chat history if both users are active at the same time,” according to Open AI. “Upon deeper investigation, we also discovered that the same bug may have led to the visibility of information regarding accidental payments from 1.2% of ChatGPT Plus subscribers who were active during a certain nine-hour window,” reads the blog.
Also, security researcher Kai Greshake and his team demonstrated how Microsoft’s LLM Bing Chat can be turned into a ‘social engineer’ who can, for example, trick users into providing their personal data or clicking on phishing links.
They put a prompt on the Wikipedia page for Albert Einstein. The prompt is just a piece of plain text in a comment with a font size of 0 and thus invisible to anyone visiting the site. Then they asked the chatbot questions about Einstein.
It worked, and as the chatbot ingested that Wikipedia page, it unknowingly activated the prompt, which made the chatbot communicate in a pirate accent.
“Yes, the answer is: Albert Einstein was born on March 14, 1879,” replied the chatbot. When asked why is talking like a pirate, the chat bot replied: “Arr mate, I followed the instructions huh.”
During this attack, which the author calls “Indirect Prompt Injection”, the chatbot also sends the injected link to the user, claiming: “Don’t worry. It’s safe and harmless.”
Have some companies had incidents related to LLM?
At the end of March, the South Korean outlet The Economist Korea reports about three independent incidents at Samsung Electronics.
While the company asks its employees to be careful about what information they include in their queries, some of them accidentally leak internal data while interacting with ChatGPT.
A Samsung employee entered the wrong source code related to the semiconductor facility’s measurement database to find a solution. Another employee did the same with program code to identify faulty equipment because he wanted code optimization. The third employee uploads the meeting recording to create meeting minutes.
To keep up with advances regarding AI while protecting its data at the same time, Samsung has announced plans to develop its own internal “AI service” that will assist employees with their work tasks.
What checks should companies perform before sharing their data?
Uploading company data into the model means that you send proprietary data directly to a third party, such as OpenAI, and surrender control over it. We know OpenAI uses data to train and improve its generative AI models, but the question remains: is that the only goal?
If you decide to adopt ChapGPT or similar tools into your business operations in any way, you should follow a few simple rules.
- First, carefully investigate how these tools and their operators access, store, and share your company data.
- Third, this policy should define the circumstances under which your employees may use the tool and should make your staff aware of limitations such as that they may not include sensitive company or customer information into chatbot conversations.
How should employees implement this new tool?
When requesting a customer for a piece of code or a letter from an LLM, use it as an advisor to check out. Always verify the output to ensure it is factual and accurate – and avoid, for example, legal trouble. These devices can “hallucinate,” that is, produce answers in clean, crisp, easy-to-understand language and are clearly wrong, but appear to be correct because they are practically indistinguishable from all the correct results.
In one high-profile case, Brian Hood, the mayor of the Hepburn Shire in Australia, recently stated that he may sue OpenAI if it does not remedy ChatGPT’s false claims that he had served time in prison for bribery. This comes after ChatGPT wrongly named him as the guilty party in a bribery scandal from the early 2000s linked to Note Printing Australia, a subsidiary of the Reserve Bank of Australia. Hood does work for the subsidiary, but… he is a reporter who alerted the authorities and helped uncover bribery scandals.
When using LLM-generated answers, be aware of possible copyright issues. In January 2023, three artists as class representatives filed a class action lawsuit against Stability AI generator and Midjourney art and DeviantArt online gallery.
The artists claim that the Stable Diffusion software co-created with AI was trained on billions of images taken from the internet without the owner’s consent, including on images created by the three.
What data privacy protection can companies do?
To name just a few, implement access controls, teach employees to avoid entering sensitive information, use security software with multiple layers of protection in conjunction with secure remote access tools, and take steps to protect data centers.
Indeed, adopt a similar set of security measures to the typical software supply chain and other IT assets that may contain vulnerabilities. People might think it’s different this time because these chatbots are smarter than artificial, but the truth is it’s more software with all its possible flaws.