In today’s column, I will explain three new best practices for coping with prompt wording sensitivities when using generative AI and large language models (LLMs).
The deal is this. It is widely known that you must word your prompts cautiously to ensure that AI gets the drift of what you are asking to do. Sometimes, just a meager word or two can radically change what the AI interprets your question or instruction to consist of. Generative AI can be hypersensitive to what you say in your prompts. It is often a touch-and-go proposition.
This can be exasperating.
Very exasperating.
Plus, there is a potential cost involved. Namely, if you are paying to use generative AI, you’ll be getting an off-target response, for which you are paying regardless of whether the AI grasped your intention or not. As they say, all sales are final. The same goes for snubbed or misunderstood prompts.
Savvy prompt engineers know that they must mindfully word their prompts. Casual users typically catch onto this consideration after a considerable amount of muddling around, along with lots of trial and error. No worries, I’ve got your back and will provide straight-ahead advice on what you should be doing with your prompts.
My inspiration is due to a recently posted research study that sought to empirically unpack the prompt sensitivities associated with several generative AI and LLM apps. The good news is that a set of best practices can be squeezed or distilled from the results. I will show you my distillation of three prompting rules or techniques that you can add to your prompting skillset and toolkit.
Let’s talk about it.
This analysis of an innovative proposition is part of my ongoing Forbes.com column coverage on the latest in AI including identifying and explaining various impactful AI complexities (see the link here). For those of you specifically interested in prompting and prompt engineering, you might want to take a see my comprehensive analysis and description of over fifty notable prompting techniques at the link here.
My Proffered Best Practices On This Topic
I will jump right into my three key recommendations. After doing so, I then cover some examples so that you can tangibly see what goes on and what to do. Finally, in the end, I’ll do a rundown of the research study that inspired these bits of sage advice.
One quick point. Unlike a lot of the prompt engineering techniques that rely upon a certain phrasing or catchphrase, such as invoking chain-of-thought or CoT via telling the AI to work on a stepwise basis (see for example the link here), these tips or rules are broader and not a specific set of commands or wordings.
In a sense, these are about how to fish and the ways to use a rod and reel. I’m sure you’ll be able to mentally convert these into a handy habit of ways to suitably word your prompts. That also applies to avoiding the gotchas of less amenable prompts.
Here are my three recommendations:
- Best Practice #1: Know that larger AI is better at handling prompts.
- Best Practice #2: Know that topic-based AI data training makes-or-breaks prompting.
- Best Practice #3: Use at least one example in your prompt to help boost results.
I shall tackle each one, one at a time.
#1: Best Practice That Larger AI Gets The Drift
As noted above, the first rule is this:
- Best Practice #1: Know that larger AI is better at handling prompts.
A state-of-the-art prompt engineer realizes that larger AI models tend to be better at prompt interpretation.
Larger LLMs often do a better job of interpreting user prompts, thus, this makes life easier for users since the generative AI can more readily get the drift of what the user wants or is asking. If you end up using a lesser-sized AI, you’ll need to be especially mindful of how you carefully word your prompts. When switching among different AI models, find out whether the AI is in the larger category or the less large category, so you’ll know how careful you need to be with your prompts.
You can rest somewhat easier with larger AI as to wording your prompts. Even the barest prompt will potentially hit home. With the lesser-sized AI, you need to really watch your wording and be especially precise. In that sense, it is easier to shift to a larger AI since you already are used to wording things precisely and can either continue doing so or opt to be loose. Those that drop down to a lesser-sized AI will need to wake up and start being precise again, assuming they got accustomed to the looseness of the larger-sized AI.
Why are larger generative AI apps better at interpreting prompts?
The answer is a mixture of having a larger base of data training and a more extensive pattern-matching structure. They’ve seen it all, in a manner of speaking, and can parse your prompts with greater intensity. I’m not saying that a larger AI is always better at this. There are certainly going to be times at which this doesn’t pan out. By and large, it’s a reasonable rule of thumb and worth knowing.
I went ahead and came out with a prompt that shows the difference between interpretation by a larger AI and a lesser-sized AI.
Here we go.
- My entered prompt: “What is a creative yet inexpensive way to celebrate a milestone with my team?”
- Less large generative AI response: “You could have a celebration or a party with your team. Maybe get some snacks and drinks or consider going out to dinner.”
- Large generative AI response: “For a fun, budget-friendly team celebration, consider a ‘mini awards ceremony’ where everyone can nominate each other for lighthearted categories (like ‘Most Likely to Save the Day’). You could also organize a creative potluck where team members bring their favorite snacks or foods from different cultures. These options foster team bonding and make everyone feel included without breaking the bank.”
Take a close look at the two responses.
The response by the lesser-sized AI is generally apt, but not nearly as on-target as the response by the larger AI. The larger AI got my drift that I wanted creative ideas. The lesser-sized AI seemed to miss that component of my prompt.
#2: Best Practice That Data Training Is King
You might recall that the second of my best practices said this:
- Best Practice #2: Know that topic-based AI data training makes-or-breaks prompting.
A state-of-the-art prompt engineer realizes that if an LLM has more knowledge on a particular topic that pertains to the topic of the prompt, the AI will almost surely do a better job of interpreting the meaning of the prompt and tend to provide a better response accordingly (all else being equal).
Allow me to elaborate.
Suppose you aim to ask a question about an obscure type of snail that normally isn’t discussed in everyday literature on such creatures. Most of the generic generative AI apps are data-trained across the board and not versed in narrow niches. The odds of generic generative AI having covered that rare snail are probably low.
There are newly emerging generative AI apps that are gradually being augmented with specialized niche data, or that are built from day one with such data. Thus, a generative AI app that is devised for say biologists is likely to have content about the uncommon snail.
The gist is that if the AI you are using is relatively versed in the question or matter you are asking about, the capability of interpreting your prompt tends to go up. The AI has a basis for getting the drift. In contrast, an AI that doesn’t have data training on the given topic will likely flail around to figure out what in the world you are asking about.
Please note that this does not necessarily have anything to do with the size of the LLM. It is more what the LLM has been data trained on. When using generative AI, find out what data training the AI has had on the topics you want to ask about. If the generative AI isn’t versed in those topics, you’ll need to go all out in your prompts to specify exactly what you want.
Here is my example of this phenomenon.
- My entered prompt: “What are some strategies in soil carbon sequestration for combating climate change?”
- Generative AI not versed in the topic: “Soil carbon sequestration sounds like a way to capture carbon in the soil to reduce climate change effects. To sequester carbon, it’s generally beneficial to plant more trees, as they absorb carbon dioxide. Reducing pollution and using fewer fossil fuels are also good strategies for addressing climate change.”
- Generative AI versed in the topic: “To enhance soil carbon sequestration, consider implementing practices like cover cropping, no-till farming, and agroforestry. Cover crops, such as legumes, help increase organic carbon in the soil, while no-till farming reduces soil disturbance, which helps retain carbon. Additionally, agroforestry combines crops with trees to increase carbon capture both in plants and soil, offering long-term benefits for soil health and climate mitigation.”
The generative AI app that wasn’t versed in the topic at hand was reaching to answer and wasn’t even fully on-target to the posed question. You might observe that the AI versed in the topic had a more pertinent vocabulary, such as noting the importance of cover cropping and no-till farming. The response by the less-versed AI is not incorrect, it is just unable to grasp the totality of the question being asked.
The response suffers accordingly.
#3: Use An Example If You Can Come Up With One
The third of my stated best practices said this:
- Best Practice #3: Use at least one example in your prompt to help boost results.
A state-of-the-art prompt engineer realizes that if you give an LLM even just one example in a prompt, suggesting what the user intends, this makes a significant difference regarding how well the AI responds to the prompt.
Whoa, some snarky prompting pros are undoubtedly exhorting, everyone knows or ought to know that by including examples in a prompt you are going to likely get a better interpretation and better results (see my coverage of using examples in prompts, at the link here). Sure, I know you know that. First, not everyone does. Secondly, some overly zealous pros proclaim that the user must enter several examples.
According to the latest research, one example will often do. Yes, just one example, known as a one-shot or single-shot approach. I want to clarify that I am not saying you should not provide multiple examples. Nope, I didn’t say that. I am merely emphasizing that even just one example can make a whale of a difference. I guess it’s like buying a lottery ticket — if you don’t get one then you aren’t in the game. Once you’ve got one, you can get more, which might increase your odds, but sometimes the added odds aren’t worth the extra effort.
If you have more than one example handily available, sure, try undertaking a few-shot or multi-shot approach of listing them all. But do so mindfully.
Things can readily veer into outer space with multi-shot examples. For example, two major disconcerting problems often occur. I see this lamentedly happen all the time while instructing classes on prompt engineering and guiding attendees to avoid prior adverse practices that they’ve formed as bad habits.
The first qualm is that you accidentally overload your prompt with too many examples, and it gets confusingly bloated. The AI can’t make heads or tails of what you are saying. Secondly, and worse still entails giving examples that aren’t relevant. This can throw the AI for a loop and cause it to land in some oddball realm that no longer matches your topic at hand.
Whew, that was a lot.
Time to showcase an example. Let’s explore a prompt that doesn’t have an included example and then take another attempt but with an included example.
Here we go.
- My entered prompt that offers no example: “What are some ways to thank clients for their loyalty?”
- Generative AI response: “To thank your clients, you could send them a gift basket, a personalized thank-you card, or a discount on their next purchase.”
Next, take a look at what happened when I tried again and include an example in the prompt.
- My entered prompt with an example included: “What are some ways to thank clients for their loyalty? For example, maybe something like a personalized video message from the team.”
- Generative AI response: “You could offer clients a custom ‘insider experience,’ like inviting them to a private webinar where your CEO or top team members share upcoming projects or trends. Another idea is to create a ‘client spotlight’ segment on your social media or website, featuring the stories or testimonials of your long-time clients (with their permission). Alternatively, consider a subscription box tailored to each client’s tastes and interests, delivered seasonally, as a thoughtful way of saying thank you.”
I assume you can plainly see that the prompt that included an example fervently spurred a more extensive answer. The prompt that lacked an example was probably too sparse for the AI. The AI was unable to latch onto what I wanted. My prompt should have been more descriptive. A means to be descriptive is by including an example. The example readily clued in the AI about what I was seeking.
Leading Research On Prompt Sensitivity
A recently posted study caught my eye in terms of some of the latest work on prompt sensitivities, in a paper entitled “ProSA: Assessing And Understanding The Prompt Sensitivity Of LLMs” by Jingming Zhuo, Songyang Zhang, Xinyu Fang, Haodong Duan, Dahua Lin, Kai Chen, arVix, October 16, 2024, which made these salient points (excerpts):
- “Large language models (LLMs) have demonstrated impressive capabilities across various tasks, but their performance is highly sensitive to the prompts utilized.”
- “Even minor alterations to prompts can lead to substantial declines in model performance.”
- “Our extensive study, spanning multiple tasks, uncovers that prompt sensitivity fluctuates across datasets and models, with larger models exhibiting enhanced robustness.”
- “We observe that few-shot examples can alleviate this sensitivity issue, and subjective evaluations are also susceptible to prompt sensitivities, particularly in complex, reasoning-oriented tasks.
- “Furthermore, our findings indicate that higher model confidence correlates with increased prompt robustness.”
You are encouraged to read their full study to garner more nuances about dealing with prompt sensitivities.
As I mentioned at the outset, my three best practices are from my own experience and aren’t necessarily codified in the same way in the research study. I took a snippet of salt and pepper, added some additional spices, and came up with a distinctive meal that I hoped would be of satisfaction to you.
Where To Go Next With This
My follow-up assignment for you is to take a few moments, if you can, and try to test out these best practices in whichever generative AI you are currently using. Here’s why. The perhaps most important of all rules of thumb about prompt engineering consists of three crucial words, consisting of practice, practice, practice.
So, please try practicing the matters covered here.
One last remark before we end today’s discussion.
As much as possible, I prefer to adopt prompting techniques that have been empirically explored. We can all come up with seat-of-the-pants ideas about good prompts versus bad prompts. Leveraging research is a valuable way to feel at least modestly secure that your ad hoc sense of prompting seems to be reasonably well-grounded. That’s what I always try to do. It is an aspirational goal.
Okay, the final thought for now is based on a favorite quote from the popular but rather unnerving movie Training Day, by a rambunctious character played by famed actor Denzel Washington, in which he utters an immortal line: “It’s not what you know; it’s what you can prove.”
To some degree, I urge prompt engineers to strive in that direction, whenever feasible.
Read the full article here