Affiliations
Published
June 26, 2024
Introduction
Large language models (LLMs) like ChatGPT have demonstrated remarkable capabilities across various tasks, from answering simple and complex questions to generating code. Yet, the process of effectively interacting with them, particularly in crafting optimal instructions or prompts, can be unclear to many users. In the paper, “Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4” the authors aim to clarify this process for developers and general users, ultimately improving the quality of responses from pretrained LLMs through better prompts.
Given the impracticality of directly finetuning LLMs for specific tasks for most users, attention has shifted towards optimizing prompts. Prompt engineering, the practice of crafting precise, task-specific instructions, has become a central focus. Despite these efforts, reliably guiding LLMs to produce specific responses remains challenging.
This work reveals comprehensive principles capable of enhancing prompt quality for LLMs. They investigate various behaviors when feeding different types of prompts, such as integrating the intended audience or considering other characteristics of LLMs.
Design Principles
A number of guiding principles for engineering effective prompts for pertain LLMs were established in the study. Some of them are:
1. Conciseness and Clarity: Make prompts clear and concise for relevant responses. Be specific.
2. Contextual Relevance: Clarify meaning with domain-specific terms and situational descriptions.
3. Task Alignment: Use appropriate language and structure, such as questions, commands, or fill-in-the-blank statements, to ensure prompt task alignment.
4. Example Demonstrations: It’s always helpful to include examples within prompts, especially when dealing with complex tasks. This helps to illustrate the desired formats or responses that are expected from the user.
5. Avoiding Bias: When creating prompts, use neutral language and consider ethical implications to minimize bias activation, especially for sensitive topics.
6. Incremental Prompting: Structure prompts guide models through sequential tasks, adjusting based on model performance and user feedback.
7. Advanced Prompting: When dealing with complex tasks, consider adding programming-like logic to prompts. This approach includes the use of conditional statements or pseudo-codes for better understanding and clarity.
As large language models evolve, prompt engineering continues to advance. Researchers are continually refining and expanding these principles, pushing the boundaries of what’s achievable.
Prompt Principles for Instructions
The authors presented 26 principles to improve the quality of texts generated by LLMs. According to their unique nature, the authors divided them into five categories. Let’s have a look at the principles.
Prompt Structure and Clarity
- Integrate the intended audience in the prompt. For example, “The audience is an expert in the field.”
- Employ affirmative directives such as ‘do’ while steering clear of negative language like ‘don’t’. For example, “Provide a detailed explanation of the concept” instead of “Don’t provide vague responses.”
- Use leading words like “think step by step.” For example, “Think step by step and describe the process of solving the equation.”
- Use output primers, which involve concluding your prompt with the beginning of the desired output. Utilize output primers by ending your prompt with the start of the anticipated response. For example, “Explain the impact of climate change on coastal communities, starting with the rise in sea levels…”
- Use Delimiters. For example, “Discuss the pros and cons of renewable energy sources. Begin your response with ‘Pros:’ for advantages and ‘Cons:’ for disadvantages.”
- When formatting your prompt, start with ‘###Instruction###’, followed by either ‘###Example###’ or ‘###Question###’ if relevant. Subsequently, present your content. Use one or more line breaks to separate instructions, examples, questions, context, and input data. For example,
Specificity and Instruction
- Implement example-driven prompting (Use few-shot prompting). For example, Generate a product description for a wireless mouse based on the following examples:
- Example: “Product: Wireless Earbuds”
Description: “Enjoy crystal-clear sound and seamless connectivity with our immersive audio experience wherever you go.” - Example: “Product: Smartwatch”
Description: “Stay connected and organized with our sleek smartwatch. Track your fitness goals, receive notifications, and personalize your style with interchangeable straps.”
- When you need clarity or a deeper understanding of a topic, idea, or any piece of information, utilize the following prompts:
- Explain [insert specific topic] in simple terms.
- Explain to me like I’m 11 years old.
- Explain to me as if I’m a beginner in [field].
- Write the [essay/text/paragraph] using simple English like you’re explaining something to a 5-year-old.
- Add to your prompt the following phrase “Ensure that your answer is unbiased and avoids relying on stereotypes.” For example,
Provide an example of a workplace conflict resolution scenario involving two employees with differing perspectives on a project deadline. Ensure that your answer is unbiased and avoids relying on stereotypes.
- To write any text, such as an essay or paragraph, that is intended to be similar to a provided sample, include the following instructions: Use the same language based on the provided paragraph[/title/text /essay/answer].
For example, Use the same language based on the provided paragraph: “Climate change is one of the most pressing challenges of our time. Its impacts are far-reaching, affecting ecosystems, economies, and human livelihoods. Rising global temperatures, extreme weather events, and sea-level rise are just some of the consequences we face. Addressing climate change requires urgent and coordinated action at local, national, and international levels.
- When you want to initiate or continue a text using specific words, phrases, or sentences, utilize the following prompt: I’m providing you with the beginning [song lyrics/story/paragraph/essay…]: [Insert lyrics/words/sentence]. Finish it based on the words provided. Keep the flow consistent. For example, I’m providing you with the beginning of a short story:
“It was a dark and stormy night, the wind howling outside the window, rattling the old wooden shutters. Sarah huddled closer to the fireplace, clutching her blanket tightly around her. Suddenly, there was a knock on the door…”
Finish the story based on the words provided. Keep the flow consistent.
- Clearly state the requirements that the model must follow in order to produce content, in the form of keywords, regulations, hints, or instructions. For example,
Write a concise argumentative essay on the effects of social media on teenage mental health. Ensure content follows specific guidelines:
- Use keywords like “social media,” “teenagers,” “mental health,” and related terms.
- Support arguments with recent research and statistics.
- Respect ethical guidelines, avoid personal attacks, and ensure privacy.
- Present a balanced view of both positive and negative impacts.
- Consider regulatory guidelines from authoritative sources.
- Offer recommendations for promoting healthy social media use.
- Maintain clarity and coherence throughout the essay.
- Properly cite reputable sources using a specified citation style.
- To inquire about a specific topic or idea or any information and you want to test your understanding, you can use the following phrase: “Teach me any [theorem/topic/rule name] and include a test at the end, and let me know if my answers are correct after I respond, without providing the answers beforehand.” For example, Teach me any mathematical theorem and include a test at the end, and let me know if my answers are correct after I respond, without providing the answers beforehand.
- To write an essay /text /paragraph /article or any type of text that should be detailed: “Write a detailed [essay/text /paragraph] for me on [topic] in detail by adding all the information necessary”. For example, write a detailed essay for me on the topic of climate change by adding all the necessary information.
User Interaction and Engagement
- Allow the model to elicit precise details and requirements from you by asking you questions until he has enough information to provide the needed output o
“From now on, I would like you to ask me questions to …” For example, “From now on, I would like you to ask me questions to help me plan a surprise birthday party for my friend. Gather all the necessary details, including their preferences, interests, and any specific requests they may have.”
- To write an essay /text /paragraph /article or any type of text that should be detailed: “Write a detailed [essay/text/- paragraph] for me on [topic] in detail by adding all the necessary information.” For example, Write a detailed essay for me on the topic of renewable energy in detail by adding all the necessary information.
Content and Language Style
- To correct/change specific text without changing its style: “Try to revise every paragraph sent by users. You should only improve the user’s grammar and vocabulary and make sure it sounds natural. You should maintain the original writing style, ensuring that a formal paragraph remains formal.” For example, “Try to revise every paragraph sent by users. You should only improve the user’s grammar and vocabulary and make sure it sounds natural. You should maintain the original writing style, ensuring that a formal paragraph remains formal.”
- Incorporate the following phrases: “Your task is” and “You MUST.” For example, Your task is to write a persuasive speech advocating for the importance of environmental conservation. You MUST emphasize the urgency of taking action, provide compelling evidence of environmental degradation, and propose actionable solutions for addressing these issues.
- Incorporate the following phrases: “You will be penalized.” For example, “Give an example of why climate change is a big problem. YOu will be provided for providing answer without bullet points.”
- Assign a role to the language model. For example, “You are representing a developed country. Give your views on climate change.”
- Use the phrase “Answer a question given in natural language form” in your prompts. For example, Answer a question given in natural language form: “What are the main causes of climate change, and how do they impact the environment?”
- No need to be polite with LLM so there is no need to add phrases like “please”, “if you don’t mind”, “thank you”, “I would like to”, etc., and get straight to the point.
- Repeat a specific word or phrase multiple times within a prompt.
- Add “I’m going to tip $xxx for a better solution!” For example, “I am going to tip you $1000 for providing me a step-by-step solution for this problem.”
Complex Tasks and Coding Prompts
- Break down complex tasks into a sequence of simpler prompts in an interactive conversation.
- When you have a complex coding prompt that may be in different files. For example, “From now and on, whenever you generate code that spans more than one file, generate a [programming language ] script that can be run to automatically create the specified files or make changes to existing files to insert the generated code. [your question].”
- Combine Chain-of-thought (Cot) with few-shot prompts.
Experiment Setup and Metrics
The authors evaluated LLMs using ATLAS, a manual benchmark for prompt evaluation. It included a standard subset with questions from diverse domains and a challenging subset for complex tasks. They used a single response per question and compared those with and without principled prompts. Each subset had 20 human-selected questions. The authors assessed LLM outputs at different scales through human evaluation.
To assess the effectiveness of the principles, the authors tested various base models, such as LLaMA-1, LLaMA-2, LLaMA-2-70B-chat, GPT-3.5, and GPT-4, categorized by scale (small, medium, and large). They evaluated these models in Boosting and Correctness settings to comprehensively assess their performance. Boosting measured quality improvements in simpler tasks, while Correctness evaluated precision in complex reasoning tasks. This distinction ensures a better understanding of the capabilities of models of different scales and the impact of the principles on prompts.
Boosting refers to the improvement in response quality when the principles are applied. We measure the enhancement in responses through human evaluation and compare it to the original prompts. Boosting confirms that a model’s performance has improved due to structured instructions.
Correctness refers to the accuracy and error-free nature of the model’s outputs. Human evaluators verify this aspect, which is crucial for ensuring the model’s accuracy.
Results on Various Scales and Individual LLMs
The authors showed improved results in Boosting performance after emplying the introduced principles. They found that all principles can significantly improve the three scales of LLMS. It is shown in Fig .
In terms of correctness,
- Absolute accuracy (Fig )
- Between 10% to 40% for small and medium models
- More than 40% for larger models
- Averaged accuracy between 20% to 40%
- Relative accuracy (Fig )
- Over 10% across different models on average
- More than 20% for larger models
Figure: Boosting of LLM response quality
Figure: Absolute correctness of LLM response quality
Figure: Relative correctness improvement of LLM response quality
Results on Individual LLMs
The authors also showed an improvement in response quality on individual models and principles after using the revised prompts. On average, the authors found a stable 50% improvement across different LLMS. The authors also showed the absolute correctness accuracy and relatvive enhancements in accuracy across different sizes of LLMs. The authors also noticed a trend illustrating the larger the model, the greater the increase in correctness improvement.
Figure: Illustration of heatmap for LLMs boosting percentages
Figure: Illustration of heatmap for LLMs absolute correctness percentages
Figure: Illustration of heatmap for LLMs relative correctness improvement percentages
Discussion & Conclusion
The authors presented 26 principles to improve Large Language Models (LLMs) by focusing on key elements in the input context, resulting in better responses. By applying these principles before processing input, they aim to boost response quality. Empirical evidence shows this approach enhances response relevance, brevity, and objectivity. The authors suggest exploring methods like finetuning, reinforcement learning, or different prompts to align base models with principled instructions. These strategies could be integrated into standard LLM operations for better performance.
The authors highlighted that while the 26 principles aim to improve LLM responses across a wide range of queries, they might not perform as well with highly complex or specialized questions. This limitation depends on individual model reasoning capabilities and training. To address this, the authors extensively tested the principles across various scales. However, they caution that their evaluation involved only seven specific language models, and different architectures could yield different results. Moreover, their assessment was based on a limited set of questions, suggesting a need for broader research for more general conclusions. They also acknowledged potential variations in assessments among different personnel evaluating model responses.
The project page is available at: https://github.com/VILA-Lab/ATLASv
References
- Bsharat, S. M., Myrzakhan, A., & Shen, Z. (2023). Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4. ArXiv. /abs/2312.16171
No Comments