This topic describes how to create information extraction templates, improving the accuracy of nformation extraction jobs.
1. Template types
AI Doc provides two types of information extraction templates to meet the requirements in various scenarios.
Template type | Scenario | Core features |
Prompt template (recommended) |
|
|
Key-value template |
|
|
2. Prompt template creation guide
Scenarios
Prompt templates (guiding models to output structured/unstructured content through natural language instructions) are suitable for various task scenarios, especially excelling in tasks requiring flexibility, contextual understanding, free generation, or complex reasoning.
Prompt writing techniques
Direct questioning method
Scenarios
Suitable for scenarios with clear objectives, simple questions, and definite answers.
Writing key points
Concise: Express questions in the shortest way possible. Overly lengthy questions may contain redundant information, causing the model to misunderstand or answer incorrectly.
Before optimization: "Our company recently received an invoice from a supplier, and we need you to extract some key information from it. Please carefully read this invoice document and tell me the supplier's company name, invoice number, and issue date. Additionally, if there is a total amount due on the invoice, please extract that as well. We need this information for accounting processing and payment arrangements."
After optimization: "Extract from supplier invoice: company name, invoice number, issue date, total amount due."
Specific: In information extraction scenarios, clearly list all items to be extracted and provide detailed characteristic descriptions for each item.
Before optimization: "Analyze this article and extract key information."
After optimization: "Extract the following information from this article: the author's full name and any relevant titles (such as 'Professor', 'Researcher', etc.), the author's views on global warming, and 3-5 keywords."
Avoid ambiguity: If a word or phrase may have multiple meanings, either clarify its meaning or rephrase to eliminate ambiguity.
Before optimization: "What is a tree?"
After optimization: "Explain the concept of 'tree' in computer science."
Detailed context: If the question involves specific context or background information, provide sufficient details to help the model understand.
Clear logic: Questions should be logically coherent with a clear hierarchical structure.
Before optimization: "Extract applicant name, claim amount, when the accident occurred, policy number, what documents were submitted, what kind of accident happened?"
After optimization: "From the claim application document, extract the following information in sequence: 1. Customer basic information: name, policy number 2. Accident-related information: accident date, accident type 3. Claim amount requested by the customer."
Choose appropriate output method:
JSON output (recommended for complex, structured tasks)
Applicable scenario description
Need to extract multiple fields or structured information (such as time, location, people, events, etc.)
Need to organize extracted information for database storage or sending to other systems
Want to verify if the output is complete or has errors (such as missing fields, incorrect types), can be used with JSON Schema for validation
When extracted information is complex (such as an event involving multiple participants, multiple time points and locations), it is recommended to use a structured approach to organize data by categories. Below is an example showing the JSON output format defined in a Prompt for complex information extraction scenarios.
Please extract relevant information from the text and output in the following JSON format:
{
"event_name": "Event name",
"dates": [
{
"type": "Time type (such as opening ceremony, closing ceremony)",
"date": "Date (format YYYY-MM-DD)"
}
],
"locations": [
{
"name": "Location name",
"address": "Detailed address"
}
],
"participants": ["Participant 1", "Participant 2", ...]
}
Notes
Please make sure to include all items that need to be extracted in the JSON to ensure information completeness
Text output
Applicable scenario description
Only need to extract a small amount of information for direct display
Generate summaries, notes, and other unstructured content
Results are used for webpage display, printing, or other direct presentation purposes
For example: Extract title, time, and location from news, generate a one-sentence summary; no need for structured processing, output in natural language form
Examples
Type | Example | Result prediction | Problem analysis |
Unoptimized example | Extract dates from the contract |
|
|
Optimized example | Please extract the following two pieces of information from Section 3.1 of the contract main text, and output in JSON format.
【Output JSON format】 [{ "Service start date": "***", "Service end date": "***" }] | Output: [{ "Service start date": "2014-12-31", "Service end date": "2024-12-31" }] |
|
Direct questioning checklist
Before asking, please confirm:
Is the information to be extracted and its characteristics clearly described?
Is the logic coherent, with each point clear and unambiguous?
Are industry jargon avoided? (Replace internal abbreviations with standard terminology)
Is structured output needed? If so, are all items to be extracted listed in the output format?
Advanced questioning method
Scenarios
Suitable for scenarios involving complex problems, multi-step tasks, or requiring thinking or reasoning.
Writing key points
Concise: Even for complex tasks, express core requirements as concisely as possible, avoiding redundant explanations that may interfere with the model's understanding.
Structured: Break down complex problems into clear steps to help the model process information in an orderly manner.
Role positioning: Tell the model what role it should play. For example, saying "you are now a doctor" will make the model think and answer questions from a doctor's perspective.
In-context learning (few shot): Provide the model with some good output examples for reference and learning.
Multi-dimensional constraints: Clearly state your various requirements for the model's response, such as what the format should be and what content should be included. For example, in a financial report analysis scenario, requirements such as: amounts accurate to the nearest million dollars, use professional financial terminology, etc.
Examples
Type | Example | Result prediction | Problem analysis |
Unoptimized example | Analyze Q4 sales data and provide insights |
|
|
Optimized example | 【Task description】 As a professional financial analyst, please complete the sales data analysis task based on the provided Q4 sales report. Please proceed with the following steps:
【Output example】 [{ "Overall trend": "Q4 sales increased by 6% compared to Q3, mainly driven by year-end promotions and holiday spending, but lower than the 8% in the same period last year.", "Key drivers": ["Strong demand for high-end electronics, increasing to 35% of share", "Online channel sales increased by 18% year-over-year"], "Seasonal impact": "Black Friday and Christmas promotions significantly boosted sales of consumer electronics.", "Preliminary recommendations": ["Increase inventory of high-margin products", "Optimize online platform user experience to improve conversion rates"] }] 【Output JSON format】 [{ "Overall trend": "Description...", "Key drivers": ["Factor 1", "Factor 2"], "Seasonal impact": "Analysis...", "Preliminary recommendations": ["Recommendation 1", "Recommendation 2"] }] | Output: Clear, organized output of the following four key contents:
|
|
Advanced questioning checklist
Before asking, please confirm:
Is the core task articulated concisely?
Does the role setting match the nature of the task?
Are the thinking path and reasoning steps clearly defined? (Avoid jumping logic)
Is sufficient background information provided? (Complex questions require adequate context)
Are output format requirements clear?
For more complex output requirements, are clear output examples provided for the model to reference?
3. Key-Value template writing guide
Scenarios
Key-Value template is a structured data format used to clearly specify content fields to be extracted and their corresponding values. It is particularly suitable for extracting specific information from complex, information-dense long documents (such as ESG reports, company annual reports, contract documents, research papers, and technical whitepapers) and outputting in a standardized form.
Key-Value template writing techniques
Writing key points
Precise definition: Define a clear name for each piece of information to be extracted. For example, instead of using vague terms like "basic information," use specific terms like "name" and "age."
Semantic extension: For each piece of information (key) to be extracted, think about what other related synonyms/near-synonyms exist. For example, when retrieving "job," also consider words like "occupation," "position," etc. This expands the matching range.
Limit output results: Through keyword options, set a clear answer range for each extraction item to constrain the model's output results.
Clear extraction rules: Add some common questions as examples in the FAQ section to help the model understand the application scenarios and extraction methods for the keys.
Examples
Example of a contract information extraction Key-Value template
No. | Keyword (Required) Please fill in nouns or phrases that accurately represent the content to be extracted | Synonyms Synonyms or near-synonyms of the keyword, treated as equivalent during information extraction. Can be left blank; filling in helps improve information extraction quality. | Keyword options Possible results for the keyword. It is recommended to fill in fixed answer options, such as "Yes/No", "Compliant/Partially compliant/Non-compliant". Can be left blank. | FAQ It is recommended to fill in clear extraction rules, such as what conditions output what results, used for judgment/summary. Can be left blank. |
1 | Contract type |
|
| If the document title is the contract type, please output the contract type directly. If the document title is not the contract type, please summarize the contract type based on the document content. |
2 | Payment method |
|
| What is the payment method in the contract? |
3 | Whether green clauses are signed |
| Does the document contain clauses with the following content: "Party A is committed to implementing green sustainable development, using energy most efficiently, minimizing environmental impact, and has developed green decoration guidelines for tenants to reference and implement." If yes, return "Yes"; if no, return "No" |
Key-value template configuration checklist
Before configuring, please confirm:
Is the key definition specific enough with clear boundaries? (Avoid broad and vague expressions)
Do the synonyms truly express the same or similar concepts? (Ensure semantic consistency)
Do the keyword options cover the main options for expected answers?
Do the FAQs simulate real user query intentions?
Is there any duplication or contradiction between information items (keys)? (Ensure each information item is independent)