All Products
Search
Document Center

Energy Expert:Guide to creating information extraction templates

Last Updated:Jul 09, 2025

This topic describes how to create information extraction templates, improving the accuracy of nformation extraction jobs.

1. Template types

AI Doc provides two types of information extraction templates to meet the requirements in various scenarios.

Template type

Scenario

Core features

Prompt template

(recommended)

  1. Suitable for extraction jobs for various types of documents.

  2. Suitable for extraction jobs with complex rules, such as extraction and summarization of multi-dimensional comprehensive information. In such cases, prompt templates can better describe job details.

  1. Natural language instruction writing with high flexibility and easy-to-understand configurations.

  2. Support for all extraction methods: retrieval-augmented generation (RAG), image processing, and long text understanding.

  3. Support for concurrent extraction of multiple jobs. Multiple prompts can be configured in a prompt template.

Key-value template

  1. uitable for extracting various structured information from long documents.

  2. Suitable for scenarios where extraction fields must be specified and standardized answers must be provided. For example, you can select this template type if you need to check "whether worker protection content is included" in an ESG report and require standardized options such as "Yes" or "No".

  1. The tabular template configurations provide more standardized configuration process and output results but lower flexibility. Extraction items are displayed in tabular format, which is more intuitive.

  2. Support for only the Retrieval-Augmented Generation extraction method.

  3. Support for parallel extraction of multiple key values.


2. Prompt template creation guide

Scenarios

Prompt templates (guiding models to output structured/unstructured content through natural language instructions) are suitable for various task scenarios, especially excelling in tasks requiring flexibility, contextual understanding, free generation, or complex reasoning.

Prompt writing techniques

Direct questioning method

Scenarios

Suitable for scenarios with clear objectives, simple questions, and definite answers.

Writing key points

  • Concise: Express questions in the shortest way possible. Overly lengthy questions may contain redundant information, causing the model to misunderstand or answer incorrectly.

    • Before optimization: "Our company recently received an invoice from a supplier, and we need you to extract some key information from it. Please carefully read this invoice document and tell me the supplier's company name, invoice number, and issue date. Additionally, if there is a total amount due on the invoice, please extract that as well. We need this information for accounting processing and payment arrangements."

    • After optimization: "Extract from supplier invoice: company name, invoice number, issue date, total amount due."

  • Specific: In information extraction scenarios, clearly list all items to be extracted and provide detailed characteristic descriptions for each item.

    • Before optimization: "Analyze this article and extract key information."

    • After optimization: "Extract the following information from this article: the author's full name and any relevant titles (such as 'Professor', 'Researcher', etc.), the author's views on global warming, and 3-5 keywords."

  • Avoid ambiguity: If a word or phrase may have multiple meanings, either clarify its meaning or rephrase to eliminate ambiguity.

    • Before optimization: "What is a tree?"

    • After optimization: "Explain the concept of 'tree' in computer science."

  • Detailed context: If the question involves specific context or background information, provide sufficient details to help the model understand.

  • Clear logic: Questions should be logically coherent with a clear hierarchical structure.

    • Before optimization: "Extract applicant name, claim amount, when the accident occurred, policy number, what documents were submitted, what kind of accident happened?"

    • After optimization: "From the claim application document, extract the following information in sequence: 1. Customer basic information: name, policy number 2. Accident-related information: accident date, accident type 3. Claim amount requested by the customer."

  • Choose appropriate output method:

    • JSON output (recommended for complex, structured tasks)

      • Applicable scenario description

  • Need to extract multiple fields or structured information (such as time, location, people, events, etc.)

  • Need to organize extracted information for database storage or sending to other systems

  • Want to verify if the output is complete or has errors (such as missing fields, incorrect types), can be used with JSON Schema for validation

  • When extracted information is complex (such as an event involving multiple participants, multiple time points and locations), it is recommended to use a structured approach to organize data by categories. Below is an example showing the JSON output format defined in a Prompt for complex information extraction scenarios.

Please extract relevant information from the text and output in the following JSON format:

{
  "event_name": "Event name",
  "dates": [
    {
      "type": "Time type (such as opening ceremony, closing ceremony)",
      "date": "Date (format YYYY-MM-DD)"
    }
  ],
  "locations": [
    {
      "name": "Location name",
      "address": "Detailed address"
    }
  ],
  "participants": ["Participant 1", "Participant 2", ...]
}
  • Notes

  • Please make sure to include all items that need to be extracted in the JSON to ensure information completeness

  • Text output

    • Applicable scenario description

      • Only need to extract a small amount of information for direct display

      • Generate summaries, notes, and other unstructured content

  • Results are used for webpage display, printing, or other direct presentation purposes

  • For example: Extract title, time, and location from news, generate a one-sentence summary; no need for structured processing, output in natural language form

Examples

Type

Example

Result prediction

Problem analysis

Unoptimized example

Extract dates from the contract

  1. May output all dates (signing date/effective date/expiration date)

  2. Include irrelevant dates (attachment dates, etc.)

  3. Non-standardized date formats

  1. Date type not specified

  2. Clause location not specified

  3. Lack of format requirements

Optimized example

Please extract the following two pieces of information from Section 3.1 of the contract main text, and output in JSON format.

  1. Service start date: Date in YYYY-MM-DD format, mark as N/A if not explicitly stated in the clause;

  2. Service end date: Date in YYYY-MM-DD format, mark as N/A if not explicitly stated in the clause.

【Output JSON format】

[{

"Service start date": "***",

"Service end date": "***"

}]

Output:

[{

"Service start date": "2014-12-31",

"Service end date": "2024-12-31"

}]

  1. Clear clause location (Section 3.1)

  2. Specified date types (service start date, end date)

  3. Specified format requirements

Direct questioning checklist

Before asking, please confirm:

  1. Is the information to be extracted and its characteristics clearly described?

  2. Is the logic coherent, with each point clear and unambiguous?

  3. Are industry jargon avoided? (Replace internal abbreviations with standard terminology)

  4. Is structured output needed? If so, are all items to be extracted listed in the output format?

Advanced questioning method

Scenarios

Suitable for scenarios involving complex problems, multi-step tasks, or requiring thinking or reasoning.

Writing key points

  • Concise: Even for complex tasks, express core requirements as concisely as possible, avoiding redundant explanations that may interfere with the model's understanding.

  • Structured: Break down complex problems into clear steps to help the model process information in an orderly manner.

  • Role positioning: Tell the model what role it should play. For example, saying "you are now a doctor" will make the model think and answer questions from a doctor's perspective.

  • In-context learning (few shot): Provide the model with some good output examples for reference and learning.

  • Multi-dimensional constraints: Clearly state your various requirements for the model's response, such as what the format should be and what content should be included. For example, in a financial report analysis scenario, requirements such as: amounts accurate to the nearest million dollars, use professional financial terminology, etc.

Examples

Type

Example

Result prediction

Problem analysis

Unoptimized example

Analyze Q4 sales data and provide insights

  1. May output overly broad analysis

  2. Lack of structured output

  3. Cannot ensure focus on key metrics

  1. Lack of professional role definition

  2. Analysis steps not specified

  3. No output format requirements

Optimized example

【Task description】

As a professional financial analyst, please complete the sales data analysis task based on the provided Q4 sales report. Please proceed with the following steps:

  1. Assess overall trends: Describe the direction of sales changes and briefly explain their reasonableness;

  2. Identify key drivers: Point out the main product lines or market regions driving or dragging down sales performance;

  3. Analyze seasonal impacts: Evaluate the impact of holidays, promotional activities, etc. on sales results;

  4. Provide preliminary recommendations: Offer 1-2 targeted suggestions for developing sales strategies for the next quarter.

【Output example】

[{

"Overall trend": "Q4 sales increased by 6% compared to Q3, mainly driven by year-end promotions and holiday spending, but lower than the 8% in the same period last year.",

"Key drivers": ["Strong demand for high-end electronics, increasing to 35% of share", "Online channel sales increased by 18% year-over-year"],

"Seasonal impact": "Black Friday and Christmas promotions significantly boosted sales of consumer electronics.",

"Preliminary recommendations": ["Increase inventory of high-margin products", "Optimize online platform user experience to improve conversion rates"]

}]

【Output JSON format】

[{

"Overall trend": "Description...",

"Key drivers": ["Factor 1", "Factor 2"],

"Seasonal impact": "Analysis...",

"Preliminary recommendations": ["Recommendation 1", "Recommendation 2"]

}]

Output:

Clear, organized output of the following four key contents:

  1. Overall sales trend analysis

  2. Core product and market performance

  3. Seasonal factor analysis

  4. 1-2 targeted sales strategy recommendations

  1. Clear role positioning

  2. Clear core task: sales data analysis

  3. Set clear analysis steps, each step with concise description

  4. Provide output examples for model reference

  5. Specify output format

Advanced questioning checklist

Before asking, please confirm:

  1. Is the core task articulated concisely?

  2. Does the role setting match the nature of the task?

  3. Are the thinking path and reasoning steps clearly defined? (Avoid jumping logic)

  4. Is sufficient background information provided? (Complex questions require adequate context)

  5. Are output format requirements clear?

  6. For more complex output requirements, are clear output examples provided for the model to reference?

3. Key-Value template writing guide

Scenarios

Key-Value template is a structured data format used to clearly specify content fields to be extracted and their corresponding values. It is particularly suitable for extracting specific information from complex, information-dense long documents (such as ESG reports, company annual reports, contract documents, research papers, and technical whitepapers) and outputting in a standardized form.

Key-Value template writing techniques

Writing key points

  • Precise definition: Define a clear name for each piece of information to be extracted. For example, instead of using vague terms like "basic information," use specific terms like "name" and "age."

  • Semantic extension: For each piece of information (key) to be extracted, think about what other related synonyms/near-synonyms exist. For example, when retrieving "job," also consider words like "occupation," "position," etc. This expands the matching range.

  • Limit output results: Through keyword options, set a clear answer range for each extraction item to constrain the model's output results.

  • Clear extraction rules: Add some common questions as examples in the FAQ section to help the model understand the application scenarios and extraction methods for the keys.

Examples

Example of a contract information extraction Key-Value template

No.

Keyword (Required)

Please fill in nouns or phrases that accurately represent the content to be extracted

Synonyms

Synonyms or near-synonyms of the keyword, treated as equivalent during information extraction. Can be left blank; filling in helps improve information extraction quality.

Keyword options

Possible results for the keyword. It is recommended to fill in fixed answer options, such as "Yes/No", "Compliant/Partially compliant/Non-compliant". Can be left blank.

FAQ

It is recommended to fill in clear extraction rules, such as what conditions output what results, used for judgment/summary. Can be left blank.

1

Contract type

  • Agreement type

  • Contract form

  • Equipment lease contract

  • Equipment maintenance contract

If the document title is the contract type, please output the contract type directly. If the document title is not the contract type, please summarize the contract type based on the document content.

2

Payment method

  • Collection method

  • Transaction method

  • Independent cashier

  • Independent cashier, third-party interface

  • Party A's POS machine

  • None

What is the payment method in the contract?

3

Whether green clauses are signed

  • Yes

  • No

Does the document contain clauses with the following content: "Party A is committed to implementing green sustainable development, using energy most efficiently, minimizing environmental impact, and has developed green decoration guidelines for tenants to reference and implement." If yes, return "Yes"; if no, return "No"

Key-value template configuration checklist

Before configuring, please confirm:

  1. Is the key definition specific enough with clear boundaries? (Avoid broad and vague expressions)

  2. Do the synonyms truly express the same or similar concepts? (Ensure semantic consistency)

  3. Do the keyword options cover the main options for expected answers?

  4. Do the FAQs simulate real user query intentions?

  5. Is there any duplication or contradiction between information items (keys)? (Ensure each information item is independent)