To use Pinecone, OpenAI and a language model chain to answer questions in documents

Andrew Fletcher published: 21 November 2023 (updated) 22 November 2023 5 minutes read

10.x

Table of contents

Overview of the elements

Set Up API Keys

Obtain API keys for Pinecone and OpenAI.
Store the keys securely. Consider using a credentials.py file (as mentioned in a previous response).

Install Required Libraries

Install the necessary Python libraries, such as Pinecone's Python client (pinecone-client) and OpenAI's Python client (openai).

pip install pinecone-client openai

Create a Language Model Chain

Use OpenAI's language model (e.g., GPT-4) to understand and process questions.

Utilise Pinecone for efficient similarity search to find relevant documents based on the processed questions.

Process Questions with OpenAI

Use the OpenAI API to process questions and generate relevant information.

import openai
# Set your OpenAI API key
openai.api_key = "your_openai_api_key"
# Example: Generate response from GPT-3
response = openai.Completion.create(
   engine="text-davinci-003",
   prompt="What is the capital of France?",
   max_tokens=100,
)
answer = response.choices[0].text.strip()

Index Documents with Pinecone

Index your documents in Pinecone to make them searchable.

from pinecone import Pinecone
# Set your Pinecone API key and environment
pinecone_api_key = "your_pinecone_api_key"
pinecone_api_env = "your_pinecone_api_environment"
# Create a Pinecone client
pinecone = Pinecone(api_key=pinecone_api_key, environment=pinecone_api_env)
# Example: Index a document
document_vector = [0.1, 0.2, 0.3]  # Replace with your actual document vector
document_id = "doc_1"  # Replace with your actual document ID
pinecone.index(index_name="your_index_name", data=[(document_id, document_vector)])

Search for Relevant Documents with Pinecone

Use Pinecone to find relevant documents based on the processed questions.

# Example: Search for relevant documents
query_vector = [0.2, 0.1, 0.4]  # Replace with your actual query vector
response = pinecone.query(index_name="your_index_name", query_vector=query_vector, top_k=5)
relevant_documents = response.data  # Extract relevant documents based on similarity

Combine Results and Provide Answers

Combine the results from OpenAI and Pinecone to generate final answers.

# Combine OpenAI and Pinecone results
final_answer = f"Question: {question}\nAnswer (from OpenAI): {answer}\nRelevant Documents: {relevant_documents}"
print(final_answer)

Testing responses for LLM question / answer chain

Common settings

temperature = 0
model = "gpt-4"

Question - "Harvest sources was there representative samples for Torres Strait?"

Responses

Test 1 - map rerank

llm = ChatOpenAI(temperature=temperature, openai_api_key=OPENAI_API_KEY, model_name=model)
chain = load_qa_chain(llm, chain_type="map_rerank")

Response

This document does not answer the question

Test 2 - map reduce

llm = ChatOpenAI(temperature=temperature, openai_api_key=OPENAI_API_KEY, model_name=model)
chain = load_qa_chain(llm, chain_type="map_reduce")

Response

The text does not provide information on whether there were representative samples for Torres Strait.

Test 3 - refine

llm = ChatOpenAI(temperature=temperature, openai_api_key=OPENAI_API_KEY, model_name=model)
chain = load_qa_chain(llm, chain_type="refine")

Response

The text provides additional information about the difficulties encountered in organizing sample collection in Queensland. Despite positive conversations with several industry members and the dispatch of sample kits, further Queensland sample collection was not undertaken due to the white spot issue, movement restrictions, and workload for industry and regulatory stakeholders. In total, 273 samples were collected from 20 of a possible 31 sources. However, the text still does not specifically mention if any of these samples were from Torres Strait.

Test 4 - stuff

llm = ChatOpenAI(temperature=temperature, openai_api_key=OPENAI_API_KEY, model_name=model)
chain = load_qa_chain(llm, chain_type="stuff")

Response

The text does not provide information on whether there were representative samples for Torres Strait.

Chain type set to refine

temperature = 0
chain = load_qa_chain(llm, chain_type="refine")

The most comprehensive outcome was achieved when the chain_type was configured to 'refine'. Therefore, what happens when is the temperature is changed to 0.7?

The text still does not specify whether any samples were collected from Torres Strait. It only mentions that there were challenges in collecting samples from Queensland due to factors like the white spot disease outbreak and movement restrictions. In summary, 273 samples were collected from 20 out of a potential 31 sources, but it's not clear if Torres Strait was one of these sources.

Question - "What was the process for collecting the prawns?"

Responses

Test 1 - refine

llm = ChatOpenAI(temperature=temperature, openai_api_key=OPENAI_API_KEY, model_name=model)
chain = load_qa_chain(llm, chain_type="refine")

Response

The process for collecting prawns involved an initial assessment of the Australian Council of Prawn Fisheries (ACPF) and Australian Prawn Farmers Association (APFA) harvest and production areas. This led to the identification of 35 wild harvest sources and 19 farmed prawn sources. Each source was defined as a specific harvest area and a single species harvested or farmed from that area. If more than one species was harvested, a single harvest area could be defined as several sources. For instance, Shark Bay was one harvest area but was considered three or potentially four sources due to the harvesting of Western Kings, Brown Tigers, Endeavours, and Coral prawns from that area.

A sampling and chain of custody protocol was developed based on forensic sample collection and analysis. Different collection methods and personnel were used to collect the prawn samples. Between 2016 and 2019, 273 wild harvest and 136 farmed prawn samples were collected from 21 wild harvest sources and 17 farmed prawn sources respectively. These samples were then forwarded to the Seafood CRC Company (SCI) for analysis.

Test 2 - map reduce

llm = ChatOpenAI(temperature=temperature, openai_api_key=OPENAI_API_KEY, model_name=model)
chain = load_qa_chain(llm, chain_type="map_reduce")

Response

The process for collecting the prawns was based on a sampling and chain of custody protocol and documentation developed for prawn collection. This was based on forensic sample collection and analysis. A variety of different collection methods and personnel were used.

Test 3 - map rerank

llm = ChatOpenAI(temperature=temperature, openai_api_key=OPENAI_API_KEY, model_name=model)
chain = load_qa_chain(llm, chain_type="map_rerank")

Response

A sampling and chain of custody protocol and documentation for prawn collection was developed, based on forensic sample collection and analysis. Different collection methods and personnel were used, and between 2016 and 2019, 273 wild harvest and 136 farmed prawn samples were collected from 21 wild harvest sources and 17 farmed prawn sources respectively and forwarded to SCI for analysis.

Test 4 - stuff

llm = ChatOpenAI(temperature=temperature, openai_api_key=OPENAI_API_KEY, model_name=model)
chain = load_qa_chain(llm, chain_type="stuff")

Response

A sampling and chain of custody protocol was developed for prawn collection, based on forensic sample collection and analysis. Different collection methods and personnel were used to collect the prawns. The prawns were then forwarded to SCI for analysis.

Andrew Fletcher • 09 May 2024

Accessing nested paragraph fields in Drupal: a step-by-step reference

Drupal
twig

The issue – I have a content type that includes an entity reference revisions field named field_introduction. This field points to a paragraph, which itself contains a text field also named field_introduction, of the type Text (formatted, long). How can I access the content stored in this...

Andrew Fletcher • 09 May 2024

How to retrieve the current menu title in Drupal – a developer’s reference

Drupal

When building a theme in Drupal, it's often necessary to adapt your site’s UI dynamically based on the context in which a user is navigating. One such piece of dynamic content is the menu title. Achieving this can be a bit tricky due to the layered nature of its routing and menu systems. This...

Andrew Fletcher • 07 May 2024

Understanding and resolving a Drupal render array error

Drupal

Dealing with errors in Drupal development is a common occurrence, and understanding how to interpret and resolve them is essential for smooth development workflows. In this article, we'll delve into a specific error message related to render arrays in Drupal and discuss steps to diagnose and fix the...

Overview of the elements

Set Up API Keys

Install Required Libraries

Create a Language Model Chain

Process Questions with OpenAI

Index Documents with Pinecone

Search for Relevant Documents with Pinecone

Combine Results and Provide Answers

Testing responses for LLM question / answer chain

Common settings

Question - "Harvest sources was there representative samples for Torres Strait?"

Responses

Test 1 - map rerank

Response

Test 2 - map reduce

Response

Test 3 - refine

Response

Test 4 - stuff

Response

Chain type set to refine

Question - "What was the process for collecting the prawns?"

Responses

Test 1 - refine

Response

Test 2 - map reduce

Response

Test 3 - map rerank

Response

Test 4 - stuff

Response

Related articles