当前位置: 首页 > news >正文

Openai API + langchain 分析小型pdf文档

声明:该版代码在2024.08.23有效。

代码如下:

from langchain_community.document_loaders import PyPDFLoader
import getpass
import os
from langchain_openai import ChatOpenAI
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplateclass QA:"""A class to handle question-answering tasks on a given PDF document.Attributes:question (str): The question to be answered about the PDF.pdf_path (str): Path to the PDF file.model_name (str): Name of the model used for analysis.docs (list): Loaded PDF documents.vecstore (Chroma): The vector store object for storing document embeddings.Methods:set_environ(): Set environment variables for the OpenAI API.load_file(): Load a PDF file using PyPDFLoader.split_and_store(): Split the PDF text and store embeddings using Chroma.retrieve_pdf(): Retrieve and answer questions based on the PDF content."""def __init__(self, question, pdf_path, model_name):"""Initializes the QA object with provided question, PDF path, and model name.Parameters:question (str): The question to be answered about the PDF.pdf_path (str): Path to the PDF file.model_name (str): Name of the model used for analysis."""self.question = questionself.pdf_path = pdf_pathself.model_name = model_nameself.docs = Noneself.vecstore = Nonedef set_environ(self):"""Sets the environment variables necessary for OpenAI API authentication."""os.environ['OPENAI_API_KEY'] = input("your api:")os.environ['OPENAI_PROXY'] = 'http://127.0.0.1:20171'def load_file(self):"""Loads the PDF file specified by the pdf_path attribute using PyPDFLoader."""loader = PyPDFLoader(self.pdf_path)self.docs = loader.load()def split_and_store(self):"""Splits the loaded PDF text into manageable chunks and stores the embeddings in a vector store."""text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)splits = text_splitter.split_documents(self.docs)self.vecstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())def retrieve_pdf(self):"""Retrieves context from the vector store and generates an answer to the input questionusing a retrieval-augmented generation chain."""retriever = self.vecstore.as_retriever()llm = ChatOpenAI(model="gpt-4o")system_prompt = ("You are an assistant for question-answering tasks. ""Use the following pieces of retrieved context to answer ""the question. If you don't know the answer, say that you ""don't know. Use three sentences maximum and keep the ""answer concise.""\n\n""{context}")prompt = ChatPromptTemplate.from_messages([("system", system_prompt),("human", "{input}"),])question_answer_chain = create_stuff_documents_chain(llm, prompt)rag_chain = create_retrieval_chain(retriever, question_answer_chain)results = rag_chain.invoke({"input": self.question})print(results['answer'])def run(self):self.set_environ()self.load_file()self.split_and_store()self.retrieve_pdf()def __main__():"""Main function to execute the QA class functionality.Prompts user for input parameters, creates a QA object, and processes the specified PDF."""question = input("Your question:")pdf_path = input("Enter the path of the pdf file:")model_name = input("Enter the model name:")qa = QA(question, pdf_path, model_name)qa.run()if __name__ == "__main__":__main__()


http://www.mrgr.cn/news/11023.html

相关文章:

  • java 线程
  • 深入理解微服务中的负载均衡算法与配置策略
  • 第一人称跟随视角与固定(2d)视角的转换
  • 从PCB开始研究FPGA设计问题
  • 【渗透测试】ATTCK靶场一,phpmyadmin,域渗透,内网横向移动攻略
  • 命令模式在手游后端的应用
  • python_每天定时向数据库插入数据
  • Kubernetes存储入门
  • 【创作活动】你是如何克服编程学习中的挫折感的
  • 第四节:Nodify 连接端子手动连接
  • 代码随想录算法训练营day53:图04:104.建造最大岛屿;110. 字符串接龙;105.有向图的完全可达性
  • Mac外接4K显示器 字体大小适应 设置HIDPI
  • 开源低代码LLM编排平台Dify:可视化Agent和工作流,如何部署在自己系统中,自定义修改前后端详解
  • PHP在现代Web开发中的高效应用与实战案例
  • SpringMVC - 第一个 SpringMVC 程序
  • OpenCV+Python自动填涂机读卡
  • OpenCV绘图函数(2)绘制圆形函数circle()的使用
  • 用Python插入SVG到PDF文档
  • 数学建模学习(118):牛顿冷却定律的原理解析、案例分析与Python求解
  • 【HuggingFace Transformers】BertIntermediate 和 BertPooler源码解析