KnoweldgeGPT System

A KnoweldgeGPT System is a knowledge extraction Python library.

Context:
- It can (typically) have Extractor Classes, such as: WebScrapeExtractor, PDFExtractor, DocsExtractor, YouTubeAudioExtractor.
- ...
See: DocsGPT.

References

2023

https://github.com/geeks-of-data/knowledge-gpt
- QUOTE: knowledgegpt is designed to gather information from various sources, including the internet and local data, which can be used to create prompts. These prompts can then be utilized by OpenAI's GPT-3 model to generate answers that are subsequently stored in a database for future reference.
  To accomplish this, the text is first transformed into a fixed-size vector using either open source or OpenAI models. When a query is submitted, the text is also transformed into a vector and compared to the stored knowledge embeddings. The most relevant information is then selected and used to generate a prompt context.
  knowledgegpt supports various information sources including websites, PDFs, PowerPoint files (PPTX), and documents (Docs). Additionally, it can extract text from YouTube subtitles and audio (using speech-to-text technology) and use it as a source of information. This allows for a diverse range of information to be gathered and used for generating prompts and answers.
  Pypi Link: https://pypi.org/project/knowledgegpt/
- SUMMARY:
  - knowledgegpt is a Python library that allows extracting knowledge from various sources like websites, PDFs, PPTX, DOCX, YouTube videos to generate prompts for GPT-3.
  - It supports extracting text from webpages, PDF files, PowerPoint presentations, Word documents. It can also extract audio from YouTube videos using speech-to-text.
  - The extracted text is converted into fixed-sized vectors using open source models like SBERT. These vectors are stored and used to find relevant context when answering queries.
  - The relevant text is used to generate a prompt for GPT-3. The GPT-3 response is returned as the answer to the query.
  - The library has various Extractor classes like WebScrapeExtractor, PDFExtractor, DocsExtractor, YouTubeAudioExtractor etc to extract information from different sources.
  - It provides both a Python library to use programmatically and a REST API endpoint using FastAPI.
  - Key features include extracting text, generating prompts and getting answers from GPT-3, storing extracted information and prompts.
  - It supports multiple languages though English has the most development. Core dependencies are SpaCy, Transformers, OpenAI API.
  - The project is open source under MIT license and welcomes contributions. Main next steps are adding more knowledge sources, improving infrastructure and testing.