OpenBean Inc. is seeking an experienced AI/LLM Developer to build a system that can intelligently search and extract business rules from a large corpus of PDF documents. The ideal candidate will have strong skills in Python, NLP/LLMs, document processing, and machine learning, and be capable of building end-to-end solutions using modern AI toolchains.
Responsibilities:
- Develop a solution to ingest and index hundreds of PDF files.
- Build a natural language search interface powered by a Large Language Model (LLM).
- Enable the system to infer, extract, and summarize business rules from PDF text.
- Fine-tune or prompt-engineer LLMs (e.g., OpenAI, Mistral, Claude, or open-source like LLaMA) for domain-specific rule extraction.
- Implement techniques for semantic search, embedding generation, and retrieval-augmented generation (RAG).
- Work with vector databases (e.g., Pinecone, FAISS, Chroma) for efficient search.
- Design and implement APIs or a UI interface for querying and visualizing results.
- Ensure proper handling of scanned PDFs using OCR if needed (e.g., Tesseract, Amazon Textract)
Skills Required:
- Strong Python programming skills.
- Experience with LLMs (OpenAI, HuggingFace, LangChain, LlamaIndex).
- Proficient in PDF processing libraries (e.g., PyMuPDF, PDFPlumber, or Adobe APIs).
- Knowledge of text embeddings, vector search, and similarity matching.
- Familiarity with document classification and information extraction.
- Experience with cloud services (AWS/GCP/Azure) is a plus.
- Strong problem-solving, communication, and documentation skills.