Back to projects

CrystaLenz

Sep 2025
PythonGoogle ADKRAGpymatgenmp-apiSciPyNode.jsReactTailwind CSSPlotly
CrystaLenz

CrystaLens is an AI-assisted workflow for X-ray diffraction (XRD) analysis that combines an agentic backend (Starlette + Google ADK) with a modern Next.js web UI. It runs an end-to-end XRD pipeline, loading and preprocessing data, detecting peaks, computing Scherrer/Williamson-Hall size and strain, cross-checking against Materials Project references, and generating plots and reports. It also mines research papers via Google Programmable Search, downloads PDFs, and extracts text for literature triage. A RAG pipeline builds a FAISS vector database from extracted paper texts using OpenAI embeddings, then semantically retrieves the most relevant chunks to augment agent responses with cited, evidence-based context. Results stream to the web UI in real time as the pipeline progresses.

Achievement

CrystaLenz was ranked 4th among the top-scoring overall projects at the 2025 Hackathon on LLMs for Materials Science & Chemistry, a global competition with over 1,000 participants across 100+ teams worldwide.

Overview

CrystaLens is an AI-assisted workflow for X-ray diffraction (XRD) analysis. It combines an agentic backend (Starlette + Google ADK) with a modern Next.js web UI. The system can run an end-to-end XRD pipeline, mine research papers, and perform Retrieval-Augmented Generation (RAG) to augment agent responses with cited, evidence-based context, all while streaming results to the web UI as the pipeline progresses.

Demo Video

Architecture

The system is composed of a Starlette backend that adapts the agent runtime to HTTP + WebSocket endpoints, streaming agent events to the UI and serving generated artifacts. Agents are defined with Google ADK, where a top-level root agent orchestrates parallel research and XRD pipelines, followed by a final analysis stage.

  • Root agent orchestrates the full workflow via a sequential pipeline agent.
  • Parallel research analysis agent runs the research specialist and XRD specialist concurrently.
  • Final analyzer agent synthesizes results from both pipelines into a unified report.
  • The web UI (Next.js) posts a prompt, then receives streaming events and renders plots and links in real time.

XRD Pipeline

The XRD pipeline is a multi-agent system that handles the full diffraction analysis workflow. It starts with data loading, then runs a hyperparameter optimization loop before executing the core analysis stages:

  • Data loading and preprocessing: Loads raw XRD CSV data and applies preprocessing steps.
  • Peak detection: Identifies diffraction peaks in the preprocessed data.
  • Scherrer and Williamson-Hall analysis: Computes crystallite size and microstrain from peak broadening.
  • Materials Project reference checking: Cross-checks detected peaks against known reference patterns using the mp-api.
  • Reporting: Generates interactive Plotly plots.

Research Mining

The research mining pipeline automates literature triage by searching for relevant papers, downloading them, and extracting their text content for downstream RAG processing.

  • Paper search: Queries Google Programmable Search (CSE) to find relevant research papers.
  • PDF download: Automatically downloads discovered PDFs to a local directory.
  • Text extraction: Uses PyMuPDF (fitz) to extract text from downloaded PDFs, saving results for vector store ingestion.

RAG Pipeline

The RAG (Retrieval-Augmented Generation) pipeline enables the agent to ground its responses in actual research literature. It builds a searchable vector database from extracted paper texts and retrieves the most relevant passages at query time.

  • Vector store creation: Chunks extracted texts (1200 characters, 200 overlap), embeds them with OpenAI text-embedding-3-large, L2-normalizes the vectors, and stores them in a FAISS IndexFlatIP index.
  • Semantic retrieval: Embeds the user query with the same model, searches the FAISS index for top-k similar chunks, and returns ranked results with source metadata and similarity scores.
  • Context augmentation: Retrieved chunks are injected into the agent's context, enabling cited and evidence-based responses.

Tech Stack

CrystaLens is built on a diverse stack spanning backend agent orchestration, scientific computing, and a modern frontend:

  • Backend: Python, Starlette, Uvicorn, Pydantic, Google ADK, google-genai.
  • XRD analysis: NumPy, SciPy, lmfit, pymatgen, mp-api.
  • Visualization: Plotly + Kaleido for interactive and static plot export.
  • RAG: FAISS (faiss-cpu), OpenAI Embeddings (text-embedding-3-large), NumPy.
  • Research mining: requests, PyMuPDF (fitz) for PDF text extraction.
  • Frontend: Next.js, React, SWR, Tailwind CSS.