Created at 1pm, Mar 28
Ms-RAGScience
0
Comprehensive Lipidomic Automa4on Workflow using Large Language Models
GYUDF50xSqsUrjDQ1P_HY-e9vrmn8YkDE3bONjZ4pag
File Type
PDF
Entry Count
148
Embed. Model
jina_embeddings_v2_base_en
Index Type
hnsw

Connor Beveridge1,#, Sanjay Iyer1,#, Caitlin E. Randolph1,#, Matthew Muhoberac1, Palak Manchanda1, AmyC. Clingenpeel2, Shane Tichy3, Gaurav Chopra1,41- Department of Chemistry, 560 Oval Drive, Purdue University, West Lafaye:e, IN, 47907, 2- ExxonMobil Technology and Engineering Company, Annandale, NJ, 08801; 3- Agilent Technologies Inc. Santa Clara, CA 95051; 4- Department of Computer Science (by courtesy), Purdue University, West Lafaye:e, IN, 47907#These authors contributed equally to this workCorresponding Author: gchopra@purdue.eduAbstractProfiling the lipidome of biological systems generates large amounts of data that makes manual annotationand interpretation time consuming and challenging. Moreover, the vast chemical and structural diversityof the lipidome compounded by structural isomers further complicates annotation. Although severalcommercial and open-source software for targeted lipid identification exists, it lacks automated methodgeneraon workflows and integraon with existing statistical and bioinformatics tools. We have developedthe Comprehensive Lipidomic Automated Workflow (CLAW) platform with integrated workflow forparsing, detailed statistical analysis, and lipid annotaons based on custom multiple reaction monitoring(MRM) precursor and product ion pair transitions. CLAW is developed with several modules, including theability to identify carbon-carbon double bond position(s) in unsaturated lipids when combined with ozoneelectrospray ionization (OzESI)-MRM methodology. 1,2 To demonstrate the utility of the automatedworkflow in CLAW, large-scale lipidomics data was collected with traditional and OzESI-MRM profiling onbiological and non-biological samples. Specifically, a total of 1497 transions organized into 10 MRM basedmass spectrometry methods were used to profile lipid droplets isolated from different brain regionsof 18–24 month-old Alzheimer’s disease mice and age-matched wild-type controls. Additionally,triacyclglycerols (TGs) profiles with carbon-carbon double bond specificity were generated from canola oilsamples using OzESI-MRM profiling. We also developed an integrated language user interface with largelanguage models using artificially intelligent (AI) agents that permits users to interact with the CLAWplatform using a chatbot terminal to perform statistical and bioinformatic analyses. We envision CLAWpipeline to be used in high-throughput lipid structural identification tasks aiding users to generateautomated lipidomics workflows ranging from data acquisition to AI agent-based bioinformatic analysis.

To demonstrate the capabili_es of CLAW, we ini_ally used the developed pipeline on biological samples from four dis_nct mouse brain regions. In the rst example, tradi_onal MRM-proling of roughly 1500 individual lipid species was conducted on LDs isolated from specic brain regions of 5xFAD and WT mice. The data shows clear lipidome dis_nc_ons amongst LDs obtained from aged and AD-diseased brains, indica_ng the LDs related to aging and AD are not the same. Furthermore, dis_nct lipid signatures for the LDs isolated from the hippocampus, cortex, cerebellum, and diencephalon regions were observed.
id: 23850c73d9e6085d7c6118680144e619 - page: 16
In the second example, an online LC-OzESI-MRM method previously developed in-house was used to examine TGs with C=C specicity from several samples of canola oil taken at various stages of renement. TGs proled using OzESI-MRMs revealed li:le to no eect of the renement process of TG isomer composi_on. While not surprising, the u_liza_on of OzESI-MRMs in conjuga_on with CLAW successfully resolved, iden_ed, and rela_vely quan_ed isomeric popula_ons of TG molecular species that would otherwise remain uniden_ed using conven_onal or tradi_onal workows.
id: 5964a706cd0495107a886f0b21b50309 - page: 16
Future work aims to extend CLAWs worklist generator to other acquisi_on soaware beyond MassHunter. Specically, the parsing and annota_on will be extended beyond raw data les from Agilent instruments to other types of mass spectrometers. In addi_on, the integra_on of AI agents as co-pilots and agent-to-agent communica_ons will be developed for robust and responsible planning and execu_on of automated experiments u_lizing LLMs and tools framework outlined in this work. We also aim to enhance our use of edgeR methodology by incorpora_ng mul_-factor experimental designs and evalua_ng both trended and tagwise dispersion es_mates, as opposed to solely using common dispersion, and compare the likelihood ra_o test with the quasi-likelihood (QL) F-test. Soaware support will also be developed for other types of instrument modica_ons. Specically, in addi_on to OzESI module, new modules are being developed to support lipid iden_ca_on combined with chemical conjuga_on techniques. Code Availability
id: 8103bcba7236fd08a302854b7a97f126 - page: 16
com/chopralab/CLAW.git Acknowledgements
id: 88f7dd0be884b2c793fc1792ccd0063a - page: 16
How to Retrieve?
# Search

curl -X POST "https://search.dria.co/hnsw/search" \
-H "x-api-key: <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"rerank": true, "top_n": 10, "contract_id": "GYUDF50xSqsUrjDQ1P_HY-e9vrmn8YkDE3bONjZ4pag", "query": "What is alexanDRIA library?"}'
        
# Query

curl -X POST "https://search.dria.co/hnsw/query" \
-H "x-api-key: <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"vector": [0.123, 0.5236], "top_n": 10, "contract_id": "GYUDF50xSqsUrjDQ1P_HY-e9vrmn8YkDE3bONjZ4pag", "level": 2}'