| CARVIEW |
Coder • Researcher • Math-Enthusiast • Artist
Hello! I am a Ph.D. student at Carnegie Mellon University in the
Machine Learning Department. I conduct research on how machine
learning and generative AI can support personalization in
education. I am currently advised by
Dr. Zachary Lipton.
Previously, I have worked on AI safety and large language models
projects through internships at Microsoft Research (advised by
Dr. Adam Kalai
&
Dr. Rosa Arriaga), Indico Data (advised by
Madison May), and The MITRE Corporation (advised by
Dr. John Henderson).
I finished my undergraduate at
Olin College of Engineering
majoring in
Engineering: Computing. There, I worked as a computing researcher in Olin's
Microbiology and Bioinformatics lab advised by
Dr. Jean Huang, in Olin's Satellite + Spectrum Technology & Policy group
advised by
Dr. Whitney Lohmeyer, and on a senior capstone research project advised by
Fidelity Center for Applied Technology.
Research Interests
Probabilistic NLP Models
Large language models trained at scale show emerging intelligent behavior, such as coherent and grammatical structures, cultural knowledge, and abstract reasoning capabilities.
- @ Microsoft Research, I used GPT-3 and Turning-NLP to simulate distributions of responses to psychology experiments.
Model Robustness
Adversarial attacks may manipulate the behavior of AI systems to serve a malicious end goal.
- @ The MITRE Corporation, I prototyped a docker containerized adversarial attack testing platform and populated a public information resource.
- @ MITRE NLP Lab, I supported research into practical attacks on machine translation using paraphrase.
Cross-Source Information Extraction
Extracting information from documents requires the ability to link events, entities and associated relations across multiple sources.
- @ Indico Data R&D, I worked on deep learning NLP and CV approaches to PDF information extraction.
- @ Olin Satellite Lab, I consolidated multiple possibly contradictory data sources when scraping the FCC's international filings database.
Event Sequence Modeling
With language, voice, and time-series data, data items are dependent on data before or after it.
- @ Olin Microbiology Lab, I characterized time-series data from perturbed and recovering microbial communities using methods from compositional data analysis.
- @ Fidelity R&D, I analyzed distributions of cryptocurrency technical trading indicators over time.
Experience
Undergraduate Researcher @ Olin Bioinformatics & Microbiology Lab
January 2021 - January 2023, part-time
Advised by Professor Jean Huang
- Led research on analyzing composition shifts in time-series of cultured, perturbed microbiomes.
- Conducted literature review to find, apply, and analyze limitations of Random Matrix Theory approach, Compositional Data Analysis, and network analysis.
- Presented poster at Northeastern Microbiologists: Physiology, Ecology, and Taxonomy (NEMPET).
- Led project on cleaning and interpreting 2D Fourier analysis to isolate patterns in bacterial surface images to identify pattern and shape of surface proteins.
-
Undergraduate Researcher @ Olin Satellite + Spectrum Technology & Policy Group
September 2021 - Present, part-time
Advised by Professor Whitney Lohmeyer
- Worked on undergraduate and industry research team to identify factors driving value in 5G spectrum auctions.
- Applied statistical analysis tests (ANOVA), and analyzed auction context and mechanics
- Presented session at Research Conference on Communication, Information, and Internet Policy (TPRC).
- Led project to automate web-scraping, PDF-text extraction, and data cleaning of satellite filings from the FCC's International Bureau Filing System, creating data sets to support two new analysis projects and generate yearly review of satellite filings.
-
Senior Engineering Capstone @ Fidelity Center for Applied Technology
September 2022 - Present, part-time
- Worked on undergraduate and industry research team to build a robust cryptocurrency algorithmic trading analysis and backtesting library.
- Led research into technical trading strategies, indicators, and evaluation methods.
- Designed visualizations to compare distributions of strategy performance.
- Supported 100% test coverage of backtesting library.
-
Undergrad Research Intern @ Microsoft Research
May - August 2022
Advised by Dr. Adam Kalai (Microsoft Research) and Professor Rosa I. Arriaga (Georgia Institute of Technology)
- Led research on using large language models (GPT-3, Turing-NLG) to simulate demographically-aligned distributions of human behavior on behavioral economics, psycholinguistics, and social psychology experiments, resulting in paper (preprint on arXiv).
- Designed zero-shot prompt methodology, compared predicted distributions to literature on Ultimatum Game, Garden Path sentence comprehension, and Milgram Shock Experiment.
- Designed and ran novel alternatively worded prompts to mitigate risk of models regurgitating training data.
- Working with cognitive psychology and behavioral language model researchers Professor Rosa I. Arriaga (Georgia Institute of Technology) and Professor Micheal Kearns (University of Pennsylvania) to evaluate limitations of method before submitting to top-tier journal
-
SWE Intern @ Indico Data Solutions
May - August 2021
Full-stack software engineering intern advised by Madison May (Co-founder and Machine Learning Architect)
- Improved deep-learning document extraction capabilities.
- As fullstack software engineer supporting team with four UXD and product interns, protoyped and user-tested high-fidelity novel React.js GUI for predicting and correcting groups of text extractions.
- In independent research project with R&D team, adapted object detection Faster R-CNN model to classify handwriting on business documents and incorporated methods for alternate pre-training, multi-label tasks, and small object detection.
- Both features were highlighted as two of the top five features of Indico's fifth major release.
-
SWE Intern @ The MITRE Corporation
September - December 2020
Software engineering intern working on Practical Attacks on Machine Translation using Paraphrase, advised by Dr. John Henderson (Principal Investigator)
- Researched new methods to exploit vulnerabilities in natural language machine learning systems.
- Revived and adapted an academic lab's research code for generating a paraphrase database using the bilingual pivoting technique, debugged it in a new environment, incorporated newer software packages, ran timing experiments to determine hardware needs and Hadoop configuration for running resource-intensive computation with 4x more data, modified code to run faster for our specific use-case.
- Augmented dataset with 900 million segments of parallel text, performed word-alignment, and implemented methods for scrubbing poorly aligned segments.
- Used Logistic Regression for classification on an imbalanced dataset, and improved performance with feature engineering ablation studies.
-
SWE Intern @ Cumulus Digital Systems
May - August 2020
Software engineering intern on the backend team
- Developed externally facing REST API to let Cumulus's clients interface with Cumulus's system directly.
- Created a Amazon Web Services SNS, Lambda, and DynamoDB webhooks system to handle real-time events.
- Implemented secure API endpoints using Serverless microservice, REST API, Swagger, and AWS CloudFront.
- Implemented request and response validation, automated documentation, and API documentation page.
-
SWE Intern @ The MITRE Corporation
June - August 2019
Software engineering intern on the Secure Assured Intelligent Learning Systems (SAILS) federally funded research project
- Worked on initial proof-of-concept platform to systematically benchmark machine learning models security vulnerabilities against adversarial attacks.
- Built gPRC interface for communication between docker containerized attacks and models.
- Improved speed of sending batches of images from 17 minutes to 17 seconds.
- Conducted literature review and consolidated information on adversarial attacks and defenses to populate a public education resource.
- Prototype work secured future funding: project now popularly known as Adversarial Threat Landscape for Artificial-Intelligence Systems (MITRE ATLAS).
-
SWE Intern @ Boston University
July - August 2018
High school researcher in the Software & Application Innovation Lab, advised by Lucy Qin (PhD candidate), Kinan Dak Albab (PhD candidate), and Dr. Andrei Lapets (Principal Investigator)
- Evaluated secure multi-party computation JavaScript library (JIFF) on sorting and set-intersection algorithms.
- Wrote demos with different algorithm implementation for data oblivious sorting algorithms and set-intersection tasks, calculated throughput and latency, and implemented benchmark tests.
- Contributed to the development of new functionalities that performed 92-95% faster than the existing JIFF functions on common use-cases.
- Presented poster at the Greater Boston Research Opportunities for Women (GROW) research conference.
-
Peer-Reviewed Publications
Using large language models to simulate multiple humans and replicate
human subject studies
G. Aher, R. I. Arriaga, and A. T. Kalai.
ICML 2023, *Oral.
Evaluating the FCC's $10 Billion Gamble: Successfully Accelerating
Access to Spectrum in Auction 107
G. Aher, P. Post, P. Boyalakuntla, G. Miner, L. Heinrich, Y. Mao,
J. A. Musey, W. Lohmeyer.
Journal of Information Policy (JIP 2023).
Analysis of Geostationary Federal Communication Commission Satellite
Applications from 2000 to 2022
P. Post, K. Fleming, K. Canavan, S. Cho, G. Aher, W. Lohmeyer.
Journal of Spacecraft and Rockets (2023).
Posters
What Factors Affect Microbial Community Composition?
Northeastern Microbiologists: Physiology, Ecology, and Taxonomy (NEMPET
2021)
SOARing with Drones in Education
Massachusetts Computer Using Educators (MassCUE 2018)
Refining Private Set Intersection Under Secure Multi-Party
Computation
Boston University, Greater Boston Research Opportunities for Women
(GROW 2018)
Artificial Intelligence, Chatbots, and Amazon Web Services
International Society of Technology Educators (ISTE 2018)
Projects
React.js + DigitalOcean + SQLite + Auth0 Football Pick-Em' site used by 40+ active weekly users
Fullstack React Development
Read MoreAlgorithms and implementations for small-world (local clustering) and scale-free (hubs) graphs
Generating Realistic Graphs
Read MoreDeep learning object-detection trials on pre-training, multi-label, small & imbalanced targets
Faster R-CNN for Handwriting Detection
Read MoreConstant time querying, compressing huge index numbers, and bypassing the curse of global updates
Data Structures for Large Scale Information Retrieval
Read MoreCharacterizing repeating protein patterns on bacteria image with 2D Fourier Transform
Fourier Transform Detective Story
Read MoreBrowse Projects By Category
Teaching, Leadership, and Academic Service
ENGR3599A-SL Olin College (Instructor Student-Led Course, Spring 2023): Advanced Algorithms
MTH2110 Olin College (Teaching Assistant Head Grader, Fall 2022): Discrete Mathematics
GirlsWhoCode Olin College (Branch Leader, Fall 2022)
Data Science and ML Lunch-and-Learn Olin College (Organizer & Presenter, Fall 2021)
ENGR2510 Olin College (Teaching Assistant, Fall 2022)
Einstein's Workshop Coding & STEM Classes (Teaching Assistant, 2017 - 2019)
Shishu Bharati Indian Language K-8 (Teaching Assistant, 2015 - 2019)
FIRST Lego League Robotics (Mentor, Fall 2018)
Some recent art...
My hobbies include drawing, dance, long-distance running, and playing four instruments :)