Human Technology Interaction

DS4101 | Monsoon 2025

Home Lectures Readings Labs Project Info

Monsoon 2025

Adaptive Personalized Fitness Assistant

Amog Rao, Shivangi Agarwal, Zoya Ghoshal

CODE PDF
[click on the image to enlarge]

Maintaining fitness consistency is challenging due to the high cost of personal trainers and the static nature of existing applications, often resulting in unused memberships and injury risks. To address this gap, we present Dumbl, a market-ready solution built upon our research system FlexAI. This real-time, multi-modal adaptive framework integrates computer vision, bio-sensing, and Large Language Models (LLMs) to provide instant, personalized interventions. Informed by a formative study of 90 participants, our methodology employs MediaPipe for pose estimation, a fine-tuned ResNet-18 model for pain classification (achieving 79.3% accuracy), and audio analysis for fatigue detection. An integrated LLM synthesizes these physiological inputs to generate context-aware, tone-adaptive feedback. A comparative user study with 20 participants revealed that the system significantly reduced negative emotional states, with users reporting lower exhaustion (p=0.0103) and discouragement (p=0.036) compared to a control group. Additionally, participants experienced significantly less boredom (p=0.0258) and higher enjoyment, demonstrating the system's ability to effectively simulate expert human coaching through dynamic physiological adaptation.


Altrus X Plaksha: Waste Management Bot

Chanakya Rao, Moksh Soni, Vaibhav Chopra

CODE PDF
[click on the image to enlarge]

This project entails the design, development, and preliminary deployment of an autonomous indoor waste-disposal robot intended for biomedical waste handling in hospital environments. Biomedical waste poses significant hygiene and safety risks, and existing manual collection systems increase the likelihood of contamination. To address this, the authors developed a robotic platform capable of identifying, collecting, and transporting waste bins with minimal human intervention. The robot integrates a 4-wheel-drive aluminium chassis, a Raspberry Pi 4 compute unit, an RPLIDAR A1M8 sensor for 2D SLAM, a webcam for bin detection via YOLOv8, and a single-degree-of-freedom lifting mechanism actuated by dual linear motors. Navigation is executed using Hector SLAM and the ROS2 NAV2 stack, enabling obstacle detection and path planning in narrow clinical corridors.

A structured experimental protocol was undertaken in controlled laboratory environments and subsequently at Altrus Healthcare, where the robot was tested in realistic operational settings. Trials revealed key implementation challenges, including torque differentials in ageing motors, chassis tolerance limitations, drift due to low-grip tyres, and occlusion of the LiDAR field of view by dustbins. Despite these constraints, the robot reliably executed bin recognition, lifting, and transport tasks. A qualitative survey of hospital staff indicated strong support for future robotic integration in healthcare and recognised this prototype as an essential proof-of-concept. The findings inform the next iteration, which will feature improved actuation, onboard waste consolidation, and enhanced autonomy.


ATLAS: Adaptive Technology for Location and Spatial Support

Nikhil Henry, Pranjal Rastogi, Tanmay Nanda

CODE PDF
[click on the image to enlarge]

Visually impaired individuals often face significant challenges navigating dynamic outdoor and unfamiliar indoor environments, frequently relying on human assistance due to the limitations of existing technology. Current solutions like GPS and beacon-based systems struggle with indoor precision and manual mapping requirements, while wearable options like smart glasses suffer from high latency and connectivity dependence. Although previous literature explores Visual Language Models (VLMs) for navigation, these often grapple with hallucinations and dangerous inaccuracies. This project addresses these gaps by developing a proactive, hybrid navigation pipeline that integrates a validation loop and an offline fallback mechanism to ensure reliability across all network conditions.

The methodology employs the Habitat Matterport3D (HM3D) dataset for embodied AI simulation and the VizWiz dataset for benchmarking Visual Question Answering (VQA). The proposed system architecture utilizes an "Orchestrator" to route inputs to specialized components: a VQA LLM for descriptive tasks and a Navigation LLM for path planning. We selected Gemini 2.5 Flash Lite for its multimodal capabilities and low latency, alongside a reinforcement learning based Semantic Exploration agent for providing instructions to find objects in unknown environments.

Performance is evaluated through objective metrics such as VQA accuracy and Success weighted by Path Length (SPL), alongside subjective feedback from sighted and blindfolded individuals to assess real-world utility. Preliminary baselines indicate that smaller, efficient models can outperform larger counterparts in single-image understanding, achieving an average SPL score of 0.66 in simulations. This solution significantly enhances accessibility by providing a seamless, hallucination-resistant navigation aid capable of operating proactively in real-time.


CognisAI: A Multimodal Neuroadaptive AI Tutor for Personalized Learning

Ananya Shukla, Chaitanya Modi, Satvik Bajpai

CODE PDF
[click on the image to enlarge]

Current Large Language Model (LLM) tutoring systems, while linguistically capable, fail to detect a learner's cognitive state, often leading to disengagement or cognitive overload. Unlike previous studies such as GazeTutor or NeuroChat, which rely on single modalities or intrusive sensors like EEG, CognisAI integrates non-intrusive multimodal biosensing—specifically gaze tracking, heart rate variability (HRV), and posture analysis—to create a closed-loop neuroadaptive learning environment.

The methodology involved a formative study (N=66) followed by a controlled lab experiment (N=25) to collect physiological and behavioral data. We employed computer vision algorithms to track body landmarks for posture classification and signal processing techniques (RMSSD, SDNN) to interpret physiological stress signals in real-time. These lightweight algorithms were selected to minimize latency, enabling immediate, seamless intervention without the computational overhead of complex deep learning models.

Results demonstrate the efficacy of this approach in enhancing the learning experience. Performance metrics using the NASA-TLX scale revealed a significant reduction in Frustration (Delta=0.54) and Mental Demand (Delta=0.49), alongside a notable increase in Perceived Performance (Delta=0.34). These findings signify a paradigm shift from reactive text generation to proactive, state-aware pedagogical support, proving that physiologically adaptive AI significantly improves user satisfaction and learning outcomes.


Evaluating the Impact of Unified Customer Data Platform (CDP) Interfaces on Cognitive Load and Operational Efficiency in Healthcare

Anirudh Chauhan, Ayush Sharma, Manan Chawla, Vijeta Raghuvanshi

CODE PDF
[click on the image to enlarge]

Healthcare organizations face significant challenges due to fragmented patient data across disparate systems, leading to high staff cognitive overload and compromised decision-making. This project addressed these Human-Technology Interaction (HTI) issues by developing and evaluating a Unified Customer Data Platform (CDP) interface against an existing two-dashboard system. Our evaluation leveraged objective physiological metrics (pupil dilation, fixations) and subjective assessments (NASA-TLX).

The Unified CDP (Dashboard 1) demonstrated a statistically significant and superior HTI experience. It resulted in a 53% reduction in Task Duration and a 50% reduction in visual fixations for some tasks, indicating improved efficiency and less visual effort. Subjectively, Dashboard 1 achieved up to a 69% reduction in user frustration and a 65% reduction in effort. These findings provide evidence-based guidelines for designing next-generation healthcare data platforms that prioritize data unification and human factors to enhance operational efficiency and staff wellbeing.


FocusGuard AI: Context-Aware Productivity & Workload Management System

Dipit Golechha, Krishnav Mahansaria, Vardan Vij

CODE PDF
[click on the image to enlarge]

In an era of fragmented digital workflows, traditional time-tracking solutions fail to distinguish between productive research and digital distraction due to a lack of contextual awareness. This project introduces FocusGuard AI, a context-aware workload manager that leverages Sensor Fusion—combining Operating System telemetry (active window, typing cadence), Computer Vision (facial presence), and Environmental Audio analysis—to quantify user focus in real-time.

To validate the solution, we operationalized a controlled observational study under Track 1 (Experimentation) with N=36 undergraduate students (CS-AI and DSEB majors). We captured high-frequency telemetry at a 1 Hz sampling rate, resulting in a dataset of 64,800 unique observations. The analysis validated the algorithm’s ability to accurately visualize "Flow States" versus "Distraction Valleys" using a weighted impact scoring system.

Key findings revealed distinct behavioral personas: Computer Science students primarily succumbed to communication-based distractions (e.g., Discord), while Design students favored visual media (e.g., Instagram). Furthermore, despite noted privacy reservations regarding webcam usage, 58.3% of participants expressed willingness to adopt the tool for daily or weekly use. These results demonstrate that fusing biometric presence with digital context provides a superior, granular metric for productivity compared to existing unimodal software loggers.


Haptic Feedback Assistant for Language Learning

Aman Paliwal, Angad Singh, Mudit Surana

CODE PDF
[click on the image to enlarge]

This project addresses the persistent challenge of teaching English pronunciation and prosody to non-native speakers, particularly the difficulty in perceiving and producing accurate stress, rhythm, and intonation patterns. There exists a strong need for intuitive, multimodal feedback; current language learning technologies often rely on audio-visual cues, which can cause cognitive overload and fail to provide immediate, embodied feedback. Prior studies have explored haptic-enhanced pronunciation training using specialized hardware, such as multi-motor wearables and fingertip devices, yielding notable gains in articulation accuracy and learner engagement. However, these solutions have been limited by the complexity of custom equipment and low accessibility for general learners.

In contrast, our approach leverages the built-in haptic engine of standard smartphones (iPhone 8 and above) to deliver real-time vibrotactile cues mapped to prosodic features without requiring extra hardware. Data for evaluation is collected through app-based interventions with beginner-to-intermediate English learners using randomized control and experimental groups. Speech samples are assessed with objective metrics: Goodness of Pronunciation (GoP), Lexical Stress Ratio (LSR), and F0 Root-Mean-Square Error, alongside subjective ratings from native speakers on a Likert scale. NASA-TLX scores and user feedback further quantify cognitive load and engagement. Preliminary results show marked improvements in pronunciation, accuracy, and participant experience, establishing the significance of accessible haptic feedback in mobile language learning.


Maintaining Flow in VR Conversations: A Modality-Level Analysis of LLM Delay Feedback on Perceived Latency

Abhinav Lodha, Pratham Arora

CODE PDF
[click on the image to enlarge]

The integration of LLM-based conversational agents into VR presents a fundamental challenge: processing delays that interrupt the illusion of natural conversation and presence. Recent work by Elfleet and Chollet (2024) showed that combined multimodal feedback significantly improves presence and immersion during LLM processing delays. However, their bundled implementation of verbal, gestural, and visual feedback leaves a critical question unanswered: which individual modality is most effective?

This study compares each feedback mechanism (verbal fillers, avatar gestures, and visual cues) against baseline to identify the optimal single approach for maintaining user engagement during inevitable LLM processing delays. By focusing on the impact of each modality, we aim to determine which feedback strategy best preserves immersion and conversational flow in VR environments powered by LLMs. The findings will offer practical, evidence-based design guidelines for future conversational VR applications, identifying the optimal single feedback mechanism for maintaining presence and engagement with LLM-powered agents.


Mitigating Information Asymmetry in Patient-Hospital Interactions

Harsh Siroya, Usman Akinyemi, Vandita Lodha

CODE PDF
[click on the image to enlarge]

Patients navigating healthcare portals currently face high cognitive load and complex navigation, resulting in near-zero contact form submissions. Previous studies indicate that while Large Language Models (LLMs) can automate inquiries, they often suffer from hallucinations, lack transparency, and rely on English-centric datasets that exclude diverse populations. Uniquely, this project addresses these gaps by deploying a multilingual (Hindi/English) Retrieval-Augmented Generation (RAG) chatbot that integrates directly with legacy systems to ensure grounded, real-time accuracy and data privacy.

The dataset was acquired by scraping and parsing the Altrus Healthcare website to construct a domain-specific knowledge base. Pre-processing involved text chunking and vector embedding generation to create a searchable semantic index. A RAG architecture was chosen over standalone LLMs to dynamically retrieve structured data (like doctor schedules), thereby minimizing hallucinations and enabling the system to provide factual, verifiable responses.

Post-deployment performance metrics confirmed a response latency of 2 seconds and stable scalability for 100+ concurrent users. Functionally, the chatbot reduced the average "Time to Information" from 5 minutes to under 1 minute. These objective metrics, combined with a significant increase in form submissions, demonstrate the solution's effectiveness in lowering cognitive load and successfully converting passive website visitors into potential patients through intuitive, natural language interaction.


Physiological-driven Elderly Care AI Assistant

Abhivarya Kumar, Bharat Jain, Gaurav Agarwal

CODE PDF
[click on the image to enlarge]

Problem: The elderly population is growing, this leads to gaps in continuous monitoring, symptom reporting, medication adherence, and emotional support. Current tools are not intelligent or flexible enough to handle real-world nursing and elderly-care needs.

Goal: Develop an AI-driven framework using Large Language Models (LLMs) to support nurses and assist elderly individuals through intelligent, conversational, and context-aware guidance.

Potential Applications: 24/7 conversational support for elderly patients, Medication & routine reminders and Early detection of health risks

Impact: Improved care quality, reduced nurse workload, enhanced patient safety, and more scalable elderly-care systems.

Productivity, Trust and Awareness across Levels of AI Autonomy

Jiya Agrawal, Liza Wahi, Madhvendra Singh

CODE PDF
[click on the image to enlarge]

As AI systems rapidly transition from passive tools to semi-autonomous collaborators, understanding how different levels of autonomy affect human trust, workload, and decision-making has become an essential research question. While prior studies have examined concepts such as automation bias, overreliance, and the effects of decision support systems, most focus on fixed-autonomy settings or narrow domains. There is limited empirical work that compares graduated levels of AI autonomy within the same task and evaluates how these levels shape human behaviour, perceived workload, and trust.

This project addresses that gap by designing a controlled user-study with four autonomy conditions (L1–L4), ranging from human-led to AI-led problem solving. Data is collected through participant interactions with a custom problem-solving interface, supported by logging of task performance, in-task behaviours, and post-task survey responses. Pre-processing methods include normalization, timestamp alignment, and interaction-pattern extraction. Techniques chosen for their suitability in analysing datasets where behavioural granularity and temporal structure matter.

The study uses NASA-TLX (for workload), Likert-scale trust measures, and objective task metrics (time, errors, dependency patterns) to quantify outcomes. The analysis aims to determine how autonomy shifts influence the balance between user confidence, cognitive effort, and task performance ultimately informing the design of more safe, reliable, and user-aware AI systems.


Studying the Effect of Modality and Topic on an Individual's Learning Ease and Performance Using Physiological Biomarkers

Shruti Laddha, Suhani Jain, Utkarsh Agarwal

CODE PDF
[click on the image to enlarge]

Conventional research on learning styles relies largely on self-reported preferences and overlooks the physiological effort associated with learning. This study investigates whether instructional modality influences objective performance and the ease of learning, using multimodal physiological biomarkers to quantify cognitive load. Nine participants engaged with three instructional modalities—reading, audio, and kinesthetic—across three cell-biology topics of comparable difficulty. Eye-tracking and ECG data were collected continuously, alongside subjective workload (NASA-TLX) and satisfaction ratings. A composite metric, Learning Ease, was developed by standardizing subjective workload and satisfaction with physiological markers (Δ heart rate and Δ pupil diameter). Repeated measures ANOVA and non-parametric tests showed no significant effect of modality on either objective performance or Learning Ease. However, Learning Ease was positively correlated with performance (R² = 0.58), indicating that reduced cognitive strain supports better outcomes. A significant interaction was observed between modality and topic order (F(8,18) = 2.87, p = 0.03), demonstrating that modality effectiveness depends on context, such as fatigue. These findings shift the focus from static learning styles toward adaptive learning strategies that align modality with the learner’s cognitive state and session phase.


Trust-Oriented Design for Real-Time AI Voice Agents

Agaaz Singhal, Vedant Singh

CODE PDF
[click on the image to enlarge]

Voice-based AI agents frequently exhibit response delays and interaction breakdowns that degrade user trust and perceived system reliability. While prior work has characterized how users react to discrete categories of voice assistant failures, the field lacks integrated, quantitative frameworks that link failure characteristics to measurable trust dynamics. Building on an established dataset of real-world voice assistant failures and trust responses across twelve failure types, we extend this foundation through a focused empirical study introducing two contributions: a severity index that captures the graded impact of different failure manifestations, and a set of measurable trust metrics that quantify changes in user confidence, perceived reliability, and behavioural repair strategies.

Our framework incorporates assessments of failure severity, task disruption, and post-error behaviour alongside trust indicators such as perceived ability, benevolence, and integrity. By aligning these metrics with documented failure modes, (including misunderstanding, overcapture, and response-execution errors) we provide a more granular model of how trust attenuates and recovers across interaction contexts. The resulting analysis highlights systematic relationships between failure severity, task-type, and trust resilience.

We hope to advance the measurement of trust in voice-based AI systems by integrating severity-based modelling with empirically grounded failure taxonomies. The findings inform the development of adaptive timing, clarification, and repair strategies for high-stakes domains such as healthcare and education, supporting the design of trustworthy and resilient VAs.