| Abstract | We propose smart home health care system for the realization of smart cities to fulfill the needs of elderly people in order to have continued care. An elderly person should be monitored constantly, specifically if he or she is diagnosed for health-related problems before. In the proposed system, a patient's condition is monitored by using multimodal inputs, specifically, speech and video. Video cameras and microphones are installed in the smart homes; these sensors constantly capture video and speech of the patient, and transmit them to a dedicated cloud. In the cloud, the data are processed, and a classification score on the patient's condition, whether he is normal, tensed, or in pain, is produced. Depending on the condition of the person, the doctors prescribe the person via audio, video or message services, or the caregivers rush to the location for emergency. For data processing in the server, we extract robust and lowdimensional discriminating features from voice and video frames. Experimental results show that the combined modality achieves better accuracy than that using a single modality to correctly classify the patient's condition. |