Background: Sensors embedded in smartphones allow for the passive momentary quantification of people’s states in the context of their daily lives in real time. Such data could be useful for alleviating the burden of ecological momentary assessments and increasing utility in clinical assessments. Despite existing research on using passive sensor data to assess participants’ moment-to-moment states and activity levels, only limited research has investigated temporally linking sensor assessment and self-reported assessment to further integrate the 2 methodologies. Objective: We investigated whether sparse movement-related sensor data can be used to train machine learning models that are able to infer states of individuals’ work-related rumination, fatigue, mood, arousal, life engagement, and sleep quality. Sensor data were only collected while the participants filled out the questionnaires on their smartphones. Methods: We trained personalized machine learning models on data from employees (N=158) who participated in a 3-week ecological momentary assessment study. Results: The results suggested that passive smartphone sensor data paired with personalized machine learning models can be used to infer individuals’ self-reported states at later measurement occasions. The mean R2 was approximately 0.31 (SD 0.29), and more than half of the participants (119⁄158, 75.3%) had an R2 of ≥0.18. Accuracy was only slightly attenuated compared with earlier studies and ranged from 38.41% to 51.38%. Conclusions: Personalized machine learning models and temporally linked passive sensing data have the capability to infer a sizable proportion of variance in individuals’ daily self-reported states. Further research is needed to investigate factors that affect the accuracy and reliability of the inference.