Projects
Public Health
The proliferation of social media – such as Twitter, Facebook, blogs, and Web forums – has created an unprecedented, continuous stream of messages containing the thoughts, opinions, and beliefs of millions of people. Can we transform this raw data into insights about public health? Our recent work has shown promising results mining online data to monitor disease symptoms and estimate population health, suggesting that this new data source can enhance our understanding of the relationships among health, behavior, personality, and environment.
Publications
- Discovering and Controlling for Latent Confounds in Text Classification Using Adversarial Domain Adaptation, SDM 2019
- Forecasting the presence and intensity of hostility on Instagram using linguistic and social features, ICWSM 2018
- Robust Text Classification under Confounding Shift, JAIR 2018
- Learning from noisy label proportions for classifying online social data, SNAM 2018
- Mining the Demographics of Political Sentiment from Twitter Using Learning from Label Proportions, ICDM 2017
- Co-training for Demographic Classification Using Deep Learning from Label Proportions, ICDM 2017
- Controlling for Unobserved Confounds in Classification Using Correlational Constraints, ICWSM 2017
- Identifying leading indicators of product recalls from online reviews using positive unlabeled learning and domain adaptation, ICWSM 2017
- Robust Text Classification in the Presence of Confounding Bias, AAAI 2016
- Reducing confounding bias in observational studies that use text classification, OSSM 2016
- A demographic and sentiment analysis of e-cigarette messages on Twitter, CHS 2015
- Using Matched Samples to Estimate the Effects of Exercise on Mental Health from Twitter, AAAI 2015
- Reducing Sampling Bias in Social Media Data for County Health Inference, JSM Proceedings
- Estimating County Health Statistics with Twitter, CHI 2014
- Lightweight methods to estimate influenza rates and alcohol sales volume from Twitter messages, Language Resources and Evaluation, 2013
- Detecting influenza epidemics by analyzing Twitter messages, arXiv:1007.4747v1 2010
- Towards detecting influenza epidemics by analyzing Twitter messages, KDD 2010 Workshop
Crisis Informatics
During disasters such as hurricanes, first-responders need situational awareness to make the right decisions in a quickly changing environment. People on the ground often post online messages that provide actionable information, but it can be difficult to find among all the noise. Can we monitor social media during a natural disaster or other crisis to inform first-responders? Can we discern the most vulnerable populations based on their attitudes before, during, and after the disaster?
Publications
- Tweedr: Mining Twitter to Inform Disaster Response, ISCRAM 2014
- A demographic analysis of online sentiment during Hurricane Irene, HLT/NAACL 2012 Workshop
User Attribute Inference
Using social media to inform health and disaster relief requires knowledge of user-level attributes, such as location, age, and gender, in order to produce accurate information. Can we infer such attributes from linguistic patterns of users? If so, what are the privacy implications of this technology?
Publications
- When do Words Matter? Understanding the Impact of Lexical Choice on Audience Perception using Individual Treatment Effect Estimation, AAAI 2019
- Are Words Commensurate with Actions? Quantifying Commitment to a Cause from Online Public Messaging, ICDM 2017
- Using online social networks to measure consumers’ brand perception,
- Domain Adaptation for Learning from Label Proportions Using Self-Training, IJCAI 2016
- Predicting Twitter User Demographics using Distant Supervision from Website Traffic Data, JAIR
- Mining brand perceptions from Twitter social networks, Marketing Science
- Training a text classifier with a single word using Twitter Lists and domain adaptation, Social Network Analysis and Mining
- Finding truth in cause-related advertising: A lexical analysis of brands' health, environment, and social justice communications on Twitter, Journal of Values-Based Leadership
- Inferring latent attributes of Twitter users with label regularization, NAACL/HLT 2015
- Predicting the Demographics of Twitter Users from Website Traffic Data, AAAI 2015
- Using county demographics to infer attributes of Twitter users, ACL Joint Workshop on Social Dynamics and Personal Attributes in Social Media
- Inferring the Origin Locations of Tweets with Quantitative Confidence, CSCW 2014
- Too Neurotic, Not too Friendly: Structured Personality Classification on Textual Data, ICWSM 2013 Workshop
Information Extraction
Most of the world’s information is intended to be read by humans, not computers. Information extraction transforms unstructured documents into structured representation, thereby allowing knowledge discovery applications to provide insights from large text collections. We explore statistical approaches to named-entity recognition, coreference resolution, and relation extraction.
Publications
- An entity-based model for coreference resolution, ICDM 2009
- First-Order Probabilistic Models for Coreference Resolution, HLT/NAACL 2007
- Canonicalization of Database Records using Adaptive Similarity Measures, KDD 2007
- Author Disambiguation using Error-driven Machine Learning with a Ranking Loss Function, AAAI 2007 Workshop
- Learning field compatibilities to extract database records from unstructured text, EMNLP 2006
- Integrating probabilistic extraction models and data mining to discover relations and patterns in text, HLT/NAACL 2006
- Joint deduplication of multiple record types in relational data, CIKM 2005
- Extracting social networks and contact information from email and the Web, CEAS 2004
- Dependency tree kernels for relation extraction, ACL 2004
- Confidence estimation for information extraction, HLT 2004
Active Learning
Most machine learning methods require costly human annotation efforts for training and validation. Can we more efficiently train machine learning models? We explore several interactive frameworks to improve the learning rate of machine learning algorithms, particularly for structured prediction problems.
Publications
- Anytime Active Learning, AAAI 2014
- Towards Anytime Active Learning: Interrupting Experts to Reduce Annotation Costs, KDD 2013 Workshop
- Corrective Feedback and Persistent Learning for Information Extraction, Artificial Intelligence 2006
- Reducing labeling effort for structured prediction tasks, AAAI 2005
- Interactive information extraction with constrained conditional random fields, AAAI 2004
Scalable Machine Learning
Most sophisticated structured prediction algorithms were not designed to run at Web scale. We explore accurate approximations that allow us to use rich data representations while scaling up to millions of variables.
Publications
- SampleRank: Training factor graphs with atomic gradients, ICML 2011
- SampleRank: Learning preferences from atomic gradients, NIPS 2010 Workshop
- Learning and inference in weighted logic with application to natural language processing, PhD Thesis (UMass), 2008
- Sparse Message Passing Algorithms for Weighted Maximum Satisfiability, NESCAI 2007
- Tractable Learning and Inference with High-Order Representations, ICML 2006 Workshop
- Practical Markov logic containing first-order quantifiers with application to identity uncertainty, HLT/NAACL 2006 Workshop
- Learning clusterwise similarity with first-order features, NIPS 2005 Workshop