The proliferation of social media – such as Twitter, Facebook, blogs, and Web forums – has created an unprecedented, continuous stream of messages containing the thoughts, opinions, and beliefs of millions of people. Can we transform this raw data into insights about public health? Our recent work has shown promising results mining online data to monitor disease symptoms and estimate population health, suggesting that this new data source can enhance our understanding of the relationships among health, behavior, personality, and environment.
- Reducing Sampling Bias in Social Media Data for County Health Inference, JSM Proceedings
- Estimating County Health Statistics with Twitter, CHI 2014
- Lightweight methods to estimate influenza rates and alcohol sales volume from Twitter messages, Language Resources and Evaluation, 2013
- Detecting influenza epidemics by analyzing Twitter messages, arXiv:1007.4747v1 2010
- Towards detecting influenza epidemics by analyzing Twitter messages, KDD 2010 Workshop
During disasters such as hurricanes, first-responders need situational awareness to make the right decisions in a quickly changing environment. People on the ground often post online messages that provide actionable information, but it can be difficult to find among all the noise. Can we monitor social media during a natural disaster or other crisis to inform first-responders? Can we discern the most vulnerable populations based on their attitudes before, during, and after the disaster?
- Tweedr: Mining Twitter to Inform Disaster Response, ISCRAM 2014
- A demographic analysis of online sentiment during Hurricane Irene, HLT/NAACL 2012 Workshop
User Attribute Inference
Using social media to inform health and disaster relief requires knowledge of user-level attributes, such as location, age, and gender, in order to produce accurate information. Can we infer such attributes from linguistic patterns of users? If so, what are the privacy implications of this technology?
- Using county demographics to infer attributes of Twitter users, ACL Joint Workshop on Social Dynamics and Personal Attributes in Social Media
- Inferring the Origin Locations of Tweets with Quantitative Confidence, CSCW 2014
- Too Neurotic, Not too Friendly: Structured Personality Classification on Textual Data, ICWSM 2013 Workshop
Most of the world’s information is intended to be read by humans, not computers. Information extraction transforms unstructured documents into structured representation, thereby allowing knowledge discovery applications to provide insights from large text collections. We explore statistical approaches to named-entity recognition, coreference resolution, and relation extraction.
- An entity-based model for coreference resolution, ICDM 2009
- First-Order Probabilistic Models for Coreference Resolution, HLT/NAACL 2007
- Canonicalization of Database Records using Adaptive Similarity Measures, KDD 2007
- Author Disambiguation using Error-driven Machine Learning with a Ranking Loss Function, AAAI 2007 Workshop
- Learning field compatibilities to extract database records from unstructured text, EMNLP 2006
- Integrating probabilistic extraction models and data mining to discover relations and patterns in text, HLT/NAACL 2006
- Joint deduplication of multiple record types in relational data, CIKM 2005
- Extracting social networks and contact information from email and the Web, CEAS 2004
- Dependency tree kernels for relation extraction, ACL 2004
- Confidence estimation for information extraction, HLT 2004
Most machine learning methods require costly human annotation efforts for training and validation. Can we more efficiently train machine learning models? We explore several interactive frameworks to improve the learning rate of machine learning algorithms, particularly for structured prediction problems.
- Anytime Active Learning, AAAI 2014
- Towards Anytime Active Learning: Interrupting Experts to Reduce Annotation Costs, KDD 2013 Workshop
- Corrective Feedback and Persistent Learning for Information Extraction, Artificial Intelligence 2006
- Reducing labeling effort for structured prediction tasks, AAAI 2005
- Interactive information extraction with constrained conditional random fields, AAAI 2004
Scalable Machine Learning
Most sophisticated structured prediction algorithms were not designed to run at Web scale. We explore accurate approximations that allow us to use rich data representations while scaling up to millions of variables.
- SampleRank: Training factor graphs with atomic gradients, ICML 2011
- SampleRank: Learning preferences from atomic gradients, NIPS 2010 Workshop
- Learning and inference in weighted logic with application to natural language processing, PhD Thesis (UMass), 2008
- Sparse Message Passing Algorithms for Weighted Maximum Satisfiability, NESCAI 2007
- Tractable Learning and Inference with High-Order Representations, ICML 2006 Workshop
- Practical Markov logic containing first-order quantifiers with application to identity uncertainty, HLT/NAACL 2006 Workshop
- Learning clusterwise similarity with first-order features, NIPS 2005 Workshop