Together We Stand

[ 94 ] The global reach of contributors to the Nepal earthquake response Source: MicroMappers © OpenStreetMap contributors © CartoDB of small task digital engagement in post-disaster situations as a phenomenon that can be harnessed for great social good. AIDR is an award-winning open source software that uses machine learning and natural language processing to analyse big data sets. 3 AIDR filters and classifies various types of data including social media messages, SMS (text messages), imagery, text and images related to natural disasters and humanitar- ian crises. It allows administrators, emergency managers and humanitarians to collect crisis-relevant information such as Twitter or SMS, define categories and train models to automati- cally classify the messages into them, run the created classifiers, and download selected messages relevant to various informa- tion needs. We converted OCHA-defined CODs into ‘classes’ that AIDR uses to categorize incoming messages. Rapid understanding of high-velocity streams of messages that people post on Twitter requires real-time processing capabilities. Often the volume of messages generated during the onset of a major crisis goes beyond human processing capabilities. Moreover, messages on Twitter are brief, infor- mal and contain misspellings and grammatical mistakes. This is one of the reasons that simply doing keyword searches on Twitter data does not produce fully encompassing results. Furthermore, tweets captured via keyword searches may not be relevant since words can have multiple meanings depending on context. Semantic understanding of messages is necessary to develop an effective categorization approach. 4 To overcome these issues, AIDR is designed to ingest and process data in real time, and for that we adopt the crowd- sourced stream processing paradigm. 5 AIDR uses supervised machine learning techniques to train user-defined classifi- ers. For machine training purposes, we use human-tagged messages by employing SBTF volunteers. Given a set of human-tagged messages, we use Random Forest, which is a well-known learning algorithm, to generate new models. Once a model is trained, all subsequent messages arriving in the stream are classified by the machine. In this case, a message will be assigned a category and a machine confidence score. There are two core components of AIDR: the Collector and Tagger. The Collector simply allows a collection to be set up using a set of keywords or geographical regions. Given the keywords or geographical region based queries, AIDR connects to a Twitter streaming application program interface to get access to a live stream of messages posted by the public. The AIDR user defines classifiers in the Tagger module. Classifiers, for example, could include those that refer to ‘needs’, ‘infrastructure damage’ and ‘rumours’. Next, we use supervised machine learning techniques as explained above and employ SBTF volunteers to help tag a handful of tweets to these classifiers. AIDR keeps training new models on receiving human-tagged tweets. After this training, the Tagger component of AIDR automatically applies the topics of interest (the classifiers) to tweets collected in real time using the Collector. Human intelligence is used to learn how to classify the messages, either by manually classifying messages into the plat- form itself or by exporting them seamlessly to a crowdsourcing platform for humanitarian purposes (MicroMappers). For all these events, the platform and community network, MicroMappers was activated in order to tag short text messages as well as to evaluate ground and aerial images. Thus, MicroMappers can also be viewed as a valuable data repository, containing historical data from past events in which it was acti- vated. The platform provides an interface to ‘microtask’ or ‘tag’ data items by giving volunteers the ability to make quick deci- sions about the relevance of content. There is always a balance between data collection and sharing and security and privacy. As T ogether W e S tand

RkJQdWJsaXNoZXIy NzQ1NTk=