Cyclone Amphan made landfall in South Asia on May 20, 2020. It was the most damaging storm in the history of the Indian Ocean, rendering hundreds of thousands of people homeless, ravaging agricultural lands and causing billions of dollars in damage. How were people affected by the storm? What were the responses of individuals, governments, corporates and NGOs? How was it covered by local, national and international media, as opposed to individuals' accounts? Who has created the dominant narratives of Cyclone Amphan; and whose voices go unheard? We aim to use online data -- such as Twitter posts, news headlines and research publications -- to analyze people's experiences of Cyclone Amphan.
Short-term objectives: 1) Investigate the social, political and economic effects of Cyclone Amphan. 2) Test this new methodology of using online data to characterize people's experiences of extreme weather events. It will be a learning experience for us! 3) In the spirit of reciprocity, we hope to co-author a peer-reviewed publication with the volunteers who we realize are donating their time and effort (if they'd like).
Longer-term objectives: 1) Better match the needs of people affected by extreme weather events with context-specific responses (e.g. humanitarian aid, funding, research) through a research tool that enables users to access more voices and perspectives on water-related issues. 2) Humanize the impacts of extreme weather events.
This project involves data collection, data analysis, and data visualization.
1) Data Collection
Our primary source of data will be from the Twitter Premium Search API. We will pull tweets based on key phrases and terms (such as 'Cyclone Amphan') in multiple languages, including English, Hindi, Bengali, and Odia. We are currently considering other forms of supplementary data such as Media Cloud (covering traditional media), remote sensor data, and GIS data, however the final choices will depend on the research questions we focus on.
2) Data Analysis
Our data analysis will follow a three-step approach: 1) Identify the individuals, corporations, and government institutions and their response, 2) determine the intent (donations, asking for help, etc.), and 3) determine who's most influential and who's not being heard.
Additionally, we will look to understand the timeline split, distinguishing between the tweets that were speculation (days leading up to the Cyclone) and tweets that represented the actual experience (day of landfall and thereafter).
3) Data Visualization
We would like to visualize this data using various techniques. For our NLP analysis, we can create word clouds and map topics over time. We can use our network analysis to create networks that show the connections in the tweets as well as point out the most influential users. A geographical analysis would be great for spatial understanding of the discourse.
4) Research Tool
We would like to build a web tool that displays our analysis and gives governments, aid organizations, researchers, and policymakers a quick understanding of the human impact of natural disasters.
Our data will consist of tweets pulled from the Twitter Premium Search API from May 1, 2020 - July 15, 2020.
We will include key terms in our search such as 'Cyclone Amphan' in multiple languages, including English, Hindi, Bengali, and Odia.
We are currently considering other forms of supplementary data such as Media Cloud (covering traditional media), remote sensoring, and GIS data, however the final choices will depend on the research questions we focus on.
Our analysis will primarily consist of preprocessing of text data, data exploration, and natural language processing techniques. First, preprocessing of the data is important as tweets are more colloquial in language versus more formal sources of text like newspapers articles; therefore, we must take care in how we choose to preprocess the data without discarding meaningful information. Second, our data exploration will consist of studying the evolution of tweets and answering questions like 'where are the tweets from?', 'what are the major languages?', 'where does there seem to be a high volume of tweets?', etc. Our goal in this part is to understand the dataset very well. Finally, natural language processing will make up our final part of the analysis. There is potential to use techniques such as named entity recognition, sentiment analysis, word embeddings, topic modeling, part-of-speech tagging, etc. There will be a large focus on understanding who is in the dataset and trying understand the intent behind their tweet. This will certainly be the toughest task of the main three.
Additional analysis tasks could include social network analysis and spatial analysis. A social network approach is useful to found out who is influential in the discourse and why. Spatial analysis will be important for understanding a bit deeper how location plays a role in this analysis. There is even potential to bring all three techniques (NLP, spatial analysis, and social network analysis) together for a very powerful analysis.
The project will be successful if 1) we reach a more nuanced understanding of the diversity of experiences of and needs following Cyclone Amphan, 2) we investigate how narratives around Cyclone Amphan were constructed, by whom and why and 3) IWMI develops a better idea of what is and isn't possible with regard to using online data to characterize experiences of extreme weather events.
This project is a proof-of-concept. If it demonstrates the usefulness of leveraging online data to characterize people's experiences of and needs following extreme weather events, IWMI then hopes to develop a research tool that scrapes online water-related discourse, which could be used by donors, governments, aid organizations, research institutes, etc. to incorporate more data inputs into their decision-making processes.