TABLE OF CONTENTS
ABSTRACT ……………………………………………………………………………………………………….. iii
ACKNOWLEDGMENT………………………………………………………………………………………. iv
LIST OF ILLUSTRATIONS ……………………………………………………………………………….. vii
LIST OF TABLES …………………………………………………………………………………………….. viii
SECTIONS
1. INTRODUCTION ……………………………………………………………………………… 1
1.1. PROBLEM DESCRIPTION ……………………………………………………….. 1
1.2. SOCIAL MEDIA AND HEALTHCARE: AN OVERVIEW ………….. 3
1.3. RESEARCH QUESTION AND MAJOR CONTRIBUTIONS ………… 5
1.4. THESIS ORGANIZATION…………………………………………………………. 7
2. RESEARCH METHODOLOGY …………………………………………………………. 9
2.1. FISHER’S EXACT TEST …………………………………………………………… 9
2.2. NAIVE BAYES CLASSIFIER ………………………………………………….. 10
2.3. RANDOM FOREST ………………………………………………………………… 11
2.4. RESEARCH APPROACH ………………………………………………………… 13
3. TWITTER DATA PROCESSING ……………………………………………………… 14
3.1. COLLECTION OF TWEETS ……………………………………………………. 14
3.2. CLEANING AND PARSING DATA …………………………………………. 17
3.3. CONDUCTING STATISTICAL ANALYSIS …………………………….. 17
4. MACHINE LEARNING TECHNIQUE AND RESULTS …………………….. 21
4.1. NAIVE BAYES CLASSIFIER ………………………………………………….. 21
4.2. RANDOM FOREST METHOD OF CLASSIFICATION ……………… 25
5. CONCLUSION ……………………………………………………………………………….. 27
6. FUTURE WORK …………………………………………………………………………….. 28
APPENDICES
A. JAVA CODE TO COUNT THE WORDS ………………………………………….. 29
B. RAW DATA USED FOR THE FISHER’S EXACT TEST …………………… 34
C. MATLAB CODE USED FOR RANDOM FOREST CLASSIFICATION 37
D. JAVA CODE TO GET USER STATUS…………………………………………….. 41
BIBLIOGRAPHY ………………………………………………………………………………………………. 49
VITA ……………………………………………………………………………………………………………….. 54