In a data science approach to the study of the phenomena there are many challenges to overcome. View the dataverse from a 30,000 foot macro perspective and you see a serious fragmentation and pooling of data regarding the phenomena. Drill down to any of these sources and you will find varying degrees of structured versus unstructured nature of the datasets. This represents a rather large ETL process which must be designed to efficiently locate, extract, scrub/normalize, categorize (to the extent it is possible to determine logical corpus with a subject that is conceptually polyvariant) and load into a centralized repository. An additional challenge exists where data pools exist away from public access. This would likely be contained in commercial and governmental/military-intel repositories, some of which are probably classified or proprietary. These non-transparent datasets may also offer the most heat signal in terms of determining fundamental natures of the phenomena – which is why one can assume they are not available to the public.