AGATHA have the capacity to analyse large amounts of information and extract implied relationships, standards and participants through modules dedicated to the video and image analysis, audio and text in several languages composed by algorithms of crawling and data mining for selective and targeted content collection.
This data collection – web crawler, will create copies of the contents to analyze and process, indexing them according to the format, source or address, etc. for search optimization. The data obtained through the crawler will be stored in its original form (raw data) in a dedicated database/repository.
The system contains two additional databases, one of which will be stored the same information, but in the standard form, and in the other the data resulting from the content analysis from the collected information.
One of the challenges, concerns the options to be taken for the definition of these databases, which will have to operate in an inter-relational and homogenized manner to guarantee the correspondence between all the stored information regarding the contents collected, its traceability searches and the metadata associated with the content.