Introduction
MiMFa Scraper contains everything you normally need to extract, collect, integrate and present small to large amounts of data from different resources all around the web and different data storage formats.
New tools have been developed in the recent years that data scientists and analysts can use to extract, collect, and integrate data. Developers in this area of knowledge face the challenge of developing and learning new programming languages, frameworks, modules, etc. This led us to develop a new flexible tool that can integrate most problems and requirements into a single and customizable program. We have derived three basic categories of activities of this area from the data warehousing and management literature and consequently developed our idea in three distinct subject areas:
Data Extraction and Collecting
Automatic data extraction from the offline and online resources covering different formats in a semi up to big volume data
- Unstructured, semi-structured and structured texts
- Semi-big up to big volume data
- From different formats (XLSx, DOCx, PPTx, PDF, XML, HTML, and etc.)
- From different resources
- From local disk files, web pages and etc.
- Through parallel processing algorithms
Data Integration and Processing
Process and integration of collected and existing data for the future use (data normalization)
- Organization of small up to big volume data
- Converting heterogeneous data storage methods to the same format or structure
- Structuring data in a tagged (ris) or column format
- Detailed description of the data type and other related information about data for the better future processing
- General and subject indexing of the collected data
- Applying Regular-Expression patterns to find intended data and doing operation on them
- Applying quick filtering, finding, and searching methods on data for the normalization and the other purposes
- Cusomizing search methods