Introduction

MiMFa Scraper contains everything you normally need to extract, collect, integrate and present small to large amounts of data from different resources all around the web and different data storage formats.

New tools have been developed in the recent years that data scientists and analysts can use to extract, collect, and integrate data. Developers in this area of knowledge face the challenge of developing and learning new programming languages, frameworks, modules, etc. This led us to develop a new flexible tool that can integrate most problems and requirements into a single and customizable program. We have derived three basic categories of activities of this area from the data warehousing and management literature and consequently developed our idea in three distinct subject areas:

Data Extraction and Collecting

Automatic data extraction from the offline and online resources covering different formats in a semi up to big volume data

Unstructured, semi-structured and structured texts
Semi-big up to big volume data
From different formats (XLSx, DOCx, PPTx, PDF, XML, HTML, and etc.)
From different resources
From local disk files, web pages and etc.
Through parallel processing algorithms

Data Integration and Processing

Process and integration of collected and existing data for the future use (data normalization)

Organization of small up to big volume data
Converting heterogeneous data storage methods to the same format or structure
Structuring data in a tagged (ris) or column format
Detailed description of the data type and other related information about data for the better future processing
General and subject indexing of the collected data
Applying Regular-Expression patterns to find intended data and doing operation on them
Applying quick filtering, finding, and searching methods on data for the normalization and the other purposes
Cusomizing search methods