On-Line Big Data Processing Using Python Libraries for Multiple Linear Regression in Complex Environment
DOI:
https://doi.org/10.46541/978-86-7233-406-7_228Keywords:
Python, big data, data processing, multiple linear regressionAbstract
The phenomenon called Big Data today is one of the most significant and least visible consequences of the development of technology and the Internet. Namely, the data generated by today's globally connected world is growing at an exponential rate and they are a real "gold mine" for those users who know how to correctly interpret such data and make successful decisions based on them. Data analysis and processing is one of the most important components of a large data system, and in this branch of data science the most popular is the Python programming language, which provides its users with a large number of constantly maintained program libraries and developing environments. The most important thing for legal entities and individuals is that almost all program libraries and functions provided by this programming language come with free licenses and possess open code, maintained and quality technical documentation, which provides each company with significant money savings and time.
This research paper is dedicated to the possibility of determining and creating a multi regression model of large amounts of data by using Python, on the basis of large amounts of data provided by two market retailers in order to display a multi regression model and assess its predictive power. Because the number of variables is large, several models have been made in this research paper and a comparative analysis of the different models has been made, which shows that Python is a good tool that can be used repeatedly to select different variants and evaluate the resulting model for which a graphical interface can be made and would be much more acceptable as an end user, can be placed on a server on the Internet or on a modern Cloud platform and used by users as an on-demand concept and the results can be embedded in end-user interfaces and models made in this way (with dynamic data extraction)can be used in BI and machine learning processes.