Understanding Big Data Framework Related to a Data Mining Technique
Author
Shahida B
Abstract
The
computing and communication power in the cyber-physical world is expanding
greatly. As a result, a lot of data is generated to manage these activities.
Big data has four primary challenges: volume, variety, velocity, and
authenticity. Some storage-based data processing systems, like Hadoop, manage
volume and variety. However, the speed and accuracy of processing such a vast
volume of data require an overly complicated process. In this paper, we'll put
into practice a system that can deal with huge volumes, varied patterns, and
the speed of data. To extract valuable information from the data stream, we'll
use correlation analytics and data mining. The system must be able to process
data in real time, using an event processing engine like Esper that can generate
various events using different language queries. Storm, which uses topology, is
used to capture real-time data and for straightforward filtering of that data
stream. Apriori and FP-Growth are two separate algorithms that are used for
correlation and mining. Data centers all across the world are now using Apache
Hadoop. The common programmer can now use parallel processing. It is essential
to convert current data mining methods to the Hadoop platform as more data
centers support it to maximize the effectiveness of parallel processing. The
tendency of moving current data mining algorithms to the Hadoop platform has
grown widespread with the advent of big data analytics. We examine the present
migration activities and problems in this survey research. The reader's
suggestions for solutions to the present migration difficulties will be guided
by this essay.
Keywords
NoSQL database, Hadoop,Apriori Algorithm, Data Mining,FP-Growth, Esper, Big Data, Big Data Analytics
DOI : https://doi.org/10.55248/gengpi.2022.3.9.22
Full Text:
Download Paper PDF
References
[1] Abramova, V.,
& Bernardino, J. (2013, July). NoSQL databases: MongoDB vs Cassandra. In
Proceedings of the international C* conference on computer science and software
engineering (pp. 14-22).
[2] Ali, W., Shafique, M. U., Majeed, M. A., & Raza,
A. (2019). Comparison between SQL and NoSQL Databases and Their Relationship
with Big Data Analytics. Asian Journal of Research in Computer Science, 4(2),
1-10
[3] Becker, M. Y., & Sewell, P. (2004, June). Cassandra:
Flexible trust management, applied to electronic health records. In
Proceedings. 17th IEEE Computer Security Foundations Workshop, 2004. (pp.
139-154). IEEE.
[4] Berg, K. L., Seymour, T., & Goel, R. (2013).
History of databases. International Journal of Management & Information
Systems (IJMIS), 17(1), 29-36.
[5] Bjeladinovic, S., Marjanovic, Z., & Babarogic, S.
(2020). A proposal of architecture for integration and uniform use of hybrid
SQL/NoSQL database components. Journal of Systems and Software, 168, 110633.
[6] Chandra, D. G. (2015). BASE analysis of NoSQL
database. Future Generation Computer Systems, 52, 13-21.
[7] Chen, J. K., & Lee, W. Z. (2019). An introduction
of NoSQL databases based on their categories and application industries.
Algorithms, 12(5), 106.
[8] Cuzzocrea, A., & Shahriar, H. (2017, December).
Data masking techniques for NoSQL database security: A systematic review. In
2017 IEEE International Conference on Big Data (Big Data) (pp. 4467-4473).
IEEE.
[9] de Oliveira, V. F., Pessoa, M. A. D. O., Junqueira,
F., & Miyagi, P. E. (2021). SQL and NoSQL Databases in the Context of
Industry 4.0. Machines, 10(1), 20.
[10] Deka, G. C. (2013). A survey of cloud database
systems. It Professional, 16(2), 50-57. IEEE.
[11] Di Martino, S., Fiadone, L., Peron, A., Riccabone,
A., & Vitale, V. N. (2019, June). Industrial Internet of Things:
Persistence for Time Series with NoSQL Databases. In 2019 IEEE 28th
International Conference on Enabling Technologies: Infrastructure for Collaborative
Enterprises (WETICE) (pp. 340-345). IEEE.
[12] dos Santos Ferreira, G., Calil, A., & dos Santos
Mello, R. (2013, December). On providing DDL support for a relational layer
over a document NoSQL database. In Proceedings of International Conference on
Information Integration and Web-based Applications & Services (pp.
125-132).
[13] Gessert, F., Wingerath, W., Friedrich, S., &
Ritter, N. (2017). NoSQL database systems: a survey and decision guidance.
Computer Science-Research and Development, 32(3), 353-365.
[14] Guimaraes, V., Hondo, F., Almeida, R., Vera, H.,
Holanda, M., Araujo, A., ... & Lifschitz, S. (2015, November). A study of
genomic data provenance in NoSQL document-oriented database systems. In 2015
IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (pp.
1525-1531). IEEE.
[15] Rodriguez, K. M., Reddy, R. S., Barreiros, A. Q.,
& Zehtab, M. (2012, June). Optimizing Program Operations: Creating a
Web-Based Application to Assign and Monitor Patient Outcomes, Educator Productivity
and Service Reimbursement. In DIABETES (Vol. 61, pp. A631-A631). 1701 N
BEAUREGARD ST, ALEXANDRIA, VA 22311-1717 USA: AMER DIABETES ASSOC.
[16] Kwon, D., Reddy, R., & Reis, I. M. (2021).
ABCMETAapp: R shiny application for simulation-based estimation of mean and
standard deviation for meta-analysis via approximate Bayesian computation.
Research synthesis methods, 12(6), 842–848. https://doi.org/10.1002/jrsm.1505
[17] Reddy, H. B. S., Reddy, R. R. S., Jonnalagadda, R.,
Singh, P., & Gogineni, A. (2022). Usability Evaluation of an Unpopular
Restaurant Recommender Web Application Zomato. Asian Journal of Research in
Computer Science, 13(4), 12-33.
[18] Reddy, H. B. S., Reddy, R. R. S., Jonnalagadda, R.,
Singh, P., & Gogineni, A. (2022). Analysis of the Unexplored Security
Issues Common to All Types of NoSQL Databases. Asian Journal of Research in
Computer Science, 14(1), 1-12.
[19] Singh, P.,
Williams, K., Jonnalagadda, R., Gogineni, A., &; Reddy, R. R. (2022).
International students: What’s missing and what matters. Open Journal of Social
Sciences, 10(02),
[20] Jonnalagadda, R., Singh, P., Gogineni, A., Reddy, R.
R., & Reddy, H. B. (2022). Developing, implementing and evaluating training
for online graduate teaching assistants based on Addie Model. Asian Journal of
Education and Social Studies, 1-10.
[21] Sarmiento, J. M., Gogineni, A., Bernstein, J. N.,
Lee, C., Lineen, E. B., Pust, G. D., & Byers, P. M. (2020).Alcohol/illicit
substance use in fatal motorcycle crashes. Journal of surgical research, 256,
243-250.
[22] Brown, M. E., Rizzuto, T., & Singh, P. (2019).
Strategic compatibility, collaboration and collective impact for community
change. Leadership & Organization Development Journal.
[23] Sprague-Jones, J., Singh, P., Rousseau, M., Counts,
J., & Firman, C. (2020). The Protective Factors Survey: Establishing
validity and reliability of a self-report measure of protective factors against
child maltreatment. Children and Youth Services Review, 111, 104868
Share your valuable work from Social Media Buttons