View Article

Article Details

File Missing!
JournalInternational Journal of Computer Applications
TitleBig Data Trends and Analytics: A Survey
Index TermInformation Sciences
AbstractBig Data is nowadays one of the apex fields of research area. It is due to expansion in technological field at rapid rate. Expansion of storage area and data has been seen from past five year which is exponentially. It is envisioned that concept of Big Data will assure to reduce the huge chunks of data into manageable form. In this paper, we have discussed concept of Big Data, characteristics and challenges. Its main focus is over data generated in various sector, analytics and various tools to manage data.
KeywordsBig data, Hadoop, Mapreduce, Data analytics, Big data tools.
No. of Pages12
Author NamesPayal Saha, Mohit Mittal, Shreya Gupta, Marwa Sharawi
  1. Abbass , H. A., Leu, G., and Merrick, K. 2016. ‘A Review of Theoretical and Practical Challenges of Trusted Autonomy in Big Data’,  IEEE Access, Vol. 4, pp. 2808 – 2830.
  2. Ayres J, Flannick J, Gehrke J and Yiu T. 2002. ‘Sequential Pattern Mining using a bitmap representation’, Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 429–435.
  3. Babu, T. G., and Babu, G. A. 2016. ‘A Survey on Data Science Technologies & Big Data Analytics’, International Journal of Advanced Research in Computer Science and Software Engineering, Vol. 6, Iss. 2, pp. 322-327.
  4. Balshetwar, S.V., and Tugnayat, R.M. 2015. ‘Techniques for analyzing framed data’, global journal of engineering science and researches, vol.2, iss. 8, pp. 80-83.
  5. Berger, C. 2012. ‘Oracle Data Mining 11g Release 2 Competing on In-Database Analytics’, Oracle Corporation, pp. 1-25.
  6. Bhatnagar, V. 2013. ‘Data mining-based big data analytics: parameters and layered framework’, Int. J. of Computational Systems Engineering, Vol.1, No.4, pp.265 – 276.
  7. Bhosale, H. S. and Gadekar, D. P 2014. ‘A Review Paper on Big Data and Hadoop’, International Journal of Scientific and Research Publications, Vol. 4, Iss. 10, pp. 1-7.
  8. Burdick D, Calimlim M and Gehrke J. 2001. ‘MAFIA: a maximal frequent itemset algorithm for transactional databases’, Proceedings of the International Conference on Data Engineering, pp 443–452.
  9. Catanzaro B, Sundaram N and Keutzer K. 2008. ‘Fast support vector machine training and classification on graphics processors’, Proceedings of the International Conference on Machine Learning, pp 104–111.
  10. Chen B, Haas P and Scheuermann P. 2002. ‘A new two-phase sampling based algorithm for discovering association rules’, Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 462–468.
  11. Chen, H., Shi, Q., Tan, R., Poor, H. V., Sezaki, K. 2010. ‘Mobile element assisted cooperative localization for wireless sensor networks with obstacles’, IEEE Transactions Wireless Communications, Vol. 9, Issue: 3, pp. 956-963. 
  12. Chen, H., Wang, G., Wang, Z., So, H. C., Poor, H. V. 2012. ‘Non-Line-of-Sight Node Localization Based on Semi-Definite Programming in Wireless Sensor Networks’, IEEE Transaction Wireless Communications, Vol. 11, Issue: 1, pp. 108-116.
  13. Chen, H., Gao, F., Martins, M., Huang, P., and Liang, J. 2013. ‘Accurate and Efficient Node Localization for Mobile Sensor Networks’, ACM/Springer Journal on Mobile Networks and Applications (MONET), Vol. 18, pp. 141-147.
  14. Chen, H. M., Kazman, R., and Haziyev, S. 2016. ‘Agile Big Data Analytics for Web-Based Systems: An Architecture-Centric Approach’,  IEEE Transactions on Big Data, Vol. 2, Iss. 3, pp. 234 – 248.
  15. Chi, M., Plaza, A., and Benediktsson, J. A. 2016 ‘Big Data for Remote Sensing: Challenges and Opportunities’, Proceedings of the IEEE, Vol.104, Iss. 11, pp. 2207 – 2219.
  16. Cho, J. and Rajagopalan, S. 2002 ‘A fast regular expression indexing engine’, ICDE, pp. 1-12.
  17. Choudhary, N. 2014 ‘A Study over Problems and Approaches of Data Cleansing/Cleaning’, International Journal of Advanced Research in Computer Science and Software Engineering, Vol. 4, Iss. 2, pp. 774-779.
  18. Djouadi A and Bouktache E. 1997 ‘A fast algorithm for the nearest-neighbor classifier’, IEEE Trans Pattern Anal Mach Intel, vol. 19, pp. 277–282.
  19. Elkan C. 2003 ‘Using the triangle inequality to accelerate k-means’, Proceedings of the International Conference on Machine Learning, pp. 147–153.
  20. Engle, R. s 2001. ‘GARCH 101: An Introduction to The Use of ARCH/GARCH models in applied econometrics’. Journal of economics perspectives, Vol. 15, No. 4, pp. 157-168.
  21. Ester M, Kriegel HP, Sander J and Xu X 1996. ‘A density-based algorithm for discovering clusters in large spatial databases with noise’, Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pp. 226–231.
  22. Ester M, Kriegel HP, Sander J, Wimmer M and Xu X 1998 ‘Incremental clustering for mining in a data warehousing environment’, Proceedings of the International Conference on Very Large Data Bases, pp 323–333.
  23. Fan, P. 2016. ‘Coping with the big data: Convergence of communications, computing and storage’, China Communications, Vol. 13, Iss. 9, pp. 203 – 207.
  24. Goyal, H., and Singh, S. 2015. ‘Big Data Analysis Using R (Big Data Analysis Applications, Challenges, Techniques)’, International Journal of Advanced Research in Computer Science and Software Engineering Vol. 5, Iss. 9, pp. 818-823.
  25. Han J, Pei J and Yin Y. 2000. ‘Mining frequent patterns without candidate generation’ Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 1–12.
  26. Hernandez, A. F. R., and Garcia, N. Y. G. 2016. ‘Distributed processing using cosine similarity for mapping Big Data in Hadoop’,  IEEE Latin America Transactions ,  Vol.14, Iss. 6, pp. 2857 – 2861
  27. Hu, J., and Vasilakos, A. V., 2016. ‘Energy Big Data Analytics and Security: Challenges and Opportunities’,  IEEE Transactions on Smart Grid, Vol.7, Iss.5, pp. 2423 – 2436
  28. Jiang, H., Wang, K., Wang, Y., Gao, M., and Zhang, Y. 2016 ‘Energy big data: A survey’, IEEE Access, Vol. 4, pp. 3844 – 3861
  29. Kanoun, K., Tekin, C. and Atienza, D. (2016) ‘Big-Data Streaming Applications Scheduling Based on Staged Multi-Armed Bandits, IEEE Transactions on Computers, Vol. 65, Iss. 12, pp. 3591 – 3605.
  30. Kong, L., Zhang, D., He.,Z., Xiang, Q., Wan, J., and Tao, M. 2016. ‘Embracing big data with compressive sensing: a green approach in industrial wireless networks’, IEEE Communications Magazine, Vol. 54, Iss. 10, pp. 53 – 59.
  31. Kumar P. S., and Selvan, T. T. 2015. ‘A Survey of Qubole Data Service on Big Data Analytics and Cloud Computing’, International Journal of Advanced Research in Computer and Communication Engineering, Vol. 4, Iss. 6, pp. 46-48
  32. Laney, D. 2001. ‘3D data management: controlling data volume, velocity, and variety’, META Group.
  33. Larsson, P. 2013. ‘Evaluation of Open Source Data Cleaning Tools: Open Refine and Data Wrangler’,
  34. Masseglia F, Poncelet P and Teisseire M. 2003. ‘Incremental mining of sequential patterns in large databases’, Data Knowl Eng., vol. 46, pp. 97–121.
  35. Masurel, P., Bourguignat, C., Hasegawa, K. L., and Scordia, M. 2013. ‘Dataiku’s Solution to Yandex’s Personalized Web Search Challenge’, WSCD workshop, pp. 1-9.
  36. Mehta M, Agrawal R and Rissanen J. 1996. ‘SLIQ: a fast scalable classifier for data mining’, Proceedings of the 5th International Conference on Extending Database Technology Advances in Database Technology, pp 18–32.
  37. Mertz, L. 2016a. ‘What Can Big Data Tell Us About Health? Finding Gold Through Data Mining, IEEE Pulse, Vol. 7, Iss. 5, pp. 40 – 44.
  38. Mertz, L. 2016b. ‘The Case for Big Data: New York City's Kalvi HUMAN Project Aims to Use Big Data in Resolving Big Health Questions’, IEEE Pulse, Vol. 7, Iss. 5, pp. 45 – 47.
  39. Micó L, Oncina J and Carrasco R C. 1996. ‘A fast branch and bound nearest neighbour classifier in metric spaces’, Pattern Recogn Lett, vol.17, issue 7, pp.731–739.
  40. Mittal, M. and Kumar, K. 2015a. ‘Delay Prediction in Wireless Sensor Network Routing Using ART1 Neural Network’, IEEE, African Journal of Computing & ICT, vol 8. no. 3, pp. 175-180.
  41. Mittal, M. and Kumar, K. 2015b. ‘Energy Efficient Homogeneous Wireless Sensor Network Using Self- Organizing Map (SOM) Neural Networks’, IEEE, African Journal of Computing & ICT Vol 8. No. 1, pp. 179-184.
  42. Mittal, M. and Kumar, K. 2015c. ‘Quality of Services Provisioning in Wireless Sensor Networks using Artificial Neural Network: A Survey’, International Journal of Computer Application (IJCA), pp. 28-40.
  43. Mittal, M. and Kumar, K. 2016. ‘Data Clustering In Wireless Sensor Network Implemented On Self Organization Feature Map (SOFM) Neural Network’ IEEE international conference on Computing Communication and Automation(ICCCA).
  44. Mittal, M. and Bhadoria, R. S. 2017, ‘Aspect of ESB with Wireless Sensor Network”’, Exploring Enterprise Service Bus in the Service-Oriented Architecture Paradigm”, IGI-global publications, pages 319.
  45. Miyoshi, T., Lien, G. Y., Satoh, S., Ushio, T., Bessho, K., Tomita, H., Nishizawa, S., Yoshida, R., Adachi, S. A., Liao, J. Gerofi, B.,Ishikawa, Y.,Kunii, M., Ruiz, J., Maejima, Y., Otsuka, M. Okamoto, K., Seko, H. 2016. ‘Big Data Assimilation Toward Post-Petascale Severe Weather Prediction: An Overview and Progress’, Proceedings of the IEEE, Vol. 104, Iss. 11, pp. 2155 – 2179
  46. Moyne, J., Samantaray, J., and Armacost, M. 2016. ‘Big Data Capabilities Applied to Semiconductor Manufacturing Advanced Process Control’,  IEEE Transactions on Semiconductor Manufacturing, Vol. 29, Iss. 4, pp. 283 – 291
  47. Myers, J. 2016. ‘Master Data Management for Data Driven Organizations’, an Enterprise Management Associates, pp. 1-12.
  48. O'Leary, D. E. 2016. ‘Ethics for Big Data and Analytics, IEEE Intelligent Systems’, Vol. 31, Iss. 4, pp. 81 – 84.
  49. Oneto, L., Bisio, F., Cambria, E., and Anguita, D. 2016. ‘Statistical Learning Theory and ELM for Big Social Data Analysis’, IEEE Computational Intelligence Magazine, Vol. 11, Iss. 3, pp. 45 – 55.
  50. Ordonez C and Omiecinski E. 2004. ‘Efficient disk-based k-means clustering for relational databases’, IEEE Trans. Knowl. DataEng, vol. 16, issue 8, pp. 909–921.
  51. Pan, E., Wang, D., and Han, Z. 2016. ‘Analyzing Big Smart Metering Data Towards Differentiated User Services: A Sublinear Approach’, IEEE Transactions on Big Data, Vol. 2, Iss. 3, pp. 249 – 261
  52. Paul, A., Ahmad, A., and Rathore M. M. 2016. ‘Smartbuddy: defining human behaviors using big data analytics in social internet of things’, IEEE Wireless Communications,  Vol. 23, Iss. 5, pp. 68 – 74.
  53. Pei J, Han J and Mao R. 2000. ‘CLOSET: an efficient algorithm for mining frequent closed itemsets’, Proceedings of the ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp 21–30.
  54. Pei J, Han J, Asl MB, Pinto H, Chen Q, Dayal U and Hsu MC. 2001. ‘PrefixSpan mining sequential patterns efficiently by prefix projected pattern growth’, Proceedings of the International Conference on Data Engineering, pp. 215–226.
  55. Pol, U. R. 2014. ‘Big Data and Hadoop Technology Solutions with Cloudera Manager’, International Journal of Advanced Research in Computer Science and Software Engineering, Vol.4, Iss.11, pp. 1028-1034.
  56. Pradhananga, Y., Karande, S., and Karande, C. 2015).‘CBA: Cloud-Based Bigdata Analytics’, International Conference on Computing Communication Control and Automation (ICCUBEA), pp. 47-51.
  57. Rangra, K., and Bansal, K. L 2014. ‘Comparative Study of Data Mining Tools’, International Journal of Advanced Research in Computer Science and Software Engineering, Volume 4, Issue 6,pp. 216-223.
  58. Ranjan, R., Ranjan, J. and Bhatnaga, V. 2013, ‘Critical success factor for implementing data mining in higher education: Indian perspective’ Int. J. of Computational Systems Engineering, Vol.1, No.3, pp.151 – 161.
  59. Russom, P. 2013. ‘Integrating Hadoop Into Business Intelligence and Data Warehousing’, Tdwi Best Practices Report, pp. 1-38.
  60. Shao, W., Salim, F. D., and Song, A. 2016. ‘Clustering Big Spatiotemporal-Interval Data, IEEE Transactions on Big Data, Vol. 2, Iss. 3, pp. 190 – 203.
  61. Singh, S., Firdaus, T., and Sharma, A. K. 2015. ‘Survey On Big Data Using Data Mining’, International Journal of Engineering Research and Development, Vol. 4, Iss. 4, pp. 135-143.
  62. Tawalbeh, L. A., Mehmood, R., Benkhlifa, E., and Song, H. 2016. ‘Mobile Cloud Computing Model and Big Data Analysis for Healthcare Applications’, IEEE Access, Vol. 4, pp. 6171 – 6180
  63. Ulfarsson, M. O., Palsson, F., Sigurdsson, J., and Sveinsson, J. R. 2016. ‘Classification of Big Data With Application to Imaging Genetics’,  Proceedings of the IEEE,  Vol.104, Iss. 11, pp. 2137 – 2154
  64. Ververidis D and Kotropoulos C. 2008 ‘Fast and accurate sequential floating forward feature selection with the bayes classifier applied to speech emotion recognition’, Signal Process, vol. 88, pp. 2956–2970.
  65. Wang, L., Wang, G., and Alexander, C. A. 2015a. ‘Big Data and Visualization: Methods, Challenges and Technology Progress’ Digital Technologies, Vol. 1, pp. 33-38.
  66. Wang, L, Wang, G. and Alexander, C. A. 2015b. ‘Natural language processing systems and Big Data analytics’,  Int. J. of Computational Systems Engineering, Vol.2, No.2, pp.76 – 84
  67. Wang, J., Wu, Y. , Yen, N., Guo, S., and Cheng, Z. 2016. ‘Big Data Analytics for Emergency Communication Networks: A Survey’, IEEE Communications Surveys & Tutorials, Vol. 18, Iss.3, pp. 1758 – 1778
  68. Wang, X., and He, Y. 2016. ‘Learning from Uncertainty for Big Data: Future Analytical Challenges and Strategies’, IEEE Systems, Man, and Cybernetics Magazine, Vol. 2, Iss. 2, pp. 26 – 31
  69. Wang, Y., Chen, Q., Kang, C., and Xia, Q. 2016. ‘Clustering of Electricity Consumption Behavior Dynamics Toward Big Data Applications, IEEE Transactions on Smart Grid, Vol. 7, Iss. 5, pp. 2437 – 2447
  70. Wu, C., Chen, Y., and Li, F. 2016. ‘Decision model of knowledge transfer in big data environment’, China Communications,  Vol. 13, Iss. 7, pp. 100 – 107
  71. Xiao, B., and Cheng, G. 2015. ‘The Research of Teradata Data Warehouse Technology’, International Conference on Computational Intelligence and Communication Networks (CICN), pp. 982-984.
  72. Xing, E. P., Ho, Q., Xie, P., and   Dai, W. 2016. ‘Strategies and Principles of Distributed Machine Learning on Big Data’, Engineering, published by Elsevier, pp. 179-195.
  73. Yan X, Han J and Afshar R. 2003. ‘CloSpan: mining closed sequential patterns in large datasets’, Proceedings of the SIAM International Conference on Data Mining, pp. 166–177.
  74. Yu, S. 2016. ‘Big Privacy: Challenges and Opportunities of Privacy Study in the Age of Big Data’, IEEE Access, Vol. 4, pp. 2751 – 2763.
  75. Zaki M J.2001. ‘SPADE: an efficient algorithm for mining frequent sequences’, Mach Learn, vol. 42, pp. 31–60.
  76. Zaki M J and Hsiao C-J 2005. ‘Efficient algorithms for mining closed itemsets and their lattice structure’, IEEE Trans Knowl Data Eng., vol. 17, pp. 462–478.
  77. Zhang T, Ramakrishnan R and Livny M. 1996. ‘BIRCH: an efficient data clustering method for very large databases’, Proceedings of the ACM SIGMOD International Conference on Management of Data, pp 103–114.
  78. Zhang, X., Yi, Z., Yan, Z. Min, G., Wang, W., Elmokashfi, A., Maharjan, S., Zhang, Y. 2016. ‘Social Computing for Mobile Big Data Computer’, Vol. 49, Iss. 9, pp. 86 – 90
  79. Zhang, Y., Cao, T., Li, S., Tian, X., Yuan, L., Jia, H., and Vasilakos, A. V. 2016. ‘Parallel Processing Systems for Big Data: A Survey’, Proceedings of the IEEE, Vol. 104, Iss. 11, pp. 2114 – 2136.
  80. Zhao, Y. 2013. ‘Research on MongoDB Design and Query Optimization in Vehicle Management Information System’, Applied Mechanics and Materials, Vols. 246-247,pp.418-422.

Publishing Information

Start Page No.9
Editor's Choice