Data Mining and Statistics in Data Science

Author :  

Year-Number: 2019-30
Language : null
Konu :
Number of pages: 960-968
Mendeley EndNote Alıntı Yap

Abstract

Büyük veri, günümüzde oldukça sık duyulan bir kavram haline gelmiştir. Büyük veri, anlamlı ve anlamsız verilerin bir arada bulunduğu yapı olarak da tanımlanmaktadır. Günümüz teknolojisi hızla ilerlemekte iken verinin gücü de her geçen gün artmakta ve büyük veri ile sıkça karşılaşılmaktadır. Bu nedenle, veri boyutlarında meydana gelen artış klasik istatistiksel yöntemlerin yetersiz kalabileceği durumlara neden olmaktadır. Veri yığınları arasından anlamlı ve işe yarayacak nitelikte olan bilgilerin elde edilebilmesi için veri analizinin önemi her geçen gün daha da artmaktadır. Veri analizinde istatistiğin amacı, verilerin anlamlandırılmasıdır. İstatistiksel yöntemler yaygın olarak kullanılsa da, veri analizinde, temeli istatistiksel yöntemlere dayanan veri madenciliği, büyük verilerin analiz edilmesine olanak sağlamaktadır. Veri madenciliği süreçlerinde istatistiksel yöntemlerin yaygın olarak kullanılması, istatistik ile veri madenciliğinin birbirinden ayrı tutulamayacağı gerçeğini açıklamaktadır.

Keywords

Abstract

In parallel with the developing technology of modern age, there has been a corresponding increase in computer domains that possess data storage function. Therefore, methods which allow storing large data gained an equally grave attention. In this study some of the most popular data analysis methods, namely data mining and statistical methods, have been investigated. The aim of this study is to exhibit the correlation between data mining and statistics. To achieve this aim, firstly data mining process has been explored. Next the need to implement statistical methods during this process has been accentuated.

Keywords


  • Akpinar, H. (2000). “Knowledge discovery and Data mining in databases”, İstanbul University Faculty of

  • Akpinar, H. (2000). “Knowledge discovery and Data mining in databases”, İstanbul University Faculty of Management Journal, Vol. 29, No. 1/April, p. 1–22.

  • Ball, G.H. (1970). Classification Analysis, Menlo Park, Calif.: Standford Research Institute.

  • Chien, C. F., Chen. L. F. (2008). “Data Mining to Improve Personnel Selection and Enhance Human Capital: A Case Study in High-Technology Industry,” Expert Systems

  • Cakmak, Z. (1999). “Validity Problem in Clustering Analysis and Evaluating Clustering Results”, Dumlupınar University Social Sciences Journal, No:3, Nov., p.187-205.

  • Dasu, T. and Johnson, T. (2003). Exploratory Data Mining and Data Delete, John Wiley & Sons Publication, New Jersey, USA.

  • Ganesh, S. (2002). “Data Mining: Should it be included in the ‘Statistics’ curriculum?”, The Sixth International Conference on Teaching Statistics, Cape Town, South Africa, 7–12 July.

  • Geetha, A. and Nasira, G.M. (2014). “Data Mining for Meteorological Applications: Decision Trees forModeling Rainfall Prediction” IEEE International Conference on Computational Intelligence and Computing Research.

  • Gehrke, J. (2003). “Decision Trees”, The Handbook of Data Mining”, Editor: Nong Ye, Lawrence Erlbaum Associates Publishers, London, 149-175.

  • Han, J. and Kamber, M. (2001). “Data mining concepts and techniques”, Morgan Kaufmann Publishers, Tokyo, 30-33.

  • Hand, D., Mannila, H. and SMYTH, P. (2001). Principles of Data Mining, MIT, USA, 546p.

  • Haupt, R. L. and Haupt, S. E. (2004). Practical Genetic Algorithms, New Jersey, John Wiley & Sons. Kalayci, S. (2016). “SPSS Applied Multivarite Statistical Techniques”.

  • Koyuncugil, A. S. (2007). “Data mining and its Application on Capital Markets”, Capital Market Board Research Report, Research Office.

  • Kuonen, D. (2004). “Data Mining and Statistics: What is the Connection?”, The Data Administration Newsletter).

  • Ladha, L., Deepa, T. (2011). Feature Selection Methods and Algorithms, International Journal on Computer Science and Engineering, 3(5), 1787-1797.

  • Lori Bowen Ayre. (2006). Data Mining for Information Professionals.Piramuthu, S. (2003). “Evaluating Feature Selection Methods for Learning in Data Mining Applications” European Journal of Operational Research, Article in Press, pp.1-11.

  • Roiger, R.J. and Geatz, M.W. (2003). Data Mining a Tutorial-Based Primer, Addison Wesley, USA, 350p. Sivanandam, S.N. and Deepa, S.N. (2008). Introduction to Genetic Algorithms, New York, Springer.

  • Tatlidil, H. (1992). Applied Multi-Variable Statistical Analysis H.U. Faculty of Science Department of Statistics, Ankara: p.252.

  • Wan, Y., Wang, M., Ye, Z. and Lai, X. (2016). “A Feature Selection Method Based on Modified Binary Coded Ant Colony Optimization Algorithm” Applied Soft Computing, 49:248-258.

  • Zhao Chung-Mei and Luan, J. (2006). “Data Mining: Going Beyond Traditional Statistics”, New Directions for Institutional Research, No. 131, pp. 7–16.

                                                                                                                                                                                                        
  • Article Statistics