Research and scholarship
Machine learning is the scientific study of algorithms and statistical models that computer systems use to progressively improve their performance on tasks such as detecting spam email, recognizing handwritten digits and speech recognition.
My research is primarily driven by practical problems that have high impact. Current research interests include:
- Cognitive computing with focus on natural languages and computer vision
- Computational linguistics of resource-poor languages (e.g., Telugu language)
- Information retrieval
- Automated question generation
- Data management and NoSQL Systems
- Personalization of learning
The goal of personalization is to address diversity in student learning. Personalization of learning is best thought of as a set of approaches to teaching and learning spanning a spectrum. Personalization encompasses a range of instructional approaches, learning experiences and academic-support strategies.
I have published over 110 articles in peer reviewed journals, book chapters, and conference proceedings. My research has been funded by NSF, NASA, U.S. Department of Energy, U.S. Department of Navy, U.S. Army Research Office, Marshall University Foundation, West Virginia Division of Science and Research, and Central Appalachian Regional Education and Research Center.
Computational linguistics is the study of appropriate computational approaches to investigating linguistic questions. The terms computational linguistics (CL) and natural language processing (NLP) are often used interchangeably. Computational linguistics/ NLP provides unprecedented approaches to preserve, promote and celebrate languages and linguistic diversity. NLP is the art of solving engineering problems that require analyzing and generating natural language in both written and spoken forms. NLP works on large quantities of existing data in machine readable form.
Currently Funded Research Project: Programmers to Professional Software Engineers
Programmers to Professional Software Engineers (PPSE) project is funded by the National Science Foundation (NSF), award number 1730568, 1 July 2017 – 30 June 2022, $2,000,000.
The goals of the project are:
- Transform programming-centric computer science education approach to a systems-oriented and software engineering-centric one
- Infuse professional skills development process into the entire curriculum
- Dramatically increase retention and graduation rates
- Recruit significantly more students from underrepresented groups
- Personalize teaching and learning in both formal and informal settings
- Work with community colleges and early college high schools in the region to increase the number of transfer students and enhance their success in college
The PPSE project employs curricular innovations, personalized learning, and infusion of professional skills development throughout the curriculum toward achieving the above goals.
Recent Publications
- Iyer, B., Rajurkar, A., & Gudivada, V. (Eds.). (2021). Applied Computer Vision and Image Processing, Proceedings of ICCET-2020, Volume 1. Springer Singapore. https://doi.org/https://doi.org/10.1007/978-981-15-4029-5
- Gudivada, V., & Rao, C. R. (Eds.). (2018). Computational Analysis and Understanding of Natural Languages: Principles, Methods and Applications (Vol. 40). Elsevier.
- Gudivada, V., Bhulai, S., & Pia di Buono, M. (Eds.). (2017). Proceedings of the Third International Conference on Big Data, Small Data, Linked Data, and Open Data. IARIA/Think Mind.
- Gudivada, V., Raghavan, V., Govindaraju, V., & Rao, C. R. (Eds.). (2016). Cognitive Computing: Theory and Applications (Vol. 35). Elsevier.
- Gudivada, V., Roman, D., Pia di Buono, M., & Monteleone, M. (Eds.). (2016). Proceedings of the Second International Conference on Big Data, Small Data, Linked Data, and Open Data. IARIA/Think Mind.
- Gudivada, V., & Nadigam, J. (2004). Fundamentals of Computer Science Using C#. Self-published, 875 pages.
- Kuo, C. C. J., Chang, S.-F., & Gudivada, V. N. (1997). Multimedia Storage and Archiving Systems II. SPIE Proceedings Series.
- Gudivada, V., & Arbabifard, K. (2018). Open-source Libraries, Application Frameworks, and Workflow Systems for NLP. In V. N. Gudivada & C. R. Rao (Eds.), Computational Analysis and Understanding of Natural Languages: Principles, Methods and Applications (pp. 31–50). Elsevier. https://doi.org/http://doi.org/10.1016/bs.host.2018.07.007
- Gudivada, V., Rao, D., & Gudivada, A. (2018). Information Retrieval: Concepts, Models, and Systems. In V. N. Gudivada & C. R. Rao (Eds.), Computational Analysis and Understanding of Natural Languages: Principles, Methods and Applications (pp. 331–401). Elsevier. https://doi.org/http://doi.org/10.1016/bs.host.2018.07.009
- Gudivada, V. N. (2018). Natural Language Core Tasks and Applications. In V. N. Gudivada & C. R. Rao (Eds.), Computational Analysis and Understanding of Natural Languages: Principles, Methods and Applications (pp. 403–428). Elsevier. https://doi.org/http://doi.org/10.1016/bs.host.2018.07.010
- Gudivada, A., & Rao, D. (2018). Languages and Grammar. In V. N. Gudivada & C. R. Rao (Eds.), Computational Analysis and Understanding of Natural Languages: Principles, Methods and Applications (pp. 15–29). Elsevier. https://doi.org/http://doi.org/10.1016/bs.host.2018.07.003
- Gudivada, A., Rao, D., & Gudivada, V. (2018). Linguistics: Core Concepts and Principles. In V. N. Gudivada & C. R. Rao (Eds.), Computational Analysis and Understanding of Natural Languages: Principles, Methods and Applications (pp. 3–14). Elsevier. https://doi.org/http://doi.org/10.1016/bs.host.2018.07.005
- Rao, D., Ding, J., & Gudivada, V. (2018). Supporting Data Analytics in Education: Human and Technical Resources Needed for Collecting, Storing, Analyzing, and Mining Data. In B. H. Khan, J. R. Corbeil, & M. E. Corbeil (Eds.), Responsible Analytics and Data Mining in Education (pp. 141–155). Routledge.
- Gudivada, V., Rao, D., & Ding, J. (2018). Evolution and Facets of Data Analytics for Educational Data Mining and Learning Analytics. In B. H. Khan, J. R. Corbeil, & M. E. Corbeil (Eds.), Responsible Analytics and Data Mining in Education (pp. 16–42). Routledge.
- Gudivada, V., Ramaswamy, S., & Srinivasan, S. (2018). Data Management Issues in Cyber-Physical Systems. In L. Deka & M. Chowdhury (Eds.), Transportation Cyber-Physical Systems (pp. 173–200). Elsevier. https://doi.org/http://doi.org/10.1016/B978-0-12-814295-0.00007-1
- Gudivada, V., Apon, A., & Rao, D. (2018). Database Systems for Big Data Storage and Retrieval. In R. Segall & J. Cook (Eds.), Handbook of Research on Big Data Storage and Visualization Techniques (pp. 76–100). IDG Global. https://doi.org/http://doi.org/10.4018/978-1-5225-3142-5.ch003
- Gudivada, V. (2017). Data Analytics: Fundamentals. In M. Chowdhury, A. Apon, & K. Dey (Eds.), Data Analytics for Intelligent Transportation Systems (pp. 31–67). Elsevier. https://doi.org/http://doi.org/10.1016/B978-0-12-809715-1.00002-X
- Jothilakshmi, S., & Gudivada, V. (2016). Large Scale Data Enabled Evolution of Speech Processing Research and Applications. In V. Gudivada, V. Raghavan, V. Govindaraju, & C. R. Rao (Eds.), Cognitive Computing: Theory and Applications (Vol. 35, pp. 301–340). Elsevier. https://doi.org/http://dx.doi.org/10.1016/bs.host.2016.07.005
- Gudivada, V., Irfan, M., Fathi, E., & Rao, D. (2016). Cognitive Analytics: Going Beyond Big Data Analytics and Machine Learning. In V. Gudivada, V. Raghavan, V. Govindaraju, & C. R. Rao (Eds.), Cognitive Computing: Theory and Applications (Vol. 35, pp. 169–205). Elsevier. https://doi.org/http://dx.doi.org/10.1016/bs.host.2016.07.010
- Irfan, M., & Gudivada, V. (2016). Cognitive Computing Applications in Education and Learning. In V. Gudivada, V. Raghavan, V. Govindaraju, & C. R. Rao (Eds.), Cognitive Computing: Theory and Applications (Vol. 35, pp. 283–300). Elsevier. https://doi.org/http://dx.doi.org/10.1016/bs.host.2016.07.008
- Gudivada, V. (2016). Cognitive Computing: Concepts, Architectures, Systems and Applications. In V. Gudivada, V. Raghavan, V. Govindaraju, & C. R. Rao (Eds.), Cognitive Computing: Theory and Applications (Vol. 35, pp. 3–38). Elsevier. https://doi.org/http://dx.doi.org/10.1016/bs.host.2016.07.004
- Rao, D., & Gudivada, V. (2017). Metagenomics of epiphytic bacterial biofilms. In P. S. Murthy, Y. V. Nanchariah, V. Thiyagarajan, & R. Sekar (Eds.), Biofilms in the Environment: Processes, Application and Control. Narosa Publishing House.
- Gudivada, V., Rao, D., & Raghavan, V. (2015). Big Data Driven Natural Language Processing Research and Applications. In V. Govindaraju, V. Raghavan, & C. R. Rao (Eds.), Big Data Analytics (Vol. 33, pp. 203–238). Elsevier. https://doi.org/http://dx.doi.org/10.1016/B978-0-444-63492-4.00009-5
- Gudivada, V., Nandigam, J., & Paris, J. (2015). Programming Paradigms in High Performance Computing. In R. Segall, J. Cook, & Q. Zhang (Eds.), Research and Applications in Global Supercomputing (pp. 311–338). IDG Global. https://doi.org/10.4018/978-1-4666-7461-5.ch013
- Gudivada, V., Pankanti, S., Seetharaman, G., & Zhang, Y. (2019). Cognitive Computing Systems: Their Potential and the Future. IEEE Computer, 52(5), 1–5. https://doi.org/10.1109/MC.2019.2904940
- Ding, J., Li, X., Kang, X., & Gudivada, V. N. (2019). A Case Study of the Augmentation and Evaluation of Training Data for Deep Learning. J. Data and Information Quality, 11(4), 20:1–20:22. https://doi.org/10.1145/3317573
- Gudivada, V., Apon, A., & Ding, J. (2017). Data Quality Considerations for Big Data and Machine Learning: Going Beyond Data Cleaning and Transformations. International Journal on Advances in Software, 10(1), 1–20.
- Ding, J., Hu, X.-H., & Gudivada, V. (2017). A Machine Learning Based Framework for Verification and Validation of Massive Scale Image Data. IEEE Transactions on Big Data, 1–17. DOI: 10.1109/TBDATA.2017.2680460
- Gudivada, V. (2017). Cognitive Analytics Driven Personalized Learning. Educational Technology Magazine Special Issue - Big Data and Data Analytics in E-Learning, 23–30. https://www.jstor.org/stable/44430537?seq=1#page_scan_tab_contents
- Gudivada, V., Baeza-Yates, R., & Raghavan, V. (2015). Big Data: Promises and Problems. IEEE Computer, 48(3), 20–23. https://doi.org/http://doi.org/10.1109/MC.2015.62
- Gudivada, V., Rao, D., & Paris, J. (2015). Understanding Search-Engine Optimization. IEEE Computer, 48(10), 67–76. https://doi.org/http://doi.org/10.1109/MC.2015.297
- Park, B., Rao, D. L., & Gudivada, V. N. (2021). Dangers of Bias in Data-Intensive Information Systems. In P. Deshpande, A. Abraham, B. Iyer, & K. Ma (Eds.), Next Generation Information Processing Systems, Proceedings of ICCET-2020, Volume 2 (pp. 259–271). Springer Singapore. https://doi.org/doi.org/10.1007/978-981-15-4851-2_28
- Rao, D. L., Pala, V. R., Herndon, N., & Gudivada, V. N. (2021). A Deep Learning Architecture for Corpus Creation for Telugu Language. In B. Iyer, A. Rajurkar, & V. N. Gudivada (Eds.), Applied Computer Vision and Image Processing, Proceedings of ICCET-2020, Volume 1 (pp. 1–16). Springer Singapore. https://doi.org/https://doi.org/10.1007/978-981-15-4029-5_1
- Rao, D. L., Smith, E., & Gudivada, V. N. (2021). A Computational Linguistics Approach to Preserving and Promoting Natural Languages. In B. Iyer, A. Rajurkar, & V. N. Gudivada (Eds.), Applied Computer Vision and Image Processing, Proceedings of ICCET-2020, Volume 1 (pp. 353–361). Springer Singapore. https://doi.org/https://doi.org/10.1007/978-981-15-4029-5_35
- Andriot, J., Park, B., Francia, P., & Gudivada, V. N. (2021). Sentiment Analysis of Democratic Presidential Primaries Debate Tweets Using Machine Learning Models. In B. Iyer, A. Rajurkar, & V. N. Gudivada (Eds.), Applied Computer Vision and Image Processing, Proceedings of ICCET-2020, Volume 1 (pp. 339–352). Springer Singapore. https://doi.org/https://doi.org/10.1007/978-981-15-4029-5_34
- Ding, J., Li, X. C., & Gudivada, V. (2017). Augmentation and Evaluation of Training Data for Deep Learning. The IEEE International Conference on Big Data (Big Data), 2603–2611. https://doi.org/https://doi.org/10.1109/BigData.2017.8258220
- Ding, J., Kang, X., Hu, X., & Gudivada, V. (2017). Building a Deep Learning Classifier for Enhancing a Biomedical Big Data Service. The IEEE International Conference on Services Computing (SCC), 140–147. https://doi.org/http://doi.org/10.1109/SCC.2017.25
- Gudivada, V., Arbabifard, K., & Rao, D. (2017). Automated Generation of SQL Queries that Feature Specified SQL Concepts. The Proceedings of the 3rd International Conference on Big Data, Small Data, Linked Data and Open Data (ALLDATA 2017), 9–13.
- Gudivada, V., Rao, D., & Grosky, W. (2016). Data Quality Centric Application Framework for Big Data. The Proceedings of the The 2nd International Conference on Big Data, Small Data, Linked Data and Open Data (ALLDATA 2016), 24–32.
- Rao, D., Gudivada, V., & Raghavan, V. (2015). Data quality issues in big data. The 2015 IEEE International Conference on Big Data (Big Data), 2654–2660.
- Gudivada, V., Jothilakshmi, S., & Rao, D. (2015). Data Management Issues in Big Data Applications. The Proceedings of the First International Conference on Big Data, Small Data, Linked Data and Open Data (ALLDATA 2015), 16–21.
- Natarajan, V., Jothilakshmi, S., & Gudivada, V. (2015). Scalable Analytics on Traffic Video Data using Hadoop MapReduce. The Proceedings of the First International Conference on Big Data, Small Data, Linked Data and Open Data (ALLDATA 2015), 11–15.
Resources for Research
-
LaTeX – a high-quality typesetting system for all computing platforms. It is free.
-
TikZ and PGF are TeX packages for creating sophisticated graphics programmatically. TikZ is built on top of PGF.
-
Overleaf is a collaborative, cloud-based LaTeX editor used for writing, editing and publishing scientific documents. Features include real-time collaboration, version control, and hundreds of ready to use LaTeX templates.
-
MathPix extracts equations as LaTeX code from PDFs or handwritten notes.
-
Grammarly – an AI-powered writing assistant.
-
Analyzing the past to prepare for the future: writing a literature review
-
Booth, W., Colomb, G., and Williams, J. (1995). The craft of research. Chicago: University of Chicago Press. Summary by Prof. Andy Finn
-
Towards a Framework of Literature Review Process in Support of Information Systems Research
-
Guidelines for Performing Systematic Literature Reviews in Software Engineering – Summary
-
Guidelines for Performing Systematic Literature Reviews in Software Engineering – Complete Article
-
Empirical Studies of Agile Software Development: A Systematic Review
-
Factors associated with success in medical school: systematic review of the literature
-
NSF Workshop on the Question Generation Shared Task and Evaluation Challenge
-
Automatically Generating Questions in Multiple Variables for Intelligent Tutoring
-
Automatic Question Pattern Generation for Ontology-based Question Answering
-
Automatic Generation of Multiple Choice Questions From Domain Ontologies
-
Automatic Generation of Multiple choice questions from domain ontologies
-
awesome-nlp: a curated list of resources dedicated to natural language processing (NLP).
-
WordNet: A Lexical Database for English.
-
RILM: The SRI Language Modeling Toolkit.
-
Apache Lucene: A high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.
-
MontyLingua: A Free, commonsense-enriched natural language understander for English.
-
tm: A framework for developing text mining applications with R.
-
GATE: General Architecture for Text Engineering (GATE) – an open source software capable of solving almost any text processing problem.
-
Open NLP: The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text.
-
HTK: The Hidden Markov Model Toolkit (HTK) is a portable toolkit for building and manipulating hidden Markov models. HTK is primarily used for speech recognition research although it has been used for numerous other applications including research into speech synthesis, character recognition and DNA sequencing.
-
VoiceBox: A speech processing toolkit based on MATLAB.
-
Praat: doing phonetics by computer.
-
openSMILE: The Munich versatile and fast open-source audio feature extractor.
-
Foundations of Statistical Natural Language Processing, 2nd Edition.
-
Speech and Language Processing, 2nd Edition.
-
Download PostgreSQL from EnterpriseDB: PostgreSQL is a powerful, open source object-relational database system. It has more than 15 years of active development and a proven architecture that has earned it a strong reputation for reliability, data integrity, and correctness.
-
Download pgAdmin: An open source tool for PostgreSQL administration and database development.
-
PostgreSQL Studio: Open Source Web Interface for PostgreSQL.
-
TeamPostgreSQL: An open source, Web browser-based tool for PostgreSQL administration and database development.
-
SQL Power Architect: An open source tool for conceptual data modeling and profiling.
-
PostgreSQL Wiki: This wiki contains user documentation, how-tos, and tips ‘n’ tricks related to PostgreSQL. It also serves as a collaboration area for PostgreSQL contributors.
-
Automated test data generation using Datanamic Data Generator MultiDB 2011
- DTM Data Generator
- Database test data generation (PhD Thesis, 2004)
-
IPOL Journal – Image Processing On Line IPOL is a research journal of image processing and image analysis. Each article contains a text on an algorithm and its source code, with an online demonstration facility and an archive of experiments. Text and source code are peer-reviewed and the demonstration is controlled. IPOL is an Open Science and Reproducible Research journal.
-
Supercamera More Pixels Than You Know What To Do With.
-
OpenCV: An open source C/C++ library with over 2,5000 implemented algorithms for real-time computer vision applications such as object recognition, shape detection, depth estimation, tracking moving objects, extracting 3D models, and overlaying augmented reality.
-
ImageMagick: A software suite to create, edit, compose, or convert images.
-
Geograph Project: Aims to collect geographically representative photographs and information for every square kilometer of Great Britain and Ireland.
-
Professor Alireza Saberi’s Image and Video Processing lectures - YouTube Playlist
-
Computer Vision: Algorithms and Applications, a book by Richard Szeliski, Microsoft Research