A MULTI-FEATURE ACCURATE DETECTION (MFAD) APPROACH FOR LARGE LANGUAGE MODEL-GENERATEDTEXT

Document Type : Original Article

Authors

1 computer science , faculty of computer and information,Ain shimas

2 Computer Science, Faculty of Computer and Information science, Ainshams University, Cairo, Egypt

3 Department Computer Science, Faculty of Computer and Information Sciences,Ain Shams University, Cairo, Egypt.

Abstract

Advanced Large Language Models (LLMs) generate highly complex text that closely resembles human writing. However, their rapid development raises significant concerns, such as misinformation and academic cheating. As the responsible use of LLMs becomes increasingly critical, the ability to detect LLM-generated content has emerged as a critical challenge. Existing detection methods often rely on single-feature analysis, traditional feature extraction techniques, and conventional classification models. Many also require full access to the underlying models and are sensitive to variations in text length, limiting their overall effectiveness. This paper proposes a novel Multi-Feature Accurate Detection (MFAD) approach for identifying LLM-generated text by integrating syntactic and statistical attributes with high-level semantic representations. A case study using the Human ChatGPT Comparison Corpus (HC3) is conducted to evaluate the proposed architecture. MFAD comprises six phases: text preprocessing, syntactic and statistical feature extraction, text representation, semantic feature extraction, feature concatenation, and text classification. Results show that MFAD effectively distinguishes between human-written and LLM-generated text, achieving a peak confidence score of 98%, highlighting its reliability and strong performance.

Keywords