Open Source

Malicious URL Detection using CNN

2019 (Final Year Project)

Overview

A deep learning-based security solution that detects malicious URLs using Character-level Convolutional Neural Networks. This project addresses the growing threat of web-based attacks by identifying malicious patterns in URL structures.

Challenge

Traditional signature-based detection methods struggle with new and evolving attack patterns. The challenge was to create a model that could learn to identify malicious URLs based on their character-level patterns without relying on predefined signatures.

Solution

Implemented a Character-level CNN architecture using Python and Keras that analyzes URL strings at the character level. The model was trained on comprehensive datasets of malicious and benign URLs, learning to identify patterns indicative of XSS attacks, SQL injection attempts, and directory traversal exploits.

Technology Stack

PythonKerasTensorFlowDeep LearningCNNMachine Learning

Key Features

Character-level CNN architecture
Multi-class classification for different attack types
Training on real-world security datasets
URL preprocessing and feature extraction
Model evaluation and performance metrics
Python implementation with Keras/TensorFlow
Modular design for easy extension
Detailed documentation and examples

Impact & Results

Effective detection of XSS, SQL injection, and directory traversal attacks
Model trained on comprehensive security threat datasets
Character-level analysis providing robustness against obfuscation
Open source contribution to security research
Academic project demonstrating ML application in cybersecurity

Technical Highlights

Character-level Convolutional Neural Network
Deep learning with Keras and TensorFlow
Feature extraction from URL strings
Multi-label classification
Training pipeline with data augmentation
Model optimization and hyperparameter tuning
Evaluation metrics (accuracy, precision, recall)
Python implementation with NumPy and Pandas
Deployment:Open source on GitHub

¯ Project loaded successfully

8 features documented

6 technologies used

› Ready for review