xsukax ReadClean PDF

# 📄 xsukax ReadClean PDF

A privacy-focused, client-side web application that extracts clean, readable content from any webpage and converts it to PDF format. Built with pure HTML, CSS, and JavaScript—no backend required, no tracking, complete privacy.

**Github Repo:** [https://github.com/xsukax/xsukax-ReadClean-PDF](https://github.com/xsukax/xsukax-ReadClean-PDF)

**Demo:** [https://xsukax.github.io/xsukax-ReadClean-PDF](https://xsukax.github.io/xsukax-ReadClean-PDF)

[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
[![GitHub stars](https://img.shields.io/github/stars/xsukax/xsukax-ReadClean-PDF?style=social)](https://github.com/xsukax/xsukax-ReadClean-PDF)
[![GitHub issues](https://img.shields.io/github/issues/xsukax/xsukax-ReadClean-PDF)](https://github.com/xsukax/xsukax-ReadClean-PDF/issues)

## 🎯 Project Overview

**xsukax ReadClean PDF** is a lightweight, browser-based tool designed to transform cluttered web content into clean, distraction-free PDFs optimized for reading and archival. The application strips away advertisements, navigation elements, and other extraneous content while preserving the core article or document structure.

### Primary Purpose

– **Content Extraction**: Intelligently identifies and extracts main content from web pages
– **Distraction Removal**: Eliminates ads, scripts, sidebars, navigation menus, and other non-essential elements
– **PDF Generation**: Leverages native browser print functionality for high-quality PDF output
– **Universal Compatibility**: Works with any website through multiple fetching methods

### Core Functionalities

1. **URL-Based Fetching**: Retrieve content directly from web URLs using CORS proxies
2. **HTML Paste Processing**: Process raw HTML content pasted directly into the application
3. **Bookmarklet Integration**: One-click content extraction from any webpage via browser bookmarklet
4. **Intelligent Content Cleaning**: Automated removal of ads, scripts, images, links, and navigation elements
5. **RTL/LTR Language Support**: Automatic detection and proper rendering of right-to-left and left-to-right text
6. **Responsive Design**: Optimized interface for desktop and mobile devices

## 🔒 Security and Privacy Benefits

**xsukax ReadClean PDF** is architected with privacy and security as foundational principles. All processing occurs entirely within your browser, ensuring complete data sovereignty and protection.

### Privacy-Centric Architecture

#### Client-Side Processing Only
All HTML parsing, content extraction, and PDF generation occur exclusively in your browser’s JavaScript environment. No data is transmitted to external servers for processing, eliminating concerns about data interception, logging, or unauthorized access.

#### No Data Collection or Tracking
– **Zero Analytics**: The application contains no analytics scripts, tracking pixels, or telemetry
– **No Cookies**: Does not set or read cookies for user identification or behavior tracking
– **No External Dependencies**: Core functionality operates without loading third-party libraries from CDNs
– **No User Accounts**: Fully functional without registration, login, or profile creation

**Note**: When using the URL Fetch method, the application routes requests through public CORS proxy services to bypass browser same-origin restrictions. While these proxies can see the URLs being fetched, they do not receive or process your extracted content. For maximum privacy, use the **Bookmarklet** or **Paste HTML** methods, which operate entirely offline.

### Security Features

#### Content Sanitization
– **Script Removal**: Automatically strips all `