# Single Page Copier CLI Tool
A high-quality, developer-controlled CLI program for copying individual web pages with maximum quality and precision.
## ๐ฏ Purpose
This tool focuses on creating the highest quality copies of individual web pages, allowing developers to control exactly which pages to copy without automatic sitemap analysis. Each page is processed with optimal asset handling and styling preservation.
## ๐ Features
- **Single Page Focus**: Copy one page at a time with maximum quality
- **Developer Controlled**: No automatic sitemap scanning - you control what gets copied
- **Visibility Cleanup**: Ensures all main content sections are visible in static copies
- **Code Prettification**: Developer-friendly HTML, CSS, and JavaScript output
- **Multiple Processing Modes**: Flexible JavaScript handling (clean, extract, disable, prettify, keep)
- **High Quality Output**: Preserves all styling, assets, and functionality
- **Complete Asset Handling**: Downloads and processes CSS, JavaScript, images, fonts
- **CSS Processing**: Handles nested assets in CSS files (background images, fonts, etc.)
- **Responsive Images**: Processes srcset attributes correctly
- **Inline Styles**: Handles background images in style attributes
- **Favicon Support**: Downloads and processes all icon types
- **Clean Output**: Organized directory structure with proper relative paths
## ๐ฆ Installation
No installation required - just run the Python script directly:
```bash
# Make sure you have required dependencies
pip install requests beautifulsoup4
```
## ๐ง Usage
### Basic Usage
```bash
python single_page_copier.py https://www.wealthfront.com/
```
### Recommended: Clean Mode with Visibility Cleanup
```bash
python single_page_copier.py https://www.wealthfront.com/ --js-mode clean --verbose
```
### Specify Output Filename
```bash
python single_page_copier.py https://www.wealthfront.com/cash --output cash.html --js-mode clean
```
### Custom Output Directory
```bash
python single_page_copier.py https://www.wealthfront.com/stock-investing --dir my_copies --js-mode clean
```
### Verbose Mode (See All Processing Details)
```bash
python single_page_copier.py https://www.wealthfront.com/bonds --js-mode clean --verbose
```
### Combined Options
```bash
python single_page_copier.py https://www.wealthfront.com/retirement \
--output retirement.html \
--dir wealthfront_pages \
--js-mode clean \
--verbose
```
## ๐ Command Line Options
| `url` | - | URL of the page to copy (required) | - |
| `--output` | `-o` | Output filename | Auto-generated |
| `--dir` | `-d` | Output directory | `output` |
| `--js-mode` | `-j` | JavaScript processing mode | `keep` |
| `--verbose` | `-v` | Enable detailed logging | False |
### JavaScript Processing Modes
| `keep` | Preserves original JavaScript as-is | Production copies |
| `disable` | Removes all JavaScript functionality | Static-only copies |
| `extract` | Extracts inline CSS and JavaScript to separate files | Organization |
| `prettify` | Beautifies HTML, CSS, and JavaScript | Development |
| `clean` | **Recommended**: Prettifies code + visibility cleanup | Development with visible content |
## ๐ Output Structure
```
output/
โโโ index.html # Main HTML file
โโโ assets/ # All downloaded assets
โโโ css/ # Stylesheets
โโโ js/ # JavaScript files
โโโ images/ # Images
โโโ fonts/ # Web fonts
```
## ๐ฏ Quality Features
### Visibility Cleanup (Clean Mode)
- **Removes hiding CSS transforms**: Clears `opacity:0`, `translateY()`, `translateZ()` styles
- **Removes offscreen attributes**: Eliminates `offscreen=""` attributes that hide content
- **Preserves other styling**: Maintains all other CSS properties and visual design
- **Ensures content visibility**: Main sections like "Money works better here" are displayed
- **Automatic application**: Applied automatically in `clean` mode
### Code Prettification
- **HTML Beautification**: Clean, indented HTML output without developer notes
- **CSS Formatting**: Properly formatted CSS with consistent indentation
- **JavaScript Prettification**: Readable JavaScript with proper spacing and structure
- **Chunk File Processing**: Even minified framework chunks are prettified for development
### CSS Processing
- Downloads all linked stylesheets
- Processes `@import` statements in CSS
- Handles `url()` references in CSS (background images, fonts)
- Maintains proper relative paths
- Beautifies CSS for better readability (clean/prettify modes)
### Image Handling
- Downloads all `<img>` sources
- Processes `srcset` attributes for responsive images
- Handles background images in inline styles
- Supports all common image formats
### JavaScript & Assets
- Downloads all external JavaScript files
- Preserves script loading order
- Downloads favicons and touch icons
- Handles web fonts and other assets
- Prettifies minified code for readability (clean/prettify modes)
### HTML Optimization
- Maintains original HTML structure
- Preserves all meta tags and SEO elements
- Keeps responsive design intact
- Adds generator meta tag for identification
- Formats HTML for better readability (clean/prettify modes)
## ๐ Workflow Examples
### Copy Multiple Pages Individually (Recommended)
```bash
# Copy homepage with clean mode for visible content
python single_page_copier.py https://www.wealthfront.com/ -o home.html --js-mode clean -v
# Copy cash page with clean mode
python single_page_copier.py https://www.wealthfront.com/cash -o cash.html --js-mode clean -v
# Copy stock investing page with clean mode
python single_page_copier.py https://www.wealthfront.com/stock-investing -o stocks.html --js-mode clean -v
# Copy bonds page with clean mode
python single_page_copier.py https://www.wealthfront.com/bonds -o bonds.html --js-mode clean -v
```
### Organize by Directories
```bash
# Create separate directories for different sections with clean mode
python single_page_copier.py https://www.wealthfront.com/ --dir homepage --js-mode clean
python single_page_copier.py https://www.wealthfront.com/cash --dir products --js-mode clean
python single_page_copier.py https://www.wealthfront.com/stock-investing --dir products --js-mode clean
```
## ๐งช Testing Your Copies
After copying pages, test them locally:
```bash
# Navigate to output directory
cd output
# Start local server
python3 -m http.server 8000
# Open in browser
# http://localhost:8000/index.html
```
## ๐ Quality Verification
The tool provides detailed feedback:
- โ
Success/failure status
- ๐ Output file location
- ๐ฆ Number of assets downloaded
- ๐ Original URL reference
- ๐ง Verbose logging of all operations
## ๐ฏ Best Practices
1. **Use Clean Mode**: Always use `--js-mode clean` for development-ready copies with visible content
2. **Use Verbose Mode**: Always use `-v` to see what's being processed
3. **Organize Output**: Use different directories for different sites/sections
4. **Test Locally**: Always test copied pages with a local server
5. **Check Content Visibility**: Verify all main content sections are displayed
6. **Check Assets**: Verify all images and styles loaded correctly
7. **Custom Filenames**: Use descriptive output filenames for organization
## ๐ Advanced Usage
### Batch Processing Script
Create a simple bash script for multiple pages:
```bash
#!/bin/bash
# copy_wealthfront_pages.sh
python single_page_copier.py https://www.wealthfront.com/ -o home.html -d wealthfront --js-mode clean -v
python single_page_copier.py https://www.wealthfront.com/cash -o cash.html -d wealthfront --js-mode clean -v
python single_page_copier.py https://www.wealthfront.com/stock-investing -o stocks.html -d wealthfront --js-mode clean -v
python single_page_copier.py https://www.wealthfront.com/bonds -o bonds.html -d wealthfront --js-mode clean -v
```
### Quality Control
```bash
# Copy with maximum verbosity and check results
python single_page_copier.py https://www.wealthfront.com/pricing \
--output pricing.html \
--dir quality_test \
--js-mode clean \
--verbose > copy_log.txt 2>&1
# Review the log
cat copy_log.txt
```
## ๐ Why This Approach?
- **Quality First**: Focus on perfect single-page copies rather than bulk processing
- **Developer Control**: You decide exactly what to copy and when
- **Debugging Friendly**: Verbose mode shows exactly what's happening
- **Flexible**: Works with any website, not just specific architectures
- **Reliable**: Handles edge cases and error conditions gracefully
---
## ๐ง Troubleshooting
### Missing Content Issues
If main content sections are not visible:
1. **Use `--js-mode clean`**: This includes automatic visibility cleanup
2. **Check for hidden elements**: Look for CSS with `opacity:0` or `transform:translateY()`
3. **Verify JavaScript dependencies**: Some content may require JavaScript to display
### Common Visibility Problems
- **"Money works better here" section missing**: Use `--js-mode clean` to fix
- **Hero sections not displaying**: CSS transforms are hiding content - clean mode fixes this
- **Content appears blank**: Offscreen attributes are hiding elements - clean mode removes them
### Asset Loading Issues
If assets are not loading properly:
1. Check that the output directory contains an `assets/` folder
2. Verify that asset URLs in HTML are relative paths
3. Ensure the web server is serving from the correct directory
### Code Readability Issues
If code is minified and hard to read:
1. Use `--js-mode clean` or `--js-mode prettify` for formatted output
2. Check that CSS and JavaScript files are properly beautified
3. Verify HTML is properly indented and formatted
**Perfect for**: Website design analysis, offline development, design inspiration, educational purposes, and creating high-quality website references.