Improve error handling and add duplicate snapshot cleanup

🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Fix snapshot overwrite logic to preserve successful responses
2025-06-18 11:56:26 +03:00 · 2025-06-18 11:23:56 +03:00 · 2025-06-17 10:41:17 +03:00 · 2025-06-16 17:09:26 +03:00 · 2025-06-16 12:29:33 +03:00 · 2025-05-22 13:26:11 +03:00
73 changed files with 3589 additions and 1673 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -1,7 +1,9 @@
 **/.#*
 **/*~
 **/.DS_Store
 /.idea
 /.goroot
 /dist/**
 /blacklist.txt
 /check.sh
 /debug.sh
@@ -13,5 +15,15 @@
 run*.sh
 /main
 /db/migration*/**
-/db/populate/**
+/cmd/populate/**
 /db/sql/**
 **/.claude/settings.local.json
 /crawl.sh
 /crawler.sh
 /get.sh
 /snapshot_history.sh
 /whitelist.txt
 /CLAUDE.md
--- a/ARCHITECTURE.md
+++ b/ARCHITECTURE.md
@@ -0,0 +1,169 @@
 # gemini-grc Architectural Notes
 ## 20250513 - Versioned Snapshots
 The crawler now supports saving multiple versions of the same URL over time, similar to the Internet Archive's Wayback Machine. This document outlines the architecture and changes made to support this feature.
 ### Database Schema Changes
 The following changes to the database schema are required:
 ```sql
 -- Remove UNIQUE constraint from url in snapshots table
 ALTER TABLE snapshots DROP CONSTRAINT unique_url;
 -- Create a composite primary key using url and timestamp
 CREATE UNIQUE INDEX idx_url_timestamp ON snapshots (url, timestamp);
 -- Add a new index to efficiently find the latest snapshot
 CREATE INDEX idx_url_latest ON snapshots (url, timestamp DESC);
 ```
 ## Error handling
 - `xerrors` library is used for error creation/wrapping.
 - The "Fatal" field is not used, we _always_ panic on fatal errors.
 - _All_ internal functions _must_ return `xerror` errors.
 - _All_ external errors are wrapped within `xerror` errors.
 ### Code Changes
 1. **Updated SQL Queries**:
   - Changed queries to insert new snapshots without conflict handling
   - Added queries to retrieve snapshots by timestamp
   - Added queries to retrieve all snapshots for a URL
   - Added queries to retrieve snapshots in a date range
 2. **Context-Aware Database Methods**:
   - `SaveSnapshot`: Saves a new snapshot with the current timestamp using a context
   - `GetLatestSnapshot`: Retrieves the most recent snapshot for a URL using a context
   - `GetSnapshotAtTimestamp`: Retrieves the nearest snapshot at or before a given timestamp using a context
   - `GetAllSnapshotsForURL`: Retrieves all snapshots for a URL using a context
   - `GetSnapshotsByDateRange`: Retrieves snapshots within a date range using a context
 3. **Backward Compatibility**:
   - The `OverwriteSnapshot` method has been maintained for backward compatibility
   - It now delegates to `SaveSnapshot`, effectively creating a new version instead of overwriting
 ### Utility Scripts
 A new utility script `snapshot_history.sh` has been created to demonstrate the versioned snapshot functionality:
 - Retrieve the latest snapshot for a URL
 - Retrieve a snapshot at a specific point in time
 - Retrieve all snapshots for a URL
 - Retrieve snapshots within a date range
 ### Usage Examples
 ```bash
 # Get the latest snapshot
 ./snapshot_history.sh -u gemini://example.com/
 # Get a snapshot from a specific point in time
 ./snapshot_history.sh -u gemini://example.com/ -t 2023-05-01T12:00:00Z
 # Get all snapshots for a URL
 ./snapshot_history.sh -u gemini://example.com/ -a
 # Get snapshots in a date range
 ./snapshot_history.sh -u gemini://example.com/ -r 2023-01-01T00:00:00Z 2023-12-31T23:59:59Z
 ```
 ### API Usage Examples
 ```go
 // Save a new snapshot
 ctx := context.Background()
 snapshot, _ := snapshot.SnapshotFromURL("gemini://example.com", true)
 tx, _ := Database.NewTx(ctx)
 err := Database.SaveSnapshot(ctx, tx, snapshot)
 tx.Commit()
 // Get the latest snapshot
 ctx := context.Background()
 tx, _ := Database.NewTx(ctx)
 latestSnapshot, err := Database.GetLatestSnapshot(ctx, tx, "gemini://example.com")
 tx.Commit()
 // Get a snapshot at a specific time
 ctx := context.Background()
 timestamp := time.Date(2023, 5, 1, 12, 0, 0, 0, time.UTC)
 tx, _ := Database.NewTx(ctx)
 historicalSnapshot, err := Database.GetSnapshotAtTimestamp(ctx, tx, "gemini://example.com", timestamp)
 tx.Commit()
 // Get all snapshots for a URL
 ctx := context.Background()
 tx, _ := Database.NewTx(ctx)
 allSnapshots, err := Database.GetAllSnapshotsForURL(ctx, tx, "gemini://example.com")
 tx.Commit()
 // Using a timeout context to limit database operations
 ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
 defer cancel()
 tx, _ := Database.NewTx(ctx)
 latestSnapshot, err := Database.GetLatestSnapshot(ctx, tx, "gemini://example.com")
 tx.Commit()
 ```
 ### Content Deduplication Strategy
 The crawler implements a sophisticated content deduplication strategy that balances storage efficiency with comprehensive historical tracking:
 #### `--skip-identical-content` Flag Behavior
 **When `--skip-identical-content=true` (default)**:
 - All content types are checked for duplicates before storing
 - Identical content is skipped entirely to save storage space
 - Only changed content results in new snapshots
 - Applies to both Gemini and non-Gemini content uniformly
 **When `--skip-identical-content=false`**:
 - **Gemini content (`text/gemini` MIME type)**: Full historical tracking - every crawl creates a new snapshot regardless of content changes
 - **Non-Gemini content**: Still deduplicated - identical content is skipped even when flag is false
 - Enables comprehensive version history for Gemini capsules while avoiding unnecessary storage of duplicate static assets
 #### Implementation Details
 The deduplication logic is implemented in `shouldSkipIdenticalSnapshot()` function in `common/worker.go`:
 1. **Primary Check**: When `--skip-identical-content=true`, all content is checked for duplicates
 2. **MIME-Type Specific Check**: When the flag is false, only non-`text/gemini` content is checked for duplicates
 3. **Content Comparison**: Uses `IsContentIdentical()` which compares either GemText fields or binary Data fields
 4. **Dual Safety Checks**: Content is checked both in the worker layer and database layer for robustness
 This approach ensures that Gemini capsules get complete version history when desired, while preventing storage bloat from duplicate images, binaries, and other static content.
 ### Time-based Crawl Frequency Control
 The crawler can be configured to skip re-crawling URLs that have been recently updated, using the `--skip-if-updated-days=N` parameter:
 * When set to a positive integer N, URLs that have a snapshot newer than N days ago will not be added to the crawl queue, even if they're found as links in other pages.
 * This feature helps control crawl frequency, ensuring that resources aren't wasted on frequently checking content that rarely changes.
 * Setting `--skip-if-updated-days=0` disables this feature, meaning all discovered URLs will be queued for crawling regardless of when they were last updated.
 * Default value is 60 days.
 * For example, `--skip-if-updated-days=7` will skip re-crawling any URL that has been crawled within the last week.
 ### Worker Pool Architecture
 The crawler uses a sophisticated worker pool system with backpressure control:
 * **Buffered Channel**: Job queue size equals the number of workers (`NumOfWorkers`)
 * **Self-Regulating**: Channel backpressure naturally rate-limits the scheduler
 * **Context-Aware**: Each URL gets its own context with timeout (default 120s)
 * **Transaction Per Job**: Each worker operates within its own database transaction
 * **SafeRollback**: Uses `gemdb.SafeRollback()` for graceful transaction cleanup on errors
 ### Database Transaction Patterns
 * **Context Separation**: Scheduler uses long-lived context, while database operations use fresh contexts
 * **Timeout Prevention**: Fresh `dbCtx := context.Background()` prevents scheduler timeouts from affecting DB operations
 * **Error Handling**: Distinguishes between context cancellation, fatal errors, and recoverable errors
 ### Future Improvements
 1. Add a web interface to browse snapshot history
 2. Implement comparison features to highlight changes between snapshots
 3. Add metadata to track crawl batches
 4. Implement retention policies to manage storage
--- a/13
+++ b/13
@@ -1,10 +1,10 @@
-SHELL := /bin/env oksh
+SHELL := /bin/sh
 export PATH := $(PATH)
-all: fmt lintfix tidy test clean build
+all: fmt lintfix vet tidy test clean build
 clean:
-	rm -f ./gemini-grc
+	mkdir -p ./dist && rm -rf ./dist/*
 debug:
 	@echo "PATH: $(PATH)"
@@ -30,12 +30,17 @@ fmt:
 lint: fmt
 	golangci-lint run
 vet: fmt
 	go vet ./.../
 # Run linter and fix
 lintfix: fmt
 	golangci-lint run --fix
 build:
-	go build -race -o gemini-grc ./main.go
+	CGO_ENABLED=0 go build -o ./dist/get ./cmd/get/get.go
 	CGO_ENABLED=0 go build -o ./dist/crawl ./cmd/crawl/crawl.go
 	CGO_ENABLED=0 go build -o ./dist/crawler ./cmd/crawler/crawler.go
 show-updates:
 	go list -m -u all
--- a/NOTES.md
+++ b/NOTES.md
@@ -0,0 +1,13 @@
 # Notes
 Avoiding endless loops while crawling
 - Make sure we follow robots.txt
 - Announce our own agent so people can block us in their robots.txt
 - Put a limit on number of pages per host, and notify on limit reach.
 - Put a limit on the number of redirects (not needed?)
 Heuristics:
 - Do _not_ parse links from pages that have '/git/' or '/cgi/' or '/cgi-bin/' in their URLs.
 - Have a list of "whitelisted" hosts/urls that we visit in regular intervals.
--- a/README.md
+++ b/README.md
@@ -8,6 +8,7 @@ Easily extendable as a "wayback machine" of Gemini.
 - [x] Concurrent downloading with configurable number of workers
 - [x] Connection limit per host
 - [x] URL Blacklist
 - [x] URL Whitelist (overrides blacklist and robots.txt)
 - [x] Follow robots.txt, see gemini://geminiprotocol.net/docs/companion/robots.gmi
 - [x] Configuration via environment variables
 - [x] Storing capsule snapshots in PostgreSQL
@@ -16,6 +17,9 @@ Easily extendable as a "wayback machine" of Gemini.
 - [x] Handle redirects (3X status codes)
 - [x] Crawl Gopher holes
 ## Security Note
 This crawler uses `InsecureSkipVerify: true` in TLS configuration to accept all certificates. This is a common approach for crawlers but makes the application vulnerable to MITM attacks. This trade-off is made to enable crawling self-signed certificates widely used in the Gemini ecosystem.
 ## How to run
 Spin up a PostgreSQL, check `db/sql/initdb.sql` to create the tables and start the crawler.
@@ -30,11 +34,12 @@ Bool can be `true`,`false` or `0`,`1`.
 	MaxResponseSize        int // Maximum size of response in bytes
 	NumOfWorkers           int // Number of concurrent workers
 	ResponseTimeout        int // Timeout for responses in seconds
 	WorkerBatchSize        int // Batch size for worker processing
 	PanicOnUnexpectedError bool // Panic on unexpected errors when visiting a URL
 	BlacklistPath          string // File that has blacklisted strings of "host:port"
 	WhitelistPath          string // File with URLs that should always be crawled regardless of blacklist or robots.txt
 	DryRun                 bool // If false, don't write to disk
-	PrintWorkerStatus      bool // If false, print logs and not worker status table
+	SkipIdenticalContent   bool // When true, skip storing snapshots with identical content
 	SkipIfUpdatedDays      int  // Skip re-crawling URLs updated within this many days (0 to disable)
 ```
 Example:
@@ -42,8 +47,8 @@ Example:
 ```shell
 LOG_LEVEL=info \
 NUM_OF_WORKERS=10 \
 WORKER_BATCH_SIZE=10 \
 BLACKLIST_PATH="./blacklist.txt" \ # one url per line, can be empty
 WHITELIST_PATH="./whitelist.txt" \ # URLs that override blacklist and robots.txt
 MAX_RESPONSE_SIZE=10485760 \
 RESPONSE_TIMEOUT=10 \
 PANIC_ON_UNEXPECTED_ERROR=true \
@@ -54,6 +59,8 @@ PG_PORT=5434 \
 PG_USER=test \
 PG_PASSWORD=test \
 DRY_RUN=false \
 SKIP_IDENTICAL_CONTENT=false \
 SKIP_IF_UPDATED_DAYS=7 \
 ./gemini-grc
 ```
@@ -65,8 +72,30 @@ go install mvdan.cc/gofumpt@v0.7.0
 go install github.com/golangci/golangci-lint/cmd/golangci-lint@v1.63.4
 ```
 ## Snapshot History
 The crawler now supports versioned snapshots, storing multiple snapshots of the same URL over time. This allows you to view how content changes over time, similar to the Internet Archive's Wayback Machine.
 ### Accessing Snapshot History
 You can access the snapshot history using the included `snapshot_history.sh` script:
 ```bash
 # Get the latest snapshot
 ./snapshot_history.sh -u gemini://example.com/
 # Get a snapshot from a specific point in time
 ./snapshot_history.sh -u gemini://example.com/ -t 2023-05-01T12:00:00Z
 # Get all snapshots for a URL
 ./snapshot_history.sh -u gemini://example.com/ -a
 # Get snapshots in a date range
 ./snapshot_history.sh -u gemini://example.com/ -r 2023-01-01T00:00:00Z 2023-12-31T23:59:59Z
 ```
 ## TODO
- [ ] Add snapshot history
+- [x] Add snapshot history
 - [ ] Add a web interface
 - [ ] Provide to servers a TLS cert for sites that require it, like Astrobotany
 - [ ] Use pledge/unveil in OpenBSD hosts
--- a/TODO.md
+++ b/TODO.md
@@ -0,0 +1,38 @@
 # TODO
 ## Outstanding Issues
 ### 1. Ctrl+C Signal Handling Issue
 **Problem**: The crawler sometimes doesn't exit properly when Ctrl+C is pressed.
 **Root Cause**: The main thread gets stuck in blocking operations before it can check for signals:
 - Database operations in the polling loop (`cmd/crawler/crawler.go:239-250`)
 - Job queueing when channel is full (`jobs <- url` can block if workers are slow)
 - Long-running database transactions
 **Location**: `cmd/crawler/crawler.go` - main polling loop starting at line 233
 **Solution**: Add signal/context checking to blocking operations:
 - Use cancellable context instead of `context.Background()` for database operations
 - Make job queueing non-blocking or context-aware
 - Add timeout/cancellation to database operations
 ### 2. fetchSnapshotsFromHistory() Doesn't Work with --skip-identical-content=true
 **Problem**: When `--skip-identical-content=true` (default), URLs with unchanged content get continuously re-queued.
 **Root Cause**: The function tracks when content last changed, not when URLs were last crawled:
 - Identical content → no new snapshot created
 - Query finds old snapshot timestamp → re-queues URL
 - Creates infinite loop of re-crawling unchanged content
 **Location**: `cmd/crawler/crawler.go:388-470` - `fetchSnapshotsFromHistory()` function
 **Solution Options**:
 1. Add `last_crawled` timestamp to URLs table
 2. Create separate `crawl_attempts` table  
 3. Always create snapshot entries (even for duplicates) but mark them as such
 4. Modify logic to work with existing schema constraints
 **Current Status**: Function assumes `SkipIdenticalContent=false` per original comment at line 391.
--- a/bin/gemget/main.go
+++ b/bin/gemget/main.go
@@ -1,47 +0,0 @@
 package main
 import (
 	"encoding/json"
 	"fmt"
 	"os"
 	"gemini-grc/common/snapshot"
 	_url "gemini-grc/common/url"
 	"gemini-grc/config"
 	"gemini-grc/gemini"
 	"gemini-grc/gopher"
 	"gemini-grc/logging"
 	"github.com/antanst/go_errors"
 )
 func main() {
 	config.CONFIG = *config.GetConfig()
 	err := runApp()
 	if err != nil {
 		fmt.Printf("%v\n", err)
 		logging.LogError("%v", err)
 		os.Exit(1)
 	}
 }
 func runApp() error {
 	if len(os.Args) != 2 {
 		return go_errors.NewError(fmt.Errorf("missing URL to visit"))
 	}
 	url := os.Args[1]
 	var s *snapshot.Snapshot
 	var err error
 	if _url.IsGeminiUrl(url) {
 		s, err = gemini.Visit(url)
 	} else if _url.IsGopherURL(url) {
 		s, err = gopher.Visit(url)
 	} else {
 		return go_errors.NewFatalError(fmt.Errorf("not a Gemini or Gopher URL"))
 	}
 	if err != nil {
 		return err
 	}
 	_json, _ := json.MarshalIndent(s, "", "  ")
 	fmt.Printf("%s\n", _json)
 	return err
 }
--- a/bin/normalizeSnapshot/main.go
+++ b/bin/normalizeSnapshot/main.go
@@ -1,118 +0,0 @@
 package main
 import (
 	"fmt"
 	"os"
 	"gemini-grc/common/snapshot"
 	"gemini-grc/common/url"
 	main2 "gemini-grc/db"
 	_ "github.com/jackc/pgx/v5/stdlib" // PGX driver for PostgreSQL
 	"github.com/jmoiron/sqlx"
 )
 // Populates the `host` field
 func main() {
 	db := connectToDB()
 	count := 0
 	for {
 		tx := db.MustBegin()
 		query := `
        SELECT * FROM snapshots
        ORDER BY id
        LIMIT 10000 OFFSET $1
    `
 		var snapshots []snapshot.Snapshot
 		err := tx.Select(&snapshots, query, count)
 		if err != nil {
 			printErrorAndExit(tx, err)
 		}
 		if len(snapshots) == 0 {
 			fmt.Println("Done!")
 			return
 		}
 		for _, s := range snapshots {
 			count++
 			escaped := url.EscapeURL(s.URL.String())
 			normalizedGeminiURL, err := url.ParseURL(escaped, "", true)
 			if err != nil {
 				fmt.Println(s.URL.String())
 				fmt.Println(escaped)
 				printErrorAndExit(tx, err)
 			}
 			normalizedURLString := normalizedGeminiURL.String()
 			// If URL is already normalized, skip snapshot
 			if normalizedURLString == s.URL.String() {
 				// fmt.Printf("[%5d] Skipping %d %s\n", count, s.ID, s.URL.String())
 				continue
 			}
 			// If a snapshot already exists with the normalized
 			// URL, delete the current snapshot and leave the other.
 			var ss []snapshot.Snapshot
 			err = tx.Select(&ss, "SELECT * FROM snapshots WHERE URL=$1", normalizedURLString)
 			if err != nil {
 				printErrorAndExit(tx, err)
 			}
 			if len(ss) > 0 {
 				tx.MustExec("DELETE FROM snapshots WHERE id=$1", s.ID)
 				fmt.Printf("%d Deleting %d %s\n", count, s.ID, s.URL.String())
 				//err = tx.Commit()
 				//if err != nil {
 				//	printErrorAndExit(tx, err)
 				//}
 				//return
 				continue
 			}
 			// fmt.Printf("%s =>\n%s\n", s.URL.String(), normalizedURLString)
 			// At this point we just update the snapshot,
 			// and the normalized URL will be saved.
 			fmt.Printf("%d Updating %d %s => %s\n", count, s.ID, s.URL.String(), normalizedURLString)
 			// Saves the snapshot with the normalized URL
 			tx.MustExec("DELETE FROM snapshots WHERE id=$1", s.ID)
 			s.URL = *normalizedGeminiURL
 			err = main2.OverwriteSnapshot(tx, &s)
 			if err != nil {
 				printErrorAndExit(tx, err)
 			}
 			//err = tx.Commit()
 			//if err != nil {
 			//	printErrorAndExit(tx, err)
 			//}
 			//return
 		}
 		err = tx.Commit()
 		if err != nil {
 			printErrorAndExit(tx, err)
 		}
 	}
 }
 func printErrorAndExit(tx *sqlx.Tx, err error) {
 	_ = tx.Rollback()
 	panic(err)
 }
 func connectToDB() *sqlx.DB {
 	connStr := fmt.Sprintf("postgres://%s:%s@%s:%s/%s",
 		os.Getenv("PG_USER"),
 		os.Getenv("PG_PASSWORD"),
 		os.Getenv("PG_HOST"),
 		os.Getenv("PG_PORT"),
 		os.Getenv("PG_DATABASE"),
 	)
 	// Create a connection pool
 	db, err := sqlx.Open("pgx", connStr)
 	if err != nil {
 		panic(fmt.Sprintf("Unable to connect to database with URL %s: %v\n", connStr, err))
 	}
 	db.SetMaxOpenConns(20)
 	err = db.Ping()
 	if err != nil {
 		panic(fmt.Sprintf("Unable to ping database: %v\n", err))
 	}
 	fmt.Println("Connected to database")
 	return db
 }
--- a/cmd/crawler/crawler.go
+++ b/cmd/crawler/crawler.go
@@ -0,0 +1,390 @@
 package main
 import (
 	"context"
 	"os"
 	"os/signal"
 	"strings"
 	"syscall"
 	"time"
 	"gemini-grc/common"
 	"gemini-grc/common/blackList"
 	"gemini-grc/common/contextlog"
 	"gemini-grc/common/seedList"
 	"gemini-grc/common/whiteList"
 	"gemini-grc/config"
 	"gemini-grc/contextutil"
 	gemdb "gemini-grc/db"
 	"gemini-grc/robotsMatch"
 	"gemini-grc/util"
 	"git.antanst.com/antanst/logging"
 	"github.com/jmoiron/sqlx"
 )
 var jobs chan string
 func main() {
 	var err error
 	err = initializeApp()
 	if err != nil {
 		handleUnexpectedError(err)
 	}
 	err = runApp()
 	if err != nil {
 		handleUnexpectedError(err)
 	}
 	err = shutdownApp()
 	if err != nil {
 		handleUnexpectedError(err)
 	}
 }
 func handleUnexpectedError(err error) {
 	logging.LogError("Unexpected error: %v", err)
 	_ = shutdownApp()
 	os.Exit(1)
 }
 func initializeApp() error {
 	config.CONFIG = *config.Initialize()
 	logging.InitSlogger(config.CONFIG.LogLevel)
 	logging.LogInfo("Starting up. Press Ctrl+C to exit")
 	common.SignalsChan = make(chan os.Signal, 1)
 	signal.Notify(common.SignalsChan, syscall.SIGINT, syscall.SIGTERM)
 	common.FatalErrorsChan = make(chan error)
 	jobs = make(chan string, config.CONFIG.NumOfWorkers)
 	var err error
 	err = blackList.Initialize()
 	if err != nil {
 		return err
 	}
 	err = whiteList.Initialize()
 	if err != nil {
 		return err
 	}
 	err = seedList.Initialize()
 	if err != nil {
 		return err
 	}
 	err = robotsMatch.Initialize()
 	if err != nil {
 		return err
 	}
 	ctx := context.Background()
 	err = gemdb.Database.Initialize(ctx)
 	if err != nil {
 		return err
 	}
 	if config.CONFIG.SeedUrlPath != "" {
 		err := AddURLsFromFile(ctx, config.CONFIG.SeedUrlPath)
 		if err != nil {
 			return err
 		}
 	}
 	return nil
 }
 func shutdownApp() error {
 	var err error
 	err = blackList.Shutdown()
 	if err != nil {
 		return err
 	}
 	err = whiteList.Shutdown()
 	if err != nil {
 		return err
 	}
 	err = seedList.Shutdown()
 	if err != nil {
 		return err
 	}
 	err = robotsMatch.Shutdown()
 	if err != nil {
 		return err
 	}
 	ctx := context.Background()
 	err = gemdb.Database.Shutdown(ctx)
 	if err != nil {
 		return err
 	}
 	return nil
 }
 func runApp() (err error) {
 	go spawnWorkers(config.CONFIG.NumOfWorkers)
 	go runJobScheduler()
 	for {
 		select {
 		case <-common.SignalsChan:
 			logging.LogWarn("Received SIGINT or SIGTERM signal, exiting")
 			return nil
 		case err := <-common.FatalErrorsChan:
 			return err
 		}
 	}
 }
 func spawnWorkers(total int) {
 	for id := 0; id < total; id++ {
 		go func(a int) {
 			for {
 				job := <-jobs
 				common.RunWorkerWithTx(id, job)
 			}
 		}(id)
 	}
 }
 // Current Logic Flow:
 //
 // 1. Create transaction
 // 2. Get distinct hosts
 // 3. If no hosts → fetch snapshots from history (adds URLs to queue)
 // 4. Re-query for hosts (should now have some)
 // 5. Get URLs from hosts
 // 6. Commit transaction
 // 7. Queue URLs for workers
 func runJobScheduler() {
 	var tx *sqlx.Tx
 	var err error
 	ctx := contextutil.ContextWithComponent(context.Background(), "crawler")
 	tx, err = gemdb.Database.NewTx(ctx)
 	if err != nil {
 		common.FatalErrorsChan <- err
 		return
 	}
 	defer func(tx *sqlx.Tx) {
 		if tx != nil {
 			if err := gemdb.SafeRollback(ctx, tx); err != nil {
 				common.FatalErrorsChan <- err
 			}
 		}
 	}(tx)
 	// First, check if the URLs table is empty.
 	var urlCount int
 	if config.CONFIG.GopherEnable {
 		err = tx.Get(&urlCount, "SELECT COUNT(*) FROM urls")
 	} else {
 		err = tx.Get(&urlCount, "SELECT COUNT(*) FROM urls WHERE url LIKE 'gemini://%'")
 	}
 	if err != nil {
 		common.FatalErrorsChan <- err
 		return
 	}
 	err = tx.Commit()
 	if err != nil {
 		common.FatalErrorsChan <- err
 		return
 	}
 	// If no pending URLs, add the ones from the standard crawl set.
 	tx, err = gemdb.Database.NewTx(ctx)
 	if err != nil {
 		common.FatalErrorsChan <- err
 		return
 	}
 	if urlCount == 0 {
 		logging.LogInfo("URLs table is empty, enqueueing standard crawl set")
 		err = enqueueSeedURLs(ctx, tx)
 		if err != nil {
 			common.FatalErrorsChan <- err
 			return
 		}
 		// Commit this tx here so the loop sees the changes.
 		err := tx.Commit()
 		if err != nil {
 			common.FatalErrorsChan <- err
 			return
 		}
 	} else {
 		contextlog.LogInfoWithContext(ctx, logging.GetSlogger(), "Found %d pending URLs to crawl.", urlCount)
 	}
 	// Main job loop.
 	// We get URLs from the pending URLs table,
 	// add crawling jobs for those,
 	// and sleep a bit after each run.
 	for {
 		contextlog.LogDebugWithContext(ctx, logging.GetSlogger(), "Polling DB for jobs")
 		// Use fresh context for DB operations to avoid timeouts/cancellation
 		// from the long-lived scheduler context affecting database transactions
 		dbCtx := context.Background()
 		tx, err = gemdb.Database.NewTx(dbCtx)
 		if err != nil {
 			common.FatalErrorsChan <- err
 			return
 		}
 		// Get all distinct hosts from pending URLs
 		distinctHosts, err := gemdb.Database.GetUrlHosts(dbCtx, tx)
 		if err != nil {
 			common.FatalErrorsChan <- err
 			return
 		}
 		// When out of pending URLs, add some random ones.
 		if len(distinctHosts) == 0 {
 			// Queue random old URLs from history.
 			count, err := fetchSnapshotsFromHistory(dbCtx, tx, config.CONFIG.NumOfWorkers*3, config.CONFIG.SkipIfUpdatedDays)
 			if err != nil {
 				common.FatalErrorsChan <- err
 				return
 			}
 			if count == 0 {
 				contextlog.LogDebugWithContext(ctx, logging.GetSlogger(), "No work, waiting to poll DB...")
 				time.Sleep(30 * time.Second)
 				continue
 			}
 			distinctHosts, err = gemdb.Database.GetUrlHosts(dbCtx, tx)
 			if err != nil {
 				common.FatalErrorsChan <- err
 				return
 			}
 		}
 		// Get some URLs from each host, up to a limit
 		urls, err := gemdb.Database.GetRandomUrlsFromHosts(dbCtx, distinctHosts, 10, tx)
 		if err != nil {
 			common.FatalErrorsChan <- err
 			return
 		}
 		err = tx.Commit()
 		if err != nil {
 			common.FatalErrorsChan <- err
 			return
 		}
 		if len(urls) == 0 {
 			contextlog.LogDebugWithContext(ctx, logging.GetSlogger(), "No work, waiting to poll DB...")
 			time.Sleep(30 * time.Second)
 			continue
 		}
 		contextlog.LogInfoWithContext(ctx, logging.GetSlogger(), "Queueing %d distinct hosts -> %d urls to crawl", len(distinctHosts), len(urls))
 		for _, url := range urls {
 			jobs <- url
 		}
 	}
 }
 func enqueueSeedURLs(ctx context.Context, tx *sqlx.Tx) error {
 	// Get seed URLs from seedList module
 	urls := seedList.GetSeedURLs()
 	for _, url := range urls {
 		err := gemdb.Database.InsertURL(ctx, tx, url)
 		if err != nil {
 			return err
 		}
 	}
 	return nil
 }
 func fetchSnapshotsFromHistory(ctx context.Context, tx *sqlx.Tx, num int, age int) (int, error) {
 	// Select <num> snapshots from snapshots table for recrawling
 	// Find URLs where the LATEST crawl attempt (via last_crawled) is at least <age> days old
 	// Uses last_crawled timestamp to track actual crawl attempts regardless of content changes
 	historyCtx := contextutil.ContextWithComponent(context.Background(), "fetchSnapshotsFromHistory")
 	contextlog.LogDebugWithContext(historyCtx, logging.GetSlogger(), "Looking for %d URLs whose latest crawl attempt is at least %d days old to recrawl", num, age)
 	// Calculate the cutoff date
 	cutoffDate := time.Now().AddDate(0, 0, -age)
 	// Use the query from db_queries.go to find URLs that need re-crawling
 	type SnapshotURL struct {
 		URL  string `db:"url"`
 		Host string `db:"host"`
 	}
 	// Execute the query
 	var snapshotURLs []SnapshotURL
 	err := tx.Select(&snapshotURLs, gemdb.SQL_FETCH_SNAPSHOTS_FROM_HISTORY, cutoffDate, num)
 	if err != nil {
 		return 0, err
 	}
 	if len(snapshotURLs) == 0 {
 		contextlog.LogInfoWithContext(historyCtx, logging.GetSlogger(), "No URLs with old latest crawl attempts found to recrawl")
 		return 0, nil
 	}
 	// For each selected snapshot, add the URL to the urls table
 	insertCount := 0
 	for _, snapshot := range snapshotURLs {
 		err := gemdb.Database.InsertURL(ctx, tx, snapshot.URL)
 		if err != nil {
 			logging.LogError("Error inserting URL %s from old snapshot to queue: %v", snapshot.URL, err)
 			return 0, err
 		}
 		insertCount++
 	}
 	// Note: The transaction is committed by the caller (runJobScheduler),
 	// not here. This function is called as part of a larger transaction.
 	if insertCount > 0 {
 		contextlog.LogInfoWithContext(historyCtx, logging.GetSlogger(), "Added %d old URLs to recrawl queue", insertCount)
 	}
 	return insertCount, nil
 }
 func AddURLsFromFile(ctx context.Context, filepath string) error {
 	data, err := os.ReadFile(filepath)
 	if err != nil {
 		return err
 	}
 	lines := strings.Split(string(data), "\n")
 	urls := util.Filter(lines, func(url string) bool {
 		return strings.TrimSpace(url) != ""
 	})
 	// Create a context for database operations
 	tx, err := gemdb.Database.NewTx(ctx)
 	if err != nil {
 		return err
 	}
 	// Insert all the URLs
 	for _, url := range urls {
 		fileCtx := contextutil.ContextWithComponent(context.Background(), "AddURLsFromFile")
 		contextlog.LogInfoWithContext(fileCtx, logging.GetSlogger(), "Adding %s to queue", url)
 		err := gemdb.Database.InsertURL(ctx, tx, url)
 		if err != nil {
 			return err
 		}
 	}
 	err = tx.Commit()
 	if err != nil {
 		return err
 	}
 	return nil
 }
--- a/common/blackList/blacklist.go
+++ b/common/blackList/blacklist.go
@@ -7,24 +7,38 @@ import (
 	"strings"
 	"gemini-grc/config"
-	"gemini-grc/logging"
+	"git.antanst.com/antanst/logging"
-	"github.com/antanst/go_errors"
+	"git.antanst.com/antanst/xerrors"
 )
-var Blacklist []regexp.Regexp //nolint:gochecknoglobals
+var blacklist []regexp.Regexp //nolint:gochecknoglobals
 func Initialize() error {
 	var err error
 	// Initialize blacklist
 	if config.CONFIG.BlacklistPath != "" {
 		if err = loadBlacklist(config.CONFIG.BlacklistPath); err != nil {
 			return err
 		}
 	}
 func LoadBlacklist() error {
 	if config.CONFIG.BlacklistPath == "" {
 	return nil
 }
-	if Blacklist == nil {
+
-		data, err := os.ReadFile(config.CONFIG.BlacklistPath)
+func loadBlacklist(filePath string) error {
 	if blacklist != nil {
 		return nil
 	}
 	data, err := os.ReadFile(filePath)
 	if err != nil {
-			Blacklist = []regexp.Regexp{}
+		blacklist = []regexp.Regexp{}
-			return go_errors.NewError(fmt.Errorf("could not load Blacklist file: %w", err))
+		return xerrors.NewError(fmt.Errorf("could not load blacklist file: %w", err), 0, "", true)
 	}
 	lines := strings.Split(string(data), "\n")
 	blacklist = []regexp.Regexp{}
 	for _, line := range lines {
 		if line == "" || strings.HasPrefix(line, "#") {
@@ -32,21 +46,25 @@ func LoadBlacklist() error {
 		}
 		regex, err := regexp.Compile(line)
 		if err != nil {
-				return go_errors.NewError(fmt.Errorf("could not compile Blacklist line %s: %w", line, err))
+			return xerrors.NewError(fmt.Errorf("could not compile blacklist line %s: %w", line, err), 0, "", true)
 		}
-			Blacklist = append(Blacklist, *regex)
+		blacklist = append(blacklist, *regex)
 	}
-		if len(lines) > 0 {
+	if len(blacklist) > 0 {
-			logging.LogInfo("Loaded %d blacklist entries", len(Blacklist))
+		logging.LogInfo("Loaded %d blacklist entries", len(blacklist))
 		}
 	}
 	return nil
 }
 func Shutdown() error {
 	return nil
 }
 // IsBlacklisted checks if the URL matches any blacklist pattern
 func IsBlacklisted(u string) bool {
-	for _, v := range Blacklist {
+	for _, v := range blacklist {
 		if v.MatchString(u) {
 			return true
 		}
--- a/common/blackList/blacklist_test.go
+++ b/common/blackList/blacklist_test.go
@@ -3,16 +3,17 @@ package blackList
 import (
 	"os"
 	"regexp"
 	"strings"
 	"testing"
 	"gemini-grc/config"
 )
 func TestIsBlacklisted(t *testing.T) {
-	// Save original blacklist to restore after test
+	// Save original blacklist and whitelist to restore after test
-	originalBlacklist := Blacklist
+	originalBlacklist := blacklist
 	defer func() {
-		Blacklist = originalBlacklist
+		blacklist = originalBlacklist
 	}()
 	tests := []struct {
@@ -24,7 +25,7 @@ func TestIsBlacklisted(t *testing.T) {
 		{
 			name: "empty blacklist",
 			setup: func() {
-				Blacklist = []regexp.Regexp{}
+				blacklist = []regexp.Regexp{}
 			},
 			url:      "https://example.com",
 			expected: false,
@@ -33,7 +34,7 @@ func TestIsBlacklisted(t *testing.T) {
 			name: "exact hostname match",
 			setup: func() {
 				regex, _ := regexp.Compile(`example\.com`)
-				Blacklist = []regexp.Regexp{*regex}
+				blacklist = []regexp.Regexp{*regex}
 			},
 			url:      "example.com",
 			expected: true,
@@ -42,7 +43,7 @@ func TestIsBlacklisted(t *testing.T) {
 			name: "hostname in URL match",
 			setup: func() {
 				regex, _ := regexp.Compile(`example\.com`)
-				Blacklist = []regexp.Regexp{*regex}
+				blacklist = []regexp.Regexp{*regex}
 			},
 			url:      "https://example.com/path",
 			expected: true,
@@ -51,7 +52,7 @@ func TestIsBlacklisted(t *testing.T) {
 			name: "partial hostname match",
 			setup: func() {
 				regex, _ := regexp.Compile(`example\.com`)
-				Blacklist = []regexp.Regexp{*regex}
+				blacklist = []regexp.Regexp{*regex}
 			},
 			url:      "https://safe-example.com",
 			expected: true,
@@ -60,7 +61,7 @@ func TestIsBlacklisted(t *testing.T) {
 			name: "full URL match",
 			setup: func() {
 				regex, _ := regexp.Compile(`https://example\.com/bad-path`)
-				Blacklist = []regexp.Regexp{*regex}
+				blacklist = []regexp.Regexp{*regex}
 			},
 			url:      "https://example.com/bad-path",
 			expected: true,
@@ -69,7 +70,7 @@ func TestIsBlacklisted(t *testing.T) {
 			name: "path match",
 			setup: func() {
 				regex, _ := regexp.Compile("/malicious-path")
-				Blacklist = []regexp.Regexp{*regex}
+				blacklist = []regexp.Regexp{*regex}
 			},
 			url:      "https://example.com/malicious-path",
 			expected: true,
@@ -78,7 +79,7 @@ func TestIsBlacklisted(t *testing.T) {
 			name: "subdomain match with word boundary",
 			setup: func() {
 				regex, _ := regexp.Compile(`bad\.example\.com`)
-				Blacklist = []regexp.Regexp{*regex}
+				blacklist = []regexp.Regexp{*regex}
 			},
 			url:      "https://bad.example.com/path",
 			expected: true,
@@ -89,7 +90,7 @@ func TestIsBlacklisted(t *testing.T) {
 				regex1, _ := regexp.Compile(`badsite\.com`)
 				regex2, _ := regexp.Compile(`malicious\.org`)
 				regex3, _ := regexp.Compile(`example\.com/sensitive`)
-				Blacklist = []regexp.Regexp{*regex1, *regex2, *regex3}
+				blacklist = []regexp.Regexp{*regex1, *regex2, *regex3}
 			},
 			url:      "https://example.com/sensitive/data",
 			expected: true,
@@ -100,7 +101,7 @@ func TestIsBlacklisted(t *testing.T) {
 				regex1, _ := regexp.Compile(`badsite\.com`)
 				regex2, _ := regexp.Compile(`malicious\.org`)
 				regex3, _ := regexp.Compile(`example\.com/sensitive`)
-				Blacklist = []regexp.Regexp{*regex1, *regex2, *regex3}
+				blacklist = []regexp.Regexp{*regex1, *regex2, *regex3}
 			},
 			url:      "https://example.com/safe/data",
 			expected: false,
@@ -109,7 +110,7 @@ func TestIsBlacklisted(t *testing.T) {
 			name: "pattern with wildcard",
 			setup: func() {
 				regex, _ := regexp.Compile(`.*\.evil\.com`)
-				Blacklist = []regexp.Regexp{*regex}
+				blacklist = []regexp.Regexp{*regex}
 			},
 			url:      "https://subdomain.evil.com/path",
 			expected: true,
@@ -118,7 +119,7 @@ func TestIsBlacklisted(t *testing.T) {
 			name: "pattern with special characters",
 			setup: func() {
 				regex, _ := regexp.Compile(`example\.com/path\?id=[0-9]+`)
-				Blacklist = []regexp.Regexp{*regex}
+				blacklist = []regexp.Regexp{*regex}
 			},
 			url:      "https://example.com/path?id=12345",
 			expected: true,
@@ -127,7 +128,7 @@ func TestIsBlacklisted(t *testing.T) {
 			name: "unicode character support",
 			setup: func() {
 				regex, _ := regexp.Compile(`example\.com/[\p{L}]+`)
-				Blacklist = []regexp.Regexp{*regex}
+				blacklist = []regexp.Regexp{*regex}
 			},
 			url:      "https://example.com/café",
 			expected: true,
@@ -145,12 +146,88 @@ func TestIsBlacklisted(t *testing.T) {
 	}
 }
-func TestLoadBlacklist(t *testing.T) {
+// TestBlacklistLoading tests that the blacklist loading logic works with a mock blacklist file
-	// Save original blacklist to restore after test
+func TestBlacklistLoading(t *testing.T) {
-	originalBlacklist := Blacklist
+	// Save original blacklist and config
 	originalBlacklist := blacklist
 	originalConfigPath := config.CONFIG.BlacklistPath
 	defer func() {
-		Blacklist = originalBlacklist
+		blacklist = originalBlacklist
 		config.CONFIG.BlacklistPath = originalConfigPath
 	}()
 	// Create a temporary blacklist file with known patterns
 	tmpFile, err := os.CreateTemp("", "mock-blacklist-*.txt")
 	if err != nil {
 		t.Fatalf("Failed to create temporary file: %v", err)
 	}
 	defer os.Remove(tmpFile.Name())
 	// Write some test patterns to the mock blacklist file
 	mockBlacklistContent := `# Mock blacklist file for testing
 /git/
 /.git/
 /cgit/
 gemini://git\..*$
 gemini://.*/git/.*
 gopher://.*/git/.*
 .*/(commit|blob|tree)/.*
 .*/[0-9a-f]{7,40}$
 `
 	if err := os.WriteFile(tmpFile.Name(), []byte(mockBlacklistContent), 0o644); err != nil {
 		t.Fatalf("Failed to write to temporary file: %v", err)
 	}
 	// Configure and load the mock blacklist
 	blacklist = nil
 	config.CONFIG.BlacklistPath = tmpFile.Name()
 	err = Initialize()
 	if err != nil {
 		t.Fatalf("Failed to load mock blacklist: %v", err)
 	}
 	// Count the number of non-comment, non-empty lines to verify loading
 	lineCount := 0
 	for _, line := range strings.Split(mockBlacklistContent, "\n") {
 		if line != "" && !strings.HasPrefix(line, "#") {
 			lineCount++
 		}
 	}
 	if len(blacklist) != lineCount {
 		t.Errorf("Expected %d patterns to be loaded, got %d", lineCount, len(blacklist))
 	}
 	// Verify some sample URLs against our known patterns
 	testURLs := []struct {
 		url      string
 		expected bool
 		desc     string
 	}{
 		{"gemini://example.com/git/repo", true, "git repository"},
 		{"gemini://git.example.com", true, "git subdomain"},
 		{"gemini://example.com/cgit/repo", true, "cgit repository"},
 		{"gemini://example.com/repo/commit/abc123", true, "git commit"},
 		{"gemini://example.com/123abc7", true, "commit hash at path end"},
 		{"gopher://example.com/1/git/repo", true, "gopher git repository"},
 		{"gemini://example.com/normal/page.gmi", false, "normal gemini page"},
 		{"gemini://example.com/project/123abc", false, "hash not at path end"},
 	}
 	for _, tt := range testURLs {
 		result := IsBlacklisted(tt.url)
 		if result != tt.expected {
 			t.Errorf("With mock blacklist, IsBlacklisted(%q) = %v, want %v", tt.url, result, tt.expected)
 		}
 	}
 }
 func TestLoadBlacklist(t *testing.T) {
 	// Save original blacklist to restore after test
 	originalBlacklist := blacklist
 	originalConfigPath := config.CONFIG.BlacklistPath
 	defer func() {
 		blacklist = originalBlacklist
 		config.CONFIG.BlacklistPath = originalConfigPath
 	}()
@@ -161,7 +238,7 @@ func TestLoadBlacklist(t *testing.T) {
 	}
 	defer os.Remove(tmpFile.Name())
-	// Test cases for LoadBlacklist
+	// Test cases for Initialize
 	tests := []struct {
 		name           string
 		blacklistLines []string
@@ -202,7 +279,7 @@ func TestLoadBlacklist(t *testing.T) {
 	for _, tt := range tests {
 		t.Run(tt.name, func(t *testing.T) {
 			// Reset blacklist
-			Blacklist = nil
+			blacklist = nil
 			// Set config path
 			config.CONFIG.BlacklistPath = tt.configPath
@@ -219,29 +296,186 @@ func TestLoadBlacklist(t *testing.T) {
 			}
 			// Call the function
-			err := LoadBlacklist()
+			err := Initialize()
 			// Check results
 			if (err != nil) != tt.wantErr {
-				t.Errorf("LoadBlacklist() error = %v, wantErr %v", err, tt.wantErr)
+				t.Errorf("Initialize() error = %v, wantErr %v", err, tt.wantErr)
 				return
 			}
-			if !tt.wantErr && len(Blacklist) != tt.expectedLen {
+			if !tt.wantErr && len(blacklist) != tt.expectedLen {
-				t.Errorf("LoadBlacklist() loaded %d entries, want %d", len(Blacklist), tt.expectedLen)
+				t.Errorf("Initialize() loaded %d entries, want %d", len(blacklist), tt.expectedLen)
 			}
 		})
 	}
 }
 // TestGitPatterns tests the blacklist patterns specifically for Git repositories
 func TestGitPatterns(t *testing.T) {
 	// Save original blacklist to restore after test
 	originalBlacklist := blacklist
 	defer func() {
 		blacklist = originalBlacklist
 	}()
 	// Create patterns similar to those in the blacklist.txt file
 	patterns := []string{
 		"/git/",
 		"/.git/",
 		"/cgit/",
 		"/gitweb/",
 		"/gitea/",
 		"/scm/",
 		".*/(commit|blob|tree|tag|diff|blame|log|raw)/.*",
 		".*/(commits|objects|refs|branches|tags)/.*",
 		".*/[0-9a-f]{7,40}$",
 		"gemini://git\\..*$",
 		"gemini://.*/git/.*",
 		"gemini://.*\\.git/.*",
 		"gopher://.*/git/.*",
 	}
 	// Compile and set up the patterns
 	blacklist = []regexp.Regexp{}
 	for _, pattern := range patterns {
 		regex, err := regexp.Compile(pattern)
 		if err != nil {
 			t.Fatalf("Failed to compile pattern %q: %v", pattern, err)
 		}
 		blacklist = append(blacklist, *regex)
 	}
 	// Test URLs against git-related patterns
 	tests := []struct {
 		url      string
 		expected bool
 		desc     string
 	}{
 		// Git paths
 		{"gemini://example.com/git/", true, "basic git path"},
 		{"gemini://example.com/.git/", true, "hidden git path"},
 		{"gemini://example.com/cgit/", true, "cgit path"},
 		{"gemini://example.com/gitweb/", true, "gitweb path"},
 		{"gemini://example.com/gitea/", true, "gitea path"},
 		{"gemini://example.com/scm/", true, "scm path"},
 		// Git operations
 		{"gemini://example.com/repo/commit/abc123", true, "commit path"},
 		{"gemini://example.com/repo/blob/main/README.md", true, "blob path"},
 		{"gemini://example.com/repo/tree/master", true, "tree path"},
 		{"gemini://example.com/repo/tag/v1.0", true, "tag path"},
 		// Git internals
 		{"gemini://example.com/repo/commits/", true, "commits path"},
 		{"gemini://example.com/repo/objects/", true, "objects path"},
 		{"gemini://example.com/repo/refs/heads/main", true, "refs path"},
 		// Git hashes
 		{"gemini://example.com/commit/a1b2c3d", true, "short hash"},
 		{"gemini://example.com/commit/a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0", true, "long hash"},
 		// Git domains
 		{"gemini://git.example.com/", true, "git subdomain"},
 		{"gemini://example.com/git/repo", true, "git directory"},
 		{"gemini://example.com/project.git/", true, "git extension"},
 		// Gopher protocol
 		{"gopher://example.com/1/git/repo", true, "gopher git path"},
 		// Non-matching URLs
 		{"gemini://example.com/project/", false, "regular project path"},
 		{"gemini://example.com/blog/", false, "blog path"},
 		{"gemini://example.com/git-guide.gmi", false, "hyphenated word with git"},
 		{"gemini://example.com/digital/", false, "word containing 'git'"},
 		{"gemini://example.com/ab12cd3", true, "short hex string matches commit hash pattern"},
 		{"gemini://example.com/ab12cdz", false, "alphanumeric string with non-hex chars won't match commit hash"},
 	}
 	for _, tt := range tests {
 		t.Run(tt.desc, func(t *testing.T) {
 			result := IsBlacklisted(tt.url)
 			if result != tt.expected {
 				t.Errorf("IsBlacklisted(%q) = %v, want %v", tt.url, result, tt.expected)
 			}
 		})
 	}
 }
 // TestGeminiGopherPatterns tests the blacklist patterns specific to Gemini and Gopher protocols
 func TestGeminiGopherPatterns(t *testing.T) {
 	// Save original blacklist to restore after test
 	originalBlacklist := blacklist
 	defer func() {
 		blacklist = originalBlacklist
 	}()
 	// Create patterns for Gemini and Gopher
 	patterns := []string{
 		"gemini://badhost\\.com",
 		"gemini://.*/cgi-bin/",
 		"gemini://.*/private/",
 		"gemini://.*\\.evil\\..*",
 		"gopher://badhost\\.org",
 		"gopher://.*/I/onlyfans/",
 		"gopher://.*/[0-9]/(cgi|bin)/",
 	}
 	// Compile and set up the patterns
 	blacklist = []regexp.Regexp{}
 	for _, pattern := range patterns {
 		regex, err := regexp.Compile(pattern)
 		if err != nil {
 			t.Fatalf("Failed to compile pattern %q: %v", pattern, err)
 		}
 		blacklist = append(blacklist, *regex)
 	}
 	// Test URLs against Gemini and Gopher patterns
 	tests := []struct {
 		url      string
 		expected bool
 		desc     string
 	}{
 		// Gemini URLs
 		{"gemini://badhost.com/", true, "blacklisted gemini host"},
 		{"gemini://badhost.com/page.gmi", true, "blacklisted gemini host with path"},
 		{"gemini://example.com/cgi-bin/script.cgi", true, "gemini cgi-bin path"},
 		{"gemini://example.com/private/docs", true, "gemini private path"},
 		{"gemini://subdomain.evil.org", true, "gemini evil domain pattern"},
 		{"gemini://example.com/public/docs", false, "safe gemini path"},
 		{"gemini://goodhost.com/", false, "safe gemini host"},
 		// Gopher URLs
 		{"gopher://badhost.org/1/menu", true, "blacklisted gopher host"},
 		{"gopher://example.org/I/onlyfans/image", true, "gopher onlyfans path"},
 		{"gopher://example.org/1/cgi/script", true, "gopher cgi path"},
 		{"gopher://example.org/1/bin/executable", true, "gopher bin path"},
 		{"gopher://example.org/0/text", false, "safe gopher text"},
 		{"gopher://goodhost.org/1/menu", false, "safe gopher host"},
 		// Protocol distinction
 		{"https://badhost.com/", false, "blacklisted host but wrong protocol"},
 		{"http://example.com/cgi-bin/script.cgi", false, "bad path but wrong protocol"},
 	}
 	for _, tt := range tests {
 		t.Run(tt.desc, func(t *testing.T) {
 			result := IsBlacklisted(tt.url)
 			if result != tt.expected {
 				t.Errorf("IsBlacklisted(%q) = %v, want %v", tt.url, result, tt.expected)
 			}
 		})
 	}
 }
 // TestIsBlacklistedIntegration tests the integration between LoadBlacklist and IsBlacklisted
 func TestIsBlacklistedIntegration(t *testing.T) {
 	// Save original blacklist to restore after test
-	originalBlacklist := Blacklist
+	originalBlacklist := blacklist
-	originalConfigPath := config.CONFIG.BlacklistPath
+	originalBlacklistPath := config.CONFIG.BlacklistPath
 	defer func() {
-		Blacklist = originalBlacklist
+		blacklist = originalBlacklist
-		config.CONFIG.BlacklistPath = originalConfigPath
+		config.CONFIG.BlacklistPath = originalBlacklistPath
 	}()
 	// Create a temporary blacklist file for testing
@@ -264,12 +498,12 @@ malicious\.org
 	}
 	// Set up the test
-	Blacklist = nil
+	blacklist = nil
 	config.CONFIG.BlacklistPath = tmpFile.Name()
 	// Load the blacklist
-	if err := LoadBlacklist(); err != nil {
+	if err := Initialize(); err != nil {
-		t.Fatalf("LoadBlacklist() failed: %v", err)
+		t.Fatalf("Initialize() failed: %v", err)
 	}
 	// Test URLs against the loaded blacklist
--- a/common/contextlog/contextlog.go
+++ b/common/contextlog/contextlog.go
@@ -0,0 +1,112 @@
 package contextlog
 import (
 	"context"
 	"fmt"
 	"log/slog"
 	"gemini-grc/contextutil"
 )
 // SlogEventWithContext adds context information as structured fields to the log event.
 func SlogEventWithContext(ctx context.Context, logger *slog.Logger) *slog.Logger {
 	// Start with the provided logger
 	if logger == nil {
 		// If logger isn't initialized, use the default logger
 		return slog.Default()
 	}
 	// Get context values - will be added directly to log records
 	host := contextutil.GetHostFromContext(ctx)
 	requestID := contextutil.GetRequestIDFromContext(ctx)
 	component := contextutil.GetComponentFromContext(ctx)
 	workerID := contextutil.GetWorkerIDFromContext(ctx)
 	url := contextutil.GetURLFromContext(ctx)
 	// Add all context fields to the logger
 	if host != "" {
 		logger = logger.With("host", host)
 	}
 	if requestID != "" {
 		logger = logger.With("request_id", requestID)
 	}
 	if workerID >= 0 {
 		logger = logger.With("worker_id", workerID)
 	}
 	if component != "" {
 		logger = logger.With("component", component)
 	}
 	if url != "" {
 		logger = logger.With("url", url)
 	}
 	return logger
 }
 // LogDebugWithContext logs a debug message with context information.
 func LogDebugWithContext(ctx context.Context, logger *slog.Logger, format string, args ...interface{}) {
 	if logger == nil {
 		return
 	}
 	// Create logger with context fields
 	contextLogger := SlogEventWithContext(ctx, logger)
 	// Format the message
 	message := fmt.Sprintf(format, args...)
 	// Log with context data in the record attributes
 	contextLogger.Debug(message)
 }
 // LogInfoWithContext logs an info message with context information.
 func LogInfoWithContext(ctx context.Context, logger *slog.Logger, format string, args ...interface{}) {
 	if logger == nil {
 		return
 	}
 	// Create logger with context fields
 	contextLogger := SlogEventWithContext(ctx, logger)
 	// Format the message
 	message := fmt.Sprintf(format, args...)
 	// Log with context data in the record attributes
 	contextLogger.Info(message)
 }
 // LogWarnWithContext logs a warning message with context information.
 func LogWarnWithContext(ctx context.Context, logger *slog.Logger, format string, args ...interface{}) {
 	if logger == nil {
 		return
 	}
 	// Create logger with context fields
 	contextLogger := SlogEventWithContext(ctx, logger)
 	// Format the message
 	message := fmt.Sprintf(format, args...)
 	// Log with context data in the record attributes
 	contextLogger.Warn(message)
 }
 // LogErrorWithContext logs an error message with context information
 func LogErrorWithContext(ctx context.Context, logger *slog.Logger, format string, args ...interface{}) {
 	if logger == nil {
 		return
 	}
 	// Create logger with context fields
 	contextLogger := SlogEventWithContext(ctx, logger)
 	// Format the message
 	msg := fmt.Sprintf(format, args...)
 	// Log with context data in the record attributes
 	contextLogger.Error(msg, slog.String("error", msg))
 }
--- a/common/errors/errors.go
+++ b/common/errors/errors.go
@@ -1,41 +0,0 @@
 package errors
 import (
 	"fmt"
 	"github.com/antanst/go_errors"
 )
 // HostError is an error encountered while
 // visiting a host, and should be recorded
 // to the snapshot.
 type HostError struct {
 	Err error
 }
 func (e *HostError) Error() string {
 	return e.Err.Error()
 }
 func (e *HostError) Unwrap() error {
 	return e.Err
 }
 func NewHostError(err error) error {
 	return &HostError{Err: err}
 }
 func IsHostError(err error) bool {
 	if err == nil {
 		return false
 	}
 	var asError *HostError
 	return go_errors.As(err, &asError)
 }
 // Sentinel errors used for their string message primarily.
 // Do not use them by themselves, to be embedded to HostError.
 var (
 	ErrBlacklistMatch = fmt.Errorf("black list match")
 	ErrRobotsMatch    = fmt.Errorf("robots match")
 )
--- a/common/errors/hostError.go
+++ b/common/errors/hostError.go
@@ -0,0 +1,29 @@
 package commonErrors
 import (
 	"errors"
 	"git.antanst.com/antanst/xerrors"
 )
 type HostError struct {
 	xerrors.XError
 }
 func IsHostError(err error) bool {
 	var temp *HostError
 	return errors.As(err, &temp)
 }
 func NewHostError(err error) error {
 	xerr := xerrors.XError{
 		UserMsg: "",
 		Code:    0,
 		Err:     err,
 		IsFatal: false,
 	}
 	return &HostError{
 		xerr,
 	}
 }
--- a/common/errors/sentinelErrors.go
+++ b/common/errors/sentinelErrors.go
@@ -0,0 +1,8 @@
 package commonErrors
 import "fmt"
 var (
 	ErrBlacklistMatch = fmt.Errorf("black list match")
 	ErrRobotsMatch    = fmt.Errorf("robots match")
 )
--- a/common/linkList/linkList.go
+++ b/common/linkList/linkList.go
@@ -10,8 +10,15 @@ import (
 type LinkList []url.URL
-func (l *LinkList) Value() (driver.Value, error) {
+func (l LinkList) Value() (driver.Value, error) {
-	return json.Marshal(l)
+	if len(l) == 0 {
 		return nil, nil
 	}
 	data, err := json.Marshal(l)
 	if err != nil {
 		return nil, err
 	}
 	return data, nil
 }
 func (l *LinkList) Scan(value interface{}) error {
@@ -19,7 +26,7 @@ func (l *LinkList) Scan(value interface{}) error {
 		*l = nil
 		return nil
 	}
-	b, ok := value.([]byte) // Type assertion! Converts to []byte
+	b, ok := value.([]byte)
 	if !ok {
 		return fmt.Errorf("failed to scan LinkList: expected []byte, got %T", value)
 	}
--- a/common/seedList/seedlist.go
+++ b/common/seedList/seedlist.go
@@ -0,0 +1,67 @@
 package seedList
 import (
 	"fmt"
 	"os"
 	"strings"
 	"git.antanst.com/antanst/logging"
 	"git.antanst.com/antanst/xerrors"
 )
 var seedlist []string //nolint:gochecknoglobals
 func Initialize() error {
 	var err error
 	// Initialize seedlist from fixed path
 	if err = loadSeedlist("seed_urls.txt"); err != nil {
 		return err
 	}
 	return nil
 }
 func loadSeedlist(filePath string) error {
 	if seedlist != nil {
 		return nil
 	}
 	data, err := os.ReadFile(filePath)
 	if err != nil {
 		seedlist = []string{}
 		return xerrors.NewError(fmt.Errorf("could not load seedlist file: %w", err), 0, "", true)
 	}
 	lines := strings.Split(string(data), "\n")
 	seedlist = []string{}
 	for _, line := range lines {
 		line = strings.TrimSpace(line)
 		if line == "" || strings.HasPrefix(line, "#") {
 			continue
 		}
 		seedlist = append(seedlist, line)
 	}
 	if len(seedlist) > 0 {
 		logging.LogInfo("Loaded %d seed URLs", len(seedlist))
 	}
 	return nil
 }
 func Shutdown() error {
 	return nil
 }
 // GetSeedURLs returns the list of seed URLs
 func GetSeedURLs() []string {
 	if seedlist == nil {
 		return []string{}
 	}
 	// Return a copy to prevent external modification
 	result := make([]string, len(seedlist))
 	copy(result, seedlist)
 	return result
 }
--- a/common/seedList/seedlist_test.go
+++ b/common/seedList/seedlist_test.go
@@ -0,0 +1,67 @@
 package seedList
 import (
 	"os"
 	"testing"
 )
 func TestLoadSeedlist(t *testing.T) {
 	// Create a temporary test file
 	content := `# Test seed URLs
 gemini://example.com/
 gemini://test.com/
 # Another comment
 gemini://demo.org/`
 	tmpFile, err := os.CreateTemp("", "seed_urls_test_*.txt")
 	if err != nil {
 		t.Fatalf("Failed to create temp file: %v", err)
 	}
 	defer os.Remove(tmpFile.Name())
 	if _, err := tmpFile.WriteString(content); err != nil {
 		t.Fatalf("Failed to write to temp file: %v", err)
 	}
 	tmpFile.Close()
 	// Reset global variable for test
 	seedlist = nil
 	// Test loading
 	err = loadSeedlist(tmpFile.Name())
 	if err != nil {
 		t.Fatalf("Failed to load seedlist: %v", err)
 	}
 	// Verify content
 	expected := []string{
 		"gemini://example.com/",
 		"gemini://test.com/",
 		"gemini://demo.org/",
 	}
 	urls := GetSeedURLs()
 	if len(urls) != len(expected) {
 		t.Errorf("Expected %d URLs, got %d", len(expected), len(urls))
 	}
 	for i, url := range urls {
 		if url != expected[i] {
 			t.Errorf("Expected URL %d to be %s, got %s", i, expected[i], url)
 		}
 	}
 }
 func TestGetSeedURLsEmptyList(t *testing.T) {
 	// Reset global variable
 	originalSeedlist := seedlist
 	defer func() { seedlist = originalSeedlist }()
 	seedlist = nil
 	urls := GetSeedURLs()
 	if len(urls) != 0 {
 		t.Errorf("Expected empty list, got %d URLs", len(urls))
 	}
 }
--- a/common/shared.go
+++ b/common/shared.go
@@ -1,11 +1,13 @@
 package common
-var (
+import "os"
-	StatusChan chan WorkerStatus
+
-	// ErrorsChan accepts errors from workers.
+// FatalErrorsChan accepts errors from workers.
 // In case of fatal error, gracefully
 // exits the application.
-	ErrorsChan chan error
+var (
 	FatalErrorsChan chan error
 	SignalsChan     chan os.Signal
 )
 const VERSION string = "0.0.1"
--- a/common/snapshot/snapshot.go
+++ b/common/snapshot/snapshot.go
@@ -5,12 +5,12 @@ import (
 	"gemini-grc/common/linkList"
 	commonUrl "gemini-grc/common/url"
-	"github.com/antanst/go_errors"
+	"git.antanst.com/antanst/xerrors"
 	"github.com/guregu/null/v5"
 )
 type Snapshot struct {
-	ID           int                           `db:"ID" json:"ID,omitempty"`
+	ID           int                           `db:"id" json:"ID,omitempty"`
 	URL          commonUrl.URL                 `db:"url" json:"url,omitempty"`
 	Host         string                        `db:"host" json:"host,omitempty"`
 	Timestamp    null.Time                     `db:"timestamp" json:"timestamp,omitempty"`
@@ -22,12 +22,13 @@ type Snapshot struct {
 	Lang         null.String                   `db:"lang" json:"lang,omitempty"`
 	ResponseCode null.Int                      `db:"response_code" json:"code,omitempty"`        // Gemini response Status code.
 	Error        null.String                   `db:"error" json:"error,omitempty"`               // On network errors only
 	LastCrawled  null.Time                     `db:"last_crawled" json:"last_crawled,omitempty"` // When URL was last processed (regardless of content changes)
 }
 func SnapshotFromURL(u string, normalize bool) (*Snapshot, error) {
 	url, err := commonUrl.ParseURL(u, "", normalize)
 	if err != nil {
-		return nil, go_errors.NewError(err)
+		return nil, xerrors.NewSimpleError(err)
 	}
 	newSnapshot := Snapshot{
 		URL:       *url,
--- a/common/text/text.go
+++ b/common/text/text.go
@@ -0,0 +1,9 @@
 package text
 import "strings"
 // RemoveNullChars removes all null characters from the input string.
 func RemoveNullChars(input string) string {
 	// Replace all null characters with an empty string
 	return strings.ReplaceAll(input, "\u0000", "")
 }
--- a/common/url/url.go
+++ b/common/url/url.go
@@ -9,7 +9,7 @@ import (
 	"strconv"
 	"strings"
-	"github.com/antanst/go_errors"
+	"git.antanst.com/antanst/xerrors"
 )
 type URL struct {
@@ -29,7 +29,7 @@ func (u *URL) Scan(value interface{}) error {
 	}
 	b, ok := value.(string)
 	if !ok {
-		return go_errors.NewFatalError(fmt.Errorf("database scan error: expected string, got %T", value))
+		return xerrors.NewError(fmt.Errorf("database scan error: expected string, got %T", value), 0, "", true)
 	}
 	parsedURL, err := ParseURL(b, "", false)
 	if err != nil {
@@ -82,7 +82,7 @@ func ParseURL(input string, descr string, normalize bool) (*URL, error) {
 	} else {
 		u, err = url.Parse(input)
 		if err != nil {
-			return nil, go_errors.NewError(fmt.Errorf("error parsing URL: %w: %s", err, input))
+			return nil, xerrors.NewError(fmt.Errorf("error parsing URL: %w: %s", err, input), 0, "", false)
 		}
 	}
 	protocol := u.Scheme
@@ -99,7 +99,7 @@ func ParseURL(input string, descr string, normalize bool) (*URL, error) {
 	}
 	port, err := strconv.Atoi(strPort)
 	if err != nil {
-		return nil, go_errors.NewError(fmt.Errorf("error parsing URL: %w: %s", err, input))
+		return nil, xerrors.NewError(fmt.Errorf("error parsing URL: %w: %s", err, input), 0, "", false)
 	}
 	full := fmt.Sprintf("%s://%s:%d%s", protocol, hostname, port, urlPath)
 	// full field should also contain query params and url fragments
@@ -145,13 +145,13 @@ func NormalizeURL(rawURL string) (*url.URL, error) {
 	// Parse the URL
 	u, err := url.Parse(rawURL)
 	if err != nil {
-		return nil, go_errors.NewError(fmt.Errorf("error normalizing URL: %w: %s", err, rawURL))
+		return nil, xerrors.NewError(fmt.Errorf("error normalizing URL: %w: %s", err, rawURL), 0, "", false)
 	}
 	if u.Scheme == "" {
-		return nil, go_errors.NewError(fmt.Errorf("error normalizing URL: No scheme: %s", rawURL))
+		return nil, xerrors.NewError(fmt.Errorf("error normalizing URL: No scheme: %s", rawURL), 0, "", false)
 	}
 	if u.Host == "" {
-		return nil, go_errors.NewError(fmt.Errorf("error normalizing URL: No host: %s", rawURL))
+		return nil, xerrors.NewError(fmt.Errorf("error normalizing URL: No host: %s", rawURL), 0, "", false)
 	}
 	// Convert scheme to lowercase
@@ -275,7 +275,7 @@ func ExtractRedirectTargetFromHeader(currentURL URL, input string) (*URL, error)
 	re := regexp.MustCompile(pattern)
 	matches := re.FindStringSubmatch(input)
 	if len(matches) < 2 {
-		return nil, go_errors.NewError(fmt.Errorf("error extracting redirect target from string %s", input))
+		return nil, xerrors.NewError(fmt.Errorf("error extracting redirect target from string %s", input), 0, "", false)
 	}
 	newURL, err := DeriveAbsoluteURL(currentURL, matches[1])
 	if err != nil {
--- a/common/url/url_test.go
+++ b/common/url/url_test.go
@@ -117,7 +117,7 @@ func TestURLOperations(t *testing.T) {
 		}
 	})
-	t.Run("NormalizeURL", func(t *testing.T) {
+	t.Run("CheckAndUpdateNormalizedURL", func(t *testing.T) {
 		t.Parallel()
 		tests := []struct {
--- a/common/whiteList/whitelist.go
+++ b/common/whiteList/whitelist.go
@@ -0,0 +1,74 @@
 package whiteList
 import (
 	"fmt"
 	"os"
 	"regexp"
 	"strings"
 	"gemini-grc/config"
 	"git.antanst.com/antanst/logging"
 	"git.antanst.com/antanst/xerrors"
 )
 var whitelist []regexp.Regexp //nolint:gochecknoglobals
 func Initialize() error {
 	var err error
 	// Initialize whitelist
 	if config.CONFIG.WhitelistPath != "" {
 		if err = loadWhitelist(config.CONFIG.WhitelistPath); err != nil {
 			return err
 		}
 	}
 	return nil
 }
 func loadWhitelist(filePath string) error {
 	if whitelist != nil {
 		return nil
 	}
 	data, err := os.ReadFile(filePath)
 	if err != nil {
 		whitelist = []regexp.Regexp{}
 		return xerrors.NewError(fmt.Errorf("could not load whitelist file: %w", err), 0, "", true)
 	}
 	lines := strings.Split(string(data), "\n")
 	whitelist = []regexp.Regexp{}
 	for _, line := range lines {
 		line = strings.TrimSpace(line)
 		if line == "" || strings.HasPrefix(line, "#") {
 			continue
 		}
 		regex, err := regexp.Compile(line)
 		if err != nil {
 			return xerrors.NewError(fmt.Errorf("could not compile whitelist line %s: %w", line, err), 0, "", true)
 		}
 		whitelist = append(whitelist, *regex)
 	}
 	if len(whitelist) > 0 {
 		logging.LogInfo("Loaded %d whitelist entries", len(whitelist))
 	}
 	return nil
 }
 func Shutdown() error {
 	return nil
 }
 // IsWhitelisted checks if the URL matches any whitelist pattern
 func IsWhitelisted(u string) bool {
 	for _, v := range whitelist {
 		if v.MatchString(u) {
 			return true
 		}
 	}
 	return false
 }
--- a/common/whiteList/whitelist_test.go
+++ b/common/whiteList/whitelist_test.go
@@ -0,0 +1,87 @@
 package whiteList
 import (
 	"os"
 	"regexp"
 	"testing"
 	"gemini-grc/config"
 )
 func TestIsWhitelisted(t *testing.T) {
 	// Set up a test whitelist
 	whitelist = []regexp.Regexp{
 		*regexp.MustCompile(`^gemini://example\.com`),
 		*regexp.MustCompile(`^gemini://test\.org/path`),
 	}
 	testCases := []struct {
 		url      string
 		expected bool
 	}{
 		{"gemini://example.com", true},
 		{"gemini://example.com/path", true},
 		{"gemini://test.org", false},
 		{"gemini://test.org/path", true},
 		{"gemini://test.org/path/subpath", true},
 		{"gemini://other.site", false},
 	}
 	for _, tc := range testCases {
 		result := IsWhitelisted(tc.url)
 		if result != tc.expected {
 			t.Errorf("IsWhitelisted(%s) = %v, want %v", tc.url, result, tc.expected)
 		}
 	}
 }
 func TestLoadWhitelist(t *testing.T) {
 	// Create a temporary whitelist file
 	content := `# This is a test whitelist
 ^gemini://example\.com
 ^gemini://test\.org/path
 `
 	tmpfile, err := os.CreateTemp("", "whitelist")
 	if err != nil {
 		t.Fatal(err)
 	}
 	defer os.Remove(tmpfile.Name())
 	if _, err := tmpfile.Write([]byte(content)); err != nil {
 		t.Fatal(err)
 	}
 	if err := tmpfile.Close(); err != nil {
 		t.Fatal(err)
 	}
 	// Reset whitelist
 	whitelist = nil
 	// Set up configuration to use the temporary file
 	oldPath := config.CONFIG.WhitelistPath
 	config.CONFIG.WhitelistPath = tmpfile.Name()
 	defer func() {
 		config.CONFIG.WhitelistPath = oldPath
 	}()
 	// Load whitelist from the file
 	err = loadWhitelist(tmpfile.Name())
 	if err != nil {
 		t.Fatalf("loadWhitelist() error = %v", err)
 	}
 	// Check if whitelist was loaded correctly
 	if len(whitelist) != 2 {
 		t.Errorf("loadWhitelist() loaded %d entries, want 2", len(whitelist))
 	}
 	// Test a whitelisted URL
 	if !IsWhitelisted("gemini://example.com") {
 		t.Error("IsWhitelisted(\"gemini://example.com\") = false, want true")
 	}
 	// Test a URL in a whitelisted path
 	if !IsWhitelisted("gemini://test.org/path/subpage.gmi") {
 		t.Error("IsWhitelisted(\"gemini://test.org/path/subpage.gmi\") = false, want true")
 	}
 }
--- a/common/worker.go
+++ b/common/worker.go
@@ -1,204 +1,190 @@
 package common
 import (
 	"context"
 	"database/sql"
 	"errors"
 	"fmt"
 	"time"
 	"gemini-grc/common/blackList"
-	errors2 "gemini-grc/common/errors"
+	"gemini-grc/common/contextlog"
 	commonErrors "gemini-grc/common/errors"
 	"gemini-grc/common/snapshot"
 	url2 "gemini-grc/common/url"
-	_db "gemini-grc/db"
+	"gemini-grc/common/whiteList"
 	"gemini-grc/config"
 	"gemini-grc/contextutil"
 	gemdb "gemini-grc/db"
 	"gemini-grc/gemini"
 	"gemini-grc/gopher"
 	"gemini-grc/hostPool"
-	"gemini-grc/logging"
+	"gemini-grc/robotsMatch"
-	"github.com/antanst/go_errors"
+	"git.antanst.com/antanst/logging"
 	"git.antanst.com/antanst/xerrors"
 	"github.com/guregu/null/v5"
 	"github.com/jmoiron/sqlx"
 )
-func CrawlOneURL(db *sqlx.DB, url *string) error {
+func RunWorkerWithTx(workerID int, job string) {
-	parsedURL, err := url2.ParseURL(*url, "", true)
+	// Extract host from URL for the context.
 	parsedURL, err := url2.ParseURL(job, "", true)
 	if err != nil {
-		return err
+		logging.LogInfo("Failed to parse URL: %s Error: %s", job, err)
 		return
 	}
 	host := parsedURL.Hostname
 	// Create a new worker context
 	baseCtx := context.Background()
 	ctx, cancel := contextutil.NewRequestContext(baseCtx, job, host, workerID)
 	ctx = contextutil.ContextWithComponent(ctx, "worker")
 	defer cancel() // Ensure the context is cancelled when we're done
 	// contextlog.LogInfoWithContext(ctx, logging.GetSlogger(), "======================================\n\n")
 	contextlog.LogDebugWithContext(ctx, logging.GetSlogger(), "Starting worker for URL %s", job)
 	// Create a new db transaction
 	tx, err := gemdb.Database.NewTx(ctx)
 	if err != nil {
 		FatalErrorsChan <- err
 		return
 	}
-	if !url2.IsGeminiUrl(parsedURL.String()) && !url2.IsGopherURL(parsedURL.String()) {
+	err = runWorker(ctx, tx, []string{job})
 		return go_errors.NewError(fmt.Errorf("error parsing URL: not a Gemini or Gopher URL: %s", parsedURL.String()))
 	}
 	tx, err := db.Beginx()
 	if err != nil {
-		return go_errors.NewFatalError(err)
+		// Two cases to handle:
 		// - context cancellation/timeout errors (log and ignore)
 		// - fatal errors (log and send to chan)
 		// non-fatal errors should've been handled within
 		// the runWorker() function and not bubble up here.
 		if errors.Is(err, context.DeadlineExceeded) || errors.Is(err, context.Canceled) {
 			contextlog.LogDebugWithContext(ctx, logging.GetSlogger(), "Worker timed out or canceled: %v", err)
 			rollbackErr := gemdb.SafeRollback(ctx, tx)
 			if rollbackErr != nil {
 				FatalErrorsChan <- rollbackErr
 				return
 			}
-
+			return
-	err = _db.InsertURL(tx, parsedURL.Full)
+		} else if xerrors.IsFatal(err) {
-	if err != nil {
+			contextlog.LogErrorWithContext(ctx, logging.GetSlogger(), "Worker failed: %v", err)
-		return err
+			rollbackErr := gemdb.SafeRollback(ctx, tx)
 			if rollbackErr != nil {
 				FatalErrorsChan <- rollbackErr
 				return
 			}
 			FatalErrorsChan <- err
 			return
-	err = workOnUrl(0, tx, parsedURL.Full)
+		}
-	if err != nil {
+		contextlog.LogErrorWithContext(ctx, logging.GetSlogger(), "Worker failed: %v", err)
-		return err
+		rollbackErr := gemdb.SafeRollback(ctx, tx)
 		if rollbackErr != nil {
 			FatalErrorsChan <- rollbackErr
 			return
 		}
 		return
 	}
 	err = tx.Commit()
-	if err != nil {
+	if err != nil && !errors.Is(err, sql.ErrTxDone) {
-		//if _db.IsDeadlockError(err) {
+		contextlog.LogErrorWithContext(ctx, logging.GetSlogger(), "Failed to commit transaction: %v", err)
-		//	logging.LogError("Deadlock detected. Rolling back")
+		if rollbackErr := gemdb.SafeRollback(ctx, tx); rollbackErr != nil {
-		//	time.Sleep(time.Duration(10) * time.Second)
+			FatalErrorsChan <- err
 		//	err := tx.Rollback()
 		//	return go_errors.NewFatalError(err)
 		//}
 		return go_errors.NewFatalError(err)
 	}
 	logging.LogInfo("Done")
 	return nil
 }
 func SpawnWorkers(numOfWorkers int, db *sqlx.DB) {
 	logging.LogInfo("Spawning %d workers", numOfWorkers)
 	go PrintWorkerStatus(numOfWorkers, StatusChan)
 	for i := range numOfWorkers {
 		go func(i int) {
 			UpdateWorkerStatus(i, "Waiting to start")
 			// Jitter to avoid starting everything at the same time
 			time.Sleep(time.Duration(i+2) * time.Second)
 			for {
 				// TODO: Use cancellable context with tx, logger & worker ID.
 				// ctx := context.WithCancel()
 				// ctx = context.WithValue(ctx, common.CtxKeyLogger, &RequestLogger{r: r})
 				RunWorkerWithTx(i, db)
 			}
 		}(i)
 	}
 }
 func RunWorkerWithTx(workerID int, db *sqlx.DB) {
 	defer func() {
 		UpdateWorkerStatus(workerID, "Done")
 	}()
 	tx, err := db.Beginx()
 	if err != nil {
 		ErrorsChan <- err
 			return
 		}
-
+	}
-	err = runWorker(workerID, tx)
+	contextlog.LogDebugWithContext(ctx, logging.GetSlogger(), "Worker done.")
 	if err != nil {
 		// TODO: Rollback in this case?
 		ErrorsChan <- err
 		return
 }
-	logging.LogDebug("[%3d] Committing transaction", workerID)
+func runWorker(ctx context.Context, tx *sqlx.Tx, urls []string) error {
-	err = tx.Commit()
+	for _, u := range urls {
-	// On deadlock errors, rollback and return, otherwise panic.
+		err := WorkOnUrl(ctx, tx, u)
 	if err != nil {
 		logging.LogError("[%3d] Failed to commit transaction: %w", workerID, err)
 		if _db.IsDeadlockError(err) {
 			logging.LogError("[%3d] Deadlock detected. Rolling back", workerID)
 			time.Sleep(time.Duration(10) * time.Second)
 			err := tx.Rollback()
 			if err != nil {
 				panic(fmt.Sprintf("[%3d] Failed to roll back transaction: %v", workerID, err))
 			}
 			return
 		}
 		panic(fmt.Sprintf("[%3d] Failed to commit transaction: %v", workerID, err))
 	}
 	logging.LogDebug("[%3d] Worker done!", workerID)
 }
 func runWorker(workerID int, tx *sqlx.Tx) error {
 	var urls []string
 	var err error
 	UpdateWorkerStatus(workerID, "Getting URLs from DB")
 	urls, err = _db.GetRandomUrls(tx)
 	// urls, err = _db.GetRandomUrlsWithBasePath(tx)
 	if err != nil {
 		return err
 	} else if len(urls) == 0 {
 		logging.LogInfo("[%3d] No URLs to visit, sleeping...", workerID)
 		UpdateWorkerStatus(workerID, "No URLs to visit, sleeping...")
 		time.Sleep(1 * time.Minute)
 		return nil
 	}
 	// Start visiting URLs.
 	total := len(urls)
 	for i, u := range urls {
 		logging.LogInfo("[%3d] Starting %d/%d %s", workerID, i+1, total, u)
 		UpdateWorkerStatus(workerID, fmt.Sprintf("Starting %d/%d %s", i+1, total, u))
 		err := workOnUrl(workerID, tx, u)
 		if err != nil {
 			return err
 		}
 		logging.LogDebug("[%3d] Done %d/%d.", workerID, i+1, total)
 		UpdateWorkerStatus(workerID, fmt.Sprintf("Done %d/%d %s", i+1, total, u))
 	}
 	return nil
 }
-// workOnUrl visits a URL and stores the result.
+// WorkOnUrl visits a URL and stores the result.
 // unexpected errors are returned.
 // expected errors are stored within the snapshot.
-func workOnUrl(workerID int, tx *sqlx.Tx, url string) (err error) {
+func WorkOnUrl(ctx context.Context, tx *sqlx.Tx, url string) (err error) {
-	s, err := snapshot.SnapshotFromURL(url, false)
+	contextlog.LogDebugWithContext(ctx, logging.GetSlogger(), "Worker visiting URL %s", url)
 	s, err := snapshot.SnapshotFromURL(url, true)
 	if err != nil {
 		contextlog.LogErrorWithContext(ctx, logging.GetSlogger(), "Failed to parse URL: %v", err)
 		return err
 	}
 	// We always use the normalized URL
 	if url != s.URL.Full {
 		//err = gemdb.Database.CheckAndUpdateNormalizedURL(ctx, tx, url, s.URL.Full)
 		//if err != nil {
 		//	return err
 		//}
 		//contextlog.LogDebugWithContext(ctx, logging.GetSlogger(), "Normalized URL: %s → %s", url, s.URL.Full)
 		url = s.URL.Full
 	}
 	isGemini := url2.IsGeminiUrl(s.URL.String())
 	isGopher := url2.IsGopherURL(s.URL.String())
 	if !isGemini && !isGopher {
-		return go_errors.NewError(fmt.Errorf("not a Gopher or Gemini URL: %s", s.URL.String()))
+		return xerrors.NewSimpleError(fmt.Errorf("not a Gopher or Gemini URL: %s", s.URL.String()))
 	}
-	if blackList.IsBlacklisted(s.URL.String()) {
+	if isGopher && !config.CONFIG.GopherEnable {
-		logging.LogInfo("[%3d] URL matches blacklist, ignoring", workerID)
+		return xerrors.NewSimpleError(fmt.Errorf("gopher disabled, not processing Gopher URL: %s", s.URL.String()))
 		s.Error = null.StringFrom(errors2.ErrBlacklistMatch.Error())
 		return saveSnapshotAndRemoveURL(tx, s)
 	}
-	if isGemini {
+	// Check if URL is whitelisted
 	isUrlWhitelisted := whiteList.IsWhitelisted(s.URL.String())
 	if isUrlWhitelisted {
 		contextlog.LogDebugWithContext(ctx, logging.GetSlogger(), "URL matches whitelist, forcing crawl %s", url)
 	}
 	// Only check blacklist if URL is not whitelisted
 	if !isUrlWhitelisted && blackList.IsBlacklisted(s.URL.String()) {
 		contextlog.LogDebugWithContext(ctx, logging.GetSlogger(), "URL matches blacklist, ignoring %s", url)
 		s.Error = null.StringFrom(commonErrors.ErrBlacklistMatch.Error())
 		return saveSnapshotAndRemoveURL(ctx, tx, s)
 	}
 	// Only check robots.txt if URL is not whitelisted and is a Gemini URL
 	var robotMatch bool
 	if !isUrlWhitelisted && isGemini {
 		// If URL matches a robots.txt disallow line,
 		// add it as an error and remove url
-		robotMatch, err := gemini.RobotMatch(s.URL.String())
+		robotMatch = robotsMatch.RobotMatch(ctx, s.URL.String())
 		if robotMatch {
 			contextlog.LogDebugWithContext(ctx, logging.GetSlogger(), "URL matches robots.txt, skipping")
 			s.Error = null.StringFrom(commonErrors.ErrRobotsMatch.Error())
 			return saveSnapshotAndRemoveURL(ctx, tx, s)
 		}
 	}
 	err = hostPool.AddHostToHostPool(ctx, s.Host)
 	if err != nil {
 			// robotMatch returns only network errors!
 			// we stop because we don't want to hit
 			// the server with another request on this case.
 		return err
 	}
 		if robotMatch {
 			logging.LogInfo("[%3d] URL matches robots.txt, ignoring", workerID)
 			s.Error = null.StringFrom(errors2.ErrRobotsMatch.Error())
 			return saveSnapshotAndRemoveURL(tx, s)
 		}
 	}
-	logging.LogDebug("[%3d] Adding to pool %s", workerID, s.URL.String())
+	defer func(ctx context.Context, host string) {
-	UpdateWorkerStatus(workerID, fmt.Sprintf("Adding to pool %s", s.URL.String()))
+		hostPool.RemoveHostFromPool(ctx, host)
-	hostPool.AddHostToHostPool(s.Host)
+	}(ctx, s.Host)
 	defer func(s string) {
 		hostPool.RemoveHostFromPool(s)
 	}(s.Host)
-	logging.LogDebug("[%3d] Visiting %s", workerID, s.URL.String())
+	contextlog.LogDebugWithContext(ctx, logging.GetSlogger(), "Visiting %s", s.URL.String())
 	UpdateWorkerStatus(workerID, fmt.Sprintf("Visiting %s", s.URL.String()))
 	// Use context-aware visits for both protocols
 	if isGopher {
-		s, err = gopher.Visit(s.URL.String())
+		s, err = gopher.VisitWithContext(ctx, s.URL.String())
 	} else {
-		s, err = gemini.Visit(s.URL.String())
+		s, err = gemini.Visit(ctx, s.URL.String())
 	}
 	if err != nil {
 		contextlog.LogInfoWithContext(ctx, logging.GetSlogger(), "Error visiting URL: %v", err)
 		return err
 	}
@@ -206,40 +192,97 @@ func workOnUrl(workerID int, tx *sqlx.Tx, url string) (err error) {
 	if isGemini &&
 		s.ResponseCode.ValueOrZero() >= 30 &&
 		s.ResponseCode.ValueOrZero() < 40 {
-		err = handleRedirection(workerID, tx, s)
+		err = saveRedirectURL(ctx, tx, s)
 		if err != nil {
-			return fmt.Errorf("error while handling redirection: %s", err)
+			return xerrors.NewSimpleError(fmt.Errorf("error while handling redirection: %s", err))
 		}
 	}
-	// Store links
+	// Check if we should skip a potentially
 	// identical snapshot with one from history
 	isIdentical, err := isContentIdentical(ctx, tx, s)
 	if err != nil {
 		return err
 	}
 	if isIdentical {
 		contextlog.LogInfoWithContext(ctx, logging.GetSlogger(), "Content identical to existing snapshot, updating crawl timestamp")
 		// Update the last_crawled timestamp to track that we processed this URL
 		err = gemdb.Database.UpdateLastCrawled(ctx, tx, s.URL.String())
 		if err != nil {
 			return err
 		}
 		return removeURL(ctx, tx, s.URL.String())
 	}
 	// Process and store links since content has changed
 	if len(s.Links.ValueOrZero()) > 0 {
-		logging.LogDebug("[%3d] Found %d links", workerID, len(s.Links.ValueOrZero()))
+		contextlog.LogDebugWithContext(ctx, logging.GetSlogger(), "Found %d links", len(s.Links.ValueOrZero()))
-		err = storeLinks(tx, s)
+		err = storeLinks(ctx, tx, s)
 		if err != nil {
 			return err
 		}
 	}
-	logging.LogInfo("[%3d] %2d %s", workerID, s.ResponseCode.ValueOrZero(), s.URL.String())
+	// Save the snapshot and remove the URL from the queue
-	return saveSnapshotAndRemoveURL(tx, s)
+	if s.Error.ValueOrZero() != "" {
 		// Only save error if we didn't have any valid
 		// snapshot data from a previous crawl!
 		shouldUpdateSnapshot, err := shouldUpdateSnapshotData(ctx, tx, s)
 		if err != nil {
 			return err
 		}
 		if shouldUpdateSnapshot {
 			contextlog.LogInfoWithContext(ctx, logging.GetSlogger(), "%2d %s", s.ResponseCode.ValueOrZero(), s.Error.ValueOrZero())
 			return saveSnapshotAndRemoveURL(ctx, tx, s)
 		} else {
 			contextlog.LogInfoWithContext(ctx, logging.GetSlogger(), "%2d %s (but old content exists, not updating)", s.ResponseCode.ValueOrZero(), s.Error.ValueOrZero())
 			return removeURL(ctx, tx, s.URL.String())
 		}
 	} else {
 		contextlog.LogInfoWithContext(ctx, logging.GetSlogger(), "%2d", s.ResponseCode.ValueOrZero())
 		return saveSnapshotAndRemoveURL(ctx, tx, s)
 	}
 }
-func storeLinks(tx *sqlx.Tx, s *snapshot.Snapshot) error {
+func shouldUpdateSnapshotData(ctx context.Context, tx *sqlx.Tx, s *snapshot.Snapshot) (bool, error) {
 	prevSnapshot, err := gemdb.Database.GetLatestSnapshot(ctx, tx, s.URL.String())
 	if err != nil {
 		return false, err
 	}
 	if prevSnapshot == nil {
 		return true, nil
 	}
 	if prevSnapshot.ResponseCode.Valid {
 		return false, nil
 	}
 	return true, nil
 }
 func isContentIdentical(ctx context.Context, tx *sqlx.Tx, s *snapshot.Snapshot) (bool, error) {
 	// Always check if content is identical to previous snapshot
 	identical, err := gemdb.Database.IsContentIdentical(ctx, tx, s)
 	if err != nil {
 		return false, err
 	}
 	return identical, nil
 }
 // storeLinks checks and stores the snapshot links in the database.
 func storeLinks(ctx context.Context, tx *sqlx.Tx, s *snapshot.Snapshot) error {
 	if s.Links.Valid { //nolint:nestif
 		for _, link := range s.Links.ValueOrZero() {
 			if shouldPersistURL(&link) {
-				visited, err := haveWeVisitedURL(tx, link.Full)
+				visited, err := haveWeVisitedURL(ctx, tx, link.Full)
 				if err != nil {
 					return err
 				}
 				if !visited {
-					err := _db.InsertURL(tx, link.Full)
+					err := gemdb.Database.InsertURL(ctx, tx, link.Full)
 					if err != nil {
 						return err
 					}
 				} else {
-					logging.LogDebug("Link already persisted: %s", link.Full)
+					contextlog.LogDebugWithContext(ctx, logging.GetSlogger(), "Link already persisted: %s", link.Full)
 				}
 			}
 		}
@@ -247,74 +290,109 @@ func storeLinks(tx *sqlx.Tx, s *snapshot.Snapshot) error {
 	return nil
 }
-func saveSnapshotAndRemoveURL(tx *sqlx.Tx, s *snapshot.Snapshot) error {
+func removeURL(ctx context.Context, tx *sqlx.Tx, url string) error {
-	err := _db.OverwriteSnapshot(tx, s)
+	return gemdb.Database.DeleteURL(ctx, tx, url)
 	if err != nil {
 		return err
 	}
 	err = _db.DeleteURL(tx, s.URL.String())
 	if err != nil {
 		return err
 	}
 	return nil
 }
-// shouldPersistURL returns true if we
+func saveSnapshotAndRemoveURL(ctx context.Context, tx *sqlx.Tx, s *snapshot.Snapshot) error {
-// should save the URL in the _db.
+	err := gemdb.Database.SaveSnapshot(ctx, tx, s)
-// Only gemini:// urls are saved.
+	if err != nil {
 		return err
 	}
 	return gemdb.Database.DeleteURL(ctx, tx, s.URL.String())
 }
 // shouldPersistURL returns true given URL is a
 // non-blacklisted Gemini or Gopher URL.
 func shouldPersistURL(u *url2.URL) bool {
-	return url2.IsGeminiUrl(u.String()) || url2.IsGopherURL(u.String())
+	if blackList.IsBlacklisted(u.String()) {
 		return false
 	}
 	if config.CONFIG.GopherEnable && url2.IsGopherURL(u.String()) {
 		return true
 	}
 	return url2.IsGeminiUrl(u.String())
 }
-func haveWeVisitedURL(tx *sqlx.Tx, u string) (bool, error) {
+func haveWeVisitedURL(ctx context.Context, tx *sqlx.Tx, u string) (bool, error) {
 	var result []bool
-	err := tx.Select(&result, `SELECT TRUE FROM urls WHERE url=$1`, u)
+
 	// Check if the context is cancelled
 	if err := ctx.Err(); err != nil {
 		return false, xerrors.NewSimpleError(err)
 	}
 	// Check the urls table which holds the crawl queue.
 	err := tx.SelectContext(ctx, &result, `SELECT TRUE FROM urls WHERE url=$1`, u)
 	if err != nil {
-		return false, go_errors.NewFatalError(fmt.Errorf("database error: %w", err))
+		return false, xerrors.NewError(fmt.Errorf("database error: %w", err), 0, "", true)
 	}
 	if len(result) > 0 {
 		return result[0], nil
 	}
 	err = tx.Select(&result, `SELECT TRUE FROM snapshots WHERE snapshots.url=$1`, u)
 	if err != nil {
 		return false, go_errors.NewFatalError(fmt.Errorf("database error: %w", err))
 	}
 	if len(result) > 0 {
 		return result[0], nil
 	}
 		return false, nil
 	}
-// handleRedirection saves redirection URL.
+	// If we're skipping URLs based on recent updates, check if this URL has been
-func handleRedirection(workerID int, tx *sqlx.Tx, s *snapshot.Snapshot) error {
+	// crawled within the specified number of days
-	newURL, err := url2.ExtractRedirectTargetFromHeader(s.URL, s.Error.ValueOrZero())
+	if config.CONFIG.SkipIfUpdatedDays > 0 {
-	if err != nil {
+		var recentSnapshots []bool
-		return err
+		cutoffDate := time.Now().AddDate(0, 0, -config.CONFIG.SkipIfUpdatedDays)
 	}
 	logging.LogDebug("[%3d] Page redirects to %s", workerID, newURL)
-	haveWeVisited, _ := haveWeVisitedURL(tx, newURL.String())
+		// Check if the context is cancelled
-	if shouldPersistURL(newURL) && !haveWeVisited {
+		if err := ctx.Err(); err != nil {
-		err = _db.InsertURL(tx, newURL.Full)
+			return false, err
 		}
 		err = tx.SelectContext(ctx, &recentSnapshots, `
 			SELECT TRUE FROM snapshots
 			WHERE snapshots.url=$1
 			AND timestamp > $2
 			LIMIT 1`, u, cutoffDate)
 		if err != nil {
 			return false, xerrors.NewError(fmt.Errorf("database error checking recent snapshots: %w", err), 0, "", true)
 		}
 		if len(recentSnapshots) > 0 {
 			return true, nil
 		}
 	}
 	return false, nil
 }
 func saveRedirectURL(ctx context.Context, tx *sqlx.Tx, s *snapshot.Snapshot) error {
 	newURL, err := url2.ExtractRedirectTargetFromHeader(s.URL, s.Header.ValueOrZero())
 	if err != nil {
 		contextlog.LogDebugWithContext(ctx, logging.GetSlogger(), "Failed to extract redirect target: %v", err)
 		return err
 	}
 	contextlog.LogDebugWithContext(ctx, logging.GetSlogger(), "Page redirects to %s", newURL)
 	haveWeVisited, err := haveWeVisitedURL(ctx, tx, newURL.String())
 	if err != nil {
 		return err
 	}
-		logging.LogDebug("[%3d] Saved redirection URL %s", workerID, newURL.String())
+
 	if shouldPersistURL(newURL) && !haveWeVisited {
 		err = gemdb.Database.InsertURL(ctx, tx, newURL.Full)
 		if err != nil {
 			return err
 		}
 		contextlog.LogDebugWithContext(ctx, logging.GetSlogger(), "Saved redirection URL %s", newURL.String())
 	}
 	return nil
 }
-func GetSnapshotFromURL(tx *sqlx.Tx, url string) ([]snapshot.Snapshot, error) {
+//func GetSnapshotFromURL(tx *sqlx.Tx, url string) ([]snapshot.Snapshot, error) {
-	query := `
+//	query := `
-	SELECT *
+//	SELECT *
-	FROM snapshots
+//	FROM snapshots
-	WHERE url=$1
+//	WHERE url=$1
-	LIMIT 1
+//	LIMIT 1
-	`
+//	`
-	var snapshots []snapshot.Snapshot
+//	var snapshots []snapshot.Snapshot
-	err := tx.Select(&snapshots, query, url)
+//	err := tx.Select(&snapshots, query, url)
-	if err != nil {
+//	if err != nil {
-		return nil, err
+//		return nil, err
-	}
+//	}
-	return snapshots, nil
+//	return snapshots, nil
-}
+//}
--- a/common/workerStatus.go
+++ b/common/workerStatus.go
@@ -1,70 +0,0 @@
 package common
 import (
 	"fmt"
 	"strings"
 	"gemini-grc/config"
 )
 type WorkerStatus struct {
 	ID     int
 	Status string
 }
 func UpdateWorkerStatus(workerID int, status string) {
 	if !config.GetConfig().PrintWorkerStatus {
 		return
 	}
 	if config.CONFIG.NumOfWorkers > 1 {
 		StatusChan <- WorkerStatus{
 			ID:     workerID,
 			Status: status,
 		}
 	}
 }
 func PrintWorkerStatus(totalWorkers int, statusChan chan WorkerStatus) {
 	if !config.GetConfig().PrintWorkerStatus {
 		return
 	}
 	// Create a slice to store current Status of each worker
 	statuses := make([]string, totalWorkers)
 	// Initialize empty statuses
 	for i := range statuses {
 		statuses[i] = ""
 	}
 	// Initial print
 	var output strings.Builder
 	// \033[H moves the cursor to the top left corner of the screen
 	// (ie, the first column of the first row in the screen).
 	// \033[J clears the part of the screen from the cursor to the end of the screen.
 	output.WriteString("\033[H\033[J") // Clear screen and move cursor to top
 	for i := range statuses {
 		output.WriteString(fmt.Sprintf("[%2d] \n", i))
 	}
 	fmt.Print(output.String())
 	// Continuously receive Status updates
 	for update := range statusChan {
 		if update.ID >= totalWorkers {
 			continue
 		}
 		// Update the Status
 		statuses[update.ID] = update.Status
 		// Build the complete output string
 		output.Reset()
 		output.WriteString("\033[H\033[J") // Clear screen and move cursor to top
 		for i, status := range statuses {
 			output.WriteString(fmt.Sprintf("[%2d] %.100s\n", i, status))
 		}
 		// Print the entire Status
 		fmt.Print(output.String())
 	}
 }
--- a/config/config.go
+++ b/config/config.go
@@ -1,166 +1,90 @@
 package config
 import (
 	"flag"
 	"fmt"
 	"log/slog"
 	"os"
 	"strconv"
 	"github.com/rs/zerolog"
 )
 // Environment variable names.
 const (
 	EnvLogLevel               = "LOG_LEVEL"
 	EnvNumWorkers             = "NUM_OF_WORKERS"
 	EnvWorkerBatchSize        = "WORKER_BATCH_SIZE"
 	EnvMaxResponseSize        = "MAX_RESPONSE_SIZE"
 	EnvResponseTimeout        = "RESPONSE_TIMEOUT"
 	EnvPanicOnUnexpectedError = "PANIC_ON_UNEXPECTED_ERROR"
 	EnvBlacklistPath          = "BLACKLIST_PATH"
 	EnvDryRun                 = "DRY_RUN"
 	EnvPrintWorkerStatus      = "PRINT_WORKER_STATUS"
 )
 // Config holds the application configuration loaded from environment variables.
 type Config struct {
-	LogLevel               zerolog.Level // Logging level (debug, info, warn, error)
+	PgURL             string
 	LogLevel          slog.Level // Logging level (debug, info, warn, error)
 	MaxResponseSize   int        // Maximum size of response in bytes
 	MaxDbConnections  int        // Maximum number of database connections.
 	NumOfWorkers      int        // Number of concurrent workers
 	ResponseTimeout   int        // Timeout for responses in seconds
 	WorkerBatchSize        int           // Batch size for worker processing
 	PanicOnUnexpectedError bool          // Panic on unexpected errors when visiting a URL
 	BlacklistPath     string     // File that has blacklisted strings of "host:port"
 	WhitelistPath     string     // File with URLs that should always be crawled regardless of blacklist
 	DryRun            bool       // If false, don't write to disk
-	PrintWorkerStatus      bool          // If false, don't print worker status table
+	GopherEnable      bool       // Enable Gopher crawling
 	SeedUrlPath       string     // Add URLs from file to queue
 	SkipIfUpdatedDays int        // Skip re-crawling URLs updated within this many days (0 to disable)
 }
 var CONFIG Config //nolint:gochecknoglobals
-// parsePositiveInt parses and validates positive integer values.
+// Initialize loads and validates configuration from environment variables
-func parsePositiveInt(param, value string) (int, error) {
+func Initialize() *Config {
 	val, err := strconv.Atoi(value)
 	if err != nil {
 		return 0, ValidationError{
 			Param:  param,
 			Value:  value,
 			Reason: "must be a valid integer",
 		}
 	}
 	if val <= 0 {
 		return 0, ValidationError{
 			Param:  param,
 			Value:  value,
 			Reason: "must be positive",
 		}
 	}
 	return val, nil
 }
 func parseBool(param, value string) (bool, error) {
 	val, err := strconv.ParseBool(value)
 	if err != nil {
 		return false, ValidationError{
 			Param:  param,
 			Value:  value,
 			Reason: "cannot be converted to boolean",
 		}
 	}
 	return val, nil
 }
 // GetConfig loads and validates configuration from environment variables
 func GetConfig() *Config {
 	config := &Config{}
-	// Map of environment variables to their parsing functions
+	loglevel := flag.String("log-level", "info", "Logging level (debug, info, warn, error)")
-	parsers := map[string]func(string) error{
+	pgURL := flag.String("pgurl", "", "Postgres URL")
-		EnvLogLevel: func(v string) error {
+	dryRun := flag.Bool("dry-run", false, "Dry run mode")
-			level, err := zerolog.ParseLevel(v)
+	gopherEnable := flag.Bool("gopher", false, "Enable crawling of Gopher holes")
 	maxDbConnections := flag.Int("max-db-connections", 100, "Maximum number of database connections")
 	numOfWorkers := flag.Int("workers", 1, "Number of concurrent workers")
 	maxResponseSize := flag.Int("max-response-size", 1024*1024, "Maximum size of response in bytes")
 	responseTimeout := flag.Int("response-timeout", 10, "Timeout for network responses in seconds")
 	blacklistPath := flag.String("blacklist-path", "", "File that has blacklist regexes")
 	skipIfUpdatedDays := flag.Int("skip-if-updated-days", 60, "Skip re-crawling URLs updated within this many days (0 to disable)")
 	whitelistPath := flag.String("whitelist-path", "", "File with URLs that should always be crawled regardless of blacklist")
 	seedUrlPath := flag.String("seed-url-path", "", "File with seed URLs that should be added to the queue immediatelly")
 	flag.Parse()
 	config.PgURL = *pgURL
 	config.DryRun = *dryRun
 	config.GopherEnable = *gopherEnable
 	config.NumOfWorkers = *numOfWorkers
 	config.MaxResponseSize = *maxResponseSize
 	config.ResponseTimeout = *responseTimeout
 	config.BlacklistPath = *blacklistPath
 	config.WhitelistPath = *whitelistPath
 	config.SeedUrlPath = *seedUrlPath
 	config.MaxDbConnections = *maxDbConnections
 	config.SkipIfUpdatedDays = *skipIfUpdatedDays
 	level, err := ParseSlogLevel(*loglevel)
 	if err != nil {
-				return ValidationError{
+		_, _ = fmt.Fprint(os.Stderr, err.Error())
-					Param:  EnvLogLevel,
+		os.Exit(-1)
 					Value:  v,
 					Reason: "must be one of: debug, info, warn, error",
 				}
 	}
 	config.LogLevel = level
 			return nil
 		},
 		EnvNumWorkers: func(v string) error {
 			val, err := parsePositiveInt(EnvNumWorkers, v)
 			if err != nil {
 				return err
 			}
 			config.NumOfWorkers = val
 			return nil
 		},
 		EnvWorkerBatchSize: func(v string) error {
 			val, err := parsePositiveInt(EnvWorkerBatchSize, v)
 			if err != nil {
 				return err
 			}
 			config.WorkerBatchSize = val
 			return nil
 		},
 		EnvMaxResponseSize: func(v string) error {
 			val, err := parsePositiveInt(EnvMaxResponseSize, v)
 			if err != nil {
 				return err
 			}
 			config.MaxResponseSize = val
 			return nil
 		},
 		EnvResponseTimeout: func(v string) error {
 			val, err := parsePositiveInt(EnvResponseTimeout, v)
 			if err != nil {
 				return err
 			}
 			config.ResponseTimeout = val
 			return nil
 		},
 		EnvPanicOnUnexpectedError: func(v string) error {
 			val, err := parseBool(EnvPanicOnUnexpectedError, v)
 			if err != nil {
 				return err
 			}
 			config.PanicOnUnexpectedError = val
 			return nil
 		},
 		EnvBlacklistPath: func(v string) error {
 			config.BlacklistPath = v
 			return nil
 		},
 		EnvDryRun: func(v string) error {
 			val, err := parseBool(EnvDryRun, v)
 			if err != nil {
 				return err
 			}
 			config.DryRun = val
 			return nil
 		},
 		EnvPrintWorkerStatus: func(v string) error {
 			val, err := parseBool(EnvPrintWorkerStatus, v)
 			if err != nil {
 				return err
 			}
 			config.PrintWorkerStatus = val
 			return nil
 		},
 	}
 	// Process each environment variable
 	for envVar, parser := range parsers {
 		value, ok := os.LookupEnv(envVar)
 		if !ok {
 			fmt.Fprintf(os.Stderr, "Missing required environment variable: %s\n", envVar)
 			os.Exit(1)
 		}
 		if err := parser(value); err != nil {
 			fmt.Fprintf(os.Stderr, "Configuration error: %v\n", err)
 			os.Exit(1)
 		}
 	}
 	return config
 }
 // ParseSlogLevel converts a string level to slog.Level
 func ParseSlogLevel(levelStr string) (slog.Level, error) {
 	switch levelStr {
 	case "debug":
 		return slog.LevelDebug, nil
 	case "info":
 		return slog.LevelInfo, nil
 	case "warn":
 		return slog.LevelWarn, nil
 	case "error":
 		return slog.LevelError, nil
 	default:
 		return slog.LevelInfo, fmt.Errorf("invalid log level: %s", levelStr)
 	}
 }
 // Convert method for backward compatibility with existing codebase
 // This can be removed once all references to Convert() are updated
 func (c *Config) Convert() *Config {
 	// Just return the config itself as it now directly contains slog.Level
 	return c
 }
--- a/config/errors.go
+++ b/config/errors.go
@@ -1,14 +0,0 @@
 package config
 import "fmt"
 // ValidationError represents a config validation error
 type ValidationError struct {
 	Param  string
 	Value  string
 	Reason string
 }
 func (e ValidationError) Error() string {
 	return fmt.Sprintf("invalid value '%s' for %s: %s", e.Value, e.Param, e.Reason)
 }
--- a/contextutil/context.go
+++ b/contextutil/context.go
@@ -0,0 +1,89 @@
 package contextutil
 import (
 	"context"
 	"time"
 	"git.antanst.com/antanst/uid"
 )
 // ContextKey type for context values
 type ContextKey string
 // Context keys
 const (
 	CtxKeyURL       ContextKey = "url"        // Full URL being processed
 	CtxKeyHost      ContextKey = "host"       // Host of the URL
 	CtxKeyRequestID ContextKey = "request_id" // Unique ID for this processing request
 	CtxKeyWorkerID  ContextKey = "worker_id"  // Worker ID processing this request
 	CtxKeyStartTime ContextKey = "start_time" // When processing started
 	CtxKeyComponent ContextKey = "component"  // Component name for logging
 )
 // NewRequestContext creates a new, cancellable context
 // with a timeout and
 func NewRequestContext(parentCtx context.Context, url string, host string, workerID int) (context.Context, context.CancelFunc) {
 	ctx, cancel := context.WithTimeout(parentCtx, 120*time.Second)
 	requestID := uid.UID()
 	ctx = context.WithValue(ctx, CtxKeyURL, url)
 	ctx = context.WithValue(ctx, CtxKeyHost, host)
 	ctx = context.WithValue(ctx, CtxKeyRequestID, requestID)
 	ctx = context.WithValue(ctx, CtxKeyWorkerID, workerID)
 	ctx = context.WithValue(ctx, CtxKeyStartTime, time.Now())
 	return ctx, cancel
 }
 // Helper functions to get values from context
 // GetURLFromContext retrieves the URL from the context
 func GetURLFromContext(ctx context.Context) string {
 	if url, ok := ctx.Value(CtxKeyURL).(string); ok {
 		return url
 	}
 	return ""
 }
 // GetHostFromContext retrieves the host from the context
 func GetHostFromContext(ctx context.Context) string {
 	if host, ok := ctx.Value(CtxKeyHost).(string); ok {
 		return host
 	}
 	return ""
 }
 // GetRequestIDFromContext retrieves the request ID from the context
 func GetRequestIDFromContext(ctx context.Context) string {
 	if id, ok := ctx.Value(CtxKeyRequestID).(string); ok {
 		return id
 	}
 	return ""
 }
 // GetWorkerIDFromContext retrieves the worker ID from the context
 func GetWorkerIDFromContext(ctx context.Context) int {
 	if id, ok := ctx.Value(CtxKeyWorkerID).(int); ok {
 		return id
 	}
 	return -1
 }
 // GetStartTimeFromContext retrieves the start time from the context
 func GetStartTimeFromContext(ctx context.Context) time.Time {
 	if startTime, ok := ctx.Value(CtxKeyStartTime).(time.Time); ok {
 		return startTime
 	}
 	return time.Time{}
 }
 // GetComponentFromContext retrieves the component name from the context
 func GetComponentFromContext(ctx context.Context) string {
 	if component, ok := ctx.Value(CtxKeyComponent).(string); ok {
 		return component
 	}
 	return ""
 }
 // ContextWithComponent adds or updates the component name in the context
 func ContextWithComponent(ctx context.Context, component string) context.Context {
 	return context.WithValue(ctx, CtxKeyComponent, component)
 }
--- a/db/db.go
+++ b/db/db.go
@@ -1,87 +1,187 @@
 package db
 import (
 	"bytes"
 	"context"
 	"database/sql"
 	"encoding/json"
 	"errors"
 	"fmt"
-	"os"
+	"strings"
-	"strconv"
+	"sync"
 	"time"
 	"gemini-grc/common/contextlog"
 	"gemini-grc/common/snapshot"
 	commonUrl "gemini-grc/common/url"
 	"gemini-grc/config"
-	"gemini-grc/logging"
+	"gemini-grc/contextutil"
-	"github.com/antanst/go_errors"
+	"git.antanst.com/antanst/logging"
 	"git.antanst.com/antanst/xerrors"
 	"github.com/guregu/null/v5"
 	_ "github.com/jackc/pgx/v5/stdlib" // PGX driver for PostgreSQL
 	"github.com/jmoiron/sqlx"
 	"github.com/lib/pq"
 )
-func ConnectToDB() (*sqlx.DB, error) {
+type DbService interface {
-	connStr := fmt.Sprintf("postgres://%s:%s@%s:%s/%s", //nolint:nosprintfhostport
+	// Core database methods
-		os.Getenv("PG_USER"),
+	Initialize(ctx context.Context) error
-		os.Getenv("PG_PASSWORD"),
+	Shutdown(ctx context.Context) error
-		os.Getenv("PG_HOST"),
+	NewTx(ctx context.Context) (*sqlx.Tx, error)
 		os.Getenv("PG_PORT"),
 		os.Getenv("PG_DATABASE"),
 	)
-	// Create a connection pool
+	// URL methods
-	db, err := sqlx.Open("pgx", connStr)
+	InsertURL(ctx context.Context, tx *sqlx.Tx, url string) error
-	if err != nil {
+	CheckAndUpdateNormalizedURL(ctx context.Context, tx *sqlx.Tx, url string, normalizedURL string) error
-		return nil, go_errors.NewFatalError(fmt.Errorf("unable to connect to database with URL %s: %w", connStr, err))
+	DeleteURL(ctx context.Context, tx *sqlx.Tx, url string) error
-	}
+	MarkURLsAsBeingProcessed(ctx context.Context, tx *sqlx.Tx, urls []string) error
-	// TODO move PG_MAX_OPEN_CONNECTIONS to config env variables
+	GetUrlHosts(ctx context.Context, tx *sqlx.Tx) ([]string, error)
-	maxConnections, err := strconv.Atoi(os.Getenv("PG_MAX_OPEN_CONNECTIONS"))
+	GetRandomUrlsFromHosts(ctx context.Context, hosts []string, limit int, tx *sqlx.Tx) ([]string, error)
-	if err != nil {
+
-		return nil, go_errors.NewFatalError(fmt.Errorf("unable to set DB max connections: %w", err))
+	// Snapshot methods
-	}
+	SaveSnapshot(ctx context.Context, tx *sqlx.Tx, s *snapshot.Snapshot) error
-	db.SetMaxOpenConns(maxConnections)
+	OverwriteSnapshot(ctx context.Context, tx *sqlx.Tx, s *snapshot.Snapshot) error
-	err = db.Ping()
+	UpdateLastCrawled(ctx context.Context, tx *sqlx.Tx, url string) error
-	if err != nil {
+	GetLatestSnapshot(ctx context.Context, tx *sqlx.Tx, url string) (*snapshot.Snapshot, error)
-		return nil, go_errors.NewFatalError(fmt.Errorf("unable to ping database: %w", err))
+	GetSnapshotAtTimestamp(ctx context.Context, tx *sqlx.Tx, url string, timestamp time.Time) (*snapshot.Snapshot, error)
 	GetAllSnapshotsForURL(ctx context.Context, tx *sqlx.Tx, url string) ([]*snapshot.Snapshot, error)
 	GetSnapshotsByDateRange(ctx context.Context, tx *sqlx.Tx, url string, startTime, endTime time.Time) ([]*snapshot.Snapshot, error)
 	IsContentIdentical(ctx context.Context, tx *sqlx.Tx, s *snapshot.Snapshot) (bool, error)
 }
-	logging.LogDebug("Connected to database")
+type DbServiceImpl struct {
-	return db, nil
+	db        *sqlx.DB
 	connected bool
 	mu sync.Mutex
 }
 var Database DbServiceImpl
 // IsDeadlockError checks if the error is a PostgreSQL deadlock error.
 func IsDeadlockError(err error) bool {
-	err = go_errors.Unwrap(err)
+	err = errors.Unwrap(err)
 	var pqErr *pq.Error
-	if go_errors.As(err, &pqErr) {
+	if errors.As(err, &pqErr) {
 		return pqErr.Code == "40P01" // PostgreSQL deadlock error code
 	}
 	return false
 }
-func GetRandomUrls(tx *sqlx.Tx) ([]string, error) {
+// Initialize initializes the database with context
-	var urls []string
+func (d *DbServiceImpl) Initialize(ctx context.Context) error {
-	err := tx.Select(&urls, SQL_SELECT_RANDOM_URLS, config.CONFIG.WorkerBatchSize)
+	// Create a database-specific context
-	if err != nil {
+	dbCtx := contextutil.ContextWithComponent(ctx, "database")
-		return nil, go_errors.NewFatalError(err)
+	contextlog.LogDebugWithContext(dbCtx, logging.GetSlogger(), "Initializing database connection")
-	}
+
-	return urls, nil
+	d.mu.Lock()
 	defer d.mu.Unlock()
 	if d.connected {
 		return nil
 	}
-func GetRandomUrlsWithBasePath(tx *sqlx.Tx) ([]string, error) {
+	// Check if the context is cancelled before proceeding
-	SqlQuery := `SELECT url FROM snapshots WHERE url ~ '^[^:]+://[^/]+/?$' ORDER BY RANDOM() LIMIT $1`
+	if err := ctx.Err(); err != nil {
-	var urls []string
+		return err
 	err := tx.Select(&urls, SqlQuery, config.CONFIG.WorkerBatchSize)
 	if err != nil {
 		return nil, go_errors.NewFatalError(err)
 	}
 	return urls, nil
 	}
-func InsertURL(tx *sqlx.Tx, url string) error {
+	// Create a connection pool
-	logging.LogDebug("Inserting URL %s", url)
+	connStr := config.CONFIG.PgURL
-	query := SQL_INSERT_URL
+	db, err := sqlx.Open("pgx", connStr)
 	if err != nil {
 		contextlog.LogErrorWithContext(dbCtx, logging.GetSlogger(), "Unable to connect to database with URL %s: %v", connStr, err)
 		return xerrors.NewError(fmt.Errorf("unable to connect to database with URL %s: %w", connStr, err), 0, "", true)
 	}
 	// Configure connection pool
 	db.SetMaxOpenConns(config.CONFIG.MaxDbConnections)
 	db.SetMaxIdleConns(config.CONFIG.MaxDbConnections / 2)
 	db.SetConnMaxLifetime(time.Minute * 5)
 	db.SetConnMaxIdleTime(time.Minute * 1)
 	// Check if the context is cancelled before proceeding with ping
 	if err := ctx.Err(); err != nil {
 		return err
 	}
 	// Use PingContext for context-aware ping
 	err = db.PingContext(ctx)
 	if err != nil {
 		contextlog.LogErrorWithContext(dbCtx, logging.GetSlogger(), "Unable to ping database: %v", err)
 		return xerrors.NewError(fmt.Errorf("unable to ping database: %w", err), 0, "", true)
 	}
 	contextlog.LogDebugWithContext(dbCtx, logging.GetSlogger(), "Database connection initialized successfully")
 	d.db = db
 	d.connected = true
 	return nil
 }
 func (d *DbServiceImpl) Shutdown(ctx context.Context) error {
 	dbCtx := contextutil.ContextWithComponent(ctx, "database")
 	contextlog.LogDebugWithContext(dbCtx, logging.GetSlogger(), "Shutting down database connections")
 	_, err := d.db.Query("UPDATE urls SET being_processed=false")
 	if err != nil {
 		contextlog.LogErrorWithContext(dbCtx, logging.GetSlogger(), "Unable to update urls table: %v", err)
 	}
 	d.mu.Lock()
 	defer d.mu.Unlock()
 	if !d.connected {
 		return nil
 	}
 	if err := ctx.Err(); err != nil {
 		return err
 	}
 	err = d.db.Close()
 	if err != nil {
 		contextlog.LogErrorWithContext(dbCtx, logging.GetSlogger(), "Error closing database connection: %v", err)
 	} else {
 		contextlog.LogDebugWithContext(dbCtx, logging.GetSlogger(), "Database connection closed successfully")
 		d.connected = false
 	}
 	return err
 }
 // NewTx creates a new transaction with context
 func (d *DbServiceImpl) NewTx(ctx context.Context) (*sqlx.Tx, error) {
 	dbCtx := contextutil.ContextWithComponent(ctx, "database")
 	// Check if the context is cancelled before proceeding
 	if err := ctx.Err(); err != nil {
 		contextlog.LogErrorWithContext(dbCtx, logging.GetSlogger(), "Context error before creating transaction: %v", err)
 		return nil, err
 	}
 	tx, err := d.db.BeginTxx(ctx, nil)
 	if err != nil {
 		contextlog.LogErrorWithContext(dbCtx, logging.GetSlogger(), "Failed to create transaction: %v", err)
 		return nil, err
 	}
 	return tx, nil
 }
 // InsertURL inserts a URL with context
 func (d *DbServiceImpl) InsertURL(ctx context.Context, tx *sqlx.Tx, url string) error {
 	dbCtx := contextutil.ContextWithComponent(ctx, "database")
 	contextlog.LogDebugWithContext(dbCtx, logging.GetSlogger(), "Inserting URL %s", url)
 	// Check if the context is cancelled before proceeding
 	if err := ctx.Err(); err != nil {
 		return err
 	}
 	// Context-aware implementation
 	normalizedURL, err := commonUrl.ParseURL(url, "", true)
 	if err != nil {
 		return err
 	}
 	a := struct {
 		Url       string
 		Host      string
@@ -91,50 +191,378 @@ func InsertURL(tx *sqlx.Tx, url string) error {
 		Host:      normalizedURL.Hostname,
 		Timestamp: time.Now(),
 	}
-	_, err = tx.NamedExec(query, a)
+
 	query := SQL_INSERT_URL
 	_, err = tx.NamedExecContext(ctx, query, a)
 	if err != nil {
-		return go_errors.NewFatalError(fmt.Errorf("cannot insert URL: database error %w URL %s", err, url))
+		return xerrors.NewError(fmt.Errorf("cannot insert URL: database error %w URL %s", err, url), 0, "", true)
 	}
 	return nil
 }
-func DeleteURL(tx *sqlx.Tx, url string) error {
+// NormalizeURL normalizes a URL with context
-	logging.LogDebug("Deleting URL %s", url)
+func (d *DbServiceImpl) CheckAndUpdateNormalizedURL(ctx context.Context, tx *sqlx.Tx, url string, normalizedURL string) error {
 	dbCtx := contextutil.ContextWithComponent(ctx, "database")
 	// Check if URLs are already the same
 	if url == normalizedURL {
 		return nil
 	}
 	// Check if the context is cancelled before proceeding
 	if err := ctx.Err(); err != nil {
 		return err
 	}
 	contextlog.LogDebugWithContext(dbCtx, logging.GetSlogger(), "Updating normalized URL %s -> %s", url, normalizedURL)
 	query := SQL_UPDATE_URL
 	a := struct {
 		Url           string `db:"Url"`
 		NormalizedURL string `db:"NormalizedURL"`
 	}{
 		Url:           url,
 		NormalizedURL: normalizedURL,
 	}
 	_, err := tx.NamedExecContext(ctx, query, a)
 	if err != nil {
 		return xerrors.NewError(fmt.Errorf("cannot update normalized URL: %w URL %s", err, url), 0, "", true)
 	}
 	return nil
 }
 // DeleteURL deletes a URL with context
 func (d *DbServiceImpl) DeleteURL(ctx context.Context, tx *sqlx.Tx, url string) error {
 	dbCtx := contextutil.ContextWithComponent(ctx, "database")
 	contextlog.LogDebugWithContext(dbCtx, logging.GetSlogger(), "Deleting URL %s", url)
 	// Check if the context is cancelled before proceeding
 	if err := ctx.Err(); err != nil {
 		return err
 	}
 	// Context-aware implementation
 	query := SQL_DELETE_URL
-	_, err := tx.Exec(query, url)
+	_, err := tx.ExecContext(ctx, query, url)
 	if err != nil {
-		return go_errors.NewFatalError(fmt.Errorf("cannot delete URL: database error %w URL %s", err, url))
+		return xerrors.NewError(fmt.Errorf("cannot delete URL: database error %w URL %s", err, url), 0, "", true)
 	}
 	return nil
 }
-func OverwriteSnapshot(tx *sqlx.Tx, s *snapshot.Snapshot) (err error) {
+// MarkURLsAsBeingProcessed marks URLs as being processed with context
 func (d *DbServiceImpl) MarkURLsAsBeingProcessed(ctx context.Context, tx *sqlx.Tx, urls []string) error {
 	dbCtx := contextutil.ContextWithComponent(ctx, "database")
 	// Skip if no URLs provided
 	if len(urls) == 0 {
 		return nil
 	}
 	// Check if the context is cancelled before proceeding
 	if err := ctx.Err(); err != nil {
 		return err
 	}
 	contextlog.LogDebugWithContext(dbCtx, logging.GetSlogger(), "Marking %d URLs as being processed", len(urls))
 	// Context-aware implementation
 	if len(urls) > 0 {
 		// Build a query with multiple parameters instead of using pq.Array
 		placeholders := make([]string, len(urls))
 		args := make([]interface{}, len(urls))
 		for i, url := range urls {
 			placeholders[i] = fmt.Sprintf("$%d", i+1)
 			args[i] = url
 		}
 		query := fmt.Sprintf(SQL_MARK_URLS_BEING_PROCESSED, strings.Join(placeholders, ","))
 		_, err := tx.ExecContext(ctx, query, args...)
 		if err != nil {
 			return xerrors.NewError(fmt.Errorf("cannot mark URLs as being processed: %w", err), 0, "", true)
 		}
 	}
 	return nil
 }
 // GetUrlHosts gets URL hosts with context
 func (d *DbServiceImpl) GetUrlHosts(ctx context.Context, tx *sqlx.Tx) ([]string, error) {
 	dbCtx := contextutil.ContextWithComponent(ctx, "database")
 	contextlog.LogDebugWithContext(dbCtx, logging.GetSlogger(), "Getting URL hosts")
 	// Check if the context is cancelled before proceeding
 	if err := ctx.Err(); err != nil {
 		return nil, err
 	}
 	// Context-aware implementation
 	var hosts []string
 	var query string
 	if config.CONFIG.GopherEnable {
 		query = "SELECT DISTINCT(host) FROM urls WHERE being_processed IS NOT TRUE"
 	} else {
 		query = "SELECT DISTINCT(host) FROM urls WHERE url like 'gemini://%' AND being_processed IS NOT TRUE"
 	}
 	err := tx.SelectContext(ctx, &hosts, query)
 	if err != nil {
 		return nil, xerrors.NewError(err, 0, "", true)
 	}
 	return hosts, nil
 }
 // GetRandomUrlsFromHosts gets random URLs from hosts with context
 func (d *DbServiceImpl) GetRandomUrlsFromHosts(ctx context.Context, hosts []string, limit int, tx *sqlx.Tx) ([]string, error) {
 	dbCtx := contextutil.ContextWithComponent(ctx, "database")
 	contextlog.LogDebugWithContext(dbCtx, logging.GetSlogger(), "Getting random URLs from %d hosts with limit %d", len(hosts), limit)
 	// Check if the context is cancelled before proceeding
 	if err := ctx.Err(); err != nil {
 		return nil, err
 	}
 	// Context-aware implementation
 	var urls []string
 	var query string
 	for _, host := range hosts {
 		var results []string
 		if !config.CONFIG.GopherEnable {
 			query = "SELECT url FROM urls WHERE host=$1 AND url like 'gemini://%' AND being_processed IS NOT TRUE ORDER BY RANDOM() LIMIT $2"
 		} else {
 			query = "SELECT url FROM urls WHERE host=$1 AND being_processed IS NOT TRUE ORDER BY RANDOM() LIMIT $2"
 		}
 		err := tx.SelectContext(ctx, &results, query, host, limit)
 		if err != nil {
 			return nil, xerrors.NewError(err, 0, "", true)
 		}
 		urls = append(urls, results...)
 	}
 	// Check context cancellation before mark operation
 	if err := ctx.Err(); err != nil {
 		return nil, err
 	}
 	// Use context-aware method for marking URLs
 	err := d.MarkURLsAsBeingProcessed(ctx, tx, urls)
 	if err != nil {
 		return nil, err
 	}
 	return urls, nil
 }
 // SaveSnapshot saves a snapshot with context
 func (d *DbServiceImpl) SaveSnapshot(ctx context.Context, tx *sqlx.Tx, s *snapshot.Snapshot) error {
 	dbCtx := contextutil.ContextWithComponent(ctx, "database")
 	contextlog.LogDebugWithContext(dbCtx, logging.GetSlogger(), "Saving snapshot for URL %s", s.URL.String())
 	// Check if the context is cancelled before proceeding
 	if err := ctx.Err(); err != nil {
 		return err
 	}
 	// Context-aware implementation
 	if config.CONFIG.DryRun {
 		marshalled, err := json.MarshalIndent(s, "", "  ")
 		if err != nil {
-			return go_errors.NewFatalError(fmt.Errorf("JSON serialization error for %v", s))
+			return xerrors.NewError(fmt.Errorf("JSON serialization error for %v", s), 0, "", true)
 		}
-		logging.LogDebug("Would upsert snapshot %s", marshalled)
+		contextlog.LogDebugWithContext(dbCtx, logging.GetSlogger(), "Would save snapshot %s", marshalled)
 		return nil
 	}
-	query := SQL_UPSERT_SNAPSHOT
+
-	rows, err := tx.NamedQuery(query, s)
+	// Check context before expensive operations
 	if err := ctx.Err(); err != nil {
 		return err
 	}
 	// Always ensure we have current timestamps
 	currentTime := time.Now()
 	s.Timestamp = null.TimeFrom(currentTime)
 	s.LastCrawled = null.TimeFrom(currentTime)
 	// For PostgreSQL, use the global sqlx.NamedQueryContext function
 	// The SQL_INSERT_SNAPSHOT already has a RETURNING id clause
 	query := SQL_INSERT_SNAPSHOT
 	rows, err := sqlx.NamedQueryContext(ctx, tx, query, s)
 	if err != nil {
-		return go_errors.NewFatalError(fmt.Errorf("cannot overwrite snapshot: %w", err))
+		return xerrors.NewError(fmt.Errorf("cannot save snapshot: %w", err), 0, "", true)
 	}
-	defer func() {
+	defer rows.Close()
-		_err := rows.Close()
+
-		if err == nil && _err != nil {
+	// Scan the returned ID
 			err = go_errors.NewFatalError(fmt.Errorf("cannot overwrite snapshot: error closing rows: %w", err))
 		}
 	}()
 	if rows.Next() {
-		var returnedID int
+		err = rows.Scan(&s.ID)
 		err = rows.Scan(&returnedID)
 		if err != nil {
-			return go_errors.NewFatalError(fmt.Errorf("cannot overwrite snapshot: error scanning rows: %w", err))
+			return xerrors.NewError(fmt.Errorf("cannot save snapshot: error scanning returned ID: %w", err), 0, "", true)
 		}
-		s.ID = returnedID
+	}
 	return nil
 }
 // OverwriteSnapshot overwrites a snapshot with context (maintained for backward compatibility)
 func (d *DbServiceImpl) OverwriteSnapshot(ctx context.Context, tx *sqlx.Tx, s *snapshot.Snapshot) error {
 	dbCtx := contextutil.ContextWithComponent(ctx, "database")
 	contextlog.LogDebugWithContext(dbCtx, logging.GetSlogger(), "Overwriting snapshot for URL %s", s.URL.String())
 	// Check if the context is cancelled before proceeding
 	if err := ctx.Err(); err != nil {
 		return err
 	}
 	// Now simply delegate to SaveSnapshot which is already context-aware
 	return d.SaveSnapshot(ctx, tx, s)
 }
 // UpdateLastCrawled updates the last_crawled timestamp for the most recent snapshot of a URL
 func (d *DbServiceImpl) UpdateLastCrawled(ctx context.Context, tx *sqlx.Tx, url string) error {
 	dbCtx := contextutil.ContextWithComponent(ctx, "database")
 	contextlog.LogDebugWithContext(dbCtx, logging.GetSlogger(), "Updating last_crawled timestamp for URL %s", url)
 	// Check if the context is cancelled before proceeding
 	if err := ctx.Err(); err != nil {
 		return err
 	}
 	// Update the last_crawled timestamp for the most recent snapshot
 	_, err := tx.ExecContext(ctx, SQL_UPDATE_LAST_CRAWLED, url)
 	if err != nil {
 		return xerrors.NewError(fmt.Errorf("cannot update last_crawled for URL %s: %w", url, err), 0, "", true)
 	}
 	return nil
 }
 // GetLatestSnapshot gets the latest snapshot with context
 func (d *DbServiceImpl) GetLatestSnapshot(ctx context.Context, tx *sqlx.Tx, url string) (*snapshot.Snapshot, error) {
 	dbCtx := contextutil.ContextWithComponent(ctx, "database")
 	contextlog.LogDebugWithContext(dbCtx, logging.GetSlogger(), "Getting latest snapshot for URL %s", url)
 	// Check if the context is cancelled before proceeding
 	if err := ctx.Err(); err != nil {
 		return nil, err
 	}
 	// Context-aware implementation
 	s := &snapshot.Snapshot{}
 	err := tx.GetContext(ctx, s, SQL_GET_LATEST_SNAPSHOT, url)
 	if err != nil {
 		if errors.Is(err, sql.ErrNoRows) {
 			return nil, nil
 		}
 		return nil, xerrors.NewError(fmt.Errorf("cannot get latest snapshot for URL %s: %w", url, err), 0, "", false)
 	}
 	return s, nil
 }
 // GetSnapshotAtTimestamp gets a snapshot at a specific timestamp with context
 func (d *DbServiceImpl) GetSnapshotAtTimestamp(ctx context.Context, tx *sqlx.Tx, url string, timestamp time.Time) (*snapshot.Snapshot, error) {
 	dbCtx := contextutil.ContextWithComponent(ctx, "database")
 	contextlog.LogDebugWithContext(dbCtx, logging.GetSlogger(), "Getting snapshot for URL %s at timestamp %v", url, timestamp)
 	// Check if the context is cancelled before proceeding
 	if err := ctx.Err(); err != nil {
 		return nil, err
 	}
 	// Context-aware implementation
 	s := &snapshot.Snapshot{}
 	err := tx.GetContext(ctx, s, SQL_GET_SNAPSHOT_AT_TIMESTAMP, url, timestamp)
 	if err != nil {
 		if errors.Is(err, sql.ErrNoRows) {
 			return nil, xerrors.NewError(fmt.Errorf("no snapshot found for URL %s at or before %v", url, timestamp), 0, "", false)
 		}
 		return nil, xerrors.NewError(fmt.Errorf("cannot get snapshot for URL %s at timestamp %v: %w", url, timestamp, err), 0, "", false)
 	}
 	return s, nil
 }
 // GetAllSnapshotsForURL gets all snapshots for a URL with context
 func (d *DbServiceImpl) GetAllSnapshotsForURL(ctx context.Context, tx *sqlx.Tx, url string) ([]*snapshot.Snapshot, error) {
 	dbCtx := contextutil.ContextWithComponent(ctx, "database")
 	contextlog.LogDebugWithContext(dbCtx, logging.GetSlogger(), "Getting all snapshots for URL %s", url)
 	// Check if the context is cancelled before proceeding
 	if err := ctx.Err(); err != nil {
 		return nil, err
 	}
 	// Context-aware implementation
 	snapshots := []*snapshot.Snapshot{}
 	err := tx.SelectContext(ctx, &snapshots, SQL_GET_ALL_SNAPSHOTS_FOR_URL, url)
 	if err != nil {
 		return nil, xerrors.NewError(fmt.Errorf("cannot get all snapshots for URL %s: %w", url, err), 0, "", false)
 	}
 	return snapshots, nil
 }
 // GetSnapshotsByDateRange gets snapshots by date range with context
 func (d *DbServiceImpl) GetSnapshotsByDateRange(ctx context.Context, tx *sqlx.Tx, url string, startTime, endTime time.Time) ([]*snapshot.Snapshot, error) {
 	dbCtx := contextutil.ContextWithComponent(ctx, "database")
 	contextlog.LogDebugWithContext(dbCtx, logging.GetSlogger(), "Getting snapshots for URL %s in date range %v to %v", url, startTime, endTime)
 	// Check if the context is cancelled before proceeding
 	if err := ctx.Err(); err != nil {
 		return nil, err
 	}
 	// Context-aware implementation
 	snapshots := []*snapshot.Snapshot{}
 	err := tx.SelectContext(ctx, &snapshots, SQL_GET_SNAPSHOTS_BY_DATE_RANGE, url, startTime, endTime)
 	if err != nil {
 		return nil, xerrors.NewError(fmt.Errorf("cannot get snapshots for URL %s in date range %v to %v: %w",
 			url, startTime, endTime, err), 0, "", false)
 	}
 	return snapshots, nil
 }
 // IsContentIdentical checks if content is identical with context
 func (d *DbServiceImpl) IsContentIdentical(ctx context.Context, tx *sqlx.Tx, s *snapshot.Snapshot) (bool, error) {
 	dbCtx := contextutil.ContextWithComponent(ctx, "database")
 	contextlog.LogDebugWithContext(dbCtx, logging.GetSlogger(), "Checking if content is identical for URL %s", s.URL.String())
 	// Check if the context is cancelled before proceeding
 	if err := ctx.Err(); err != nil {
 		return false, err
 	}
 	// Try to get the latest snapshot for this URL
 	latestSnapshot := &snapshot.Snapshot{}
 	err := tx.GetContext(ctx, latestSnapshot, SQL_GET_LATEST_SNAPSHOT, s.URL.String())
 	if err != nil {
 		// If there's no snapshot yet, it can't be identical
 		if errors.Is(err, sql.ErrNoRows) {
 			return false, nil
 		}
 		return false, xerrors.NewError(err, 0, "", true)
 	}
 	// Check context cancellation before potentially expensive comparison
 	if err := ctx.Err(); err != nil {
 		return false, err
 	}
 	// Check if the content is identical
 	if s.GemText.Valid && latestSnapshot.GemText.Valid {
 		return s.GemText.String == latestSnapshot.GemText.String, nil
 	} else if s.Data.Valid && latestSnapshot.Data.Valid {
 		return bytes.Equal(s.Data.V, latestSnapshot.Data.V), nil
 	}
 	return false, nil
 }
 // SafeRollback attempts to roll back a transaction,
 // handling the case if the tx was already finalized.
 func SafeRollback(ctx context.Context, tx *sqlx.Tx) error {
 	rollbackErr := tx.Rollback()
 	if rollbackErr != nil {
 		// Check if it's the standard "transaction already finalized" error
 		if errors.Is(rollbackErr, sql.ErrTxDone) {
 			contextlog.LogWarnWithContext(ctx, logging.GetSlogger(), "Rollback failed because transaction is already finalized")
 			return nil
 		}
 		// Only return error for other types of rollback failures
 		contextlog.LogErrorWithContext(ctx, logging.GetSlogger(), "Failed to rollback transaction: %v", rollbackErr)
 		return xerrors.NewError(fmt.Errorf("failed to rollback transaction: %w", rollbackErr), 0, "", true)
 	}
 	return nil
 }
--- a/db/db_queries.go
+++ b/db/db_queries.go
@@ -18,50 +18,158 @@ LIMIT $1
 	SQL_SELECT_RANDOM_URLS = `
 SELECT url
 FROM urls u
 WHERE u.being_processed IS NOT TRUE
 ORDER BY RANDOM()
 FOR UPDATE SKIP LOCKED
 LIMIT $1
 `
-	SQL_INSERT_SNAPSHOT_IF_NEW = `
+	SQL_MARK_URLS_BEING_PROCESSED      = `UPDATE urls SET being_processed = true WHERE url IN (%s)`
-        INSERT INTO snapshots (url, host, timestamp, mimetype, data, gemtext, links, lang, response_code, error)
+	SQL_SELECT_RANDOM_URLS_GEMINI_ONLY = `
-        VALUES (:url, :host, :timestamp, :mimetype, :data, :gemtext, :links, :lang, :response_code, :error)
+SELECT url
-        ON CONFLICT (url) DO NOTHING
+FROM urls u
 WHERE u.url like 'gemini://%'
  AND u.being_processed IS NOT TRUE
 ORDER BY RANDOM()
 FOR UPDATE SKIP LOCKED
 LIMIT $1
 `
-	SQL_UPSERT_SNAPSHOT = `INSERT INTO snapshots (url, host, timestamp, mimetype, data, gemtext, links, lang, response_code, error)
+	SQL_SELECT_RANDOM_URLS_GEMINI_ONLY_2 = `
-        VALUES (:url, :host, :timestamp, :mimetype, :data, :gemtext, :links, :lang, :response_code, :error)
+WITH RankedUrls AS (
-        ON CONFLICT (url) DO UPDATE SET
+    -- Step 1: Assign a random rank to each URL within its host group
-            url = EXCLUDED.url,
+    SELECT
-            host = EXCLUDED.host,
+        url,
-            timestamp = EXCLUDED.timestamp,
+        host,
-            mimetype = EXCLUDED.mimetype,
+        ROW_NUMBER() OVER (PARTITION BY host ORDER BY RANDOM()) as rn
-            data = EXCLUDED.data,
+    FROM
-            gemtext = EXCLUDED.gemtext,
+        urls
-            links = EXCLUDED.links,
+    WHERE url like 'gemini://%'
-            lang = EXCLUDED.lang,
+      AND being_processed IS NOT TRUE
-            response_code = EXCLUDED.response_code,
+),
-            error = EXCLUDED.error
+OneUrlPerHost AS (
-        RETURNING id
+    -- Step 2: Filter to keep only the first-ranked (random) URL per host
    SELECT
        url,
        host
    FROM
        RankedUrls
    WHERE
        rn = 1
 )
 -- Step 3: From the set of one URL per host, randomly select X
 SELECT
    url
 FROM
    OneUrlPerHost
 ORDER BY
    RANDOM()
 FOR UPDATE SKIP LOCKED
 LIMIT $1
 `
-	SQL_UPDATE_SNAPSHOT = `UPDATE snapshots
+	// New query - always insert a new snapshot without conflict handling
-SET url = :url,
+	SQL_INSERT_SNAPSHOT = `
-host = :host,
+        INSERT INTO snapshots (url, host, timestamp, mimetype, data, gemtext, links, lang, response_code, error, header, last_crawled)
-timestamp = :timestamp,
+        VALUES (:url, :host, :timestamp, :mimetype, :data, :gemtext, :links, :lang, :response_code, :error, :header, :last_crawled)
 mimetype = :mimetype,
 data = :data,
 gemtext = :gemtext,
 links = :links,
 lang = :lang,
 response_code = :response_code,
 error = :error
 WHERE id = :id
        RETURNING id
    `
 	SQL_INSERT_URL = `
        INSERT INTO urls (url, host, timestamp)
        VALUES (:url, :host, :timestamp)
        ON CONFLICT (url) DO NOTHING
    `
 	SQL_UPDATE_URL = `
        UPDATE urls
        SET url = :NormalizedURL
        WHERE url = :Url
        AND NOT EXISTS (
            SELECT 1 FROM urls WHERE url = :NormalizedURL
        )
    `
 	SQL_DELETE_URL = `
        DELETE FROM urls WHERE url=$1
    `
 	SQL_GET_LATEST_SNAPSHOT = `
        SELECT * FROM snapshots
        WHERE url = $1
        ORDER BY timestamp DESC
        LIMIT 1
    `
 	SQL_GET_SNAPSHOT_AT_TIMESTAMP = `
        SELECT * FROM snapshots
        WHERE url = $1
        AND timestamp <= $2
        ORDER BY timestamp DESC
        LIMIT 1
    `
 	SQL_GET_ALL_SNAPSHOTS_FOR_URL = `
        SELECT * FROM snapshots
        WHERE url = $1
        ORDER BY timestamp DESC
    `
 	SQL_GET_SNAPSHOTS_BY_DATE_RANGE = `
        SELECT * FROM snapshots
        WHERE url = $1
        AND timestamp BETWEEN $2 AND $3
        ORDER BY timestamp DESC
    `
 	// Update last_crawled timestamp for the most recent snapshot of a URL
 	SQL_UPDATE_LAST_CRAWLED = `
        UPDATE snapshots 
        SET last_crawled = CURRENT_TIMESTAMP 
        WHERE id = (
            SELECT id FROM snapshots 
            WHERE url = $1 
            ORDER BY timestamp DESC 
            LIMIT 1
        )
    `
 	// SQL_FETCH_SNAPSHOTS_FROM_HISTORY Fetches URLs from snapshots for re-crawling based on last_crawled timestamp
 	// This query finds root domain URLs that haven't been crawled recently and selects
 	// one URL per host for diversity. Uses CTEs to:
 	// 1. Find latest crawl attempt per URL (via MAX(last_crawled))
 	// 2. Filter to URLs with actual content and successful responses (20-29)
 	// 3. Select URLs where latest crawl is older than cutoff date
 	// 4. Rank randomly within each host and pick one URL per host
 	// Parameters: $1 = cutoff_date, $2 = limit
 	SQL_FETCH_SNAPSHOTS_FROM_HISTORY = `
 		WITH latest_attempts AS (
 			SELECT 
 				url,
 				host,
 				COALESCE(MAX(last_crawled), '1970-01-01'::timestamp) as latest_attempt
 			FROM snapshots
 			WHERE url ~ '^gemini://[^/]+/?$' AND mimetype = 'text/gemini'
 			GROUP BY url, host
 		),
 		root_urls_with_content AS (
 			SELECT DISTINCT
 				la.url,
 				la.host,
 				la.latest_attempt
 			FROM latest_attempts la
 			JOIN snapshots s ON s.url = la.url 
 			WHERE (s.gemtext IS NOT NULL OR s.data IS NOT NULL)
 				AND s.response_code BETWEEN 20 AND 29
 		),
 		eligible_urls AS (
 			SELECT 
 				url,
 				host,
 				latest_attempt
 			FROM root_urls_with_content
 			WHERE latest_attempt < $1
 		),
 		ranked_urls AS (
 			SELECT
 				url,
 				host,
 				latest_attempt,
 				ROW_NUMBER() OVER (PARTITION BY host ORDER BY RANDOM()) as rank
 			FROM eligible_urls
 		)
 		SELECT url, host
 		FROM ranked_urls
 		WHERE rank = 1
 		ORDER BY RANDOM()
 		LIMIT $2
    `
 )
--- a/db/sql/initdb.sql
+++ b/db/sql/initdb.sql
@@ -1,35 +0,0 @@
 DROP TABLE IF EXISTS snapshots;
 CREATE TABLE snapshots (
    id SERIAL PRIMARY KEY,
    url TEXT NOT NULL UNIQUE,
    host TEXT NOT NULL,
    timestamp TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP,
    mimetype TEXT,
    data BYTEA,
    gemtext TEXT,
    links JSONB,
    lang TEXT,
    response_code INTEGER,
    error TEXT
 );
 CREATE INDEX idx_url ON snapshots (url);
 CREATE INDEX idx_timestamp ON snapshots (timestamp);
 CREATE INDEX idx_mimetype ON snapshots (mimetype);
 CREATE INDEX idx_lang ON snapshots (lang);
 CREATE INDEX idx_response_code ON snapshots (response_code);
 CREATE INDEX idx_error ON snapshots (error);
 CREATE INDEX idx_host ON snapshots (host);
 CREATE INDEX unique_uid_url ON snapshots (uid, url);
 CREATE INDEX idx_response_code_error_nulls ON snapshots (response_code, error) WHERE response_code IS NULL AND error IS NULL;
 CREATE TABLE urls (
                      id SERIAL PRIMARY KEY,
                      url TEXT NOT NULL UNIQUE,
                      host TEXT NOT NULL,
                      timestamp TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP
 );
 CREATE INDEX idx_urls_url ON urls (url);
 CREATE INDEX idx_urls_timestamp ON urls (timestamp);
--- a/gemget.sh
+++ b/gemget.sh
@@ -1,16 +0,0 @@
 #!/bin/env bash
 set -eu
 set -o pipefail
 # Max response size 10MiB
 LOG_LEVEL=debug \
 PRINT_WORKER_STATUS=false \
 DRY_RUN=false \
 NUM_OF_WORKERS=1 \
 WORKER_BATCH_SIZE=1 \
 BLACKLIST_PATH="$(pwd)/blacklist.txt" \
 MAX_RESPONSE_SIZE=10485760 \
 RESPONSE_TIMEOUT=10 \
 PANIC_ON_UNEXPECTED_ERROR=true \
 go run ./bin/gemget/main.go "$@"
--- a/gemini/errors.go
+++ b/gemini/errors.go
@@ -1,9 +1,8 @@
 package gemini
 import (
 	"errors"
 	"fmt"
 	"github.com/antanst/go_errors"
 )
 // GeminiError is used to represent
@@ -20,21 +19,33 @@ func (e *GeminiError) Error() string {
 	return fmt.Sprintf("gemini error: code %d %s", e.Code, e.Msg)
 }
 func (e *GeminiError) String() string {
 	return e.Error()
 }
 // NewGeminiError creates a new GeminiError based on the status code and header.
 // Status codes are based on the Gemini protocol specification:
 // - 1x: Input required
 // - 2x: Success (not handled as errors)
 // - 3x: Redirect
 // - 4x: Temporary failure
 // - 5x: Permanent failure
 // - 6x: Client certificate required/rejected
 func NewGeminiError(code int, header string) error {
 	var msg string
 	switch {
 	case code >= 10 && code < 20:
-		msg = "needs input"
+		msg = fmt.Sprintf("input required: %s", header)
 	case code >= 30 && code < 40:
-		msg = "redirect"
+		msg = fmt.Sprintf("redirect: %s", header)
 	case code >= 40 && code < 50:
-		msg = "bad request"
+		msg = fmt.Sprintf("request failed: %s", header)
 	case code >= 50 && code < 60:
-		msg = "server error"
+		msg = fmt.Sprintf("server error: %s", header)
 	case code >= 60 && code < 70:
-		msg = "TLS error"
+		msg = fmt.Sprintf("TLS error: %s", header)
 	default:
-		msg = "unexpected Status code"
+		msg = fmt.Sprintf("unexpected status code %d: %s", code, header)
 	}
 	return &GeminiError{
 		Msg:    msg,
@@ -43,10 +54,11 @@ func NewGeminiError(code int, header string) error {
 	}
 }
 // IsGeminiError checks if the given error is a GeminiError.
 func IsGeminiError(err error) bool {
 	if err == nil {
 		return false
 	}
 	var asError *GeminiError
-	return go_errors.As(err, &asError)
+	return errors.As(err, &asError)
 }
--- a/common/errors/errors_test.go
+++ b/common/errors/errors_test.go
@@ -1,38 +1,36 @@
-package errors_test
+package gemini
 import (
 	"errors"
 	"fmt"
 	"testing"
 	"gemini-grc/gemini"
 )
 func TestErrGemini(t *testing.T) {
 	t.Parallel()
-	err := gemini.NewGeminiError(50, "50 server error")
+	err := NewGeminiError(50, "50 server error")
-	if !errors.As(err, new(*gemini.GeminiError)) {
+	if !errors.As(err, new(*GeminiError)) {
 		t.Errorf("TestErrGemini fail")
 	}
 }
 func TestErrGeminiWrapped(t *testing.T) {
 	t.Parallel()
-	err := gemini.NewGeminiError(50, "50 server error")
+	err := NewGeminiError(50, "50 server error")
 	errWrapped := fmt.Errorf("%w wrapped", err)
-	if !errors.As(errWrapped, new(*gemini.GeminiError)) {
+	if !errors.As(errWrapped, new(*GeminiError)) {
 		t.Errorf("TestErrGeminiWrapped fail")
 	}
 }
 func TestIsGeminiError(t *testing.T) {
 	t.Parallel()
-	err1 := gemini.NewGeminiError(50, "50 server error")
+	err1 := NewGeminiError(50, "50 server error")
-	if !gemini.IsGeminiError(err1) {
+	if !IsGeminiError(err1) {
 		t.Errorf("TestGeminiError fail #1")
 	}
 	wrappedErr1 := fmt.Errorf("wrapped %w", err1)
-	if !gemini.IsGeminiError(wrappedErr1) {
+	if !IsGeminiError(wrappedErr1) {
 		t.Errorf("TestGeminiError fail #2")
 	}
 }
--- a/gemini/files.go
+++ b/gemini/files.go
@@ -1,114 +0,0 @@
 package gemini
 import (
 	"fmt"
 	"net/url"
 	"os"
 	"path"
 	"path/filepath"
 	"strings"
 	"gemini-grc/common/snapshot"
 	"gemini-grc/logging"
 )
 // sanitizePath encodes invalid filesystem characters using URL encoding.
 // Example:
 // /example/path/to/page?query=param&another=value
 // would become
 // example/path/to/page%3Fquery%3Dparam%26another%3Dvalue
 func sanitizePath(p string) string {
 	// Split the path into its components
 	components := strings.Split(p, "/")
 	// Encode each component separately
 	for i, component := range components {
 		// Decode any existing percent-encoded characters
 		decodedComponent, err := url.PathUnescape(component)
 		if err != nil {
 			decodedComponent = component // Fallback to original if unescape fails
 		}
 		// Encode the component to escape invalid filesystem characters
 		encodedComponent := url.QueryEscape(decodedComponent)
 		// Replace '+' (from QueryEscape) with '%20' to handle spaces correctly
 		encodedComponent = strings.ReplaceAll(encodedComponent, "+", "%20")
 		components[i] = encodedComponent
 	}
 	// Rejoin the components into a sanitized path
 	safe := filepath.Join(components...)
 	return safe
 }
 // getFilePath constructs a safe file path from the root path and URL path.
 // It URL-encodes invalid filesystem characters to ensure the path is valid.
 func calcFilePath(rootPath, urlPath string) (string, error) {
 	// Normalize the URL path
 	cleanPath := filepath.Clean(urlPath)
 	// Safe check to prevent directory traversal
 	if strings.Contains(cleanPath, "..") {
 		return "", fmt.Errorf("Invalid URL path: contains directory traversal")
 	}
 	// Sanitize the path by encoding invalid characters
 	safePath := sanitizePath(cleanPath)
 	// Join the root path and the sanitized URL path
 	finalPath := filepath.Join(rootPath, safePath)
 	return finalPath, nil
 }
 func SaveToFile(rootPath string, s *snapshot.Snapshot, done chan struct{}) {
 	parentPath := path.Join(rootPath, s.URL.Hostname)
 	urlPath := s.URL.Path
 	// If path is empty, add `index.gmi` as the file to save
 	if urlPath == "" || urlPath == "." {
 		urlPath = "index.gmi"
 	}
 	// If path ends with '/' then add index.gmi for the
 	// directory to be created.
 	if strings.HasSuffix(urlPath, "/") {
 		urlPath = strings.Join([]string{urlPath, "index.gmi"}, "")
 	}
 	finalPath, err := calcFilePath(parentPath, urlPath)
 	if err != nil {
 		logging.LogError("GeminiError saving %s: %w", s.URL, err)
 		return
 	}
 	// Ensure the directory exists
 	dir := filepath.Dir(finalPath)
 	if err := os.MkdirAll(dir, os.ModePerm); err != nil {
 		logging.LogError("Failed to create directory: %w", err)
 		return
 	}
 	if s.MimeType.Valid && s.MimeType.String == "text/gemini" {
 		err = os.WriteFile(finalPath, (*s).Data.V, 0o666)
 	} else {
 		err = os.WriteFile(finalPath, []byte((*s).GemText.String), 0o666)
 	}
 	if err != nil {
 		logging.LogError("GeminiError saving %s: %w", s.URL.Full, err)
 	}
 	close(done)
 }
 func ReadLines(path string) []string {
 	data, err := os.ReadFile(path)
 	if err != nil {
 		panic(fmt.Sprintf("Failed to read file: %s", err))
 	}
 	lines := strings.Split(string(data), "\n")
 	// remove last line if empty
 	// (happens when file ends with '\n')
 	if lines[len(lines)-1] == "" {
 		lines = lines[:len(lines)-1]
 	}
 	return lines
 }
--- a/gemini/geminiLinks.go
+++ b/gemini/geminiLinks.go
@@ -7,9 +7,9 @@ import (
 	"gemini-grc/common/linkList"
 	url2 "gemini-grc/common/url"
 	"gemini-grc/logging"
 	"gemini-grc/util"
-	"github.com/antanst/go_errors"
+	"git.antanst.com/antanst/logging"
 	"git.antanst.com/antanst/xerrors"
 )
 func GetPageLinks(currentURL url2.URL, gemtext string) linkList.LinkList {
@@ -37,14 +37,14 @@ func ParseGeminiLinkLine(linkLine string, currentURL string) (*url2.URL, error)
 	// Check: currentURL is parseable
 	baseURL, err := url.Parse(currentURL)
 	if err != nil {
-		return nil, go_errors.NewError(fmt.Errorf("error parsing link line: %w input '%s'", err, linkLine))
+		return nil, xerrors.NewError(fmt.Errorf("error parsing link line: %w input '%s'", err, linkLine), 0, "", false)
 	}
 	// Extract the actual URL and the description
 	re := regexp.MustCompile(`^=>[ \t]+(\S+)([ \t]+.*)?`)
 	matches := re.FindStringSubmatch(linkLine)
 	if len(matches) == 0 {
-		return nil, go_errors.NewError(fmt.Errorf("error parsing link line: no regexp match for line %s", linkLine))
+		return nil, xerrors.NewError(fmt.Errorf("error parsing link line: no regexp match for line %s", linkLine), 0, "", false)
 	}
 	originalURLStr := matches[1]
@@ -52,7 +52,7 @@ func ParseGeminiLinkLine(linkLine string, currentURL string) (*url2.URL, error)
 	// Check: Unescape the URL if escaped
 	_, err = url.QueryUnescape(originalURLStr)
 	if err != nil {
-		return nil, go_errors.NewError(fmt.Errorf("error parsing link line: %w input '%s'", err, linkLine))
+		return nil, xerrors.NewError(fmt.Errorf("error parsing link line: %w input '%s'", err, linkLine), 0, "", false)
 	}
 	description := ""
@@ -63,7 +63,7 @@ func ParseGeminiLinkLine(linkLine string, currentURL string) (*url2.URL, error)
 	// Parse the URL from the link line
 	parsedURL, err := url.Parse(originalURLStr)
 	if err != nil {
-		return nil, go_errors.NewError(fmt.Errorf("error parsing link line: %w input '%s'", err, linkLine))
+		return nil, xerrors.NewError(fmt.Errorf("error parsing link line: %w input '%s'", err, linkLine), 0, "", false)
 	}
 	// If link URL is relative, resolve full URL
@@ -80,7 +80,7 @@ func ParseGeminiLinkLine(linkLine string, currentURL string) (*url2.URL, error)
 	finalURL, err := url2.ParseURL(parsedURL.String(), description, true)
 	if err != nil {
-		return nil, go_errors.NewError(fmt.Errorf("error parsing link line: %w input '%s'", err, linkLine))
+		return nil, xerrors.NewError(fmt.Errorf("error parsing link line: %w input '%s'", err, linkLine), 0, "", false)
 	}
 	return finalURL, nil
--- a/gemini/network.go
+++ b/gemini/network.go
@@ -1,7 +1,9 @@
 package gemini
 import (
 	"context"
 	"crypto/tls"
 	"errors"
 	"fmt"
 	"io"
 	"net"
@@ -12,155 +14,201 @@ import (
 	"strings"
 	"time"
-	errors2 "gemini-grc/common/errors"
+	"gemini-grc/common/contextlog"
 	"gemini-grc/common/snapshot"
 	_url "gemini-grc/common/url"
 	"gemini-grc/config"
-	"gemini-grc/logging"
+	"gemini-grc/contextutil"
-	"github.com/antanst/go_errors"
+	"git.antanst.com/antanst/logging"
 	"git.antanst.com/antanst/xerrors"
 	"github.com/guregu/null/v5"
 )
-// Visit given URL, using the Gemini protocol.
+// Visit visits a given URL using the Gemini protocol,
-// Mutates given Snapshot with the data.
+// and returns a populated snapshot. Any relevant errors
-// In case of error, we store the error string
+// when visiting the URL are stored in the snapshot;
-// inside snapshot and return the error.
+// an error is returned only when construction of a
-func Visit(url string) (s *snapshot.Snapshot, err error) {
+// snapshot was not possible (context cancellation errors,
 // not a valid URL etc.)
 func Visit(ctx context.Context, url string) (s *snapshot.Snapshot, err error) {
 	geminiCtx := contextutil.ContextWithComponent(ctx, "network")
 	s, err = snapshot.SnapshotFromURL(url, true)
 	if err != nil {
 		return nil, err
 	}
-	defer func() {
+	// Check if the context has been canceled
 	if err := ctx.Err(); err != nil {
 		return nil, xerrors.NewSimpleError(err)
 	}
 	data, err := ConnectAndGetData(geminiCtx, s.URL.String())
 	if err != nil {
 			// GeminiError and HostError should
 			// be stored in the snapshot. Other
 			// errors are returned.
 			if errors2.IsHostError(err) {
 		s.Error = null.StringFrom(err.Error())
 				err = nil
 			} else if IsGeminiError(err) {
 				s.Error = null.StringFrom(err.Error())
 				s.Header = null.StringFrom(go_errors.Unwrap(err).(*GeminiError).Header)
 				s.ResponseCode = null.IntFrom(int64(go_errors.Unwrap(err).(*GeminiError).Code))
 				err = nil
 			} else {
 				s = nil
 			}
 		}
 	}()
 	data, err := ConnectAndGetData(s.URL.String())
 	if err != nil {
 		return s, err
 	}
 	s, err = processData(*s, data)
 	if err != nil {
 		return s, err
 	}
 	if isGeminiCapsule(s) {
 		links := GetPageLinks(s.URL, s.GemText.String)
 		if len(links) > 0 {
 			logging.LogDebug("Found %d links", len(links))
 			s.Links = null.ValueFrom(links)
 		}
 	}
 		return s, nil
 	}
-func ConnectAndGetData(url string) ([]byte, error) {
+	// Check if the context has been canceled
 	if err := ctx.Err(); err != nil {
 		return nil, xerrors.NewSimpleError(err)
 	}
 	s = UpdateSnapshotWithData(*s, data)
 	if !s.Error.Valid &&
 		s.MimeType.Valid &&
 		s.MimeType.String == "text/gemini" &&
 		len(s.GemText.ValueOrZero()) > 0 {
 		links := GetPageLinks(s.URL, s.GemText.String)
 		if len(links) > 0 {
 			s.Links = null.ValueFrom(links)
 		}
 	}
 	return s, nil
 }
 // ConnectAndGetData is a context-aware version of ConnectAndGetData
 // that returns the data from a GET request to a Gemini URL. It uses the context
 // for cancellation, timeout, and logging.
 func ConnectAndGetData(ctx context.Context, url string) ([]byte, error) {
 	parsedURL, err := stdurl.Parse(url)
 	if err != nil {
-		return nil, go_errors.NewError(err)
+		return nil, xerrors.NewSimpleError(fmt.Errorf("error parsing URL: %w", err))
 	}
 	hostname := parsedURL.Hostname()
 	port := parsedURL.Port()
 	if port == "" {
 		port = "1965"
 	}
 	host := fmt.Sprintf("%s:%s", hostname, port)
 	// Check if the context has been canceled before proceeding
 	if err := ctx.Err(); err != nil {
 		return nil, err
 	}
 	timeoutDuration := time.Duration(config.CONFIG.ResponseTimeout) * time.Second
-	// Establish the underlying TCP connection.
+
 	// Establish the underlying TCP connection with context-based cancellation
 	dialer := &net.Dialer{
 		Timeout: timeoutDuration,
 	}
-	conn, err := dialer.Dial("tcp", host)
+
 	conn, err := dialer.DialContext(ctx, "tcp", host)
 	if err != nil {
-		return nil, errors2.NewHostError(err)
+		contextlog.LogDebugWithContext(ctx, logging.GetSlogger(), "Failed to establish TCP connection: %v", err)
 		return nil, xerrors.NewSimpleError(err)
 	}
-	// Make sure we always close the connection.
+
 	// Make sure we always close the connection
 	defer func() {
 		_ = conn.Close()
 	}()
 	// Set read and write timeouts on the TCP connection.
 	err = conn.SetReadDeadline(time.Now().Add(timeoutDuration))
 	if err != nil {
-		return nil, errors2.NewHostError(err)
+		return nil, xerrors.NewSimpleError(err)
 	}
 	err = conn.SetWriteDeadline(time.Now().Add(timeoutDuration))
 	if err != nil {
-		return nil, errors2.NewHostError(err)
+		return nil, xerrors.NewSimpleError(err)
 	}
 	// Check if the context has been canceled before proceeding with TLS handshake
 	if err := ctx.Err(); err != nil {
 		return nil, err
 	}
 	// Perform the TLS handshake
 	tlsConfig := &tls.Config{
 		InsecureSkipVerify: true,                 //nolint:gosec    // Accept all TLS certs, even if insecure.
 		ServerName:         parsedURL.Hostname(), // SNI says we should not include port in hostname
 		// MinVersion:         tls.VersionTLS12, // Use a minimum TLS version. Warning breaks a lot of sites.
 	}
 	tlsConn := tls.Client(conn, tlsConfig)
 	err = tlsConn.SetReadDeadline(time.Now().Add(timeoutDuration))
 	if err != nil {
-		return nil, errors2.NewHostError(err)
+		return nil, xerrors.NewSimpleError(err)
 	}
 	err = tlsConn.SetWriteDeadline(time.Now().Add(timeoutDuration))
 	if err != nil {
-		return nil, errors2.NewHostError(err)
+		return nil, xerrors.NewSimpleError(err)
 	}
 	err = tlsConn.Handshake()
 	if err != nil {
 		return nil, errors2.NewHostError(err)
 	}
-	// We read `buf`-sized chunks and add data to `data`.
+	// Check if the context is done before attempting handshake
 	if err := ctx.Err(); err != nil {
 		return nil, err
 	}
 	// Perform TLS handshake with regular method
 	// (HandshakeContext is only available in Go 1.17+)
 	err = tlsConn.Handshake()
 	if err != nil {
 		return nil, xerrors.NewSimpleError(err)
 	}
 	// Check again if the context is done after handshake
 	if err := ctx.Err(); err != nil {
 		return nil, xerrors.NewSimpleError(err)
 	}
 	// We read `buf`-sized chunks and add data to `data`
 	buf := make([]byte, 4096)
 	var data []byte
-	// Send Gemini request to trigger server response.
+	// Check if the context has been canceled before sending request
 	if err := ctx.Err(); err != nil {
 		return nil, xerrors.NewSimpleError(err)
 	}
 	// Send Gemini request to trigger server response
 	// Fix for stupid server bug:
 	// Some servers return 'Header: 53 No proxying to other hosts or ports!'
 	// when the port is 1965 and is still specified explicitly in the URL.
 	url2, _ := _url.ParseURL(url, "", true)
 	_, err = tlsConn.Write([]byte(fmt.Sprintf("%s\r\n", url2.StringNoDefaultPort())))
 	if err != nil {
-		return nil, errors2.NewHostError(err)
+		return nil, xerrors.NewSimpleError(err)
 	}
 	// Read response bytes in len(buf) byte chunks
 	for {
 		// Check if the context has been canceled before each read
 		if err := ctx.Err(); err != nil {
 			return nil, xerrors.NewSimpleError(err)
 		}
 		n, err := tlsConn.Read(buf)
 		if n > 0 {
 			data = append(data, buf[:n]...)
 		}
 		if len(data) > config.CONFIG.MaxResponseSize {
-			return nil, errors2.NewHostError(err)
+			contextlog.LogDebugWithContext(ctx, logging.GetSlogger(), "Response too large (max: %d bytes)", config.CONFIG.MaxResponseSize)
 			return nil, xerrors.NewSimpleError(fmt.Errorf("response too large"))
 		}
 		if err != nil {
-			if go_errors.Is(err, io.EOF) {
+			if errors.Is(err, io.EOF) {
 				break
 			}
-			return nil, errors2.NewHostError(err)
+			contextlog.LogDebugWithContext(ctx, logging.GetSlogger(), "Error reading data: %v", err)
 			return nil, xerrors.NewSimpleError(err)
 		}
 	}
 	contextlog.LogDebugWithContext(ctx, logging.GetSlogger(), "Received %d bytes of data", len(data))
 	return data, nil
 }
-func processData(s snapshot.Snapshot, data []byte) (*snapshot.Snapshot, error) {
+// UpdateSnapshotWithData processes the raw data from a Gemini response and populates the Snapshot.
 // This function is exported for use by the robotsMatch package.
 func UpdateSnapshotWithData(s snapshot.Snapshot, data []byte) *snapshot.Snapshot {
 	header, body, err := getHeadersAndData(data)
 	if err != nil {
-		return nil, err
+		s.Error = null.StringFrom(err.Error())
 		return &s
 	}
 	code, mimeType, lang := getMimeTypeAndLang(header)
@@ -182,13 +230,14 @@ func processData(s snapshot.Snapshot, data []byte) (*snapshot.Snapshot, error) {
 	if mimeType == "text/gemini" {
 		validBody, err := BytesToValidUTF8(body)
 		if err != nil {
-			return nil, go_errors.NewError(err)
+			s.Error = null.StringFrom(err.Error())
 			return &s
 		}
 		s.GemText = null.StringFrom(validBody)
 	} else {
 		s.Data = null.ValueFrom(body)
 	}
-	return &s, nil
+	return &s
 }
 // Checks for a Gemini header, which is
@@ -198,7 +247,7 @@ func processData(s snapshot.Snapshot, data []byte) (*snapshot.Snapshot, error) {
 func getHeadersAndData(data []byte) (string, []byte, error) {
 	firstLineEnds := slices.Index(data, '\n')
 	if firstLineEnds == -1 {
-		return "", nil, errors2.NewHostError(fmt.Errorf("error parsing header"))
+		return "", nil, xerrors.NewSimpleError(fmt.Errorf("error parsing header"))
 	}
 	firstLine := string(data[:firstLineEnds])
 	rest := data[firstLineEnds+1:]
@@ -216,17 +265,17 @@ func getMimeTypeAndLang(headers string) (int, string, string) {
 	// - Only capturing the lang value, ignoring charset
 	re := regexp.MustCompile(`^(\d+)\s+([a-zA-Z0-9/\-+]+)(?:(?:[\s;]+(?:charset=[^;\s]+|lang=([a-zA-Z0-9-]+)))*)\s*$`)
 	matches := re.FindStringSubmatch(headers)
-	if matches == nil || len(matches) <= 1 {
+	if len(matches) <= 1 {
 		// If full format doesn't match, try to match redirect format: "<code> <URL>"
 		// This handles cases like "31 gemini://example.com"
 		re := regexp.MustCompile(`^(\d+)\s+(.+)$`)
 		matches := re.FindStringSubmatch(headers)
-		if matches == nil || len(matches) <= 1 {
+		if len(matches) <= 1 {
 			// If redirect format doesn't match, try to match just a status code
 			// This handles cases like "99"
 			re := regexp.MustCompile(`^(\d+)\s*$`)
 			matches := re.FindStringSubmatch(headers)
-			if matches == nil || len(matches) <= 1 {
+			if len(matches) <= 1 {
 				return 0, "", ""
 			}
 			code, err := strconv.Atoi(matches[1])
@@ -249,7 +298,3 @@ func getMimeTypeAndLang(headers string) (int, string, string) {
 	lang := matches[3] // Will be empty string if no lang parameter was found
 	return code, mimeType, lang
 }
 func isGeminiCapsule(s *snapshot.Snapshot) bool {
 	return !s.Error.Valid && s.MimeType.Valid && s.MimeType.String == "text/gemini"
 }
--- a/gemini/network_test.go
+++ b/gemini/network_test.go
@@ -135,17 +135,7 @@ func TestProcessData(t *testing.T) {
 	for _, test := range tests {
 		t.Run(test.name, func(t *testing.T) {
 			s := snapshot.Snapshot{}
-			result, err := processData(s, test.inputData)
+			result := UpdateSnapshotWithData(s, test.inputData)
 			if test.expectedError && err == nil {
 				t.Errorf("Expected error, got nil")
 				return
 			}
 			if !test.expectedError && err != nil {
 				t.Errorf("Unexpected error: %v", err)
 				return
 			}
 			if test.expectedError {
 				return
@@ -175,192 +165,3 @@ func TestProcessData(t *testing.T) {
 		})
 	}
 }
 //// Mock Gemini server for testing ConnectAndGetData
 //func mockGeminiServer(response string, delay time.Duration, closeConnection bool) net.Listener {
 //	listener, err := net.Listen("tcp", "127.0.0.1:0") // Bind to a random available port
 //	if err != nil {
 //		panic(fmt.Sprintf("Failed to create mock server: %v", err))
 //	}
 //
 //	go func() {
 //		conn, err := listener.Accept()
 //		if err != nil {
 //			if !closeConnection { // Don't panic if we closed the connection on purpose
 //				panic(fmt.Sprintf("Failed to accept connection: %v", err))
 //			}
 //			return
 //		}
 //		defer conn.Close()
 //
 //		time.Sleep(delay) // Simulate network latency
 //
 //		_, err = conn.Write([]byte(response))
 //		if err != nil && !closeConnection {
 //			panic(fmt.Sprintf("Failed to write response: %v", err))
 //		}
 //	}()
 //
 //	return listener
 //}
 // func TestConnectAndGetData(t *testing.T) {
 // 	config.CONFIG = config.ConfigStruct{
 // 		ResponseTimeout: 5,
 // 		MaxResponseSize: 1024 * 1024,
 // 	}
 // 	tests := []struct {
 // 		name            string
 // 		serverResponse  string
 // 		serverDelay     time.Duration
 // 		expectedData    []byte
 // 		expectedError   bool
 // 		closeConnection bool
 // 	}{
 // 		{
 // 			name:           "Successful response",
 // 			serverResponse: "20 text/gemini\r\n# Hello",
 // 			expectedData:   []byte("20 text/gemini\r\n# Hello"),
 // 			expectedError:  false,
 // 		},
 // 		{
 // 			name:           "Server error",
 // 			serverResponse: "50 Server error\r\n",
 // 			expectedData:   []byte("50 Server error\r\n"),
 // 			expectedError:  false,
 // 		},
 // 		{
 // 			name:          "Timeout",
 // 			serverDelay:   6 * time.Second, // Longer than the timeout
 // 			expectedError: true,
 // 		},
 // 		{
 // 			name:            "Server closes connection",
 // 			closeConnection: true,
 // 			expectedError:   true,
 // 		},
 // 	}
 // 	for _, test := range tests {
 // 		t.Run(test.name, func(t *testing.T) {
 // 			listener := mockGeminiServer(test.serverResponse, test.serverDelay, test.closeConnection)
 // 			defer func() {
 // 				test.closeConnection = true // Prevent panic in mock server
 // 				listener.Close()
 // 			}()
 // 			addr := listener.Addr().String()
 // 			data, err := ConnectAndGetData(fmt.Sprintf("gemini://%s/", addr))
 // 			if test.expectedError && err == nil {
 // 				t.Errorf("Expected error, got nil")
 // 			}
 // 			if !test.expectedError && err != nil {
 // 				t.Errorf("Unexpected error: %v", err)
 // 			}
 // 			if !slices.Equal(data, test.expectedData) {
 // 				t.Errorf("Expected data '%s', got '%s'", test.expectedData, data)
 // 			}
 // 		})
 // 	}
 // }
 // func TestVisit(t *testing.T) {
 // 	config.CONFIG = config.ConfigStruct{
 // 		ResponseTimeout: 5,
 // 		MaxResponseSize: 1024 * 1024,
 // 	}
 // 	tests := []struct {
 // 		name           string
 // 		serverResponse string
 // 		expectedCode   int
 // 		expectedMime   string
 // 		expectedError  bool
 // 		expectedLinks  []string
 // 	}{
 // 		{
 // 			name:           "Successful response",
 // 			serverResponse: "20 text/gemini\r\n# Hello\n=> /link1 Link 1\n=> /link2 Link 2",
 // 			expectedCode:   20,
 // 			expectedMime:   "text/gemini",
 // 			expectedError:  false,
 // 			expectedLinks:  []string{"gemini://127.0.0.1:1965/link1", "gemini://127.0.0.1:1965/link2"},
 // 		},
 // 		{
 // 			name:           "Server error",
 // 			serverResponse: "50 Server error\r\n",
 // 			expectedCode:   50,
 // 			expectedMime:   "Server error",
 // 			expectedError:  false,
 // 			expectedLinks:  []string{},
 // 		},
 // 	}
 // 	for _, test := range tests {
 // 		t.Run(test.name, func(t *testing.T) {
 // 			listener := mockGeminiServer(test.serverResponse, 0, false)
 // 			defer listener.Close()
 // 			addr := listener.Addr().String()
 // 			snapshot, err := Visit(fmt.Sprintf("gemini://%s/", addr))
 // 			if test.expectedError && err == nil {
 // 				t.Errorf("Expected error, got nil")
 // 			}
 // 			if !test.expectedError && err != nil {
 // 				t.Errorf("Unexpected error: %v", err)
 // 			}
 // 			if snapshot.ResponseCode.ValueOrZero() != int64(test.expectedCode) {
 // 				t.Errorf("Expected code %d, got %d", test.expectedCode, snapshot.ResponseCode.ValueOrZero())
 // 			}
 // 			if snapshot.MimeType.ValueOrZero() != test.expectedMime {
 // 				t.Errorf("Expected mimeType '%s', got '%s'", test.expectedMime, snapshot.MimeType.ValueOrZero())
 // 			}
 // 			if test.expectedLinks != nil {
 // 				links, _ := snapshot.Links.Value()
 // 				if len(links) != len(test.expectedLinks) {
 // 					t.Errorf("Expected %d links, got %d", len(test.expectedLinks), len(links))
 // 				}
 // 				for i, link := range links {
 // 					if link != test.expectedLinks[i] {
 // 						t.Errorf("Expected link '%s', got '%s'", test.expectedLinks[i], link)
 // 					}
 // 				}
 // 			}
 // 		})
 // 	}
 // }
 func TestVisit_InvalidURL(t *testing.T) {
 	t.Parallel()
 	_, err := Visit("invalid-url")
 	if err == nil {
 		t.Errorf("Expected error for invalid URL, got nil")
 	}
 }
 //func TestVisit_GeminiError(t *testing.T) {
 //	listener := mockGeminiServer("51 Not Found\r\n", 0, false)
 //	defer listener.Close()
 //	addr := listener.Addr().String()
 //
 //	s, err := Visit(fmt.Sprintf("gemini://%s/", addr))
 //	if err != nil {
 //		t.Errorf("Unexpected error: %v", err)
 //	}
 //
 //	expectedError := "51 Not Found"
 //	if s.Error.ValueOrZero() != expectedError {
 //		t.Errorf("Expected error in snapshot: %v, got %v", expectedError, s.Error)
 //	}
 //
 //	expectedCode := 51
 //	if s.ResponseCode.ValueOrZero() != int64(expectedCode) {
 //		t.Errorf("Expected code %d, got %d", expectedCode, s.ResponseCode.ValueOrZero())
 //	}
 //}
--- a/gemini/processing.go
+++ b/gemini/processing.go
@@ -7,6 +7,8 @@ import (
 	"io"
 	"unicode/utf8"
 	"gemini-grc/config"
 	"git.antanst.com/antanst/xerrors"
 	"golang.org/x/text/encoding/charmap"
 	"golang.org/x/text/encoding/japanese"
 	"golang.org/x/text/encoding/korean"
@@ -22,11 +24,16 @@ func BytesToValidUTF8(input []byte) (string, error) {
 	if len(input) == 0 {
 		return "", nil
 	}
-	const maxSize = 10 * 1024 * 1024 // 10MB
+
-	if len(input) > maxSize {
+	maxSize := config.CONFIG.MaxResponseSize
-		return "", fmt.Errorf("%w: %d bytes (max %d)", ErrInputTooLarge, len(input), maxSize)
+	if maxSize == 0 {
 		maxSize = 1024 * 1024 // Default 1MB for tests
 	}
-	// remove NULL byte 0x00 (ReplaceAll accepts slices)
+	if len(input) > maxSize {
 		return "", xerrors.NewError(fmt.Errorf("BytesToValidUTF8: %w: %d bytes (max %d)", ErrInputTooLarge, len(input), maxSize), 0, "", false)
 	}
 	// Always remove NULL bytes first (before UTF-8 validity check)
 	inputNoNull := bytes.ReplaceAll(input, []byte{byte(0)}, []byte{})
 	if utf8.Valid(inputNoNull) {
 		return string(inputNoNull), nil
@@ -41,6 +48,8 @@ func BytesToValidUTF8(input []byte) (string, error) {
 		japanese.EUCJP.NewDecoder(),      // Japanese
 		korean.EUCKR.NewDecoder(),        // Korean
 	}
 	// Still invalid Unicode. Try some encodings to convert to.
 	// First successful conversion wins.
 	var lastErr error
 	for _, encoding := range encodings {
@@ -55,5 +64,5 @@ func BytesToValidUTF8(input []byte) (string, error) {
 		}
 	}
-	return "", fmt.Errorf("%w (tried %d encodings): %w", ErrUTF8Conversion, len(encodings), lastErr)
+	return "", xerrors.NewError(fmt.Errorf("BytesToValidUTF8: %w (tried %d encodings): %w", ErrUTF8Conversion, len(encodings), lastErr), 0, "", false)
 }
--- a/gemini/robotmatch.go
+++ b/gemini/robotmatch.go
@@ -1,95 +0,0 @@
 package gemini
 import (
 	"fmt"
 	"strings"
 	"sync"
 	"gemini-grc/common/snapshot"
 	geminiUrl "gemini-grc/common/url"
 	"gemini-grc/logging"
 )
 // RobotsCache is a map of blocked URLs
 // key: URL
 // value: []string list of disallowed URLs
 // If a key has no blocked URLs, an empty
 // list is stored for caching.
 var RobotsCache sync.Map //nolint:gochecknoglobals
 func populateRobotsCache(key string) (entries []string, _err error) {
 	// We either store an empty list when
 	// no rules, or a list of disallowed URLs.
 	// This applies even if we have an error
 	// finding/downloading robots.txt
 	defer func() {
 		RobotsCache.Store(key, entries)
 	}()
 	url := fmt.Sprintf("gemini://%s/robots.txt", key)
 	robotsContent, err := ConnectAndGetData(url)
 	if err != nil {
 		return []string{}, err
 	}
 	s, err := snapshot.SnapshotFromURL(url, true)
 	if err != nil {
 		return []string{}, nil
 	}
 	s, err = processData(*s, robotsContent)
 	if err != nil {
 		logging.LogDebug("robots.txt error %s", err)
 		return []string{}, nil
 	}
 	if s.ResponseCode.ValueOrZero() != 20 {
 		logging.LogDebug("robots.txt error code %d, ignoring", s.ResponseCode.ValueOrZero())
 		return []string{}, nil
 	}
 	// Some return text/plain, others text/gemini.
 	// According to spec, the first is correct,
 	// however let's be lenient
 	var data string
 	switch {
 	case s.MimeType.ValueOrZero() == "text/plain":
 		data = string(s.Data.ValueOrZero())
 	case s.MimeType.ValueOrZero() == "text/gemini":
 		data = s.GemText.ValueOrZero()
 	default:
 		return []string{}, nil
 	}
 	entries = ParseRobotsTxt(data, key)
 	return entries, nil
 }
 // RobotMatch checks if the snapshot URL matches
 // a robots.txt allow rule.
 func RobotMatch(u string) (bool, error) {
 	url, err := geminiUrl.ParseURL(u, "", true)
 	if err != nil {
 		return false, err
 	}
 	key := strings.ToLower(fmt.Sprintf("%s:%d", url.Hostname, url.Port))
 	var disallowedURLs []string
 	cacheEntries, ok := RobotsCache.Load(key)
 	if !ok {
 		// First time check, populate robot cache
 		disallowedURLs, err := populateRobotsCache(key)
 		if err != nil {
 			return false, err
 		}
 		if len(disallowedURLs) > 0 {
 			logging.LogDebug("Added to robots.txt cache: %v => %v", key, disallowedURLs)
 		}
 	} else {
 		disallowedURLs, _ = cacheEntries.([]string)
 	}
 	return isURLblocked(disallowedURLs, url.Full), nil
 }
 func isURLblocked(disallowedURLs []string, input string) bool {
 	for _, url := range disallowedURLs {
 		if strings.HasPrefix(strings.ToLower(input), url) {
 			logging.LogDebug("robots.txt match: %s matches %s", input, url)
 			return true
 		}
 	}
 	return false
 }
--- a/gemini/robots.go
+++ b/gemini/robots.go
@@ -1,31 +0,0 @@
 package gemini
 import (
 	"fmt"
 	"strings"
 )
 // ParseRobotsTxt takes robots.txt content and a host, and
 // returns a list of full URLs that shouldn't
 // be visited.
 // TODO Also take into account the user agent?
 // Check gemini://geminiprotocol.net/docs/companion/robots.gmi
 func ParseRobotsTxt(content string, host string) []string {
 	var disallowedPaths []string
 	for _, line := range strings.Split(content, "\n") {
 		line = strings.TrimSpace(line)
 		line = strings.ToLower(line)
 		if strings.HasPrefix(line, "disallow:") {
 			parts := strings.SplitN(line, ":", 2)
 			if len(parts) == 2 {
 				path := strings.TrimSpace(parts[1])
 				if path != "" {
 					// Construct full Gemini URL
 					disallowedPaths = append(disallowedPaths,
 						fmt.Sprintf("gemini://%s%s", host, path))
 				}
 			}
 		}
 	}
 	return disallowedPaths
 }
--- a/go.mod
+++ b/go.mod
@@ -1,15 +1,15 @@
 module gemini-grc
-go 1.23.1
+go 1.24.3
 require (
-	github.com/antanst/go_errors v0.0.1
+	git.antanst.com/antanst/logging v0.0.1
 	git.antanst.com/antanst/uid v0.0.1
 	git.antanst.com/antanst/xerrors v0.0.2
 	github.com/guregu/null/v5 v5.0.0
 	github.com/jackc/pgx/v5 v5.7.2
 	github.com/jmoiron/sqlx v1.4.0
 	github.com/lib/pq v1.10.9
 	github.com/matoous/go-nanoid/v2 v2.1.0
 	github.com/rs/zerolog v1.33.0
 	github.com/stretchr/testify v1.9.0
 	golang.org/x/text v0.21.0
 )
@@ -20,12 +20,15 @@ require (
 	github.com/jackc/pgservicefile v0.0.0-20240606120523-5a60cdf6a761 // indirect
 	github.com/jackc/puddle/v2 v2.2.2 // indirect
 	github.com/kr/text v0.2.0 // indirect
 	github.com/mattn/go-colorable v0.1.14 // indirect
 	github.com/mattn/go-isatty v0.0.20 // indirect
 	github.com/pmezard/go-difflib v1.0.0 // indirect
 	github.com/rogpeppe/go-internal v1.13.1 // indirect
 	golang.org/x/crypto v0.32.0 // indirect
 	golang.org/x/sync v0.10.0 // indirect
 	golang.org/x/sys v0.29.0 // indirect
 	gopkg.in/yaml.v3 v3.0.1 // indirect
 )
 replace git.antanst.com/antanst/xerrors => ../xerrors
 replace git.antanst.com/antanst/uid => ../uid
 replace git.antanst.com/antanst/logging => ../logging
--- a/go.sum
+++ b/go.sum
@@ -1,15 +1,11 @@
 filippo.io/edwards25519 v1.1.0 h1:FNf4tywRC1HmFuKW5xopWpigGjJKiJSV0Cqo0cJWDaA=
 filippo.io/edwards25519 v1.1.0/go.mod h1:BxyFTGdWcka3PhytdK4V28tE5sGfRvvvRV7EaN4VDT4=
 github.com/antanst/go_errors v0.0.1 h1:55BJ8I3u9IeLJxVslbI8Hv8fM0+fWyIE2VQXuwuYg9Y=
 github.com/antanst/go_errors v0.0.1/go.mod h1:VDiDlRB7JfRhr6GMqdChBGT1XTBIfzELhg3Yq7sVwhM=
 github.com/coreos/go-systemd/v22 v22.5.0/go.mod h1:Y58oyj3AT4RCenI/lSvhwexgC+NSVTIJ3seZv2GcEnc=
 github.com/creack/pty v1.1.9/go.mod h1:oKZEueFk5CKHvIhNR5MUki03XCEU+Q6VDXinZuGJ33E=
 github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
 github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
 github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
 github.com/go-sql-driver/mysql v1.8.1 h1:LedoTUt/eveggdHS9qUFC1EFSa8bU2+1pZjSRpvNJ1Y=
 github.com/go-sql-driver/mysql v1.8.1/go.mod h1:wEBSXgmK//2ZFJyE+qWnIsVGmvmEKlqwuVSjsCm7DZg=
 github.com/godbus/dbus/v5 v5.0.4/go.mod h1:xhWf0FNVPg57R7Z0UbKHbJfkEywrmjJnf7w5xrFpKfA=
 github.com/guregu/null/v5 v5.0.0 h1:PRxjqyOekS11W+w/7Vfz6jgJE/BCwELWtgvOJzddimw=
 github.com/guregu/null/v5 v5.0.0/go.mod h1:SjupzNy+sCPtwQTKWhUCqjhVCO69hpsl2QsZrWHjlwU=
 github.com/jackc/pgpassfile v1.0.0 h1:/6Hmqy13Ss2zCq62VdNG8tM1wchn8zjSGOBJ6icpsIM=
@@ -28,25 +24,12 @@ github.com/kr/text v0.2.0 h1:5Nx0Ya0ZqY2ygV366QzturHI13Jq95ApcVaJBhpS+AY=
 github.com/kr/text v0.2.0/go.mod h1:eLer722TekiGuMkidMxC/pM04lWEeraHUUmBw8l2grE=
 github.com/lib/pq v1.10.9 h1:YXG7RB+JIjhP29X+OtkiDnYaXQwpS4JEWq7dtCCRUEw=
 github.com/lib/pq v1.10.9/go.mod h1:AlVN5x4E4T544tWzH6hKfbfQvm3HdbOxrmggDNAPY9o=
 github.com/matoous/go-nanoid/v2 v2.1.0 h1:P64+dmq21hhWdtvZfEAofnvJULaRR1Yib0+PnU669bE=
 github.com/matoous/go-nanoid/v2 v2.1.0/go.mod h1:KlbGNQ+FhrUNIHUxZdL63t7tl4LaPkZNpUULS8H4uVM=
 github.com/mattn/go-colorable v0.1.13/go.mod h1:7S9/ev0klgBDR4GtXTXX8a3vIGJpMovkB8vQcUbaXHg=
 github.com/mattn/go-colorable v0.1.14 h1:9A9LHSqF/7dyVVX6g0U9cwm9pG3kP9gSzcuIPHPsaIE=
 github.com/mattn/go-colorable v0.1.14/go.mod h1:6LmQG8QLFO4G5z1gPvYEzlUgJ2wF+stgPZH1UqBm1s8=
 github.com/mattn/go-isatty v0.0.16/go.mod h1:kYGgaQfpe5nmfYZH+SKPsOc2e4SrIfOl2e/yFXSvRLM=
 github.com/mattn/go-isatty v0.0.19/go.mod h1:W+V8PltTTMOvKvAeJH7IuucS94S2C6jfK/D7dTCTo3Y=
 github.com/mattn/go-isatty v0.0.20 h1:xfD0iDuEKnDkl03q4limB+vH+GxLEtL/jb4xVJSWWEY=
 github.com/mattn/go-isatty v0.0.20/go.mod h1:W+V8PltTTMOvKvAeJH7IuucS94S2C6jfK/D7dTCTo3Y=
 github.com/mattn/go-sqlite3 v1.14.22 h1:2gZY6PC6kBnID23Tichd1K+Z0oS6nE/XwU+Vz/5o4kU=
 github.com/mattn/go-sqlite3 v1.14.22/go.mod h1:Uh1q+B4BYcTPb+yiD3kU8Ct7aC0hY9fxUwlHK0RXw+Y=
 github.com/pkg/errors v0.9.1/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0=
 github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
 github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
 github.com/rogpeppe/go-internal v1.13.1 h1:KvO1DLK/DRN07sQ1LQKScxyZJuNnedQ5/wKSR38lUII=
 github.com/rogpeppe/go-internal v1.13.1/go.mod h1:uMEvuHeurkdAXX61udpOXGD/AzZDWNMNyH2VO9fmH0o=
 github.com/rs/xid v1.5.0/go.mod h1:trrq9SKmegXys3aeAKXMUTdJsYXVwGY3RLcfgqegfbg=
 github.com/rs/zerolog v1.33.0 h1:1cU2KZkvPxNyfgEmhHAz/1A9Bz+llsdYzklWFzgp0r8=
 github.com/rs/zerolog v1.33.0/go.mod h1:/7mN4D5sKwJLZQ2b/znpjC3/GQWY/xaDXUM0kKWRHss=
 github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME=
 github.com/stretchr/testify v1.3.0/go.mod h1:M5WIy9Dh21IEIfnGCwXGc5bZfKNJtfHm1UVUgZn+9EI=
 github.com/stretchr/testify v1.7.0/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg=
@@ -56,11 +39,6 @@ golang.org/x/crypto v0.32.0 h1:euUpcYgM8WcP71gNpTqQCn6rC2t6ULUPiOzfWaXVVfc=
 golang.org/x/crypto v0.32.0/go.mod h1:ZnnJkOaASj8g0AjIduWNlq2NRxL0PlBrbKVyZ6V/Ugc=
 golang.org/x/sync v0.10.0 h1:3NQrjDixjgGwUOCaF8w2+VYHv0Ve/vGYSbdkTa98gmQ=
 golang.org/x/sync v0.10.0/go.mod h1:Czt+wKu1gCyEFDUtn0jG5QVvpJ6rzVqr5aXyt9drQfk=
 golang.org/x/sys v0.0.0-20220811171246-fbc7d0a398ab/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
 golang.org/x/sys v0.6.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
 golang.org/x/sys v0.12.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
 golang.org/x/sys v0.29.0 h1:TPYlXGxvx1MGTn2GiZDhnjPA9wZzZeGKHHmKhHYvgaU=
 golang.org/x/sys v0.29.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA=
 golang.org/x/text v0.21.0 h1:zyQAAkrwaneQ066sspRyJaG9VNi/YJ1NfzcGB3hZ/qo=
 golang.org/x/text v0.21.0/go.mod h1:4IBbMaMmOPCJ8SecivzSH54+73PCFmPWxNTLm+vZkEQ=
 gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
--- a/gopher/errors.go
+++ b/gopher/errors.go
@@ -1,6 +1,6 @@
 package gopher
-import "github.com/antanst/go_errors"
+import "errors"
 // GopherError is an error encountered while
 // visiting a Gopher host, and is only for
@@ -26,5 +26,5 @@ func IsGopherError(err error) bool {
 		return false
 	}
 	var asError *GopherError
-	return go_errors.As(err, &asError)
+	return errors.As(err, &asError)
 }
--- a/gopher/network.go
+++ b/gopher/network.go
@@ -1,6 +1,7 @@
 package gopher
 import (
 	"errors"
 	"fmt"
 	"io"
 	"net"
@@ -8,16 +9,11 @@ import (
 	"regexp"
 	"strings"
 	"time"
 	"unicode/utf8"
-	errors2 "gemini-grc/common/errors"
+	commonErrors "gemini-grc/common/errors"
 	"gemini-grc/common/linkList"
 	"gemini-grc/common/snapshot"
 	_url "gemini-grc/common/url"
 	"gemini-grc/config"
-	"gemini-grc/logging"
+	"git.antanst.com/antanst/logging"
-	"github.com/antanst/go_errors"
+	"git.antanst.com/antanst/xerrors"
 	"github.com/guregu/null/v5"
 )
 // References:
@@ -61,64 +57,10 @@ import (
 // The original Gopher protocol only specified types 0-9, `+`, `g`, `I`, and `T`.
 // The others were added by various implementations and extensions over time.
 // Error methodology:
 // HostError for DNS/network errors
 // GopherError for network/gopher errors
 // NewError for other errors
 // NewFatalError for other fatal errors
 func Visit(url string) (*snapshot.Snapshot, error) {
 	s, err := snapshot.SnapshotFromURL(url, false)
 	if err != nil {
 		return nil, err
 	}
 	data, err := connectAndGetData(url)
 	if err != nil {
 		logging.LogDebug("Error: %s", err.Error())
 		if IsGopherError(err) || errors2.IsHostError(err) {
 			s.Error = null.StringFrom(err.Error())
 			return s, nil
 		}
 		return nil, err
 	}
 	isValidUTF8 := utf8.ValidString(string(data))
 	if isValidUTF8 {
 		s.GemText = null.StringFrom(removeNullChars(string(data)))
 	} else {
 		s.Data = null.ValueFrom(data)
 	}
 	if !isValidUTF8 {
 		return s, nil
 	}
 	responseError := checkForError(string(data))
 	if responseError != nil {
 		s.Error = null.StringFrom(responseError.Error())
 		return s, nil
 	}
 	links := getGopherPageLinks(string(data))
 	linkURLs := linkList.LinkList(make([]_url.URL, len(links)))
 	for i, link := range links {
 		linkURL, err := _url.ParseURL(link, "", true)
 		if err == nil {
 			linkURLs[i] = *linkURL
 		}
 	}
 	if len(links) != 0 {
 		s.Links = null.ValueFrom(linkURLs)
 	}
 	return s, nil
 }
 func connectAndGetData(url string) ([]byte, error) {
 	parsedURL, err := stdurl.Parse(url)
 	if err != nil {
-		return nil, go_errors.NewError(err)
+		return nil, xerrors.NewError(fmt.Errorf("error parsing URL: %w", err), 0, "", false)
 	}
 	hostname := parsedURL.Hostname()
@@ -135,7 +77,7 @@ func connectAndGetData(url string) ([]byte, error) {
 	logging.LogDebug("Dialing %s", host)
 	conn, err := dialer.Dial("tcp", host)
 	if err != nil {
-		return nil, errors2.NewHostError(err)
+		return nil, commonErrors.NewHostError(err)
 	}
 	// Make sure we always close the connection.
 	defer func() {
@@ -145,11 +87,11 @@ func connectAndGetData(url string) ([]byte, error) {
 	// Set read and write timeouts on the TCP connection.
 	err = conn.SetReadDeadline(time.Now().Add(timeoutDuration))
 	if err != nil {
-		return nil, errors2.NewHostError(err)
+		return nil, commonErrors.NewHostError(err)
 	}
 	err = conn.SetWriteDeadline(time.Now().Add(timeoutDuration))
 	if err != nil {
-		return nil, errors2.NewHostError(err)
+		return nil, commonErrors.NewHostError(err)
 	}
 	// We read `buf`-sized chunks and add data to `data`.
@@ -160,7 +102,7 @@ func connectAndGetData(url string) ([]byte, error) {
 	payload := constructPayloadFromPath(parsedURL.Path)
 	_, err = conn.Write([]byte(fmt.Sprintf("%s\r\n", payload)))
 	if err != nil {
-		return nil, errors2.NewHostError(err)
+		return nil, commonErrors.NewHostError(err)
 	}
 	// Read response bytes in len(buf) byte chunks
 	for {
@@ -169,13 +111,13 @@ func connectAndGetData(url string) ([]byte, error) {
 			data = append(data, buf[:n]...)
 		}
 		if err != nil {
-			if go_errors.Is(err, io.EOF) {
+			if errors.Is(err, io.EOF) {
 				break
 			}
-			return nil, errors2.NewHostError(err)
+			return nil, commonErrors.NewHostError(err)
 		}
 		if len(data) > config.CONFIG.MaxResponseSize {
-			return nil, errors2.NewHostError(fmt.Errorf("response exceeded max"))
+			return nil, commonErrors.NewHostError(fmt.Errorf("response exceeded max"))
 		}
 	}
 	logging.LogDebug("Got %d bytes", len(data))
@@ -276,8 +218,3 @@ func getGopherPageLinks(content string) []string {
 	return links
 }
 func removeNullChars(input string) string {
 	// Replace all null characters with an empty string
 	return strings.ReplaceAll(input, "\u0000", "")
 }
--- a/gopher/network_context.go
+++ b/gopher/network_context.go
@@ -0,0 +1,205 @@
 package gopher
 import (
 	"context"
 	"errors"
 	"fmt"
 	"io"
 	"net"
 	stdurl "net/url"
 	"time"
 	"unicode/utf8"
 	"gemini-grc/common/contextlog"
 	commonErrors "gemini-grc/common/errors"
 	"gemini-grc/common/linkList"
 	"gemini-grc/common/snapshot"
 	"gemini-grc/common/text"
 	_url "gemini-grc/common/url"
 	"gemini-grc/config"
 	"gemini-grc/contextutil"
 	"git.antanst.com/antanst/logging"
 	"git.antanst.com/antanst/xerrors"
 	"github.com/guregu/null/v5"
 )
 // VisitWithContext is a context-aware version of Visit that visits
 // a given URL using the Gopher protocol. It uses the context for
 // cancellation, timeout, and logging.
 func VisitWithContext(ctx context.Context, url string) (*snapshot.Snapshot, error) {
 	// Create a gopher-specific context with the "gopher" component
 	gopherCtx := contextutil.ContextWithComponent(ctx, "gopher")
 	if !config.CONFIG.GopherEnable {
 		contextlog.LogDebugWithContext(gopherCtx, logging.GetSlogger(), "Gopher protocol is disabled")
 		return nil, nil
 	}
 	s, err := snapshot.SnapshotFromURL(url, true)
 	if err != nil {
 		contextlog.LogErrorWithContext(gopherCtx, logging.GetSlogger(), "Failed to create snapshot from URL: %v", err)
 		return nil, err
 	}
 	// Check if the context is canceled
 	if err := ctx.Err(); err != nil {
 		return nil, err
 	}
 	data, err := connectAndGetDataWithContext(gopherCtx, url)
 	if err != nil {
 		contextlog.LogDebugWithContext(gopherCtx, logging.GetSlogger(), "Error: %s", err.Error())
 		if IsGopherError(err) || commonErrors.IsHostError(err) {
 			s.Error = null.StringFrom(err.Error())
 			return s, nil
 		}
 		return nil, err
 	}
 	// Check if the context is canceled
 	if err := ctx.Err(); err != nil {
 		return nil, err
 	}
 	isValidUTF8 := utf8.ValidString(string(data))
 	if isValidUTF8 {
 		s.GemText = null.StringFrom(text.RemoveNullChars(string(data)))
 		contextlog.LogDebugWithContext(gopherCtx, logging.GetSlogger(), "Response is valid UTF-8 text (%d bytes)", len(data))
 	} else {
 		s.Data = null.ValueFrom(data)
 		contextlog.LogDebugWithContext(gopherCtx, logging.GetSlogger(), "Response is binary data (%d bytes)", len(data))
 	}
 	if !isValidUTF8 {
 		return s, nil
 	}
 	responseError := checkForError(string(data))
 	if responseError != nil {
 		contextlog.LogErrorWithContext(gopherCtx, logging.GetSlogger(), "Gopher server returned error: %v", responseError)
 		s.Error = null.StringFrom(responseError.Error())
 		return s, nil
 	}
 	// Extract links from the response
 	links := getGopherPageLinks(string(data))
 	linkURLs := linkList.LinkList(make([]_url.URL, len(links)))
 	for i, link := range links {
 		linkURL, err := _url.ParseURL(link, "", true)
 		if err == nil {
 			linkURLs[i] = *linkURL
 		}
 	}
 	if len(links) != 0 {
 		s.Links = null.ValueFrom(linkURLs)
 		contextlog.LogDebugWithContext(gopherCtx, logging.GetSlogger(), "Found %d links in gopher page", len(links))
 	}
 	contextlog.LogDebugWithContext(gopherCtx, logging.GetSlogger(), "Successfully visited Gopher URL: %s", url)
 	return s, nil
 }
 // connectAndGetDataWithContext is a context-aware version of connectAndGetData
 func connectAndGetDataWithContext(ctx context.Context, url string) ([]byte, error) {
 	parsedURL, err := stdurl.Parse(url)
 	if err != nil {
 		return nil, xerrors.NewError(fmt.Errorf("error parsing URL: %w", err), 0, "", false)
 	}
 	hostname := parsedURL.Hostname()
 	port := parsedURL.Port()
 	if port == "" {
 		port = "70"
 	}
 	host := fmt.Sprintf("%s:%s", hostname, port)
 	// Use the context's deadline if it has one, otherwise use the config timeout
 	var timeoutDuration time.Duration
 	deadline, ok := ctx.Deadline()
 	if ok {
 		timeoutDuration = time.Until(deadline)
 	} else {
 		timeoutDuration = time.Duration(config.CONFIG.ResponseTimeout) * time.Second
 	}
 	// Check if the context is canceled
 	if err := ctx.Err(); err != nil {
 		return nil, err
 	}
 	contextlog.LogDebugWithContext(ctx, logging.GetSlogger(), "Dialing %s", host)
 	// Establish the underlying TCP connection with context-based cancellation
 	dialer := &net.Dialer{
 		Timeout: timeoutDuration,
 	}
 	// Use DialContext to allow cancellation via context
 	conn, err := dialer.DialContext(ctx, "tcp", host)
 	if err != nil {
 		contextlog.LogErrorWithContext(ctx, logging.GetSlogger(), "Failed to connect: %v", err)
 		return nil, commonErrors.NewHostError(err)
 	}
 	// Make sure we always close the connection
 	defer func() {
 		_ = conn.Close()
 	}()
 	// Set read and write timeouts on the TCP connection
 	err = conn.SetReadDeadline(time.Now().Add(timeoutDuration))
 	if err != nil {
 		return nil, commonErrors.NewHostError(err)
 	}
 	err = conn.SetWriteDeadline(time.Now().Add(timeoutDuration))
 	if err != nil {
 		return nil, commonErrors.NewHostError(err)
 	}
 	// We read `buf`-sized chunks and add data to `data`
 	buf := make([]byte, 4096)
 	var data []byte
 	// Check if the context is canceled before sending request
 	if err := ctx.Err(); err != nil {
 		return nil, err
 	}
 	// Send Gopher request to trigger server response
 	payload := constructPayloadFromPath(parsedURL.Path)
 	contextlog.LogDebugWithContext(ctx, logging.GetSlogger(), "Sending request with payload: %s", payload)
 	_, err = conn.Write([]byte(fmt.Sprintf("%s\r\n", payload)))
 	if err != nil {
 		contextlog.LogErrorWithContext(ctx, logging.GetSlogger(), "Failed to send request: %v", err)
 		return nil, commonErrors.NewHostError(err)
 	}
 	// Read response bytes in len(buf) byte chunks
 	for {
 		// Check if the context is canceled before each read
 		if err := ctx.Err(); err != nil {
 			return nil, err
 		}
 		n, err := conn.Read(buf)
 		if n > 0 {
 			data = append(data, buf[:n]...)
 		}
 		if err != nil {
 			if errors.Is(err, io.EOF) {
 				break
 			}
 			contextlog.LogErrorWithContext(ctx, logging.GetSlogger(), "Error reading data: %v", err)
 			return nil, commonErrors.NewHostError(err)
 		}
 		if len(data) > config.CONFIG.MaxResponseSize {
 			contextlog.LogErrorWithContext(ctx, logging.GetSlogger(), "Response too large (max: %d bytes)", config.CONFIG.MaxResponseSize)
 			return nil, commonErrors.NewHostError(fmt.Errorf("response exceeded max"))
 		}
 	}
 	contextlog.LogDebugWithContext(ctx, logging.GetSlogger(), "Received %d bytes", len(data))
 	return data, nil
 }
--- a/gopher/network_test.go
+++ b/gopher/network_test.go
@@ -288,7 +288,7 @@ func TestConnectAndGetDataTimeout(t *testing.T) {
 	// Check if the error is due to timeout
 	if err == nil {
 		t.Error("Expected an error due to timeout, but got no error")
-	} else if !errors.IsHostError(err) {
+	} else if !commonErrors.IsHostError(err) {
 		t.Errorf("Expected a HostError, but got: %v", err)
 	} else {
 		// Here you might want to check if the specific error message contains 'timeout'
--- a/hostPool/hostPool.go
+++ b/hostPool/hostPool.go
@@ -1,49 +1,75 @@
 package hostPool
 import (
 	"context"
 	"math/rand"
 	"sync"
 	"time"
-	"gemini-grc/logging"
+	"gemini-grc/common/contextlog"
 	"gemini-grc/contextutil"
 	"git.antanst.com/antanst/logging"
 	"git.antanst.com/antanst/xerrors"
 )
-var hostPool = HostPool{hostnames: make(map[string]struct{})} //nolint:gochecknoglobals
+var hostPool = HostPool{hostnames: make(map[string]struct{})}
 type HostPool struct {
 	hostnames map[string]struct{}
 	lock      sync.RWMutex
 }
-//func (p *HostPool) add(key string) {
+// RemoveHostFromPool removes a host from the pool with context awareness
-//	p.lock.Lock()
+func RemoveHostFromPool(ctx context.Context, key string) {
-//	defer p.lock.Unlock()
+	hostCtx := contextutil.ContextWithComponent(ctx, "hostPool")
 //	p.hostnames[key] = struct{}{}
 //}
 //
 //func (p *HostPool) has(key string) bool {
 //	p.lock.RLock()
 //	defer p.lock.RUnlock()
 //	_, ok := p.hostnames[key]
 //	return ok
 //}
 func RemoveHostFromPool(key string) {
 	hostPool.lock.Lock()
 	defer hostPool.lock.Unlock()
 	delete(hostPool.hostnames, key)
 	hostPool.lock.Unlock()
 	// Add some jitter
 	time.Sleep(time.Duration(rand.Intn(1000)) * time.Millisecond)
 	contextlog.LogDebugWithContext(hostCtx, logging.GetSlogger(), "Host %s removed from pool", key)
 }
-func AddHostToHostPool(key string) {
+// AddHostToHostPool adds a host to the host pool with context awareness.
 // Blocks until the host is added or the context is canceled.
 func AddHostToHostPool(ctx context.Context, key string) error {
 	// Create a hostPool-specific context
 	hostCtx := contextutil.ContextWithComponent(ctx, "hostPool")
 	// Use a ticker to periodically check if we can add the host
 	ticker := time.NewTicker(500 * time.Millisecond)
 	defer ticker.Stop()
 	// We continuously poll the pool,
 	// and if the host isn't already
 	// there, we add it.
 	for {
 		// Check if context is done before attempting to acquire lock
 		select {
 		case <-ctx.Done():
 			return xerrors.NewSimpleError(ctx.Err())
 		default:
 			// Continue with attempt to add host
 		}
 		hostPool.lock.Lock()
 		_, exists := hostPool.hostnames[key]
 		if !exists {
 			hostPool.hostnames[key] = struct{}{}
 			hostPool.lock.Unlock()
-			return
+			contextlog.LogDebugWithContext(hostCtx, logging.GetSlogger(), "Added host %s to pool", key)
 			return nil
 		}
 		hostPool.lock.Unlock()
-		time.Sleep(1 * time.Second)
+
-		logging.LogInfo("Waiting to add %s to pool...", key)
+		// Wait for next tick or context cancellation
 		select {
 		case <-ticker.C:
 			// Try again on next tick
 		case <-ctx.Done():
 			contextlog.LogDebugWithContext(hostCtx, logging.GetSlogger(), "Context canceled while waiting for host %s", key)
 			return xerrors.NewSimpleError(ctx.Err())
 		}
 	}
 }
--- a/logging/logging.go
+++ b/logging/logging.go
@@ -1,23 +0,0 @@
 package logging
 import (
 	"fmt"
 	zlog "github.com/rs/zerolog/log"
 )
 func LogDebug(format string, args ...interface{}) {
 	zlog.Debug().Msg(fmt.Sprintf(format, args...))
 }
 func LogInfo(format string, args ...interface{}) {
 	zlog.Info().Msg(fmt.Sprintf(format, args...))
 }
 func LogWarn(format string, args ...interface{}) {
 	zlog.Warn().Msg(fmt.Sprintf(format, args...))
 }
 func LogError(format string, args ...interface{}) {
 	zlog.Error().Err(fmt.Errorf(format, args...)).Msg("")
 }
--- a/main.go
+++ b/main.go
@@ -1,82 +0,0 @@
 package main
 import (
 	"fmt"
 	"os"
 	"os/signal"
 	"syscall"
 	"gemini-grc/common"
 	"gemini-grc/common/blackList"
 	"gemini-grc/config"
 	"gemini-grc/db"
 	"gemini-grc/logging"
 	"github.com/antanst/go_errors"
 	"github.com/jmoiron/sqlx"
 	"github.com/rs/zerolog"
 	zlog "github.com/rs/zerolog/log"
 )
 func main() {
 	config.CONFIG = *config.GetConfig()
 	zerolog.TimeFieldFormat = zerolog.TimeFormatUnix
 	zerolog.SetGlobalLevel(config.CONFIG.LogLevel)
 	zlog.Logger = zlog.Output(zerolog.ConsoleWriter{Out: os.Stderr, TimeFormat: "[2006-01-02 15:04:05]"})
 	err := runApp()
 	if err != nil {
 		var asErr *go_errors.Error
 		if go_errors.As(err, &asErr) {
 			logging.LogError("Unexpected error: %v", err)
 			_, _ = fmt.Fprintf(os.Stderr, "Unexpected error: %v", err)
 		} else {
 			logging.LogError("Unexpected error: %v", err)
 		}
 		os.Exit(1)
 	}
 }
 func runApp() (err error) {
 	logging.LogInfo("gemcrawl %s starting up. Press Ctrl+C to exit", common.VERSION)
 	signals := make(chan os.Signal, 1)
 	signal.Notify(signals, syscall.SIGINT, syscall.SIGTERM)
 	_db, err := db.ConnectToDB()
 	if err != nil {
 		return err
 	}
 	defer func(db *sqlx.DB) {
 		_ = db.Close()
 	}(_db)
 	err = blackList.LoadBlacklist()
 	if err != nil {
 		return err
 	}
 	common.StatusChan = make(chan common.WorkerStatus, config.CONFIG.NumOfWorkers)
 	common.ErrorsChan = make(chan error, config.CONFIG.NumOfWorkers)
 	// If there's an argument, visit this
 	// URL only and don't spawn other workers
 	if len(os.Args) > 1 {
 		url := os.Args[1]
 		err = common.CrawlOneURL(_db, &url)
 		return err
 	}
 	go common.SpawnWorkers(config.CONFIG.NumOfWorkers, _db)
 	for {
 		select {
 		case <-signals:
 			logging.LogWarn("Received SIGINT or SIGTERM signal, exiting")
 			return nil
 		case err := <-common.ErrorsChan:
 			if go_errors.IsFatal(err) {
 				return err
 			}
 			logging.LogError("%s", fmt.Sprintf("%v", err))
 		}
 	}
 }
--- a/misc/sql/README.md
+++ b/misc/sql/README.md
@@ -0,0 +1,28 @@
 # SQL Queries for Snapshot Analysis
 This directory contains SQL queries to analyze snapshot data in the gemini-grc database.
 ## Usage
 You can run these queries directly from psql using the `\i` directive:
 ```
 \i misc/sql/snapshots_per_url.sql
 ```
 ## Available Queries
 - **snapshots_per_url.sql** - Basic count of snapshots per URL
 - **snapshots_date_range.sql** - Shows snapshot count with date range information for each URL
 - **host_snapshot_stats.sql** - Groups snapshots by hosts and shows URLs with multiple snapshots
 - **content_changes.sql** - Finds URLs with the most content changes between consecutive snapshots
 - **snapshot_distribution.sql** - Shows the distribution of snapshots per URL (how many URLs have 1, 2, 3, etc. snapshots)
 - **recent_snapshot_activity.sql** - Shows URLs with most snapshots in the last 7 days
 - **storage_efficiency.sql** - Shows potential storage savings from deduplication
 - **snapshots_by_timeframe.sql** - Shows snapshot count by timeframe (day, week, month)
 ## Notes
 - These queries are designed to work with PostgreSQL and the gemini-grc database schema
 - Some queries may be resource-intensive on large databases
 - The results can help optimize storage and understand the effectiveness of the versioned snapshot feature
--- a/misc/sql/cleanup_duplicate_snapshots.sql
+++ b/misc/sql/cleanup_duplicate_snapshots.sql
@@ -0,0 +1,19 @@
 WITH snapshot_rankings AS (
      SELECT
          id,
          url,
          ROW_NUMBER() OVER (
              PARTITION BY url
              ORDER BY
                  CASE WHEN (gemtext IS NOT NULL AND gemtext != '') OR data IS NOT NULL
                       THEN 0 ELSE 1 END,
                  timestamp DESC
          ) as rn
      FROM snapshots
  )
  DELETE FROM snapshots
  WHERE id IN (
      SELECT id
      FROM snapshot_rankings
      WHERE rn > 1
  );
--- a/misc/sql/content_changes.sql
+++ b/misc/sql/content_changes.sql
@@ -0,0 +1,26 @@
 -- File: content_changes.sql
 -- Finds URLs with the most content changes between consecutive snapshots
 -- Usage: \i misc/sql/content_changes.sql
 WITH snapshot_changes AS (
    SELECT 
        s1.url,
        s1.timestamp as prev_timestamp,
        s2.timestamp as next_timestamp,
        s1.gemtext IS DISTINCT FROM s2.gemtext as gemtext_changed,
        s1.data IS DISTINCT FROM s2.data as data_changed
    FROM snapshots s1
    JOIN snapshots s2 ON s1.url = s2.url AND s1.timestamp < s2.timestamp
    WHERE NOT EXISTS (
        SELECT 1 FROM snapshots s3
        WHERE s3.url = s1.url AND s1.timestamp < s3.timestamp AND s3.timestamp < s2.timestamp
    )
 )
 SELECT 
    url,
    COUNT(*) + 1 as snapshot_count,
    SUM(CASE WHEN gemtext_changed OR data_changed THEN 1 ELSE 0 END) as content_changes
 FROM snapshot_changes
 GROUP BY url
 HAVING COUNT(*) + 1 > 1
 ORDER BY content_changes DESC, snapshot_count DESC;
--- a/misc/sql/crawl_top_level.sql
+++ b/misc/sql/crawl_top_level.sql
@@ -0,0 +1,30 @@
 BEGIN;
 WITH matching_urls AS (
    SELECT url, host
    FROM snapshots
    WHERE url ~ '^gemini://[^/]+/$'
      AND timestamp < (NOW() - INTERVAL '1 week')
    ORDER BY random()
    LIMIT 500
 )
 INSERT INTO urls (url, host)
 SELECT url, host
 FROM matching_urls
 ON CONFLICT DO NOTHING;
 -- WITH matching_urls AS (
 --     SELECT url, host
 --     FROM snapshots
 --     WHERE url ~ '^gemini://[^/]+/$'
 --       AND timestamp < (NOW() - INTERVAL '1 week')
 --     ORDER BY random()
 --     LIMIT 500
 -- )
 -- DELETE FROM snapshots
 -- WHERE url IN (
 --     SELECT url
 --     FROM matching_urls
 -- );
 COMMIT;
--- a/misc/sql/host_snapshot_stats.sql
+++ b/misc/sql/host_snapshot_stats.sql
@@ -0,0 +1,20 @@
 -- File: host_snapshot_stats.sql
 -- Groups snapshots by hosts and shows URLs with multiple snapshots
 -- Usage: \i misc/sql/host_snapshot_stats.sql
 SELECT 
    host,
    COUNT(DISTINCT url) as unique_urls,
    SUM(CASE WHEN url_count > 1 THEN 1 ELSE 0 END) as urls_with_multiple_snapshots,
    SUM(snapshot_count) as total_snapshots
 FROM (
    SELECT 
        host, 
        url, 
        COUNT(*) as snapshot_count,
        COUNT(*) OVER (PARTITION BY url) as url_count
    FROM snapshots
    GROUP BY host, url
 ) subquery
 GROUP BY host
 ORDER BY total_snapshots DESC;
--- a/misc/sql/initdb.sql
+++ b/misc/sql/initdb.sql
@@ -0,0 +1,46 @@
 DROP TABLE IF EXISTS snapshots;
 DROP TABLE IF EXISTS urls;
 CREATE TABLE urls (
    id SERIAL PRIMARY KEY,
    url TEXT NOT NULL,
    host TEXT NOT NULL,
    timestamp TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP,
    being_processed BOOLEAN
 );
 CREATE UNIQUE INDEX urls_url_key ON urls (url);
 CREATE INDEX idx_urls_url ON urls (url);
 CREATE INDEX idx_urls_timestamp ON urls (timestamp);
 CREATE INDEX idx_being_processed ON urls (being_processed);
 CREATE TABLE snapshots (
    id SERIAL PRIMARY KEY,
    url TEXT NOT NULL,
    host TEXT NOT NULL,
    timestamp TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP,
    mimetype TEXT,
    data BYTEA,
    gemtext TEXT,
    links JSONB,
    lang TEXT,
    response_code INTEGER,
    error TEXT,
    header TEXT,
    last_crawled TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP
 );
 CREATE UNIQUE INDEX idx_url_timestamp ON snapshots (url, timestamp);
 CREATE INDEX idx_url ON snapshots (url);
 CREATE INDEX idx_timestamp ON snapshots (timestamp);
 CREATE INDEX idx_mimetype ON snapshots (mimetype);
 CREATE INDEX idx_lang ON snapshots (lang);
 CREATE INDEX idx_response_code ON snapshots (response_code);
 CREATE INDEX idx_error ON snapshots (error);
 CREATE INDEX idx_host ON snapshots (host);
 CREATE INDEX idx_response_code_error ON snapshots (response_code, error);
 CREATE INDEX idx_response_code_error_nulls ON snapshots (response_code, error) WHERE response_code IS NULL AND error IS NULL;
 CREATE INDEX idx_snapshots_unprocessed ON snapshots (host) WHERE response_code IS NULL AND error IS NULL;
 CREATE INDEX idx_url_latest ON snapshots (url, timestamp DESC);
 CREATE INDEX idx_last_crawled ON snapshots (last_crawled);
 CREATE INDEX idx_url_last_crawled ON snapshots (url, last_crawled DESC);
--- a/misc/sql/mark_urls_processed_false.sql
+++ b/misc/sql/mark_urls_processed_false.sql
@@ -0,0 +1 @@
 update urls set being_processed=false where being_processed is true;
--- a/misc/sql/recent_snapshot_activity.sql
+++ b/misc/sql/recent_snapshot_activity.sql
@@ -0,0 +1,13 @@
 -- File: recent_snapshot_activity.sql
 -- Shows URLs with most snapshots in the last 7 days
 -- Usage: \i misc/sql/recent_snapshot_activity.sql
 SELECT 
    url, 
    COUNT(*) as snapshot_count
 FROM snapshots
 WHERE timestamp > NOW() - INTERVAL '7 days'
 GROUP BY url
 HAVING COUNT(*) > 1
 ORDER BY snapshot_count DESC
 LIMIT 20;
--- a/misc/sql/snapshot_distribution.sql
+++ b/misc/sql/snapshot_distribution.sql
@@ -0,0 +1,16 @@
 -- File: snapshot_distribution.sql
 -- Shows the distribution of snapshots per URL (how many URLs have 1, 2, 3, etc. snapshots)
 -- Usage: \i misc/sql/snapshot_distribution.sql
 WITH counts AS (
    SELECT url, COUNT(*) as snapshot_count
    FROM snapshots
    GROUP BY url
 )
 SELECT 
    snapshot_count,
    COUNT(*) as url_count,
    ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 2) as percentage
 FROM counts
 GROUP BY snapshot_count
 ORDER BY snapshot_count;
--- a/misc/sql/snapshots_by_timeframe.sql
+++ b/misc/sql/snapshots_by_timeframe.sql
@@ -0,0 +1,37 @@
 -- File: snapshots_by_timeframe.sql
 -- Shows snapshot count by timeframe (day, week, month)
 -- Usage: \i misc/sql/snapshots_by_timeframe.sql
 WITH daily_snapshots AS (
    SELECT 
        date_trunc('day', timestamp) as day,
        COUNT(*) as snapshot_count,
        COUNT(DISTINCT url) as unique_urls
    FROM snapshots
    GROUP BY day
    ORDER BY day
 ),
 weekly_snapshots AS (
    SELECT 
        date_trunc('week', timestamp) as week,
        COUNT(*) as snapshot_count,
        COUNT(DISTINCT url) as unique_urls
    FROM snapshots
    GROUP BY week
    ORDER BY week
 ),
 monthly_snapshots AS (
    SELECT 
        date_trunc('month', timestamp) as month,
        COUNT(*) as snapshot_count,
        COUNT(DISTINCT url) as unique_urls
    FROM snapshots
    GROUP BY month
    ORDER BY month
 )
 SELECT 'Daily' as timeframe, * FROM daily_snapshots
 UNION ALL
 SELECT 'Weekly' as timeframe, * FROM weekly_snapshots
 UNION ALL
 SELECT 'Monthly' as timeframe, * FROM monthly_snapshots
 ORDER BY timeframe, day;
--- a/misc/sql/snapshots_date_range.sql
+++ b/misc/sql/snapshots_date_range.sql
@@ -0,0 +1,14 @@
 -- File: snapshots_date_range.sql
 -- Shows snapshot count with date range information for each URL
 -- Usage: \i misc/sql/snapshots_date_range.sql
 SELECT 
    url, 
    COUNT(*) as snapshot_count,
    MIN(timestamp) as first_snapshot,
    MAX(timestamp) as last_snapshot,
    MAX(timestamp) - MIN(timestamp) as time_span
 FROM snapshots
 GROUP BY url
 HAVING COUNT(*) > 1
 ORDER BY snapshot_count DESC;
--- a/misc/sql/snapshots_per_url.sql
+++ b/misc/sql/snapshots_per_url.sql
@@ -0,0 +1,8 @@
 -- File: snapshots_per_url.sql
 -- Basic count of snapshots per URL
 -- Usage: \i misc/sql/snapshots_per_url.sql
 SELECT url, COUNT(*) as snapshot_count
 FROM snapshots
 GROUP BY url
 ORDER BY snapshot_count DESC;
--- a/misc/sql/storage_efficiency.sql
+++ b/misc/sql/storage_efficiency.sql
@@ -0,0 +1,20 @@
 -- File: storage_efficiency.sql
 -- Shows potential storage savings from deduplication
 -- Usage: \i misc/sql/storage_efficiency.sql
 WITH duplicate_stats AS (
    SELECT 
        url,
        COUNT(*) as snapshot_count,
        COUNT(DISTINCT gemtext) as unique_gemtexts,
        COUNT(DISTINCT data) as unique_datas
    FROM snapshots
    GROUP BY url
    HAVING COUNT(*) > 1
 )
 SELECT 
    SUM(snapshot_count) as total_snapshots,
    SUM(unique_gemtexts + unique_datas) as unique_contents,
    SUM(snapshot_count) - SUM(unique_gemtexts + unique_datas) as duplicate_content_count,
    ROUND((SUM(snapshot_count) - SUM(unique_gemtexts + unique_datas)) * 100.0 / SUM(snapshot_count), 2) as duplicate_percentage
 FROM duplicate_stats;
--- a/robotsMatch/robots.go
+++ b/robotsMatch/robots.go
@@ -0,0 +1,73 @@
 package robotsMatch
 import (
 	"context"
 	"fmt"
 	"strings"
 	"gemini-grc/common/contextlog"
 	"gemini-grc/contextutil"
 	"git.antanst.com/antanst/logging"
 )
 // ParseRobotsTxt takes robots.txt content and a host, and
 // returns a list of full URLs that shouldn't be visited.
 // This is the legacy version without context support.
 // TODO Also take into account the user agent?
 // Check gemini://geminiprotocol.net/docs/companion/robots.gmi
 func ParseRobotsTxt(content string, host string) []string {
 	// Call the context-aware version with a background context
 	return ParseRobotsTxtWithContext(context.Background(), content, host)
 }
 // ParseRobotsTxtWithContext takes robots.txt content and a host, and
 // returns a list of full URLs that shouldn't be visited.
 // This version supports context for logging.
 // TODO Also take into account the user agent?
 // Check gemini://geminiprotocol.net/docs/companion/robots.gmi
 func ParseRobotsTxtWithContext(ctx context.Context, content string, host string) []string {
 	// Create a context for robots.txt parsing
 	parseCtx := contextutil.ContextWithComponent(ctx, "robotsMatch.parser")
 	var disallowedPaths []string
 	for _, line := range strings.Split(content, "\n") {
 		line = strings.TrimSpace(line)
 		line = strings.ToLower(line)
 		if strings.HasPrefix(line, "disallow:") {
 			parts := strings.SplitN(line, ":", 2)
 			if len(parts) == 2 {
 				path := strings.TrimSpace(parts[1])
 				if path != "" {
 					// Construct full Gemini URL
 					var fullURL string
 					// Handle if the path is already a full URL
 					if strings.HasPrefix(path, "gemini://") {
 						// Extract just the path from the full URL
 						urlParts := strings.SplitN(path, "/", 4)
 						if len(urlParts) >= 4 {
 							// Get the path part (everything after the domain)
 							pathPart := "/" + urlParts[3]
 							fullURL = fmt.Sprintf("gemini://%s%s", host, pathPart)
 						} else {
 							// If it's just a domain without a path, skip it or use root path
 							fullURL = fmt.Sprintf("gemini://%s/", host)
 						}
 					} else {
 						// It's a relative path, just add it to the host
 						if !strings.HasPrefix(path, "/") {
 							path = "/" + path
 						}
 						fullURL = fmt.Sprintf("gemini://%s%s", host, path)
 					}
 					disallowedPaths = append(disallowedPaths, fullURL)
 					// Add additional logging to debug robots.txt parsing
 					contextlog.LogDebugWithContext(parseCtx, logging.GetSlogger(), "Added robots.txt disallow rule: %s from original: %s", fullURL, path)
 				}
 			}
 		}
 	}
 	return disallowedPaths
 }
--- a/robotsMatch/robotsMatch.go
+++ b/robotsMatch/robotsMatch.go
@@ -0,0 +1,161 @@
 package robotsMatch
 import (
 	"context"
 	"errors"
 	"fmt"
 	"strings"
 	"sync"
 	"gemini-grc/common/contextlog"
 	"gemini-grc/common/snapshot"
 	geminiUrl "gemini-grc/common/url"
 	"gemini-grc/config"
 	"gemini-grc/contextutil"
 	"gemini-grc/gemini"
 	"git.antanst.com/antanst/logging"
 )
 // RobotsCache is a map of blocked URLs
 // key: URL
 // value: []string list of disallowed URLs
 // If a key has no blocked URLs, an empty
 // list is stored for caching.
 var RobotsCache sync.Map //nolint:gochecknoglobals
 func populateRobotsCache(ctx context.Context, key string) (entries []string, _err error) {
 	// Create a context for robots cache population
 	cacheCtx := contextutil.ContextWithComponent(ctx, "robotsCache")
 	// We either store an empty list when
 	// no rules, or a list of disallowed URLs.
 	// This applies even if we have an error
 	// finding/downloading robots.txt
 	defer func() {
 		RobotsCache.Store(key, entries)
 	}()
 	url := fmt.Sprintf("gemini://%s/robots.txt", key)
 	contextlog.LogDebugWithContext(cacheCtx, logging.GetSlogger(), "Fetching robots.txt from %s", url)
 	// Use the context-aware version to honor timeout and cancellation
 	robotsContent, err := gemini.ConnectAndGetData(cacheCtx, url)
 	if err != nil {
 		// Check for context timeout or cancellation specifically
 		if errors.Is(err, context.DeadlineExceeded) || errors.Is(err, context.Canceled) {
 			contextlog.LogDebugWithContext(cacheCtx, logging.GetSlogger(), "Timeout or cancellation while fetching robots.txt: %v", err)
 			// Don't cache the result on timeout, to allow retrying later
 			return []string{}, err
 		}
 		// For other errors, we store an empty list for this host
 		// to avoid continually hitting it
 		contextlog.LogDebugWithContext(cacheCtx, logging.GetSlogger(), "Failed to get robots.txt: %v", err)
 		RobotsCache.Store(key, []string{})
 		return []string{}, err
 	}
 	s, err := snapshot.SnapshotFromURL(url, true)
 	if err != nil {
 		contextlog.LogDebugWithContext(cacheCtx, logging.GetSlogger(), "Failed to create snapshot from URL: %v", err)
 		return []string{}, nil
 	}
 	s = gemini.UpdateSnapshotWithData(*s, robotsContent)
 	if s.ResponseCode.ValueOrZero() != 20 {
 		contextlog.LogDebugWithContext(cacheCtx, logging.GetSlogger(), "robots.txt error code %d, ignoring", s.ResponseCode.ValueOrZero())
 		return []string{}, nil
 	}
 	// Some return text/plain, others text/gemini.
 	// According to spec, the first is correct,
 	// however let's be lenient
 	var data string
 	switch {
 	case s.MimeType.ValueOrZero() == "text/plain":
 		data = string(s.Data.ValueOrZero())
 	case s.MimeType.ValueOrZero() == "text/gemini":
 		data = s.GemText.ValueOrZero()
 	default:
 		contextlog.LogDebugWithContext(cacheCtx, logging.GetSlogger(), "Unsupported mime type: %s", s.MimeType.ValueOrZero())
 		return []string{}, nil
 	}
 	entries = ParseRobotsTxtWithContext(ctx, data, key)
 	return entries, nil
 }
 // RobotMatch checks if the snapshot URL matches
 // a robots.txt allow rule.
 func RobotMatch(ctx context.Context, u string) bool {
 	// Create a context for robots operations
 	robotsCtx := contextutil.ContextWithComponent(ctx, "robotsMatch")
 	// TODO Missing Gopher functionality
 	if config.CONFIG.GopherEnable {
 		return false
 	}
 	url, err := geminiUrl.ParseURL(u, "", true)
 	if err != nil {
 		return false
 	}
 	key := strings.ToLower(fmt.Sprintf("%s:%d", url.Hostname, url.Port))
 	contextlog.LogDebugWithContext(robotsCtx, logging.GetSlogger(), "Checking robots.txt for URL: %s with host key: %s", u, key)
 	var disallowedURLs []string
 	cacheEntries, ok := RobotsCache.Load(key)
 	if !ok {
 		// First time check, populate robot cache
 		contextlog.LogDebugWithContext(robotsCtx, logging.GetSlogger(), "No robots.txt cache for %s, fetching...", key)
 		var fetchErr error
 		disallowedURLs, fetchErr = populateRobotsCache(ctx, key)
 		if fetchErr != nil {
 			return false
 		}
 		if len(disallowedURLs) > 0 {
 			contextlog.LogDebugWithContext(robotsCtx, logging.GetSlogger(), "Added to robots.txt cache: %v => %v", key, disallowedURLs)
 		} else {
 			contextlog.LogDebugWithContext(robotsCtx, logging.GetSlogger(), "No disallowed paths found in robots.txt for %s", key)
 		}
 	} else {
 		var ok bool
 		disallowedURLs, ok = cacheEntries.([]string)
 		if !ok {
 			contextlog.LogErrorWithContext(robotsCtx, logging.GetSlogger(), "Invalid type in robots.txt cache for %s", key)
 			disallowedURLs = []string{} // Use empty list as fallback
 		}
 		contextlog.LogDebugWithContext(robotsCtx, logging.GetSlogger(), "Found %d disallowed paths in robots.txt cache for %s", len(disallowedURLs), key)
 	}
 	return isURLblocked(ctx, disallowedURLs, url.Full)
 }
 // Initialize initializes the robots.txt match package
 func Initialize() error {
 	logging.LogDebug("Initializing robotsMatch package")
 	return nil
 }
 // Shutdown cleans up the robots.txt match package
 func Shutdown() error {
 	logging.LogDebug("Shutting down robotsMatch package")
 	return nil
 }
 func isURLblocked(ctx context.Context, disallowedURLs []string, input string) bool {
 	// Create a context for URL blocking checks
 	blockCtx := contextutil.ContextWithComponent(ctx, "robotsMatch.isURLblocked")
 	inputLower := strings.ToLower(input)
 	for _, url := range disallowedURLs {
 		urlLower := strings.ToLower(url)
 		if strings.HasPrefix(inputLower, urlLower) {
 			contextlog.LogDebugWithContext(blockCtx, logging.GetSlogger(), "MATCH! robots.txt rule: %s blocks URL: %s", url, input)
 			return true
 		}
 	}
 	contextlog.LogDebugWithContext(blockCtx, logging.GetSlogger(), "No robots.txt rules matched URL: %s", input)
 	return false
 }
--- a/robotsMatch/robotsMatch_test.go
+++ b/robotsMatch/robotsMatch_test.go
@@ -0,0 +1,40 @@
 package robotsMatch
 import (
 	"context"
 	"sync"
 	"testing"
 	"gemini-grc/config"
 )
 func TestInitializeShutdown(t *testing.T) {
 	err := Initialize()
 	if err != nil {
 		t.Errorf("Initialize() failed: %v", err)
 	}
 	err = Shutdown()
 	if err != nil {
 		t.Errorf("Shutdown() failed: %v", err)
 	}
 }
 func TestRobotMatch_EmptyCache(t *testing.T) {
 	// This test doesn't actually connect to gemini URLs due to the complexity
 	// of mocking the gemini client, but tests the caching behavior when no
 	// robots.txt is found (empty cache case)
 	config.CONFIG.ResponseTimeout = 5
 	// Clear the cache before testing
 	RobotsCache = sync.Map{}
 	// For empty cache or DNS errors, RobotMatch should return false (allow the URL) without an error
 	ctx := context.Background()
 	blocked := RobotMatch(ctx, "gemini://nonexistent.example.com/")
 	// The URL should be allowed (not blocked) when robots.txt can't be fetched
 	if blocked {
 		t.Errorf("Expected URL to be allowed when robots.txt can't be fetched")
 	}
 }
--- a/robotsMatch/robots_test.go
+++ b/robotsMatch/robots_test.go
@@ -1,6 +1,7 @@
-package gemini
+package robotsMatch
 import (
 	"context"
 	"reflect"
 	"testing"
 )
@@ -44,12 +45,13 @@ func TestIsURLblocked(t *testing.T) {
 		"gemini://example.com/cgi-bin/wp.cgi/media",
 		"gemini://example.com/admin/",
 	}
 	ctx := context.Background()
 	url := "gemini://example.com/admin/index.html"
-	if !isURLblocked(disallowedURLs, url) {
+	if !isURLblocked(ctx, disallowedURLs, url) {
 		t.Errorf("Expected %s to be blocked", url)
 	}
 	url = "gemini://example1.com/admin/index.html"
-	if isURLblocked(disallowedURLs, url) {
+	if isURLblocked(ctx, disallowedURLs, url) {
 		t.Errorf("expected %s to not be blocked", url)
 	}
 }
--- a/seed_urls.txt
+++ b/seed_urls.txt
@@ -0,0 +1,10 @@
 gemini://geminiprotocol.net/
 gemini://warmedal.se/~antenna/
 gemini://skyjake.fi/~Cosmos/
 gemini://gemini.circumlunar.space/capcom/
 gemini://auragem.letz.dev/
 gemini://gemplex.space/
 gemini://kennedy.gemi.dev/
 gemini://tlgs.one/
 gemini://yesterday.gemlog.org/
 gemini://gemini.cyberbot.space/feed.gmi
--- a/test.txt
+++ b/test.txt
@@ -0,0 +1,22 @@
 # Test redirect full url:
 gemini://gemini.circumlunar.space 
 # Test blacklist:
 gemi.dev
 # Test robots disallow:
 gemini://tlgs.one/search?aa
 # Test TLS cert required:
 gemini://astrobotany.mozz.us/app/plant
 // 31 redirect
 gemini://gemini.circumlunar.space
 // body with null byte
 gemini://kennedy.gemi.dev/archive/cached?url=gemini://spam.works/mirrors/textfiles/fun/consult.how&t=638427244900000000&raw=False
 // has invalid url
 gemini://tlgs.one/known-hosts
 // Needs SNI TLS info (our bug)
 gemini://hanzbrix.pollux.casa/gemlog/20241002.gmi
--- a/uid/uid.go
+++ b/uid/uid.go
@@ -1,14 +0,0 @@
 package uid
 import (
 	nanoid "github.com/matoous/go-nanoid/v2"
 )
 func UID() string {
 	// No 'o','O' and 'l'
 	id, err := nanoid.Generate("abcdefghijkmnpqrstuvwxyzABCDEFGHIJKLMNPQRSTUVWXYZ0123456789", 20)
 	if err != nil {
 		panic(err)
 	}
 	return id
 }
--- a/util/util.go
+++ b/util/util.go
@@ -6,14 +6,8 @@ import (
 	"fmt"
 	"math/big"
 	"regexp"
 	"runtime/debug"
 )
 func PrintStackAndPanic(err error) {
 	fmt.Printf("PANIC Error %s Stack trace:\n%s", err, debug.Stack())
 	panic("PANIC")
 }
 // SecureRandomInt returns a cryptographically secure random integer in the range [0,max).
 // Panics if max <= 0 or if there's an error reading from the system's secure
 // random number generator.
@@ -24,14 +18,14 @@ func SecureRandomInt(max int) int {
 	// Generate random number
 	n, err := rand.Int(rand.Reader, maxBig)
 	if err != nil {
-		PrintStackAndPanic(fmt.Errorf("could not generate a random integer between 0 and %d", max))
+		panic(fmt.Errorf("could not generate a random integer between 0 and %d", max))
 	}
 	// Convert back to int
 	return int(n.Int64())
 }
-func PrettyJson(data string) string {
+func PrettifyJson(data string) string {
 	marshalled, _ := json.MarshalIndent(data, "", "  ")
 	return fmt.Sprintf("%s\n", marshalled)
 }
@@ -42,3 +36,27 @@ func GetLinesMatchingRegex(input string, pattern string) []string {
 	matches := re.FindAllString(input, -1)
 	return matches
 }
 // Filter applies a predicate function to each element in a slice and returns a new slice
 // containing only the elements for which the predicate returns true.
 // Type parameter T allows this function to work with slices of any type.
 func Filter[T any](slice []T, f func(T) bool) []T {
 	filtered := make([]T, 0)
 	for _, v := range slice {
 		if f(v) {
 			filtered = append(filtered, v)
 		}
 	}
 	return filtered
 }
 // Map applies a function to each element in a slice and returns a new slice
 // containing the results.
 // Type parameters T and R allow this function to work with different input and output types.
 func Map[T any, R any](slice []T, f func(T) R) []R {
 	result := make([]R, len(slice))
 	for i, v := range slice {
 		result[i] = f(v)
 	}
 	return result
 }
Author	SHA1	Message	Date
antanst	8bbe6efabc	Improve error handling and add duplicate snapshot cleanup 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-06-18 11:56:26 +03:00
antanst	aa2658e61e	Fix snapshot overwrite logic to preserve successful responses - Prevent overwriting snapshots that have valid response codes - Ensure URL is removed from queue when snapshot update is skipped - Add last_crawled timestamp tracking for better crawl scheduling - Remove SkipIdenticalContent flag, simplify content deduplication logic - Update database schema with last_crawled column and indexes 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-06-18 11:23:56 +03:00
antanst	4e225ee866	Fix infinite recrawl loop with skip-identical-content Add last_crawled timestamp tracking to fix fetchSnapshotsFromHistory() infinite loop when SkipIdenticalContent=true. Now tracks actual crawl attempts separately from content changes via database DEFAULT timestamps. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-06-17 10:41:17 +03:00
antanst	f9024d15aa	Refine content deduplication and improve configuration	2025-06-16 17:09:26 +03:00
antanst	330b596497	Enhance crawler with seed list and SQL utilities Add seedList module for URL initialization, comprehensive SQL utilities for database analysis, and update project configuration.	2025-06-16 12:29:33 +03:00
antanst	51f94c90b2	Update documentation and project configuration - Add architecture documentation for versioned snapshots - Update Makefile with improved build commands - Update dependency versions in go.mod - Add project notes and development guidelines - Improve README with new features and instructions	2025-05-22 13:26:11 +03:00
antanst	bfaa857fae	Update and refactor core functionality - Update common package utilities - Refactor network code for better error handling - Remove deprecated files and functionality - Enhance blacklist and filtering capabilities - Improve snapshot handling and processing	2025-05-22 12:47:01 +03:00
antanst	5cc82f2c75	Modernize host pool management - Add context-aware host pool operations - Implement rate limiting for host connections - Improve concurrency handling with mutexes - Add host connection tracking	2025-05-22 12:46:42 +03:00
antanst	eca54b2f68	Implement context-aware database operations - Add context support to database operations - Implement versioned snapshots for URL history - Update database queries to support URL timestamps - Improve transaction handling with context - Add utility functions for snapshot history	2025-05-22 12:46:36 +03:00
antanst	7d27e5a123	Add whitelist functionality - Implement whitelist package for filtering URLs - Support pattern matching for allowed URLs - Add URL validation against whitelist patterns - Include test cases for whitelist functionality	2025-05-22 12:46:28 +03:00
antanst	8a9ca0b2e7	Add robots.txt parsing and matching functionality - Create separate robotsMatch package for robots.txt handling - Implement robots.txt parsing with support for different directives - Add support for both Allow and Disallow patterns - Include robots.txt matching with efficient pattern matching - Add test cases for robots matching	2025-05-22 12:46:21 +03:00
antanst	5940a117fd	Add context-aware network operations - Implement context-aware versions of network operations - Add request cancellation support throughout network code - Use structured logging with context metadata - Support timeout management with contexts - Improve error handling with detailed logging 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-05-22 12:45:58 +03:00
antanst	d1c326f868	Improve error handling with xerrors package - Replace custom error handling with xerrors package - Enhance error descriptions for better debugging - Add text utilities for string processing - Update error tests to use standard errors package - Add String() method to GeminiError 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-05-22 12:45:46 +03:00
antanst	a55f820f62	Implement structured logging with slog - Replace zerolog with Go's standard slog package - Add ColorHandler for terminal color output - Add context-aware logging system - Format attributes on the same line as log messages - Use green color for INFO level logs - Set up context value extraction helpers 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-05-22 12:44:08 +03:00
antanst	ad224a328e	Change errors to use xerrors package.	2025-05-12 20:37:58 +03:00
antanst	a823f5abc3	Fix Makefile.	2025-03-10 16:54:06 +02:00
antanst	658c5f5471	Fix linter warnings in gemini/network.go Remove redundant nil checks before len() operations as len() for nil slices is defined as zero in Go. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-03-10 11:34:29 +02:00
antanst	efaedcc6b2	Improvements in error handling & descriptions	2025-02-27 09:20:22 +02:00
		`@@ -0,0 +1 @@`
							`update urls set being_processed=false where being_processed is true;`