12 Commits

Author SHA1 Message Date
antanst
acbac15c20 Improve crawler performance and worker coordination
- Add WaitGroup synchronization for workers to prevent overlapping scheduler runs
- Increase history fetch multiplier and sleep intervals for better resource usage
- Simplify error handling and logging in worker processing
- Update SQL query to exclude error snapshots from history selection
- Fix worker ID variable reference in spawning loop
- Streamline snapshot update logic and error reporting
2025-06-29 22:38:38 +03:00
antanst
55bb0d96d0 Update last_crawled timestamp when skipping duplicate content and improve error handling 2025-06-29 22:38:38 +03:00
antanst
349968d019 Improve error handling and add duplicate snapshot cleanup 2025-06-29 22:38:38 +03:00
antanst
2357135d5a Fix snapshot overwrite logic to preserve successful responses
- Prevent overwriting snapshots that have valid response codes
- Ensure URL is removed from queue when snapshot update is skipped
- Add last_crawled timestamp tracking for better crawl scheduling
- Remove SkipIdenticalContent flag, simplify content deduplication logic
- Update database schema with last_crawled column and indexes
2025-06-29 22:38:38 +03:00
antanst
98d3ed6707 Fix infinite recrawl loop with skip-identical-content
Add last_crawled timestamp tracking to fix fetchSnapshotsFromHistory()
infinite loop when SkipIdenticalContent=true. Now tracks actual crawl
attempts separately from content changes via database DEFAULT timestamps.
2025-06-29 22:38:38 +03:00
antanst
8588414b14 Enhance crawler with seed list and SQL utilities
Add seedList module for URL initialization, comprehensive SQL utilities for database analysis, and update project configuration.
2025-06-29 22:38:38 +03:00
c54c093a10 Implement context-aware database operations
- Add context support to database operations
- Implement versioned snapshots for URL history
- Update database queries to support URL timestamps
- Improve transaction handling with context
- Add utility functions for snapshot history
2025-06-29 22:38:38 +03:00
4ef3f70f1f Implement structured logging with slog
- Replace zerolog with Go's standard slog package
- Add ColorHandler for terminal color output
- Add context-aware logging system
- Format attributes on the same line as log messages
- Use green color for INFO level logs
- Set up context value extraction helpers
2025-06-29 22:38:38 +03:00
5b84960c5a Use go_errors library everywhere. 2025-02-26 13:31:46 +02:00
ca008b0796 Reorganize code for more granular imports 2025-02-26 10:34:46 +02:00
ccb8f6838e Update DB init instructions & README 2025-01-04 15:39:21 +02:00
4e6fad873b Break up common functions and small refactor. 2025-01-04 15:31:26 +02:00