Commit Graph

18 Commits

Author SHA1 Message Date
antanst
aa2658e61e Fix snapshot overwrite logic to preserve successful responses
- Prevent overwriting snapshots that have valid response codes
- Ensure URL is removed from queue when snapshot update is skipped
- Add last_crawled timestamp tracking for better crawl scheduling
- Remove SkipIdenticalContent flag, simplify content deduplication logic
- Update database schema with last_crawled column and indexes

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-18 11:23:56 +03:00
antanst
4e225ee866 Fix infinite recrawl loop with skip-identical-content
Add last_crawled timestamp tracking to fix fetchSnapshotsFromHistory()
infinite loop when SkipIdenticalContent=true. Now tracks actual crawl
attempts separately from content changes via database DEFAULT timestamps.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-17 10:41:17 +03:00
antanst
f9024d15aa Refine content deduplication and improve configuration 2025-06-16 17:09:26 +03:00
antanst
330b596497 Enhance crawler with seed list and SQL utilities
Add seedList module for URL initialization, comprehensive SQL utilities for database analysis, and update project configuration.
2025-06-16 12:29:33 +03:00
bfaa857fae Update and refactor core functionality
- Update common package utilities
- Refactor network code for better error handling
- Remove deprecated files and functionality
- Enhance blacklist and filtering capabilities
- Improve snapshot handling and processing
2025-05-22 12:47:01 +03:00
7d27e5a123 Add whitelist functionality
- Implement whitelist package for filtering URLs
- Support pattern matching for allowed URLs
- Add URL validation against whitelist patterns
- Include test cases for whitelist functionality
2025-05-22 12:46:28 +03:00
a55f820f62 Implement structured logging with slog
- Replace zerolog with Go's standard slog package
- Add ColorHandler for terminal color output
- Add context-aware logging system
- Format attributes on the same line as log messages
- Use green color for INFO level logs
- Set up context value extraction helpers

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-05-22 12:44:08 +03:00
ad224a328e Change errors to use xerrors package. 2025-05-12 20:37:58 +03:00
efaedcc6b2 Improvements in error handling & descriptions 2025-02-27 09:20:22 +02:00
9dc008cb0f Use go_errors library everywhere. 2025-02-26 13:31:46 +02:00
4bceb75695 Reorganize code for more granular imports 2025-02-26 10:34:46 +02:00
a9983f3531 Reorganize errors 2025-02-26 10:32:38 +02:00
5cf720103f Improve blacklist to use regex matching 2025-02-26 10:32:01 +02:00
b30b7274ec Simplify duplicate code 2025-01-16 22:37:39 +02:00
63adf73ef9 Proper package in tests 2025-01-16 10:04:02 +02:00
b3387ce7ad Add DB scan error 2025-01-16 10:04:02 +02:00
64f98bb37c Add mode that prints multiple worker status in console 2025-01-16 10:04:02 +02:00
4e6fad873b Break up common functions and small refactor. 2025-01-04 15:31:26 +02:00