antanst
349968d019
Improve error handling and add duplicate snapshot cleanup
2025-06-29 22:38:38 +03:00
antanst
2357135d5a
Fix snapshot overwrite logic to preserve successful responses
...
- Prevent overwriting snapshots that have valid response codes
- Ensure URL is removed from queue when snapshot update is skipped
- Add last_crawled timestamp tracking for better crawl scheduling
- Remove SkipIdenticalContent flag, simplify content deduplication logic
- Update database schema with last_crawled column and indexes
2025-06-29 22:38:38 +03:00
antanst
98d3ed6707
Fix infinite recrawl loop with skip-identical-content
...
Add last_crawled timestamp tracking to fix fetchSnapshotsFromHistory()
infinite loop when SkipIdenticalContent=true. Now tracks actual crawl
attempts separately from content changes via database DEFAULT timestamps.
2025-06-29 22:38:38 +03:00
antanst
8b498a2603
Refine content deduplication and improve configuration
2025-06-29 22:38:38 +03:00
antanst
8588414b14
Enhance crawler with seed list and SQL utilities
...
Add seedList module for URL initialization, comprehensive SQL utilities for database analysis, and update project configuration.
2025-06-29 22:38:38 +03:00
a8173544e7
Update and refactor core functionality
...
- Update common package utilities
- Refactor network code for better error handling
- Remove deprecated files and functionality
- Enhance blacklist and filtering capabilities
- Improve snapshot handling and processing
2025-06-29 22:38:38 +03:00
57f5c0e865
Add whitelist functionality
...
- Implement whitelist package for filtering URLs
- Support pattern matching for allowed URLs
- Add URL validation against whitelist patterns
- Include test cases for whitelist functionality
2025-06-29 22:38:38 +03:00
4ef3f70f1f
Implement structured logging with slog
...
- Replace zerolog with Go's standard slog package
- Add ColorHandler for terminal color output
- Add context-aware logging system
- Format attributes on the same line as log messages
- Use green color for INFO level logs
- Set up context value extraction helpers
2025-06-29 22:38:38 +03:00
b8ea6fab4a
Change errors to use xerrors package.
2025-06-29 22:38:38 +03:00
701a5df44f
Improvements in error handling & descriptions
2025-02-27 09:20:22 +02:00
5b84960c5a
Use go_errors library everywhere.
2025-02-26 13:31:46 +02:00
ca008b0796
Reorganize code for more granular imports
2025-02-26 10:34:46 +02:00
8350e106d6
Reorganize errors
2025-02-26 10:32:38 +02:00
9c7502b2a8
Improve blacklist to use regex matching
2025-02-26 10:32:01 +02:00
43b207c9ab
Simplify duplicate code
2025-01-16 22:37:39 +02:00
285f2955e7
Proper package in tests
2025-01-16 10:04:02 +02:00
998b0e74ec
Add DB scan error
2025-01-16 10:04:02 +02:00
03e1849191
Add mode that prints multiple worker status in console
2025-01-16 10:04:02 +02:00
4e6fad873b
Break up common functions and small refactor.
2025-01-04 15:31:26 +02:00