🚀 MCP Webpage Timestamps
A powerful Model Context Protocol (MCP) server for extracting webpage creation, modification, and publication timestamps. Ideal for web scraping and temporal analysis of web content.
🚀 Quick Start
This tool is a robust Model Context Protocol (MCP) server crafted for extracting creation, modification, and publication timestamps from webpages. It's a go - to for web scraping and temporal analysis of web content.
✨ Features
- Comprehensive Timestamp Extraction: Extract creation, modification, and publication timestamps from webpages.
- Multiple Data Sources: Support HTML meta tags, HTTP headers, JSON - LD, microdata, OpenGraph, Twitter cards, and heuristic analysis.
- Confidence Scoring: Provide confidence levels (high/medium/low) for extracted timestamps.
- Batch Processing: Extract timestamps from multiple URLs simultaneously.
- Configurable: Customize timeout, user agent, redirect handling, and heuristic options.
- Production Ready: Offer robust error handling, comprehensive logging, and TypeScript support.
📦 Installation
Quick Install
npm install -g mcp-webpage-timestamps
Usage with npx
npx mcp-webpage-timestamps
Installing via Smithery
To install mcp-webpage-timestamps for Claude Desktop automatically via Smithery:
npx -y @smithery/cli install @Fabien-desablens/mcp-webpage-timestamps --client claude
Prerequisites
- Node.js 18.0.0 or higher
- npm or yarn
Development Install
git clone https://github.com/Fabien-desablens/mcp-webpage-timestamps.git
cd mcp-webpage-timestamps
npm install
npm run build
💻 Usage Examples
Basic Usage
import { TimestampExtractor } from './src/extractor.js';
const extractor = new TimestampExtractor();
const result = await extractor.extractTimestamps('https://example.com/article');
console.log('Published:', result.publishedAt);
console.log('Modified:', result.modifiedAt);
console.log('Confidence:', result.confidence);
console.log('Sources:', result.sources.length);
Advanced Usage
const extractor = new TimestampExtractor({
timeout: 15000,
userAgent: 'MyBot/1.0',
enableHeuristics: false,
maxRedirects: 3
});
const result = await extractor.extractTimestamps('https://example.com');
Batch Processing
const urls = [
'https://example.com/article1',
'https://example.com/article2',
'https://example.com/article3'
];
const results = await Promise.all(
urls.map(url => extractor.extractTimestamps(url))
);
📚 Documentation
As MCP Server
The server can be used with any MCP - compatible client. Here's how to configure it:
Claude Desktop Configuration
Add to your claude_desktop_config.json:
{
"mcpServers": {
"webpage-timestamps": {
"command": "npx",
"args": ["mcp-webpage-timestamps"],
"env": {}
}
}
}
Cline Configuration
Add to your MCP settings:
{
"mcpServers": {
"webpage-timestamps": {
"command": "npx",
"args": ["mcp-webpage-timestamps"]
}
}
}
Direct Usage
npm start
npm run dev
📄 API Documentation
Tools
extract_timestamps
Extract timestamps from a single webpage.
Parameters:
url (string, required): The URL of the webpage to extract timestamps from.
config (object, optional): Configuration options.
Configuration Options:
timeout (number): Request timeout in milliseconds (default: 10000).
userAgent (string): User agent string for requests.
followRedirects (boolean): Whether to follow HTTP redirects (default: true).
maxRedirects (number): Maximum number of redirects to follow (default: 5).
enableHeuristics (boolean): Enable heuristic timestamp detection (default: true).
Example:
{
"name": "extract_timestamps",
"arguments": {
"url": "https://example.com/article",
"config": {
"timeout": 15000,
"enableHeuristics": true
}
}
}
batch_extract_timestamps
Extract timestamps from multiple webpages in batch.
Parameters:
urls (array of strings, required): Array of URLs to extract timestamps from.
config (object, optional): Same configuration options as extract_timestamps.
Example:
{
"name": "batch_extract_timestamps",
"arguments": {
"urls": [
"https://example.com/article1",
"https://example.com/article2",
"https://example.com/article3"
],
"config": {
"timeout": 10000
}
}
}
Response Format
Both tools return a JSON object with the following structure:
{
url: string;
createdAt?: Date;
modifiedAt?: Date;
publishedAt?: Date;
sources: TimestampSource[];
confidence: 'high' | 'medium' | 'low';
errors?: string[];
}
TimestampSource:
{
type: 'html-meta' | 'http-header' | 'json-ld' | 'microdata' | 'opengraph' | 'twitter' | 'heuristic';
field: string;
value: string;
confidence: 'high' | 'medium' | 'low';
}
🔧 Technical Details
Supported Timestamp Sources
| Property |
Details |
| HTML Meta Tags |
article:published_time, article:modified_time, date, pubdate, publishdate, last-modified, dc.date.created, dc.date.modified, dcterms.created, dcterms.modified |
| HTTP Headers |
Last-Modified, Date |
| JSON-LD Structured Data |
datePublished, dateModified, dateCreated |
| Microdata |
datePublished, dateModified |
| OpenGraph |
og:article:published_time, og:article:modified_time, og:updated_time |
| Twitter Cards |
twitter:data1 (when containing date information) |
| Heuristic Analysis |
Time elements with datetime attributes, Common date patterns in text, Date - related CSS classes |
Development
Scripts
npm run dev
npm run build
npm test
npm run test:watch
npm run lint
npm run lint:fix
npm run format
Testing
The project includes comprehensive tests:
npm test
npm test -- --coverage
npm test -- extractor.test.ts
Code Quality
- TypeScript: Full TypeScript support with strict type checking.
- ESLint: Code linting with recommended rules.
- Prettier: Code formatting.
- Jest: Unit and integration testing.
- 95%+ Test Coverage: Comprehensive test suite.
📄 License
MIT License - see the LICENSE file for details.
Support
Changelog
See CHANGELOG.md for a detailed history of changes.
Acknowledgments