Sync internals
How each adapter polls, how the auto-router classifies, how watermarks prevent replay, and how the lost-thread detector + signature parser fit on top.
For the user-facing version see Connecting your accounts.
The pipeline
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Outlook │ │ Slack │ │ Telegram │ │ Notion │
│ adapter │ │ adapter │ │ adapter │ │ adapter │
└────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │ │
▼ ▼ ▼ ▼
┌─────────────────────────┐
│ auto-router │ domain match, blocklist, skip rules,
│ (sync/auto-router.js) │ internal-only filter
└────────────┬────────────┘
│
┌────────────┴────────────┐
▼ ▼
Known → log Unknown → sync_inbox
(human gate)
│
▼
interactions table
│
┌────────────┴────────────┐
▼ ▼
signature-parser lost-thread-detector
(people enrichment) (stale outbound)
The whole flow runs in sync-worker.js, a separate Node process from the API server. Runs as laurelin-sync.service (systemd).
sync-worker.js
Single-process scheduler. Runs each adapter on its own interval, each in series within its own loop (so a slow Outlook poll never blocks Slack and vice versa). Reads/writes the same SQLite DB as the API server — SQLite WAL handles the concurrent access.
The worker is intentionally simple:
setInterval(pollOutlook, 60_000); // every minute
setInterval(pollTelegram, 60_000);
setInterval(reconcileNotion, 5 * 60_000);
// Slack is event-driven via Express webhook, not polled here.
setInterval(runSignatureParser, 5 * 60_000);
setInterval(detectLostThreads, 10 * 60_000);
Intervals are illustrative; real values are in the source.
Outlook adapter (sync/outlook-adapter.js)
OAuth
sync/outlook-oauth.js. Azure AD app registration (admin-configured once via env / oauth_tokens table). Per-user OAuth via Microsoft authorization-code flow with scopes Mail.Read and Calendars.Read.
Refresh tokens stored in oauth_tokens (encrypted at rest in the table; KMS is on the maybe-later list). Refresh happens automatically when the access token is within 5 minutes of expiry. A failed refresh transitions the user to a "reconnect required" state, surfaced on the Sync tab.
Polling
Per connected user:
- Read the user's watermark:
sync_watermarks WHERE user_id = ? AND source = 'outlook' AND channel = 'mail'. - Call
GET /me/messages?$filter=receivedDateTime ge {watermark}&$top=50&$orderby=receivedDateTime asc. - For each message:
- Drop if any of
List-Unsubscribe,Auto-Submitted,Precedence: bulk, or known automated-sender heuristics. - Drop if every participant address ends in
@valinordigital.com. - Build the normalized interaction shape:
date,subject,summary(≤255 charbodyPreview),direction(outbound if sender is the connected user's mailbox owner, inbound otherwise),source_id(Outlook internet message ID),source_thread_id(conversation ID). - Hand to the auto-router.
- Drop if any of
- Advance the watermark to the max
receivedDateTimeof the batch. - Repeat until the page has fewer than 50 messages.
The same loop ingests calendar events (GET /me/events), producing interactions with type = 'meeting' and resolved attendees.
Why bodyPreview
We don't store full message bodies because (a) privacy posture is tighter, (b) the summary is enough to identify what the interaction was about for retrospective search, (c) storage cost on SQLite stays small.
Slack adapter (sync/slack-adapter.js)
OAuth & app install
sync/slack-oauth.js. Workspace-level app install + per-user OAuth with scopes:
channels:history— public channel historygroups:history— private channel historyim:history— DMsmpim:history— group DMsusers:read— resolving sender IDs to email/name
Events webhook
sync/slack-adapter.js exposes a webhook handler (mounted on the API server, not the sync worker) that subscribes to Slack message events. On every fired event:
- Look up the channel in
slack_channel_map. If unmapped, surface to the Sync tab. - Resolve the user via
slack_user_id→ people row. - Build the interaction (source:
slack, type:note— Slack doesn't fit cleanly into email/call/meeting). - Hand to the auto-router.
Threading is preserved by storing thread_ts in source_thread_id. A reply within a thread gets the same source_thread_id as the parent.
Backfill
sync/slack-api.js exposes a backfill(channelId, lookbackHours = 48) function that calls conversations.history. Triggered from the Sync UI's "Backfill" button per channel.
Channel mapping
The slack_channel_map table stores slack_channel_id → company_id. Mappings are sticky — once you've mapped C0123ABC to a company, every future event in that channel routes there without asking.
Telegram adapter (sync/telegram-adapter.js, sync/telegram-bot.js)
Bot setup
Admin creates a bot via @BotFather, stores the token in env. The Valinor bot runs in Business mode — Telegram's feature for letting a bot read chats from a connected personal account.
telegram-bot.js is intentionally lopsided: it imports the Telegram bot library but exports zero send-side methods. The only methods on the helper are getUpdates, pollChats, resolveUser. Audit-checkable: grep the file for sendMessage and you find nothing.
Pairing
To bind a team member's Telegram account to their Laurelin person row:
- Team member clicks "Generate pairing code" in the Sync tab. Backend generates a code with HMAC over
(person_id, expires_at), expires in 10 minutes. - Team member opens Telegram, finds the bot, sends
/pair <code>. - Bot verifies HMAC + expiry, writes
telegram_user_idonto the team member's people row, and creates a row intelegram_connections.
Pairing codes are single-use. A leaked code is useless after 10 minutes and useless to anyone who isn't logged in as the bot, which is just us.
Chat scoping
Each Telegram chat starts in off state — even after the bot is connected to your account, no messages get logged until you flip the toggle. State stored in telegram_chat_scope (chat_id, user_id, enabled). For chats shared with other team members, telegram_shared_chat_overrides lets one team member force-on or force-off the chat for everyone (with one canonical override per chat).
Polling
getUpdates long-poll. Per polled message:
- Resolve sender via
telegram_user_idlookup. - Check the per-user-per-chat scope. Drop if off.
- Build the interaction (source:
telegram, type:telegram). - Hand to the auto-router.
Edited messages are ignored. Future work: reconcile edits by message_id + content hash.
Notion adapter (laurelin/notion-pipeline-sync.js, laurelin/notion-sync.js)
Different shape from the others — Notion is a system of record we synchronize with, not a passive event source.
Two scripts:
notion-sync.js— companies. Reads from a Notion companies database, reconciles into Laurelin'scompanies.notion-pipeline-sync.js— pipeline state. Pulls Notion pipeline records into Laurelin projects + companies.
Both use an admin-configured Notion integration token (workspace-level). No per-user OAuth.
The reconciler runs on a schedule (default 5 minutes) and on demand via POST /api/laurelin/sync/notion/pipeline. Strategy is:
- List Notion records updated since last watermark.
- For each record, look up the corresponding Laurelin row by
notion_id(stored insource_metadataJSON). - Diff. Apply non-destructive merge: Laurelin fields take precedence if they were edited more recently than Notion, otherwise pull from Notion.
- Write back changes (Laurelin → Notion is gated; off by default).
The reconcile endpoint returns a JSON diff so the UI can show "5 records updated, 2 conflicts."
Auto-router (sync/auto-router.js)
Classification cascade. Every normalized interaction passes through:
- Internal-only check. If every participant (sender + recipients) has
@valinordigital.com, drop. Don't log internal-team email as external interactions. - Domain blocklist. Check
domain_blocklistfor the sender's domain. If matched, drop. Used forgmail.com,outlook.com, and similar — domains that would generate junk companies if auto-routed. - Skip rules. Check
sync_skip_rulesforsender,domain,subject_pattern,source_id_prefix,contact_skip. Matches → drop. - Do-not-track. Check
email_do_not_trackfor the sender or any recipient. Match → drop. - Known person. Look up the sender by email/
slack_user_id/telegram_user_id/telegram_handle. If found, log directly with that person as a participant. Done. - Known company by domain. Look up the sender's domain in any company's
email_domainsJSON array. If found, log directly with the company; auto-create the person and the affiliation. - Unknown. Write a row to
sync_inboxwithstatus = pending. Suggest a company match by token overlap on the sender's domain root (so[email protected]suggests "Bridge" even if Bridge'semail_domainsdoesn't includebridge.xyzyet).
The cascade is documented at the top of the file. Order matters — earlier rules short-circuit later ones.
Signature parser (sync/signature-parser.js)
Deterministic regex pass over interactions.summary (the 255-char preview). Extracts:
- Phone numbers (multiple formats, US + international)
- Title / role (from common signature patterns: "VP of X", "Director, X")
- LinkedIn URL
- Telegram handle (
@usernamepatterns)
Findings are written to the matching people row but only into empty fields. The parser never overwrites existing data.
Gating
signature_parsed_atis set once a person has been parsed.signature_parse_attemptsincrements on each pass.- Capped at
LAURELIN_SIGNATURE_MAX_ATTEMPTS(default 3) — if a person's signature consistently yields nothing useful, we stop trying.
The parser runs as a separate worker tick, processing people with signature_parsed_at IS NULL AND signature_parse_attempts < cap.
Lost-thread detector (sync/lost-thread-detector.js)
Scans interactions for outbound emails that haven't received an inbound reply within a threshold.
Candidate generation
- For each external company with
interactions.direction = 'outbound'as the latest message in any thread (grouped bysource_thread_id): - Compute
days_stalefrom the last message. - Classify urgency:
emergencyif the company ishighimportance +active/corestage, or a project linked to this thread has an upcomingkey_datesrow.normalotherwise.
- Generate a
content_hashover (sender, recipient, subject, snippet) for dedup. - Insert into
lost_thread_candidatesif no row exists for thiscontent_hashwithstatus = pending.
State machine
Per candidate row:
status = pending→ user hasn't acted yet.status = resolved(withresolved_at, optionalresolved_interaction_id) → user marked it handled.status = dismissed(withdismissed_scope = 'once'or'forever') → user dismissed.
When the user requests a draft (POST /api/laurelin/lost-threads/:id/draft):
draft_requested_atset.- A worker tick picks it up, calls Claude via
sync/claude-api.jswith the thread context + user'svoicesetting + chosenintent. - On success:
draft_completed_at,draft_body,draft_body_previewpopulated.
Drafts are never sent. The user copies into Outlook.
Watermarks (sync_watermarks)
(user_id, source, channel) → last_sync_at + last_source_id. Every adapter reads its watermark at the start of a poll cycle and advances it at the end. Idempotent — re-reading the same range produces no duplicates because dedup happens via source_id uniqueness on interactions.
A worker restart never replays history. If the SQLite DB is restored from backup, watermarks rewind with it (which is fine — re-running a small replay produces no duplicates).
Skip rules (sync_skip_rules)
5 rule types:
rule_type |
Pattern |
|---|---|
sender |
Exact email address — [email protected] |
domain |
Exact domain — mailchimp.com |
subject_pattern |
Case-insensitive substring — unsubscribe |
source_id_prefix |
Prefix on the source's message ID — used to skip a known mailing list ID range |
contact_skip |
"Don't log interactions involving this specific person" |
Source can be outlook, slack, telegram, or all. Created by team members from the Sync tab; admin can audit and bulk-delete.
Admin setup (one-time per integration)
Outlook
- Azure AD app registration in the tenant.
- Redirect URI set to the OAuth callback URL.
- API permissions:
Mail.Read,Calendars.Read(delegated). - Client ID, client secret, tenant ID written to env:
OUTLOOK_CLIENT_ID,OUTLOOK_CLIENT_SECRET,OUTLOOK_TENANT_ID. - Set the config via
PUT /api/laurelin/sync/outlook/configor the admin UI.
Slack
- Slack app in api.slack.com. Add OAuth scopes listed above.
- Configure the Events API webhook URL (must be publicly reachable — currently
https://laurelin.valinorinfo.com/api/laurelin/sync/slack/events). - Client ID, client secret, signing secret in env.
Telegram
/newbotto @BotFather to create a bot, get a token.- Enable inline mode + Business mode in @BotFather settings.
- Token in env:
TELEGRAM_BOT_TOKEN.
Notion
- Create an internal integration at notion.so/my-integrations.
- Grant it read access to the relevant databases.
- Token in env:
NOTION_API_KEY. - Database IDs configured in the Notion sync settings.
When sync goes wrong
- Watermark stuck — last_sync_at not advancing. Check
journalctl -u laurelin-sync -f. Common causes: token expiry (Outlook refresh failed), Slack scope revoked, Telegram bot ejected from Business chat. - Duplicate interactions — usually means
source_iduniqueness was bypassed. Check the adapter's normalization and ensure every message has a stablesource_id. - Sync Inbox flooded —
email_domainsnot maintained on companies. Approving from the inbox auto-learns new domains, but if you're seeing many items from one company, add its full list of domains to the company record. - Auto-router routes wrong company — token-overlap suggestion is just a hint; the user should reject it. If a domain consistently mis-routes (e.g., a shared domain like
consensys.netthat has multiple sub-companies), use thenotesfield on the company to flag it and route manually.