For Engineering

Sync internals

How each adapter polls, how the auto-router classifies, how watermarks prevent replay, and how the lost-thread detector + signature parser fit on top.

For the user-facing version see Connecting your accounts.

The pipeline

┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐
│ Outlook  │  │  Slack   │  │ Telegram │  │  Notion  │
│ adapter  │  │ adapter  │  │ adapter  │  │ adapter  │
└────┬─────┘  └────┬─────┘  └────┬─────┘  └────┬─────┘
     │             │             │              │
     ▼             ▼             ▼              ▼
            ┌─────────────────────────┐
            │      auto-router        │  domain match, blocklist, skip rules,
            │   (sync/auto-router.js) │  internal-only filter
            └────────────┬────────────┘
                         │
            ┌────────────┴────────────┐
            ▼                         ▼
       Known → log              Unknown → sync_inbox
                                          (human gate)
                         │
                         ▼
                  interactions table
                         │
            ┌────────────┴────────────┐
            ▼                         ▼
     signature-parser           lost-thread-detector
     (people enrichment)        (stale outbound)

The whole flow runs in sync-worker.js, a separate Node process from the API server. Runs as laurelin-sync.service (systemd).

sync-worker.js

Single-process scheduler. Runs each adapter on its own interval, each in series within its own loop (so a slow Outlook poll never blocks Slack and vice versa). Reads/writes the same SQLite DB as the API server — SQLite WAL handles the concurrent access.

The worker is intentionally simple:

setInterval(pollOutlook,  60_000);   // every minute
setInterval(pollTelegram, 60_000);
setInterval(reconcileNotion, 5 * 60_000);
// Slack is event-driven via Express webhook, not polled here.
setInterval(runSignatureParser, 5 * 60_000);
setInterval(detectLostThreads, 10 * 60_000);

Intervals are illustrative; real values are in the source.

Outlook adapter (sync/outlook-adapter.js)

OAuth

sync/outlook-oauth.js. Azure AD app registration (admin-configured once via env / oauth_tokens table). Per-user OAuth via Microsoft authorization-code flow with scopes Mail.Read and Calendars.Read.

Refresh tokens stored in oauth_tokens (encrypted at rest in the table; KMS is on the maybe-later list). Refresh happens automatically when the access token is within 5 minutes of expiry. A failed refresh transitions the user to a "reconnect required" state, surfaced on the Sync tab.

Polling

Per connected user:

  1. Read the user's watermark: sync_watermarks WHERE user_id = ? AND source = 'outlook' AND channel = 'mail'.
  2. Call GET /me/messages?$filter=receivedDateTime ge {watermark}&$top=50&$orderby=receivedDateTime asc.
  3. For each message:
    • Drop if any of List-Unsubscribe, Auto-Submitted, Precedence: bulk, or known automated-sender heuristics.
    • Drop if every participant address ends in @valinordigital.com.
    • Build the normalized interaction shape: date, subject, summary (≤255 char bodyPreview), direction (outbound if sender is the connected user's mailbox owner, inbound otherwise), source_id (Outlook internet message ID), source_thread_id (conversation ID).
    • Hand to the auto-router.
  4. Advance the watermark to the max receivedDateTime of the batch.
  5. Repeat until the page has fewer than 50 messages.

The same loop ingests calendar events (GET /me/events), producing interactions with type = 'meeting' and resolved attendees.

Why bodyPreview

We don't store full message bodies because (a) privacy posture is tighter, (b) the summary is enough to identify what the interaction was about for retrospective search, (c) storage cost on SQLite stays small.

Slack adapter (sync/slack-adapter.js)

OAuth & app install

sync/slack-oauth.js. Workspace-level app install + per-user OAuth with scopes:

Events webhook

sync/slack-adapter.js exposes a webhook handler (mounted on the API server, not the sync worker) that subscribes to Slack message events. On every fired event:

  1. Look up the channel in slack_channel_map. If unmapped, surface to the Sync tab.
  2. Resolve the user via slack_user_id → people row.
  3. Build the interaction (source: slack, type: note — Slack doesn't fit cleanly into email/call/meeting).
  4. Hand to the auto-router.

Threading is preserved by storing thread_ts in source_thread_id. A reply within a thread gets the same source_thread_id as the parent.

Backfill

sync/slack-api.js exposes a backfill(channelId, lookbackHours = 48) function that calls conversations.history. Triggered from the Sync UI's "Backfill" button per channel.

Channel mapping

The slack_channel_map table stores slack_channel_id → company_id. Mappings are sticky — once you've mapped C0123ABC to a company, every future event in that channel routes there without asking.

Telegram adapter (sync/telegram-adapter.js, sync/telegram-bot.js)

Bot setup

Admin creates a bot via @BotFather, stores the token in env. The Valinor bot runs in Business mode — Telegram's feature for letting a bot read chats from a connected personal account.

telegram-bot.js is intentionally lopsided: it imports the Telegram bot library but exports zero send-side methods. The only methods on the helper are getUpdates, pollChats, resolveUser. Audit-checkable: grep the file for sendMessage and you find nothing.

Pairing

To bind a team member's Telegram account to their Laurelin person row:

  1. Team member clicks "Generate pairing code" in the Sync tab. Backend generates a code with HMAC over (person_id, expires_at), expires in 10 minutes.
  2. Team member opens Telegram, finds the bot, sends /pair <code>.
  3. Bot verifies HMAC + expiry, writes telegram_user_id onto the team member's people row, and creates a row in telegram_connections.

Pairing codes are single-use. A leaked code is useless after 10 minutes and useless to anyone who isn't logged in as the bot, which is just us.

Chat scoping

Each Telegram chat starts in off state — even after the bot is connected to your account, no messages get logged until you flip the toggle. State stored in telegram_chat_scope (chat_id, user_id, enabled). For chats shared with other team members, telegram_shared_chat_overrides lets one team member force-on or force-off the chat for everyone (with one canonical override per chat).

Polling

getUpdates long-poll. Per polled message:

  1. Resolve sender via telegram_user_id lookup.
  2. Check the per-user-per-chat scope. Drop if off.
  3. Build the interaction (source: telegram, type: telegram).
  4. Hand to the auto-router.

Edited messages are ignored. Future work: reconcile edits by message_id + content hash.

Notion adapter (laurelin/notion-pipeline-sync.js, laurelin/notion-sync.js)

Different shape from the others — Notion is a system of record we synchronize with, not a passive event source.

Two scripts:

Both use an admin-configured Notion integration token (workspace-level). No per-user OAuth.

The reconciler runs on a schedule (default 5 minutes) and on demand via POST /api/laurelin/sync/notion/pipeline. Strategy is:

  1. List Notion records updated since last watermark.
  2. For each record, look up the corresponding Laurelin row by notion_id (stored in source_metadata JSON).
  3. Diff. Apply non-destructive merge: Laurelin fields take precedence if they were edited more recently than Notion, otherwise pull from Notion.
  4. Write back changes (Laurelin → Notion is gated; off by default).

The reconcile endpoint returns a JSON diff so the UI can show "5 records updated, 2 conflicts."

Auto-router (sync/auto-router.js)

Classification cascade. Every normalized interaction passes through:

  1. Internal-only check. If every participant (sender + recipients) has @valinordigital.com, drop. Don't log internal-team email as external interactions.
  2. Domain blocklist. Check domain_blocklist for the sender's domain. If matched, drop. Used for gmail.com, outlook.com, and similar — domains that would generate junk companies if auto-routed.
  3. Skip rules. Check sync_skip_rules for sender, domain, subject_pattern, source_id_prefix, contact_skip. Matches → drop.
  4. Do-not-track. Check email_do_not_track for the sender or any recipient. Match → drop.
  5. Known person. Look up the sender by email/slack_user_id/telegram_user_id/telegram_handle. If found, log directly with that person as a participant. Done.
  6. Known company by domain. Look up the sender's domain in any company's email_domains JSON array. If found, log directly with the company; auto-create the person and the affiliation.
  7. Unknown. Write a row to sync_inbox with status = pending. Suggest a company match by token overlap on the sender's domain root (so [email protected] suggests "Bridge" even if Bridge's email_domains doesn't include bridge.xyz yet).

The cascade is documented at the top of the file. Order matters — earlier rules short-circuit later ones.

Signature parser (sync/signature-parser.js)

Deterministic regex pass over interactions.summary (the 255-char preview). Extracts:

Findings are written to the matching people row but only into empty fields. The parser never overwrites existing data.

Gating

The parser runs as a separate worker tick, processing people with signature_parsed_at IS NULL AND signature_parse_attempts < cap.

Lost-thread detector (sync/lost-thread-detector.js)

Scans interactions for outbound emails that haven't received an inbound reply within a threshold.

Candidate generation

  1. For each external company with interactions.direction = 'outbound' as the latest message in any thread (grouped by source_thread_id):
  2. Compute days_stale from the last message.
  3. Classify urgency:
    • emergency if the company is high importance + active/core stage, or a project linked to this thread has an upcoming key_dates row.
    • normal otherwise.
  4. Generate a content_hash over (sender, recipient, subject, snippet) for dedup.
  5. Insert into lost_thread_candidates if no row exists for this content_hash with status = pending.

State machine

Per candidate row:

When the user requests a draft (POST /api/laurelin/lost-threads/:id/draft):

Drafts are never sent. The user copies into Outlook.

Watermarks (sync_watermarks)

(user_id, source, channel) → last_sync_at + last_source_id. Every adapter reads its watermark at the start of a poll cycle and advances it at the end. Idempotent — re-reading the same range produces no duplicates because dedup happens via source_id uniqueness on interactions.

A worker restart never replays history. If the SQLite DB is restored from backup, watermarks rewind with it (which is fine — re-running a small replay produces no duplicates).

Skip rules (sync_skip_rules)

5 rule types:

rule_type Pattern
sender Exact email address — [email protected]
domain Exact domain — mailchimp.com
subject_pattern Case-insensitive substring — unsubscribe
source_id_prefix Prefix on the source's message ID — used to skip a known mailing list ID range
contact_skip "Don't log interactions involving this specific person"

Source can be outlook, slack, telegram, or all. Created by team members from the Sync tab; admin can audit and bulk-delete.

Admin setup (one-time per integration)

Outlook

  1. Azure AD app registration in the tenant.
  2. Redirect URI set to the OAuth callback URL.
  3. API permissions: Mail.Read, Calendars.Read (delegated).
  4. Client ID, client secret, tenant ID written to env: OUTLOOK_CLIENT_ID, OUTLOOK_CLIENT_SECRET, OUTLOOK_TENANT_ID.
  5. Set the config via PUT /api/laurelin/sync/outlook/config or the admin UI.

Slack

  1. Slack app in api.slack.com. Add OAuth scopes listed above.
  2. Configure the Events API webhook URL (must be publicly reachable — currently https://laurelin.valinorinfo.com/api/laurelin/sync/slack/events).
  3. Client ID, client secret, signing secret in env.

Telegram

  1. /newbot to @BotFather to create a bot, get a token.
  2. Enable inline mode + Business mode in @BotFather settings.
  3. Token in env: TELEGRAM_BOT_TOKEN.

Notion

  1. Create an internal integration at notion.so/my-integrations.
  2. Grant it read access to the relevant databases.
  3. Token in env: NOTION_API_KEY.
  4. Database IDs configured in the Notion sync settings.

When sync goes wrong