For Engineering

Operations

The runbook. SSH, services, deploys, backups, logs, the things you need when something breaks.

For the system-level picture see Architecture.

SSH access

# From the local network (office, on the same LAN as ValinorPC):
ssh [email protected]

SSH keys only — no password auth. For remote SSH (outside the office network), we use Cloudflare's cloudflared access ssh tunnel. Ask Connor for the exact host alias and to add you to the Access policy; setup is a one-time ~/.ssh/config entry per laptop.

Add a new admin: append their public key to ~valinor/.ssh/authorized_keys. Remove via the same file.

Services

Two units to know

# API + Laurelin frontend
sudo systemctl status valinor-intra
sudo systemctl restart valinor-intra
journalctl -u valinor-intra -n 100 -f

# Sync worker (Outlook/Telegram/Notion poll + signature parser + lost-thread detector)
sudo systemctl status laurelin-sync
sudo systemctl restart laurelin-sync
journalctl -u laurelin-sync -n 100 -f

deploy.sh restarts these automatically when relevant files change. Manual restart is only needed for diagnosis or after editing a unit file.

Nginx

sudo systemctl status nginx
sudo systemctl reload nginx       # picks up config changes
sudo nginx -t                     # test config before reload

Config: /etc/nginx/sites-available/team-site, symlinked to sites-enabled/. Default site is removed.

Cloudflare Tunnel

The tunnel runs as cloudflared.service and is the only path from the public internet to ValinorPC. Both intra.valinorinfo.com and laurelin.valinorinfo.com route through it. If cloudflared is down, both URLs go dark.

sudo systemctl status cloudflared
sudo systemctl restart cloudflared
journalctl -u cloudflared -n 100 -f

Config at /etc/cloudflared/config.yml. Ingress rules map hostnames to local services:

ingress:
  - hostname: intra.valinorinfo.com
    service: http://localhost:80      # Nginx (proxies /api/* to localhost:3000)
  - hostname: laurelin.valinorinfo.com
    service: http://localhost:80      # same Nginx; CRM lives at /apps/laurelin.html
  - service: http_status:404

Access policies (who can sign in to which hostname) live in the Cloudflare dashboard → Zero Trust → Access → Applications. Both applications currently allow any @valinordigital.com email.

Deploy

How it triggers

A systemd timer, valinor-deploy.timer, runs deploy.sh every minute (unit files live in deploy/). Each run takes a single-instance lock, fetches origin/main, and exits in about a second if nothing changed. When there are new commits it does the full build-and-copy, then restarts services whose files changed.

systemctl status valinor-deploy.timer       # is the schedule live?
systemctl list-timers valinor-deploy.timer  # when does it fire next?

deploy.sh is self-healing: it aborts a stuck rebase from a crashed run, drops conflicted auto-sync data commits (they regenerate from the live server files), and re-checks service staleness on every tick even when no deploy is needed. npm ci only runs when package-lock.json actually changed.

Tailing a deploy

journalctl -u valinor-deploy -n 100 -f

Quiet ticks log a one-line heartbeat (up to date at <commit>); a real deploy logs each step.

Deploy status on the homepage

Every run (success, no-op, or failure) writes /var/www/team-site/deploy-status.json: last run result, last shipped commit, duration, error message if any, and the last 50 deploys. The intranet homepage renders it as a status strip above the footer — green with the last shipped commit when healthy, red with the error when a run fails, and red with a stale-heartbeat warning if no tick has landed for 5+ minutes (timer dead). Clicking the strip opens /deploy.html with the full history table and a Force rebuild button. Raw feed: https://intra.valinorinfo.com/deploy-status.json.

Deploy from the website

The Force rebuild button on /deploy.html (for stale sites and deploy retries — normal pushes deploy on their own) does POST /api/deploy, which drops data/deploy-requested.flag; the timer's next tick (within 60s) treats it as deploy.sh --force. The API never runs deploy.sh directly because a deploy can restart valinor-intra itself, which would kill a child deploy mid-run.

The API also attempts sudo -n systemctl start --no-block valinor-deploy.service so the deploy starts immediately instead of waiting for the tick. That needs one extra sudoers line on the host:

# /etc/sudoers.d/valinor-deploy (visudo -f), in addition to the restart lines:
valinor ALL=(root) NOPASSWD: /usr/bin/systemctl start --no-block valinor-deploy.service

Without it the button still works, just on the next timer tick.

Forcing a deploy

~/valinor-intra/deploy.sh            # normal run (no-ops if nothing changed)
~/valinor-intra/deploy.sh --force    # full rebuild even with no new commits

Run as the valinor user. Safe to run while the timer is active — the lock makes overlapping runs skip instead of colliding. Idempotent.

One-time migration from the old cron (done once per host)

The deploy used to be a 5-minute cron line. To switch a host to the timer:

cd ~/valinor-intra && git pull origin main
sudo cp deploy/valinor-deploy.service deploy/valinor-deploy.timer /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now valinor-deploy.timer
crontab -e   # delete the old "*/5 * * * * ... deploy.sh >> ~/deploy.log" line

If the node version on the host changes, update the PATH line in deploy/valinor-deploy.service (it pins the nvm bin dir so the timer never depends on shell profile state).

Rolling back

There's no built-in rollback. Revert the bad commit on main and push; the timer redeploys within a minute:

git log --oneline -10
git revert <bad-sha>
git push origin main

This works from any checkout (your laptop included). Do NOT check out an old SHA on the host — deploy.sh refuses to deploy from anything but main, and a detached HEAD just stalls deploys until someone checks main out again.

Restart will happen automatically if backend files change.

For DB rollback (different problem): see backups.

Backups

What's backed up

The SQLite DB, and only the SQLite DB. Everything else is in git.

/var/www/team-site/backups/laurelin/
├── snapshot-YYYYMMDD-HHMMSS.sqlite    ← routine (rotated)
├── pre-deploy-YYYYMMDD-HHMMSS.sqlite  ← pre-restart (kept 30 days)
└── pre-migration-<label>.sqlite       ← labeled, kept indefinitely

Schedule

Taking a manual snapshot

# Unlabeled (eligible for rotation)
bash ~/valinor-intra/scripts/backup-laurelin.sh

# Labeled (kept indefinitely)
bash ~/valinor-intra/scripts/backup-laurelin.sh pre-migration-add-tags

The script uses SQLite's online .backup command, which is safe under concurrent writes. Do not use cp on the .sqlite file — under WAL mode this can produce a corrupted snapshot.

Restoring

bash ~/valinor-intra/scripts/restore-laurelin.sh /path/to/snapshot.sqlite

Stops valinor-intra and laurelin-sync, replaces the live DB with the snapshot, restarts both. Allow ~30 seconds of downtime.

The restore script also handles the -wal and -shm companion files correctly.

Off-host backup

Snapshots live on ValinorPC. If the machine dies, snapshots die with it. The current mitigation: routinely scp the most recent snapshot to a secondary location (Connor's NAS, S3 bucket — TBD as of 2026-05).

This is on the maybe-later list as "set up automated rsync of the backups directory to off-host storage." Until then: when in doubt, scp a snapshot to your own laptop.

Logs

Where to look when:

Symptom Log
API 500s, slow responses journalctl -u valinor-intra -f
Sync stuck, no new interactions journalctl -u laurelin-sync -f
Deploy not picking up changes /var/log/valinor-deploy.log
Nginx serving wrong files /var/log/nginx/access.log, /var/log/nginx/error.log
Site unreachable sudo systemctl status cloudflared nginx valinor-intra
Cloudflare Tunnel failures journalctl -u cloudflared -f
Cloudflare Access auth issues Cloudflare dashboard → Zero Trust → Access → Logs
# Past 10 minutes of API logs
journalctl -u valinor-intra --since "10 minutes ago"

# Errors only
journalctl -u valinor-intra -p err -n 100

Troubleshooting

Site not loading

  1. sudo systemctl status cloudflared — tunnel up? If not, restart it.
  2. sudo systemctl status nginx — running?
  3. From the host: curl -I http://localhost (Nginx) and curl -I http://localhost:3000/api/laurelin/companies (Node) — both respond?
  4. From your laptop: do intra.valinorinfo.com and laurelin.valinorinfo.com both fail, or just one? If just one, check the ingress rule for that hostname in /etc/cloudflared/config.yml.
  5. "Access denied" page rather than the site? Cloudflare Access is rejecting your sign-in — check the Access policy in the Cloudflare dashboard.

Laurelin slow or unresponsive

  1. sudo systemctl status valinor-intra — running? recently restarted?
  2. journalctl -u valinor-intra -n 200 — errors? long requests?
  3. DB locked? Rare with WAL, but possible during a hot snapshot. Wait 60 seconds and retry.
  4. df -h /var/www/team-site — disk full? SQLite WAL can grow if a long-running transaction blocks checkpoints.

Sync inbox not populating

  1. sudo systemctl status laurelin-sync — running?
  2. journalctl -u laurelin-sync -n 200 — token errors? rate limits?
  3. Check sync_watermarks table — is last_sync_at advancing? sqlite3 /var/www/team-site/data/laurelin.sqlite "SELECT * FROM sync_watermarks ORDER BY last_sync_at DESC LIMIT 10"
  4. For Outlook: token refresh failure is the usual culprit. The user needs to reconnect.
  5. For Slack: the Events webhook URL must be reachable. Test by sending a message to a known-mapped channel.

Deploy not happening

  1. cat ~/deploy.log | tail -100 — what does the last attempt say?
  2. crontab -l — is the cron entry still there?
  3. git -C ~/valinor-intra status — clean working tree? If there are local changes, git pull will fail.
  4. git -C ~/valinor-intra fetch origin main && git -C ~/valinor-intra log HEAD..origin/main — are there new commits to pull?

Backend file newer than service, but no restart

  1. ~/valinor-intra/deploy.sh 2>&1 | tail -20 — manual run with output.
  2. Check sudo -n systemctl restart valinor-intra works without prompting. If it asks for a password, the NOPASSWD sudoers entry is missing — restore /etc/sudoers.d/valinor-deploy.

Cloudflare Access denying valid users

  1. Cloudflare dashboard → Zero Trust → Access → Applications → laurelin.valinorinfo.com → Policies. Check the email rule includes @valinordigital.com.
  2. Check the user's email is the one they're authenticating with (some team members have personal Google accounts that intercept).
  3. As a last resort, add their specific email to the bypass list (don't leave it there permanently).

"I broke something with my last commit and want to undo"

  1. git -C ~/valinor-intra log --oneline -5 to find the bad SHA.
  2. From the dev machine, push a revert: git revert <sha>git push origin main.
  3. Next cron tick deploys the revert. Or run ~/valinor-intra/deploy.sh on ValinorPC for an immediate deploy.
  4. If the bad commit included a schema change that ran: restore the DB from the pre-deploy-*.sqlite snapshot taken just before the bad deploy.

Admin sync setup

One-time configuration for new integrations.

Outlook

  1. Azure portal → App registrations → New registration.
  2. Name: Valinor Laurelin Outlook Sync. Single tenant.
  3. Redirect URI: https://laurelin.valinorinfo.com/api/laurelin/sync/outlook/callback.
  4. Certificates & secrets → New client secret. Copy it.
  5. API permissions → Add → Microsoft Graph → Delegated → Mail.Read, Calendars.Read, offline_access.
  6. Grant admin consent.
  7. On ValinorPC:
    sudo systemctl edit valinor-intra
    # Add:
    # [Service]
    # Environment="OUTLOOK_CLIENT_ID=..."
    # Environment="OUTLOOK_CLIENT_SECRET=..."
    # Environment="OUTLOOK_TENANT_ID=..."
    sudo systemctl restart valinor-intra
    
  8. Set the config in Laurelin via the Sync tab admin panel, or PUT /api/laurelin/sync/outlook/config.

Slack

  1. api.slack.com → Create New App → From scratch.
  2. Name: Valinor Laurelin. Workspace: Valinor.
  3. OAuth & Permissions → scopes:
    • User Token Scopes: channels:history, groups:history, im:history, mpim:history, users:read
  4. Event Subscriptions → enable → Request URL: https://laurelin.valinorinfo.com/api/laurelin/sync/slack/events → subscribe to message.channels, message.groups, message.im, message.mpim.
  5. Install to workspace.
  6. Credentials → copy Client ID, Client Secret, Signing Secret.
  7. On ValinorPC: env vars SLACK_CLIENT_ID, SLACK_CLIENT_SECRET, SLACK_SIGNING_SECRET.
  8. Restart valinor-intra.

Telegram

  1. Telegram → @BotFather → /newbot → follow prompts.
  2. Save the bot token.
  3. /setinline, /setjoingroups, /setprivacy per Business mode requirements.
  4. On ValinorPC: env var TELEGRAM_BOT_TOKEN.
  5. Restart laurelin-sync.

Notion

  1. notion.so/my-integrations → New integration → "Valinor Laurelin Sync".
  2. Workspace: select the Valinor workspace.
  3. Capabilities: read content. (Write only if you want bidirectional sync.)
  4. Copy the internal integration token.
  5. In Notion, share the relevant databases with the integration (Share → Add connections).
  6. On ValinorPC: env var NOTION_API_KEY.
  7. Notion database IDs configured via the Laurelin Sync tab settings or PUT /api/laurelin/sync/notion/config.

Common knobs

# Force re-render of all docs
node ~/valinor-intra/scripts/render-docs.js /var/www/team-site/docs

# Regenerate the schema + API reference manually
node ~/valinor-intra/scripts/generate-schema-docs.js
node ~/valinor-intra/scripts/generate-api-docs.js

# Tail every relevant log at once
journalctl -u valinor-intra -u laurelin-sync -f

# Quick DB shape check
sqlite3 /var/www/team-site/data/laurelin.sqlite "SELECT name FROM sqlite_master WHERE type='table' ORDER BY name"

# Recent sync activity
sqlite3 /var/www/team-site/data/laurelin.sqlite "SELECT source, channel, last_sync_at FROM sync_watermarks ORDER BY last_sync_at DESC"

# Recent interactions (sanity check on sync)
sqlite3 /var/www/team-site/data/laurelin.sqlite "SELECT date, source, type, subject FROM interactions ORDER BY date DESC LIMIT 20"

The "never do this" list