Incident Management

Incident Management

Logship incidents are the shared operating record for alert response, human coordination, ownership, and audit history.

If you only need the short version, here it is: an incident starts in OPEN, moves through active response states like IN_PROGRESS or MITIGATED, and eventually lands in RESOLVED. Every meaningful update is written to history, comments notify the current stakeholders, and the owner/assignee model lets you route work to either a person or a team.

What an incident contains

Each incident stores the core fields below:

FieldMeaning
incidentIdStable incident identifier
titleRequired short summary
descriptionLonger context, impact, or remediation notes
stateCurrent lifecycle state
severityPriority from SEV1 to SEV4
ownerType + ownerIdThe accountable owner, either a user or a group
assigneeUserIdOptional individual directly assigned to work the incident
createdByUserIdUser who created the incident
createdAt / updatedAtLifecycle timestamps
resolvedAtTimestamp set when the incident is resolved
resolutionNoteRequired when resolving
reopenReasonRequired when reopening a resolved incident

Lifecycle states

Logship defines four active incident states:

StateMeaning
OPENA new incident that needs triage or a responder
IN_PROGRESSActive investigation or mitigation is underway
MITIGATEDThe immediate impact is contained, but the incident is not fully closed
RESOLVEDThe issue is considered fixed and closed out

State transition rules

The backend enforces the allowed transitions. These are the valid moves:

FromAllowed transitions
OPENIN_PROGRESS, MITIGATED, RESOLVED
IN_PROGRESSMITIGATED, RESOLVED
MITIGATEDIN_PROGRESS, RESOLVED
RESOLVEDOPEN

Re-saving the same state is also allowed.

Anything outside that matrix is rejected with a bad request response.

Notes on resolution and reopening

Two lifecycle changes have stricter rules than the others:

TransitionRequirement
* -> RESOLVEDresolutionNote is required
RESOLVED -> OPENreopenReason is required

This is enforced in both places that matter:

  • The frontend prompts for the required note in the status-change flow.
  • The backend validates it again before saving.

That means users get a guided UX, but the server is still authoritative.

Severity model

Incidents use four severity levels:

SeverityTypical meaning
SEV1Critical, urgent, highest priority
SEV2High severity
SEV3Medium severity and the default
SEV4Low severity

Severity can be set at creation time or adjusted later. In the current UI, severity changes use explicit increase and decrease actions instead of a freeform dropdown.

Ownership and assignment

Logship intentionally separates ownership from assignment.

Owner

Every incident must have an owner. The owner can be:

  • A user
  • A group

This is represented as:

  • ownerType: USER or GROUP
  • ownerId: the ID of that user or group

Assignee

An incident can also have an optional assigneeUserId.

This is useful when a team owns the incident overall, but one person is directly driving the work. It also works for user-owned incidents if you want accountability and execution to be represented separately.

Validation rules

The backend validates ownership before saving:

  • Group owners must exist in the account.
  • User owners must be members of the account.
  • Assignees must be members of the account.

If any of those checks fail, the save is rejected.

Practical assignment patterns

These are all valid:

PatternExample
Group-owned, no assigneeThe on-call group owns the incident collectively
Group-owned, user assignedThe platform team owns it, one responder drives it
User-owned, no assigneeA single person owns and handles it
User-owned, different assigneeOne person remains accountable while another executes

Notifications

Notifications are sent to the people currently responsible for the incident.

Who gets notified

When Logship builds the recipient list, it includes:

  • The assignee, if there is one
  • Every member of the owner group, if the owner is a group
  • The owner user, if the owner is a user

Recipients are de-duplicated before delivery.

When notifications are sent

Notifications are published for these event types:

Event typeWhen it happens
INCIDENT_CREATEDA new incident is created
INCIDENT_UPDATEDA non-comment update changes the incident
INCIDENT_RESOLVEDThe state changes to RESOLVED
INCIDENT_REOPENEDA resolved incident is reopened back to OPEN
COMMENT_ADDEDA new comment is added

Two alert-related history events exist as well:

Event typeMeaning
ALERT_TRIGGERED_INCIDENTAn alert created the incident flow
ALERT_LINKED_INCIDENTAn alert was linked to an existing incident

Only freshly created incidents from the alert flow trigger the standard INCIDENT_CREATED notification.

Notification content

Incident notifications are formatted as markdown and include:

  • A heading that explains the action
  • Incident ID
  • Title
  • State
  • Severity
  • Owner
  • Assignee
  • Updated timestamp
  • Description
  • Optional additional details, such as comment text

For comment notifications, the comment body is included as its own markdown section.

Important delivery behavior

Notification delivery is best effort.

If the messenger service is unavailable, the incident update still succeeds. Logship logs a warning, but it does not roll back the incident save just because notification publishing failed.

Comments

Comments are the running human conversation attached to an incident.

Each comment stores:

  • commentId
  • authorUserId
  • body
  • createdAt

Comment rules

  • Comment body is required.
  • Empty or whitespace-only comments are rejected.
  • Adding a comment writes both the comment itself and a matching history event.
  • Adding a comment also sends a COMMENT_ADDED notification to the current stakeholders.

History and audit trail

Every incident maintains a history stream so responders can review what changed and when.

Each history event includes:

FieldMeaning
eventIdUnique history event identifier
eventTypeWhat kind of event occurred
actorTypeUsually USER, sometimes SYSTEM
actorIdThe user or system actor associated with the event
timestampExact time of the event
beforeJsonPrevious state value when applicable
afterJsonNew state value when applicable
metadataJsonStructured change details

History event types you will see

Event typeWhat it represents
INCIDENT_CREATEDIncident creation
INCIDENT_UPDATEDA save that changed fields but did not resolve or reopen
INCIDENT_RESOLVEDResolution event
INCIDENT_REOPENEDReopen event
COMMENT_ADDEDNew comment
ALERT_TRIGGERED_INCIDENTIncident originated from alert automation
ALERT_LINKED_INCIDENTAlert was linked to an existing incident

Metadata format

The history metadata is stored as a compact semicolon-delimited string. A typical example looks like this:

owner=INCIDENT_OWNER_TYPE_GROUP:8b4e...;changes=State:OPEN->IN_PROGRESS|Severity:SEV3->SEV2|Assignee:00000000-0000-0000-0000-000000000000->1f5f...;reopenReason=Alert firing again after temporary recovery

The changes= segment can include:

  • State:OLD->NEW
  • Severity:OLD->NEW
  • Owner:OLDTYPE:OLDID->NEWTYPE:NEWID
  • Assignee:OLDID->NEWID
  • Title:updated
  • Description:updated

If the incident was reopened, reopenReason=... is also appended.

UI timeline behavior

The current incident details page turns history into a chronological lifecycle timeline instead of a flat log.

In the UI, responders see:

  • The exact timestamp for each event
  • A T+ elapsed offset from incident creation
  • Stronger emphasis for state changes and severity changes
  • Secondary formatting for less critical changes like title or description edits

This makes post-incident review much easier without losing the original timestamps.

Alert integration

Logship can create or link incidents from alerts.

Creating an incident from an alert

The alert-driven creation flow does a few important things together:

  1. It creates the incident if one does not already exist for that alert.
  2. It creates an alert-to-incident link.
  3. It writes the alert runtime state as ACTIVE.
  4. It records a history event describing the alert linkage.
  5. If a new incident was created, it sends the normal incident-created notification.

Deterministic incident IDs

If the caller does not supply an incidentId, the backend builds one deterministically from the account ID and alert ID. That helps prevent duplicate incident creation for the same alert flow.

Alert links carry a linkSource value:

  • Default alert-created links use alert-triggered
  • Manual links default to manual

Permissions

At the product level, the incident system uses these account permissions:

PermissionIntended use
Logship.Incident.ViewerRead incident data
Logship.Incident.EditorCreate and edit incidents
Logship.Incident.ManagerHigher-trust incident operations

Global and account admins also participate through broader admin permissions.

Read behavior

Read operations explicitly allow:

  • Global admin
  • Account admin
  • Incident viewer
  • Incident editor
  • Incident manager

Write behavior

The UI only enables editing actions for users with incident editor or incident manager access. Database-side permission checks also gate write operations, so authenticated access alone is not enough to persist changes.

API surface

These are the main incident-related endpoints:

Incidents

GET    /accounts/{accountId}/incidents
GET    /accounts/{accountId}/incidents/{incidentId}
PUT    /accounts/{accountId}/incidents/{incidentId}
DELETE /accounts/{accountId}/incidents/{incidentId}

Comments

GET  /accounts/{accountId}/incidents/{incidentId}/comments
POST /accounts/{accountId}/incidents/{incidentId}/comments

History

GET /accounts/{accountId}/incidents/{incidentId}/history

Alert integration

POST /accounts/{accountId}/incidents/from-alert/{alertRecordId}
GET  /accounts/{accountId}/alert-runtime-states
PUT  /accounts/{accountId}/alert-runtime-states/{alertRecordId}
GET  /accounts/{accountId}/alert-incident-links
PUT  /accounts/{accountId}/alert-incident-links/{alertRecordId}/incidents/{incidentId}
DELETE /accounts/{accountId}/alert-incident-links/{alertRecordId}/incidents/{incidentId}

Example incident payload

{
  "accountId": "5e7bc3d8-6452-4cbf-860d-657a2c0e8b22",
  "incidentId": "7da5368d-3a3f-499b-84b6-01df8c8d0df2",
  "title": "Checkout API latency spike",
  "description": "Elevated p99 latency on checkout requests after deploy.",
  "state": "IN_PROGRESS",
  "severity": "SEV2",
  "ownerType": "GROUP",
  "ownerId": "fe5d0fd6-4b0f-4dd8-bb20-89be0a9194d2",
  "assigneeUserId": "fe6f1d13-ff6d-4c99-8cb3-514b9390e40d",
  "createdByUserId": "d2fdf216-32a7-42c1-97f4-8b4f8daa95f7",
  "createdAt": "2026-03-21T11:00:00Z",
  "updatedAt": "2026-03-21T11:14:00Z",
  "resolvedAt": null,
  "resolutionNote": null,
  "reopenReason": null
}

Common operating patterns

New human-created incident

  1. Create the incident in OPEN.
  2. Set a meaningful owner immediately.
  3. Add an assignee if one person is driving the response.
  4. Move to IN_PROGRESS once active work starts.
  5. Use comments for ongoing updates.
  6. Move to MITIGATED once impact is contained.
  7. Move to RESOLVED with a clear resolution note.

Reopening an incident

Use reopen only when the incident is truly back in play. When you reopen:

  • The state returns to OPEN
  • A reopen reason is required
  • A dedicated history event is written
  • Stakeholders are notified that the incident reopened

Group ownership with direct execution

One of the best-supported patterns is:

  • Owner = team group
  • Assignee = primary responder

That gives you both team accountability and clear execution ownership.

Troubleshooting and gotchas

"Resolution note is required when resolving an incident."

You tried to resolve without resolutionNote. Add the note and save again.

"Reopen reason is required when reopening a resolved incident."

You tried to move from RESOLVED back to OPEN without reopenReason.

"Invalid incident state transition"

The requested transition is outside the allowed matrix. Check the lifecycle table above.

"Incident owner is required."

Every incident must have a valid owner before it can be saved.

"Incident group owner ... does not exist."

The selected group owner is missing or invalid for the account.

"Incident user owner ... is not a member of this account."

The selected owner user is not a valid account member.

"Incident assignee ... is not a member of this account."

The assignee must belong to the same account.

Notifications did not arrive

Incident saves do not fail just because notification delivery failed. Check messenger availability and logs if the incident update succeeded but email was not sent.

Implementation references

If you need to trace the behavior in code, these are the most important files:

  • src\Logship\Service\Services\IncidentManager\IncidentManagerWebController.cs
  • src\Logship\Service\Services\IncidentManager\IncidentAssignmentNotificationService.cs
  • src\Logship\Service\Services\IncidentManager\Models\IncidentApiModels.cs
  • src\Logship\Host\ConsoleHost\Apis\Backend\Incidents.cs
  • src\Logship\App\fe-react\ClientApp\src\routes\incidents\edit\incidents-edit.tsx
  • src\Logship\App\fe-react\ClientApp\src\services\incidents\incidentservice.ts
  • src\Logship\Service\Services\Accounts\Permissioning\AuthPermissions.cs

If you are changing lifecycle rules, notification semantics, or history metadata, start with the controller and notification service first. Those two files define most of the behavior users actually experience.