System Engineering Quick Reference

1) (Almost) Universal Troubleshooting Method

OSI Model:

Troubleshooting Checklist

Scope + safety: impact, severity, recent change.
Define symptom: down vs slow; exact error.
Reproduce: local + internal + external vantage points.
Quick triage: CPU/mem/disk/network.
Dependencies: DNS/AD/DB/storage/certs/other services.
Service/process: is it running & listening on port?
Logs: journal/Event Viewer/app logs.
Security/config: firewall, SELinux, permissions, certs.
Fix low risk first: restart/rollback/canary patch.
Verify + monitor + document: multi-vantage validation.

ValidateContainFixVerifyDocument

2) RHEL Core Commands

Services (systemd)

Manage and troubleshoot system services (daemons).

systemctl status httpd        # Check service status and recent logs
systemctl start|stop|restart httpd
                               # Control service runtime state
systemctl enable|disable httpd
                               # Control auto-start at boot
journalctl -u httpd -b        # View logs for httpd since last boot
journalctl -xe                # View recent system errors

Packages

Install, update, remove, and verify software packages.

dnf install httpd             # Install package
dnf update                    # Update all packages
dnf remove httpd              # Remove package
dnf list installed            # List installed packages
rpm -qa | grep httpd          # Query installed RPMs

Network

Check IP configuration, listening ports, and connectivity.

ip addr                       # Show IP addresses
ss -tulnp                     # Show listening ports + processes
ping host                     # Test connectivity
curl -I http://site           # Test HTTP response headers

Firewall / SELinux

Security enforcement and traffic filtering.

firewall-cmd --list-all       # Show firewall rules
firewall-cmd --add-service=http --permanent
firewall-cmd --add-service=https --permanent
                               # Open HTTP/HTTPS ports
firewall-cmd --reload         # Apply firewall changes

getenforce                    # Check SELinux mode (Enforcing/Permissive)

Disk / Files

Check storage usage, mounts, and permissions.

df -h                         # Disk usage (human readable)
df -i                         # Inode usage (file count limit)
du -sh /path                  # Directory size
lsblk                         # Block devices and disks
mount                         # Mounted filesystems
chmod 755 file                # Change permissions
chown user:group file         # Change ownership

⚠ If disk space looks fine but files cannot be created, check df -i — you may be out of inodes.

Performance & Troubleshooting

Quick performance and slowdown analysis tools.

top                           # Live CPU/memory usage per process
htop                          # Enhanced interactive top (if installed)
free -h                       # Memory usage (RAM + swap)
vmstat 1                      # CPU, memory, IO every 1 sec
iostat -xz 1                  # Disk IO stats (requires sysstat)
uptime                        # Load average
ps aux --sort=-%cpu           # Top CPU consumers
ps aux --sort=-%mem           # Top memory consumers

Slow server checklist:
• High load? → uptime
• CPU maxed? → top
• Memory exhausted? → free -h
• Disk bottleneck? → iostat -xz 1
• Too many files? → df -i
• Swap usage high? → vmstat

Remember: df = disk space.
free -h = memory.
load average ≠ CPU %, it reflects runnable processes.

3) Linux Disk Expansion

First questions to ask:

Is it a Logical Volume Manger (LVM)? What filesystem (XFS vs ext4)? Is it a VM disk expansion or adding a new disk?

A) If it’s a VM disk expanded (same disk grew)

# confirm OS sees new size
lsblk
df -h

# If partition needs resize (common in VMs):
# (Depending on tooling; may use growpart if installed)
# growpart /dev/sda 2

# If LVM is used:
pvs; vgs; lvs
pvresize /dev/sda2               # (example PV partition)
lvextend -l +100%FREE /dev/vg0/lv_data
# Filesystem grow (choose one)
xfs_growfs /mountpoint           # XFS
resize2fs /dev/vg0/lv_data       # ext4

B) If adding a NEW disk and using LVM

lsblk
# create PV
pvcreate /dev/sdb
# add PV to VG
vgextend vg0 /dev/sdb
# extend LV
lvextend -l +100%FREE /dev/vg0/lv_data
# grow FS
xfs_growfs /mountpoint     # XFS
# OR
resize2fs /dev/vg0/lv_data # ext4

Note: Take a backup / snapshot where policy allows, then do the lowest-risk change first (cleanup space if possible).

4) Puppet Reference

Architecture & Concepts

Puppet Server compiles catalog (desired state).
Agent applies catalog on interval (commonly ~30 min) or manual run.
Idempotent: safe to re-run; fixes drift only if needed.
Drift: manual changes get reverted to policy.

Common Tasks Puppet Manages

packagesservicesfilesuserscronsshfirewall

Basic Manifest Patterns

# Ensure Apache installed and running
package { 'httpd':
  ensure => installed,
}

service { 'httpd':
  ensure => running,
  enable => true,
}

Agent Commands

puppet agent --test
systemctl status puppet

Talking Points (High Score)

Role-based node classification (web/app/db).
Central change = fleet-wide consistency.
Reports show compliance + failed runs.

5) PowerShell Quick Reference

How to think (works for every script):

1) Define input (server names? service name?) → 2) Get data → 3) Filter → 4) Act (restart/copy/disable) → 5) Log + handle errors
Use Get-Help and Get-Command to confirm syntax.

Help & Discovery (first stop when you’re stuck)

Find the right cmdlet and confirm parameters/examples.

Get-Command *service*               # Search commands by name
Get-Help Restart-Service -Full      # Full help + examples
Get-Help Invoke-Command -Examples   # Fast examples
Get-Member                           # Inspect object properties/methods

PowerShell is object-based (not plain text). Pipe passes objects, so you can filter on real properties.

Core Cmdlets (with what they’re for)

# Services (check/control Windows services)
Get-Service                          # List services
Get-Service -Name Spooler            # One service by name
Restart-Service -Name Spooler        # Restart service
Start-Service -Name Spooler          # Start service
Stop-Service -Name Spooler           # Stop service

# Processes (CPU/memory consumers, hung tasks)
Get-Process                          # List processes
Stop-Process -Id 1234 -Force         # Kill by PID (force)

# Files & content (basic file operations)
Get-ChildItem                        # List files/dirs (like dir/ls)
Get-Content .\file.txt               # Read file
Set-Content .\file.txt "text"        # Overwrite file
Add-Content .\file.txt "more text"   # Append to file

# Events (Windows event logs)
Get-WinEvent -LogName System -MaxEvents 50  # Recent events in a log

# Connectivity (ICMP ping)
Test-Connection server1 -Count 2     # Ping test

Common “Admin Reality” Cmdlets

These come up constantly in enterprise troubleshooting.

# System info / performance quick checks
Get-ComputerInfo                      # High-level OS/system info
Get-CimInstance Win32_OperatingSystem # Memory, OS details
Get-CimInstance Win32_LogicalDisk     # Disk free space
Get-Counter '\Processor(_Total)\% Processor Time' -SampleInterval 1 -MaxSamples 5

# Networking quick checks
Get-NetIPConfiguration                # IP/DNS/gateway
Test-NetConnection server1 -Port 445  # Test TCP port (SMB example)
Get-DnsClientServerAddress            # DNS servers configured

# Windows updates (varies by org tooling, but common conceptually)
Get-HotFix | Select-Object -First 10  # Recently installed updates/hotfixes

Pipeline patterns (this is where PowerShell “clicks”)

Think: Get → Where (filter) → Select (shape output) → Sort / Group → Do something

# Filter objects (Where-Object) by a property
Get-Service | Where-Object Status -eq 'Stopped'

# Select only fields you want
Get-Service -Name Spooler | Select-Object Name, Status, StartType

# Sort and take top items
Get-Process | Sort-Object CPU -Descending | Select-Object -First 10 Name, Id, CPU

# Export results (reporting)
Get-Service | Select-Object Name, Status | Export-Csv .\services.csv -NoTypeInformation

Variables, arrays, and loops (simple patterns you can reuse)

# Variable
$svcName = 'Spooler'

# Array
$servers = @('srv1','srv2')

# Foreach loop
foreach ($s in $servers) {
  "$s - checking $svcName"
}

Remote basics (WinRM / PowerShell Remoting)

Run commands on remote servers without logging in interactively.

# Run one command remotely
Invoke-Command -ComputerName server1 -ScriptBlock { Get-Service Spooler }

# Use local variables inside ScriptBlock with $using:
$svcName = 'Spooler'
Invoke-Command -ComputerName server1 -ScriptBlock { Get-Service -Name $using:svcName }

Common “Performance Exam” scenario: Check a service across servers and fix it

Pattern: check → if wrong → remediate → report.

$servers = @('srv1','srv2')
$svcName = 'Spooler'

$results = foreach ($s in $servers) {
  try {
    $svc = Invoke-Command -ComputerName $s -ErrorAction Stop -ScriptBlock {
      Get-Service -Name $using:svcName
    }

    if ($svc.Status -ne 'Running') {
      Invoke-Command -ComputerName $s -ScriptBlock {
        Restart-Service -Name $using:svcName -ErrorAction Stop
      }
      $action = 'Restarted'
    } else {
      $action = 'NoChange'
    }

    [pscustomobject]@{
      Server = $s
      Service = $svcName
      Status  = $svc.Status
      Action  = $action
    }
  }
  catch {
    [pscustomobject]@{
      Server = $s
      Service = $svcName
      Status  = 'Unknown'
      Action  = "Error: $($_.Exception.Message)"
    }
  }
}

$results | Format-Table -AutoSize

Notes:
• try/catch prevents one dead server from killing the whole script.
• [pscustomobject] makes clean, exportable output.
• Add | Export-Csv to save results.

Mini “cheat sheet” of the most useful operators

-eq  equal            -ne  not equal
-gt  greater than     -lt  less than
-like wildcard match  -match regex match
-and logical AND      -or  logical OR
-not negation

Safe scripting habits (small things that prevent big mistakes)

# Preview changes when supported
Restart-Service -Name Spooler -WhatIf

# Fail fast inside try/catch
Get-Service -Name DoesNotExist -ErrorAction Stop

6) Windows Troubleshooting – Systems Engineer Notes

Core OS Checks

Event Viewer – System, Application, Security logs.
Services – status, dependencies, startup type, service accounts.
DNS – A/CNAME/SRV records; verify forward & reverse lookup.
Permissions – NTFS vs Share permissions (Effective Access).
Patching – Recent updates, KB correlation, rollback plan.

Performance & Resource Analysis

CPU – sustained >80%? Check process via Task Manager / Get-Process.
Memory – paging, commit charge, memory leaks.
Disk – latency (>20ms concern), queue depth, IOPS.
Network – packet loss, NIC errors, port saturation.
PerfMon – collect counters for trend analysis.

Active Directory / Authentication Flow

Check domain controller connectivity.
Verify time synchronization (Kerberos sensitive).
Validate DNS SRV records for DC discovery.
Review account lockouts & replication health.
Use dcdiag and repadmin for DC health.

Group Policy Troubleshooting

Understand LSDOU processing order.
Check Resultant Set of Policy (rsop.msc / gpresult).
Validate SYSVOL replication.

Networking Stack

ipconfig /all (DNS, gateway, DHCP).
Test-Connection / tracert for reachability.
netstat -ano (port conflicts, listening services).
Firewall rules & Windows Defender status.

High Availability Awareness

Cluster service status.
Quorum configuration.
Failover event logs.
Storage presentation (LUN visibility, MPIO).

PowerShell (Enterprise-Level Useful Cmdlets)


Get-Service
Get-WinEvent
Test-Connection
Get-Process
Get-Counter
Get-ADUser
dcdiag
repadmin /replsummary
gpresult /r

7) Systems Thinking

Reflect on:

"What's Connected?" - "What causes what?" - "Where are the feedback loops? - What happens next if I feed this?"

A) Deploy & Manage 50 Servers (RHEL + Puppet)

Standardize build (gold image/template) + register to repos/subscription.
Install Puppet agent on all nodes; classify by role (web/app/db).
Manifests enforce packages, services, configs, firewall/SELinux baseline.
Central logging + monitoring; alerting for services, disk, CPU/mem.
Patch lifecycle: staged/canary → rollout; track compliance.
Backups/DR integration + runbooks; verify restores.
Document and manage lifecycle (build → operate → retire).

B) Critical Vulnerability on Public-Facing Servers

Validate: CVE details, exploitability, affected versions.
Contain: WAF/firewall restrictions, disable vulnerable service if needed.
Patch: canary first, then rollout (automation where applicable).
Verify: rescan, version check, regression test.
Monitor & document: logs, lessons learned, update runbooks.

C) SAN / Storage Latency Spike (keywords)

latency (ms), IOPS, throughput, queue depth, controller utilization, multipath, snapshot/replication load, fabric congestion.

8) Bash Scripting

Rinse and repeat:

1) Define input (hostnames? files? service?) → 2) Get data → 3) Filter (grep/awk/sed) → 4) Act (restart/copy/kill) → 5) Exit codes + logging
Bash is text + exit codes: 0 = success, non-zero = failure.

Help & Discovery (when you forget syntax)

Find usage, examples, and what flags mean.

man systemctl         # Manual page
command --help        # Quick help
type -a ls            # What command you’re actually running (alias/binary)
which curl            # Where binary lives
echo $?               # Exit code of last command (0 = OK)
set -x                # Debug: print commands as they execute
set +x                # Stop debug

Core commands you’ll script constantly

# Files / navigation
pwd                   # Current directory
ls -lah               # List (human, all, long)
cd /path
cp -av src dst        # Copy with attributes + verbose
mv -v old new
rm -i file            # Interactive delete
mkdir -p /a/b/c       # Create nested dirs

# Viewing files
cat file              # Dump file
less file             # Page through file
head -n 50 file
tail -n 50 file
tail -f /var/log/messages   # Follow logs live

# Search / filter
grep -R "text" /path  # Search recursively
grep -i "error" file  # Case-insensitive
awk '{print $1,$3}' file
cut -d: -f1 /etc/passwd
sort file | uniq -c

# Permissions
chmod 755 file
chown user:group file

Variables, quoting, and command substitution (most common mistakes)

name="httpd"
echo "$name"          # Use double-quotes to preserve spaces
echo '$name'          # Single quotes do NOT expand variables

now="$(date +%F_%H%M%S)"  # Command substitution
echo "$now"

path="/var/log/messages"
echo "$path"

Rule of thumb: always quote variables like "$var" unless you explicitly want word-splitting.

Conditionals (if/elif/else) and file tests

# Common file tests:
# -f file exists and is a regular file
# -d directory exists
# -r readable, -w writable, -x executable
# -z string is empty

file="/etc/ssh/sshd_config"

if [[ -f "$file" ]]; then
  echo "Found $file"
else
  echo "Missing $file"
fi

Loops (for / while) you can reuse

# For over a list
servers=("srv1" "srv2" "srv3")
for s in "${servers[@]}"; do
  echo "Checking $s"
done

# While read lines from a file
while IFS= read -r line; do
  echo "Line: $line"
done < servers.txt

Functions + safe mode (professional baseline)

This prevents silent failures and makes scripts predictable.

#!/usr/bin/env bash
set -euo pipefail

log() { echo "[$(date +%F\ %T)] $*"; }

log "Starting script"

set -e stop on error • -u error on unset vars • pipefail catch pipeline failures

Common pipeline patterns (grep/awk/sed)

# Find top CPU processes
ps aux --sort=-%cpu | head -n 10

# Find top memory processes
ps aux --sort=-%mem | head -n 10

# Show listening ports
ss -tulnp

# Count failed SSH logins (example paths vary)
grep -i "failed password" /var/log/secure | wc -l

# Extract column examples
df -h | awk '{print $1,$5,$6}'

Remote basics (SSH non-interactive)

Run a command on a remote host without logging in manually.

ssh user@server1 "hostname; uptime; systemctl is-active httpd"

Performance / “server is slow” triage (Linux)

Fast checks to decide if it’s CPU, RAM, disk, or network.

uptime                 # Load average (runnable + waiting)
top                    # CPU/mem live per process
free -h                # RAM + swap usage
vmstat 1               # CPU run queue, swapping, IO
iostat -xz 1           # Disk latency/utilization (sysstat)
df -h                  # Disk space
df -i                  # Inodes (file-count exhaustion)
ss -s                  # Socket summary
dmesg -T | tail -n 50  # Kernel messages (hardware/IO errors)

Quick reads:
• High load with low CPU → often IO wait (disk/network).
• High swap usage → memory pressure; look for top RSS processes.
• Full inodes (df -i) → “No space left on device” even when disk isn’t full.

Practical scenario script: check a service on many servers and fix it

Pattern: check → remediate → report.

#!/usr/bin/env bash
set -euo pipefail

servers=("srv1" "srv2" "srv3")
svc="httpd"

for s in "${servers[@]}"; do
  echo "=== $s ==="
  if ssh "$s" "systemctl is-active --quiet $svc"; then
    echo "$svc is running"
  else
    echo "$svc is NOT running - restarting..."
    ssh "$s" "sudo systemctl restart $svc && systemctl is-active $svc"
  fi
done

Text processing “cheat sheet” (most useful one-liners)

grep -i "error" file              # Find lines matching text
grep -v "pattern" file            # Exclude lines
awk '{print $1}' file             # Print column 1
awk -F: '{print $1,$3}' /etc/passwd # Use delimiter :
sed 's/old/new/g' file            # Replace text
wc -l file                        # Count lines

Safe scripting habits (avoid self-inflicted outages)

# Preview what you’re about to delete
rm -i /path/file

# Use a dry-run pattern (manual, but effective)
echo "Would run: systemctl restart httpd"

# Guardrails for critical variables
: "${TARGET:?TARGET is required}"   # Fails if TARGET is empty/unset

9) Scenario Playbooks (Open-Book Speed Runs)

Exam-safe structure (write this every time):

Scope/Impact → Recent change → Quick triage (CPU/Mem/Disk/Net) → Service status → Logs → Firewall/SELinux/Perms → Fix lowest-risk → Verify from 2+ vantage points → Document

A) Apache site not loading after config change (RHEL)

# 1) Is Apache running?
systemctl status httpd

# 2) Is it listening?
ss -tulnp | grep -E ':80|:443'

# 3) Syntax/config test (fast root cause)
apachectl -t
# common variants:
# httpd -t

# 4) Logs (most important)
journalctl -u httpd -b
# also check:
# /var/log/httpd/error_log  (common on RHEL)

# 5) Firewall + SELinux
firewall-cmd --list-all
getenforce

# 6) Rollback / revert last change if needed
# (restore known-good config, then restart)
systemctl restart httpd

B) Service fails after patching

Check service: systemctl status <svc>
Logs: journalctl -u <svc> -b + app logs
Dependencies: DB up? DNS? certs? ports? permissions?
Config drift: compare with known-good config; use Puppet to enforce baseline if applicable.
Rollback plan: snapshot/restore or downgrade package (only if policy allows).
Verify: service active + port listening + functional check (curl).

C) “Server is slow” (Linux performance triage in 90 seconds)

uptime            # load average (runnable + waiting)
top               # CPU/mem per process
free -h           # RAM + swap pressure
df -h             # disk space
df -i             # inode exhaustion ("No space left" even when disk isn't full)
ss -s             # socket summary (connection spikes)
dmesg -T | tail -n 30   # kernel / IO errors

D) Disk full vs Inodes full vs Memory pressure (common confusion)

Disk full

df -h
du -sh /* | sort -h
# cleanup logs/temp, then consider expansion

Inodes full

df -i
# too many small files → cleanup/rotate

Memory pressure

free -h
top
# high swap usage → find top RSS processes

Disk expansion “decision tree”

Cleanup possible? do that first
VM disk grew or new disk added?
LVM or not?
XFS or ext4?

E) Public can’t reach service but internal can

Local: service running + listening (systemctl/ss)
Host firewall: firewalld open for 80/443
SELinux: enforcing blocks? check audit/journal
DNS: public record correct?
Perimeter: NAT/LB/WAF rules, upstream firewall

F) Puppet: drift + compliance (what to write if asked)

Idempotency: Puppet enforces desired state; reverts unauthorized changes.
Detect: Puppet reports/failed runs + last run status; investigate who/why.
Correct: fix manifests if change is legitimate; otherwise allow Puppet to remediate drift.
Force run: puppet agent --test

G) Mini Glossary (don’t freeze on acronyms)

LVM: Logical Volume Manager (flexible disk management)
PV: Physical Volume (disk/partition initialized for LVM)
VG: Volume Group (pool of storage made from PVs)
LV: Logical Volume (the “volume” you mount; like a flexible partition)
XFS grow: xfs_growfs /mountpoint (grow online)
ext4 grow: resize2fs /dev/vg/lv
Idempotent: safe to run repeatedly; only changes what’s out of compliance
Drift: config changes made outside automation; Puppet fixes it

H) “If I don’t know” fallback (still score points)

Write this:

“I’m not fully certain of the exact command syntax, but my approach is: confirm scope and recent changes, check service status and logs, validate firewall/SELinux/permissions, apply lowest-risk remediation, then verify and document. If open-book is allowed, I’d confirm the specific command with man/Get-Help.”

10) Cloud & Hybrid (Azure-Focused + AWS Equivalents)

Cloud Design Mindset:

Identity first, network segmentation, least privilege, high availability, logging, backup/DR, and cost awareness. Always mention MFA, RBAC, monitoring, and encryption.

A) Service Models – Know the Differences

IaaS: Virtual Machines (Azure VM) (AWS: EC2) – full OS control, patching required.
PaaS: Managed app/database platforms (Azure App Service, Azure SQL) (AWS: Elastic Beanstalk, RDS).
SaaS: Hosted applications (Microsoft 365) (AWS: WorkSpaces / third-party SaaS).

B) Identity & Access Management

Azure AD (Entra ID) (AWS: IAM + AWS SSO / IAM Identity Center).
Conditional Access policies (AWS: IAM policies + MFA enforcement).
Role-Based Access Control (RBAC) (AWS: IAM Roles & Policies).
Privileged Identity Management (PIM) (AWS: IAM role assumption + temporary credentials).
Azure AD Connect (Hybrid identity sync) (AWS: AD Connector / AWS Managed Microsoft AD).

C) Hybrid Connectivity

Azure VPN Gateway (AWS: Site-to-Site VPN).
Azure ExpressRoute (AWS: Direct Connect).
Virtual Networks (VNet) (AWS: VPC).
Subnets + NSGs (AWS: Subnets + Security Groups).
Route tables (AWS: Route Tables).

D) Compute (High Availability Focus)

Azure Virtual Machines (AWS: EC2).
Availability Sets (AWS: EC2 placement groups).
Availability Zones (AWS: Availability Zones).
VM Scale Sets (auto scale) (AWS: Auto Scaling Groups).
Azure Load Balancer / Application Gateway (AWS: ELB / ALB).

E) Storage

Azure Managed Disks (AWS: EBS).
Azure Files (SMB shares) (AWS: EFS / FSx).
Azure Blob Storage (AWS: S3).
Storage replication: LRS / ZRS / GRS (AWS: S3 Standard / Multi-AZ / Cross-Region Replication).

F) Backup & Disaster Recovery

Azure Backup (AWS: AWS Backup).
Azure Site Recovery (replication) (AWS: Elastic Disaster Recovery).
Recovery Services Vault (AWS: Backup Vault).
Define RPO and RTO.
Test restores regularly.

G) Security & Monitoring

Network Security Groups (NSG) (AWS: Security Groups).
Azure Firewall (AWS: AWS Network Firewall).
Web Application Firewall (WAF) (AWS: AWS WAF).
Defender for Cloud (AWS: Security Hub / GuardDuty).
Azure Monitor + Log Analytics (AWS: CloudWatch + CloudTrail).
Microsoft Sentinel (SIEM) (AWS: Security Hub + OpenSearch / third-party SIEM).

H) Hybrid Architecture Answer Template

If asked: “Design a hybrid Azure solution”

1) Define requirements (availability, compliance, RPO/RTO).
2) Integrate identity (Azure AD Connect / hybrid sync).
3) Secure connectivity (VPN or ExpressRoute).
4) Deploy compute in Availability Zones or use PaaS where appropriate.
5) Segment network (subnets + NSGs).
6) Implement backup & replication.
7) Enable monitoring + security baseline.
8) Document lifecycle and cost management.

11) Active Directory (Enterprise Quick Reference)

High-Scoring Mindset:

Identity, authentication, least privilege, replication health, and GPO enforcement are core to AD stability.

A) Core Components

Domain Controller (DC) – Authentication + directory services.
Forest – Security boundary.
Domain – Logical grouping of objects.
Organizational Units (OUs) – Used for delegation + GPO targeting.
Global Catalog – Partial attribute store for forest-wide searches.

B) FSMO Roles (Know These)

Schema Master
Domain Naming Master
RID Master
PDC Emulator
Infrastructure Master

If authentication issues occur → check PDC Emulator first.

C) Authentication Flow

Client contacts DC.
Kerberos ticket issued.
Access granted based on group membership + ACLs.

D) Troubleshooting AD Authentication

Check DC availability.
Verify DNS resolution.
Check time sync (Kerberos requires time sync).
Review Event Viewer on DC.
Check replication status.

E) Replication Health

repadmin /replsummary
repadmin /showrepl
dcdiag

F) Group Policy (GPO)

Order: Local → Site → Domain → OU.
Use gpresult /r to verify applied policies.
Block inheritance carefully.

G) Security Best Practices

Least privilege.
Separate admin accounts.
MFA for privileged roles.
Tiered administration model.
Audit logging enabled.

H) If Asked: “Users Can’t Log In”

1) Verify DC reachable.
2) Confirm DNS resolution.
3) Check time sync.
4) Review Event Viewer logs.
5) Validate replication health.
6) Verify account not locked/expired.

12) DNS & Certificates (Records + TLS Quick Reference)

High-Scoring Mindset:

If authentication or web services fail, DNS is often the root cause. Certificates are common causes of production outages.

A) Common DNS Record Types

A Record – Maps hostname to IPv4 address.
AAAA Record – IPv6 equivalent.
CNAME – Alias to another hostname.
MX – Mail routing record.
SRV – Service locator (critical for AD).
TXT – SPF/DKIM/verification records.
PTR – Reverse lookup.

B) DNS Troubleshooting

nslookup hostname
Resolve-DnsName hostname   # PowerShell
dig hostname               # Linux (if available)

Verify record exists.
Check TTL.
Flush cache if needed.
Ensure correct DNS server configured.

C) AD & DNS Relationship

AD heavily relies on SRV records.
If DNS fails → authentication fails.
DCs must register properly in DNS.

D) Certificates (TLS / SSL)

Ensure certificate matches hostname (CN or SAN).
Check expiration date.
Verify certificate chain (intermediate CA).
Ensure private key present.

E) Common Certificate Issues

Expired certificate.
Wrong hostname.
Missing intermediate CA.
Service not bound to correct certificate.

F) Quick Certificate Checks

# Windows (PowerShell)
Get-ChildItem Cert:\LocalMachine\My

# Linux
openssl s_client -connect site:443

G) If Website Shows Security Warning

1) Check certificate expiration.
2) Verify CN/SAN matches URL.
3) Validate certificate chain.
4) Confirm service binding to correct cert.
5) Restart service after fix.