Site Data Dictionary

Site Data Dictionary

CloudCore Networks publishes a small, deliberately structured data portfolio so that analytics, audit and strategy units can work with realistic, internally consistent Australian business data. Every dataset here is synthetic and fictional — it describes the fictional CloudCore company only — but the relationships inside the data are designed: there is real signal to find, and none of it is a single-column trick.

ImportantReproducibility

All seven files are generated by a single seeded Python script (scripts/generate_site_data.py, SEED = 20260629). Re-running the script produces byte-identical CSVs. Currency throughout is AUD. Region codes are Australian states/territories (WA, NSW, VIC, QLD, SA). Named products are DataVault, Analytics Pro and CloudSync. Dates are ISO YYYY-MM-DD inside CSVs.

Portfolio overview

File Rows Domain Primary courses
cloudcore_customers.csv 400 Customer master & churn ISYS6014, ISYS6020, ISYS6018
cloudcore_sales.csv 180 Regional product sales 2024–2026 ISYS6018, ISYS6020
cloudcore_support_tickets.csv 400 Operational ticket log ISYS6014, ISYS6018
cloudcore_reviews.csv 250 Customer reviews (NLP) ISYS6014, ISYS6020
budget_2026.csv 12 Quarterly budget (breach-era) ISYS6018, ISYS6020
cost_analysis_2026.csv 13 Itemised cost analysis ISYS6018, ISYS6020
financial_forecast_2026.csv 12 Monthly forecast w/ breach impact ISYS6018, ISYS6020
NoteCanon anchors in the data
  • Company: CloudCore Networks Pty Ltd, founded 2010. HQ Perth (11 Newcastle St) + primary DC Malaga WA (4 Millrose Dr) + Sydney DR site (100 Harris St, Pyrmont NSW 2009).
  • Scale: ~500 clients, ~47 staff. (The staff-count figure is deliberately inconsistent with the ~250 that appears in other CloudCore artefacts — do not “correct” it.)
  • The breach: detected 12 September 2025; public coverage 13–18 September 2025; ~250,000 customer records affected.
  • “Now” in the dataset’s timeline is mid-2026.

1. cloudcore_customers.csv

The customer master table. Each row is one CloudCore client with their commercial profile and a binary churn flag for the current period. Overall churn sits at ~22%.

Column Type Description
customer_id string (PK) CC0001CC0400. Primary key.
company_name string Fabricated Australian-sounding company name.
industry string (15 cats) e.g. Healthcare, Finance & Insurance, Mining & Resources.
region string (5 cats) WA / NSW / VIC / QLD / SA.
tenure_months integer Months as a customer, 1–96.
products string One or more of DataVault / Analytics Pro / CloudSync, joined by + (e.g. DataVault + CloudSync).
monthly_recurring_revenue float (AUD) MRR, ~$450–$60,000. Scales with tier and product count.
support_tier string (4 cats) Basic / Standard / Premium / Enterprise.
support_tickets_6m integer Support tickets in the trailing 6 months, 0–18.
satisfaction_score float 1.0–5.0 (one decimal).
churn_flag integer (0/1) 1 = churned this period. Base rate ~22%.
TipDesigned signal — churn is a combination, not a lookup

churn_flag is a noisy logistic blend of several features. It correlates with region (WA highest — the breach zone), support tier (Basic churns most, Enterprise/Premium least), tenure (longer customers stay), ticket volume (more tickets → more churn) and one weak product. No single column separates churners from non-churners — a model that combines features will clearly beat any one-variable rule. Good targets: logistic regression, decision trees, feature-importance analysis, segment-level churn tables.

NoteExercises this serves
  • ISYS6014 — customer segmentation, churn prediction, feature engineering.
  • ISYS6020 — retention business case: what is a customer worth, and what does a 1-point churn reduction save in MRR?
  • ISYS6018 — breach-impact lens: is the breach-zone region (WA) over-represented in churners?

2. cloudcore_sales.csv

Quarterly product revenue by region for 2024, 2025 and 2026 (12 quarters × 5 regions × 3 products = 180 rows).

Column Type Description
region string WA / NSW / VIC / QLD / SA.
product string DataVault / Analytics Pro / CloudSync.
quarter string Q1Q4.
year integer 2024, 2025 or 2026.
revenue_aud float (AUD) Recognised revenue for that region×product×quarter.
units_sold integer Approximate unit volume (revenue ÷ list unit price).
sales_rep string Named account manager for the region.
customer_segment string Enterprise / Mid-Market / SMB.
TipDesigned signal — the breach dip

Revenue grows steadily through 2024 and into 2025, then shows a clear V-shaped dip after the September 2025 breach: a sharp fall in 2025 Q4 (the breach quarter), continued depression in 2026 Q1, and recovery from 2026 Q2 onwards. The dip is deeper in WA (the breach region) than other states. Plot total revenue_aud over year + quarter (or by region) and the shape is unmistakable. Useful for time-series decomposition, anomaly detection, and “what did the breach cost in revenue?” quantification.

NoteExercises this serves
  • ISYS6018 — quantify the breach’s revenue impact (the dip area ≈ lost sales); tie it to the financial files.
  • ISYS6020 — build the recovery case: forecast the rebound, size the addressable revenue at risk.

3. cloudcore_support_tickets.csv

An operational ticket log spanning January 2025 to mid-2026. Each ticket is linked to a customer via customer_id.

Column Type Description
ticket_id string (PK) TK0001….
customer_id string (FK) Foreign key → cloudcore_customers.customer_id.
date date (ISO) Ticket date, 2025-01-012026-06-29.
product string Product the ticket concerns.
region string Customer’s region.
channel string Email / Phone / Portal / Chat.
priority string Low / Medium / High / Critical.
resolution_hours float Hours to resolution.
satisfaction_rating integer 1–5 (post-resolution CSAT).
ticket_text string Short free-text note; tone tracks the rating — built for NLP.
TipDesigned signal — post-breach surge + tone

Two engineered patterns: (1) ticket volume spikes after 12 Sept 2025, peaking in late 2025, with a tilt toward High/Critical priority in WA; and (2) the free-text tone follows the satisfaction rating — low-rated tickets read as frustrated, high-rated as appreciative, with a subset of post-breach WA tickets explicitly referencing the security incident. Use for volume trend analysis, SLA/resolution-time study, and text classification / sentiment labelling where the rating is the ground-truth label.

NoteExercises this serves
  • ISYS6014 — text classification & sentiment analysis on ticket_text (rating as label); topic modelling.
  • ISYS6018 — operational audit: did resolution times / priority handling degrade post-breach? Evidence of an overwhelmed support function.

4. cloudcore_reviews.csv

A customer-review corpus (one review per customer) for sentiment and product analysis.

Column Type Description
review_id string (PK) RV0001….
customer_id string (FK) Foreign key → cloudcore_customers.customer_id.
product string Product reviewed.
region string Reviewer’s region.
rating integer 1–5 star rating.
review_text string Free-text review; sentiment tracks the rating.
TipDesigned signal — a weak product and a breach region

Review sentiment is lower for one product (CloudSync) and for the WA region (the breach zone), while DataVault and the eastern states skew positive. The rating column is a clean ordinal label for supervised sentiment work; the review_text is the input. Good for lexicon-based vs. learned sentiment comparison, star-rating regression, and product/region cohort analysis.

NoteExercises this serves
  • ISYS6014 — sentiment analysis, rating prediction, NLP preprocessing pipeline.
  • ISYS6020 — product-strategy reading: which product is the sentiment liability, and is it worth fixing or retiring?

5. Financial files (breach-impact & business case)

These three files together tell the financial story of the breach: lost revenue, remediation spend, and a tripled cyber-insurance premium. All figures AUD.

budget_2026.csv

Quarterly operating budget for FY2026 by category.

Column Type Description
Category string Budget line (e.g. Salaries & Wages, Insurance).
Q1_2026_AUDQ4_2026_AUD float (AUD) Quarterly budget per category.
Annual_2026_AUD float (AUD) Full-year total (sum of quarters).
TipDesigned signal — breach-era budget

Look for lines that did not exist (or were far smaller) before the breach: Insurance is roughly tripled versus a normal baseline, and three new breach-driven lines appear — Incident Response & Remediation (front-loaded into Q1–Q2), Legal & Compliance, and Credit-Monitoring & Notifications. Compare against pre-breach norms to size the financial shock.

cost_analysis_2026.csv

An itemised cost view — useful for cost/benefit and capex/opex splits.

Column Type Description
Category string Cost grouping (Personnel, Infrastructure, Security, …).
Item string Specific cost item.
Total_Cost_AUD float Annual cost of the item.
Cost_Type string Fixed / Variable.
Quantity integer Unit count.
Unit_Cost_AUD float Cost per unit (Total ÷ Quantity).
Accounting string Opex / Capex.
Notes string Context (incl. canon references, e.g. “~47 FTE”).
TipDesigned signal — breach remediation is itemised

Breach-remediation costs are explicit and tagged: digital forensics, credit monitoring for ~250k records, the post-breach infrastructure rebuild (Capex), regulator & legal response, and the tripled cyber-liability premium. The Sydney DR site and Malaga primary DC are both costed. Students can total the “cost of the breach” and split it into one-off vs. recurring.

financial_forecast_2026.csv

A monthly 2026 forecast that makes the breach’s residual impact and recovery visible.

Column Type Description
Month string YYYY-MM, Jan–Dec 2026.
Projected_Revenue_AUD float Forecast revenue (depressed early, recovering).
Projected_Expenses_AUD float Forecast expenses (incl. remediation + premium).
Net_Income_AUD float Revenue − expenses (negative early in the year).
Breach_Impact_Revenue_AUD float Estimated revenue lost to churn that month.
Remediation_Spend_AUD float One-off remediation spend (tapers through the year).
Insurance_Premium_AUD float Monthly cyber-insurance premium (tripled, persistent).
TipDesigned signal — the recovery curve

The forecast opens with net losses in early 2026 (residual churn + front-loaded remediation + the tripled insurance premium), crosses into profit as the year progresses, and ends well into the black as revenue recovers and remediation spend tapers. The Breach_Impact_Revenue_AUD column lets you isolate churn-driven revenue loss from the underlying trend — ideal for a “cost of the breach” / ROI-on-remediation business case.

NoteExercises the financial files serve
  • ISYS6018 — the audit “so what?”: translate the security failure into auditable AUD impact (lost revenue, remediation, tripled insurance) and evaluate control investment versus loss avoided.
  • ISYS6020 — the business case: cost/benefit, payback period, capex/opex structure, and a go/no-go on further security investment.

Working with the data

WarningA note on the “messy truth”

CloudCore is a teaching scenario, and some inconsistencies across the site are deliberate — for example the public claim of ISO 27001 certification versus the internal reality that certification is only being pursued, and the staff-count variance (~47 vs ~250). Treat the data as a source to interrogate, not a tidy answer key: cross-check figures against the policies, articles and interviews elsewhere in the documentation.

All files live in the /data/ directory and are plain UTF-8 CSVs — loadable directly with pandas.read_csv(), R’s read.csv(), or any spreadsheet tool. The generator script (scripts/generate_site_data.py) is the single source of truth for every value in this portfolio.