Site Data Dictionary
Site Data Dictionary
CloudCore Networks publishes a small, deliberately structured data portfolio so that analytics, audit and strategy units can work with realistic, internally consistent Australian business data. Every dataset here is synthetic and fictional — it describes the fictional CloudCore company only — but the relationships inside the data are designed: there is real signal to find, and none of it is a single-column trick.
All seven files are generated by a single seeded Python script (scripts/generate_site_data.py, SEED = 20260629). Re-running the script produces byte-identical CSVs. Currency throughout is AUD. Region codes are Australian states/territories (WA, NSW, VIC, QLD, SA). Named products are DataVault, Analytics Pro and CloudSync. Dates are ISO YYYY-MM-DD inside CSVs.
Portfolio overview
| File | Rows | Domain | Primary courses |
|---|---|---|---|
cloudcore_customers.csv |
400 | Customer master & churn | ISYS6014, ISYS6020, ISYS6018 |
cloudcore_sales.csv |
180 | Regional product sales 2024–2026 | ISYS6018, ISYS6020 |
cloudcore_support_tickets.csv |
400 | Operational ticket log | ISYS6014, ISYS6018 |
cloudcore_reviews.csv |
250 | Customer reviews (NLP) | ISYS6014, ISYS6020 |
budget_2026.csv |
12 | Quarterly budget (breach-era) | ISYS6018, ISYS6020 |
cost_analysis_2026.csv |
13 | Itemised cost analysis | ISYS6018, ISYS6020 |
financial_forecast_2026.csv |
12 | Monthly forecast w/ breach impact | ISYS6018, ISYS6020 |
- Company: CloudCore Networks Pty Ltd, founded 2010. HQ Perth (11 Newcastle St) + primary DC Malaga WA (4 Millrose Dr) + Sydney DR site (100 Harris St, Pyrmont NSW 2009).
- Scale: ~500 clients, ~47 staff. (The staff-count figure is deliberately inconsistent with the ~250 that appears in other CloudCore artefacts — do not “correct” it.)
- The breach: detected 12 September 2025; public coverage 13–18 September 2025; ~250,000 customer records affected.
- “Now” in the dataset’s timeline is mid-2026.
1. cloudcore_customers.csv
The customer master table. Each row is one CloudCore client with their commercial profile and a binary churn flag for the current period. Overall churn sits at ~22%.
| Column | Type | Description |
|---|---|---|
customer_id |
string (PK) | CC0001–CC0400. Primary key. |
company_name |
string | Fabricated Australian-sounding company name. |
industry |
string (15 cats) | e.g. Healthcare, Finance & Insurance, Mining & Resources. |
region |
string (5 cats) | WA / NSW / VIC / QLD / SA. |
tenure_months |
integer | Months as a customer, 1–96. |
products |
string | One or more of DataVault / Analytics Pro / CloudSync, joined by + (e.g. DataVault + CloudSync). |
monthly_recurring_revenue |
float (AUD) | MRR, ~$450–$60,000. Scales with tier and product count. |
support_tier |
string (4 cats) | Basic / Standard / Premium / Enterprise. |
support_tickets_6m |
integer | Support tickets in the trailing 6 months, 0–18. |
satisfaction_score |
float | 1.0–5.0 (one decimal). |
churn_flag |
integer (0/1) | 1 = churned this period. Base rate ~22%. |
churn_flag is a noisy logistic blend of several features. It correlates with region (WA highest — the breach zone), support tier (Basic churns most, Enterprise/Premium least), tenure (longer customers stay), ticket volume (more tickets → more churn) and one weak product. No single column separates churners from non-churners — a model that combines features will clearly beat any one-variable rule. Good targets: logistic regression, decision trees, feature-importance analysis, segment-level churn tables.
- ISYS6014 — customer segmentation, churn prediction, feature engineering.
- ISYS6020 — retention business case: what is a customer worth, and what does a 1-point churn reduction save in MRR?
- ISYS6018 — breach-impact lens: is the breach-zone region (WA) over-represented in churners?
2. cloudcore_sales.csv
Quarterly product revenue by region for 2024, 2025 and 2026 (12 quarters × 5 regions × 3 products = 180 rows).
| Column | Type | Description |
|---|---|---|
region |
string | WA / NSW / VIC / QLD / SA. |
product |
string | DataVault / Analytics Pro / CloudSync. |
quarter |
string | Q1–Q4. |
year |
integer | 2024, 2025 or 2026. |
revenue_aud |
float (AUD) | Recognised revenue for that region×product×quarter. |
units_sold |
integer | Approximate unit volume (revenue ÷ list unit price). |
sales_rep |
string | Named account manager for the region. |
customer_segment |
string | Enterprise / Mid-Market / SMB. |
Revenue grows steadily through 2024 and into 2025, then shows a clear V-shaped dip after the September 2025 breach: a sharp fall in 2025 Q4 (the breach quarter), continued depression in 2026 Q1, and recovery from 2026 Q2 onwards. The dip is deeper in WA (the breach region) than other states. Plot total revenue_aud over year + quarter (or by region) and the shape is unmistakable. Useful for time-series decomposition, anomaly detection, and “what did the breach cost in revenue?” quantification.
- ISYS6018 — quantify the breach’s revenue impact (the dip area ≈ lost sales); tie it to the financial files.
- ISYS6020 — build the recovery case: forecast the rebound, size the addressable revenue at risk.
3. cloudcore_support_tickets.csv
An operational ticket log spanning January 2025 to mid-2026. Each ticket is linked to a customer via customer_id.
| Column | Type | Description |
|---|---|---|
ticket_id |
string (PK) | TK0001…. |
customer_id |
string (FK) | Foreign key → cloudcore_customers.customer_id. |
date |
date (ISO) | Ticket date, 2025-01-01 … 2026-06-29. |
product |
string | Product the ticket concerns. |
region |
string | Customer’s region. |
channel |
string | Email / Phone / Portal / Chat. |
priority |
string | Low / Medium / High / Critical. |
resolution_hours |
float | Hours to resolution. |
satisfaction_rating |
integer | 1–5 (post-resolution CSAT). |
ticket_text |
string | Short free-text note; tone tracks the rating — built for NLP. |
Two engineered patterns: (1) ticket volume spikes after 12 Sept 2025, peaking in late 2025, with a tilt toward High/Critical priority in WA; and (2) the free-text tone follows the satisfaction rating — low-rated tickets read as frustrated, high-rated as appreciative, with a subset of post-breach WA tickets explicitly referencing the security incident. Use for volume trend analysis, SLA/resolution-time study, and text classification / sentiment labelling where the rating is the ground-truth label.
- ISYS6014 — text classification & sentiment analysis on
ticket_text(rating as label); topic modelling. - ISYS6018 — operational audit: did resolution times / priority handling degrade post-breach? Evidence of an overwhelmed support function.
4. cloudcore_reviews.csv
A customer-review corpus (one review per customer) for sentiment and product analysis.
| Column | Type | Description |
|---|---|---|
review_id |
string (PK) | RV0001…. |
customer_id |
string (FK) | Foreign key → cloudcore_customers.customer_id. |
product |
string | Product reviewed. |
region |
string | Reviewer’s region. |
rating |
integer | 1–5 star rating. |
review_text |
string | Free-text review; sentiment tracks the rating. |
Review sentiment is lower for one product (CloudSync) and for the WA region (the breach zone), while DataVault and the eastern states skew positive. The rating column is a clean ordinal label for supervised sentiment work; the review_text is the input. Good for lexicon-based vs. learned sentiment comparison, star-rating regression, and product/region cohort analysis.
- ISYS6014 — sentiment analysis, rating prediction, NLP preprocessing pipeline.
- ISYS6020 — product-strategy reading: which product is the sentiment liability, and is it worth fixing or retiring?
5. Financial files (breach-impact & business case)
These three files together tell the financial story of the breach: lost revenue, remediation spend, and a tripled cyber-insurance premium. All figures AUD.
budget_2026.csv
Quarterly operating budget for FY2026 by category.
| Column | Type | Description |
|---|---|---|
Category |
string | Budget line (e.g. Salaries & Wages, Insurance). |
Q1_2026_AUD … Q4_2026_AUD |
float (AUD) | Quarterly budget per category. |
Annual_2026_AUD |
float (AUD) | Full-year total (sum of quarters). |
Look for lines that did not exist (or were far smaller) before the breach: Insurance is roughly tripled versus a normal baseline, and three new breach-driven lines appear — Incident Response & Remediation (front-loaded into Q1–Q2), Legal & Compliance, and Credit-Monitoring & Notifications. Compare against pre-breach norms to size the financial shock.
cost_analysis_2026.csv
An itemised cost view — useful for cost/benefit and capex/opex splits.
| Column | Type | Description |
|---|---|---|
Category |
string | Cost grouping (Personnel, Infrastructure, Security, …). |
Item |
string | Specific cost item. |
Total_Cost_AUD |
float | Annual cost of the item. |
Cost_Type |
string | Fixed / Variable. |
Quantity |
integer | Unit count. |
Unit_Cost_AUD |
float | Cost per unit (Total ÷ Quantity). |
Accounting |
string | Opex / Capex. |
Notes |
string | Context (incl. canon references, e.g. “~47 FTE”). |
Breach-remediation costs are explicit and tagged: digital forensics, credit monitoring for ~250k records, the post-breach infrastructure rebuild (Capex), regulator & legal response, and the tripled cyber-liability premium. The Sydney DR site and Malaga primary DC are both costed. Students can total the “cost of the breach” and split it into one-off vs. recurring.
financial_forecast_2026.csv
A monthly 2026 forecast that makes the breach’s residual impact and recovery visible.
| Column | Type | Description |
|---|---|---|
Month |
string | YYYY-MM, Jan–Dec 2026. |
Projected_Revenue_AUD |
float | Forecast revenue (depressed early, recovering). |
Projected_Expenses_AUD |
float | Forecast expenses (incl. remediation + premium). |
Net_Income_AUD |
float | Revenue − expenses (negative early in the year). |
Breach_Impact_Revenue_AUD |
float | Estimated revenue lost to churn that month. |
Remediation_Spend_AUD |
float | One-off remediation spend (tapers through the year). |
Insurance_Premium_AUD |
float | Monthly cyber-insurance premium (tripled, persistent). |
The forecast opens with net losses in early 2026 (residual churn + front-loaded remediation + the tripled insurance premium), crosses into profit as the year progresses, and ends well into the black as revenue recovers and remediation spend tapers. The Breach_Impact_Revenue_AUD column lets you isolate churn-driven revenue loss from the underlying trend — ideal for a “cost of the breach” / ROI-on-remediation business case.
- ISYS6018 — the audit “so what?”: translate the security failure into auditable AUD impact (lost revenue, remediation, tripled insurance) and evaluate control investment versus loss avoided.
- ISYS6020 — the business case: cost/benefit, payback period, capex/opex structure, and a go/no-go on further security investment.
Working with the data
CloudCore is a teaching scenario, and some inconsistencies across the site are deliberate — for example the public claim of ISO 27001 certification versus the internal reality that certification is only being pursued, and the staff-count variance (~47 vs ~250). Treat the data as a source to interrogate, not a tidy answer key: cross-check figures against the policies, articles and interviews elsewhere in the documentation.
All files live in the /data/ directory and are plain UTF-8 CSVs — loadable directly with pandas.read_csv(), R’s read.csv(), or any spreadsheet tool. The generator script (scripts/generate_site_data.py) is the single source of truth for every value in this portfolio.