This document reproduces every detail from the Data Infrastructure Audit Results with inline provenance tags. Each fact is marked as one of:
- SOURCED — directly from a file on the Cloudcore website, with the file path noted
- INFERRED — a reasonable conclusion drawn from sourced material, but not explicitly stated
- INVENTED — created for the brief; plausible and non-contradictory, but not in the repo
Data Inventory
1. Infrastructure Telemetry
| System: Prometheus and Grafana |
SOURCED |
mark_gonzalez_cto.md |
| Volume: ~2.1M data points/day |
INVENTED |
Plausible for ~2,500 VMs with standard exporter intervals |
| ~2,500 VMs, ~200 servers |
SOURCED |
david_wilson_cloud_infrastructure_architect.md, raj_patel_it_manager.md |
| Quality score: 4.5/5 |
INVENTED |
Design principle: infrastructure data should be high quality |
| Data owner: Martin Nguyen |
SOURCED |
cloud_service_operations_manager_martin_nugyen.md |
| Sensitivity: INTERNAL |
INVENTED |
Classification levels sourced from draft policy; assignment to this source is fabricated |
| Retention: 90 days full, 1-year downsampled |
INVENTED |
Standard Prometheus retention pattern; not specified in repo |
| 98.5% completeness |
INVENTED |
Specific percentage fabricated |
| 1.5% gaps during maintenance |
INVENTED |
Specific percentage fabricated |
| Metric naming inconsistency (cpu_usage_percent vs node_cpu_utilisation) |
INVENTED |
Specific example fabricated; naming drift is plausible |
| 8% metrics lack client attribution labels |
INVENTED |
Specific percentage fabricated |
2. Security Event Logs
| System: Splunk SIEM |
SOURCED |
mark_gonzalez_cto.md |
| Aggregating CrowdStrike, Palo Alto, Auth0, VPN, app logs |
SOURCED |
Systems confirmed across multiple backstories; aggregation is standard SIEM function |
| Volume: ~12 GB/day |
INVENTED |
Plausible for this scale of Splunk deployment |
| 500-800 daily alerts |
SOURCED |
docs/policies/ (incident response implementation notes) |
| Quality score: 4.0/5 |
INVENTED |
Machine-generated but human classification elements reduce score |
| Data owner: Sophia Martines |
SOURCED |
sophia_martines_ciso.md |
| Sensitivity: CONFIDENTIAL |
INVENTED |
Classification assignment fabricated |
| Retention: 12 months online |
INVENTED |
Not specified |
| Retention: 7 years archived |
SOURCED |
docs/policies/data_management.qmd (audit trail retention) |
| Breach provides documented dataset for model training |
SOURCED |
docs/logs/ contains full timestamped entries |
| Alert classification accuracy 72% |
INVENTED |
Specific percentage fabricated |
| 28% miscategorised or lack context |
INVENTED |
Complement of above |
| Pre-2021 sources use non-standard timestamps |
INVENTED |
Plausible for legacy systems |
| False positive ratio 6:1 |
INVENTED |
Specific ratio fabricated |
3. Support Ticket Data
| System: internal ticketing system |
SOURCED |
customer_support_lead_samantha_wong.md |
| Volume: ~45,000 historical |
INVENTED |
500+ clients, ~1,200/month over 3+ years |
| ~1,200 new tickets/month |
INVENTED |
Plausible for 500+ clients |
| Quality score: 3.0/5 |
INVENTED |
Semi-structured human-entered data |
| Data owner: Samantha Wong |
SOURCED |
customer_support_lead_samantha_wong.md |
| Reports to Sarah Thompson |
SOURCED |
sarah_thompson_coo.md |
| Sensitivity: INTERNAL (some CONFIDENTIAL) |
INVENTED |
Classification fabricated |
| Retention: indefinite, no formal policy |
INVENTED |
No retention policy documented for tickets |
| Category and priority fields 98% complete |
INVENTED |
Specific percentage fabricated |
| Resolution times tracked |
SOURCED |
sarah_thompson_coo.md (4.2hr average resolution) |
| 18% miscategorised priority |
INVENTED |
Specific percentage fabricated |
| 12% descriptions under 20 words |
INVENTED |
Specific percentage fabricated |
| Customer identifier inconsistency (22% account numbers, 31% emails) |
INVENTED |
All percentages fabricated |
| No structured root cause field |
INVENTED |
Plausible gap |
| Pre-mid-2022 tickets lack satisfaction scores |
INVENTED |
Fabricated historical gap |
4. Customer Records (CRM)
| System: HubSpot CRM |
SOURCED |
tom_bradley_marketing_manager.md |
| Volume: ~85,000 contacts, ~4,200 companies |
INVENTED |
Plausible for 500+ active clients plus historical data |
| 500+ active client accounts |
SOURCED |
marcell_ziemann_ceo.md |
| Quality score: 2.0/5 |
INVENTED |
Design principle: customer-facing data should be messy |
| Data owner: Lisa Chen (marketing), Sales (pipeline) |
SOURCED |
lisa_chen_cmo.md, tom_bradley_marketing_manager.md |
| Sensitivity: CONFIDENTIAL |
INVENTED |
Classification fabricated |
| Retention: indefinite, no hygiene schedule |
INVENTED |
No data hygiene process documented |
| 2022 CRM migration introduced quality problems |
INVENTED |
Problems exist (sourced); migration as specific cause is fabricated (see Brief 3 companion) |
| 15% confirmed duplicates |
INVENTED |
Repo confirms duplicates exist; percentage fabricated |
| 8% probable duplicates |
INVENTED |
Fabricated |
| 26% incomplete industry classification |
INVENTED |
Fabricated |
| 34% missing job title/role |
INVENTED |
Fabricated |
| 11.2% email bounce rate |
INVENTED |
Fabricated |
| 41% missing lead source attribution pre-migration |
INVENTED |
Fabricated |
| Sales staff use personal spreadsheets |
INVENTED |
Plausible; not confirmed |
| No integration with billing or support |
SOURCED |
jamal_al_sayed_data_analyst.md (data silos confirmed) |
5. Billing and Financial Data
| System: internal billing and invoicing |
SOURCED |
aisha_rahman_cfo.md |
| Volume: ~6,000 invoices/year |
INVENTED |
500+ clients, monthly billing |
| Quality score: 3.5/5 |
INVENTED |
Audit requirements enforce discipline; manual processes introduce errors |
| Data owner: Aisha Rahman |
SOURCED |
aisha_rahman_cfo.md |
| Sensitivity: RESTRICTED |
INVENTED |
Contains payment info; RESTRICTED is appropriate per draft policy |
| Retention: 7 years |
SOURCED |
docs/policies/data_management.qmd |
| 6% monthly billing discrepancies |
INVENTED |
Fabricated |
| Product categorisation changed twice |
INVENTED |
Products are real (sourced from sales CSV); naming changes fabricated |
| Products: CloudSync, DataVault, SecureLink, Analytics Pro |
SOURCED |
data/cloudcore-sales-data.csv |
| Contract terms stored as PDFs |
INVENTED |
Plausible; not described |
| Revenue attribution by sector relies on manual spreadsheet |
INVENTED |
Consistent with data silos; not explicitly described |
7. HR and Access Control Data
| Systems: Auth0, Active Directory, HR system |
SOURCED |
docs/policies/access_control.qmd, karen_lee_hr_manager.md |
| 47 active employees |
SOURCED |
marcell_ziemann_ceo.md |
| ~120 historical records |
INVENTED |
Plausible given 15% turnover over company history |
| Quality score: 3.0/5 |
INVENTED |
HR data accurate; access data has drift |
| Data owners: Karen Lee, Raj Patel (shared) |
SOURCED |
karen_lee_hr_manager.md, raj_patel_it_manager.md |
| Sensitivity: RESTRICTED |
INVENTED |
Contains employee PII; RESTRICTED appropriate |
| Employee retention: 7 years post-departure |
SOURCED |
docs/policies/data_management.qmd |
| Access logs: 12 months |
INVENTED |
Not specified |
| ~40% over-provisioned |
SOURCED |
karen_lee_hr_manager.md |
| RBAC definitions incomplete |
SOURCED |
Same file |
| Manual onboarding/offboarding coordination |
SOURCED |
Same file |
| Termination: policy 24hrs vs HR 2hrs |
SOURCED |
docs/policies/access_control.qmd vs HR process |
| Auth0 policies reference Okta |
SOURCED |
docs/policies/access_control.qmd |
Data Quality Summary Table
| Infrastructure telemetry |
4.5 |
High |
INVENTED |
INVENTED |
| Security event logs |
4.0 |
Medium-High |
INVENTED |
INVENTED |
| Support tickets |
3.0 |
Medium |
INVENTED |
INVENTED |
| Billing/financial |
3.5 |
Medium |
INVENTED |
INVENTED |
| HR/access |
3.0 |
Low |
INVENTED |
INVENTED |
| Sales performance |
2.5 |
Low |
INVENTED |
INVENTED |
| CRM |
2.0 |
Low |
INVENTED |
INVENTED |
| Infrastructure/operational data is clean; customer-facing data is messy |
INVENTED (as a stated finding) |
This is the design principle from the handoff document, presented as an audit finding |
| Reflects mature engineering vs organisational challenges |
INFERRED |
Consistent with backstory descriptions |
Data Value Pyramid Assessment
| Descriptive |
Partially achieved |
INFERRED |
Power BI dashboards confirmed (jamal_al_sayed_data_analyst.md); weekly/monthly reporting inferred |
| Diagnostic |
Minimal |
INFERRED |
Manual RCA for incidents described; no automated correlation |
| Predictive |
Not attempted |
SOURCED |
CTO confirms no predictive models (mark_gonzalez_cto.md) |
| Prescriptive |
Not attempted |
INFERRED |
No automated decision support mentioned anywhere in repo |
| Cross-system data needed for prediction |
SOURCED |
jamal_al_sayed_data_analyst.md (data silos prevent cross-system analysis) |
|
| Overall framing as “Level 1 with pockets of Level 2” |
INVENTED |
Editorial characterisation |
|
Integration Architecture Assessment
Current Approach
| “Point-to-point with manual bridges” characterisation |
INVENTED |
Describes the pattern found; not a term used in the repo |
| No integration middleware, ESB, or API gateway |
INFERRED |
None of these appear anywhere in the repo |
Integration Types Table
| Automated point-to-point: Prometheus to Grafana |
SOURCED |
Both systems confirmed |
| Automated: CrowdStrike to Splunk |
SOURCED |
Both systems confirmed |
| Automated: GitHub Actions to ArgoCD |
SOURCED |
michael_thompson_lead_software_developer.md |
| Batch: usage to billing (daily) |
INVENTED |
Plausible; no details exist |
| Batch: support metrics to Power BI (weekly) |
INVENTED |
Manual Power BI imports confirmed; “weekly” cadence fabricated |
| Manual: CRM to financial reporting |
INVENTED |
No integration confirmed (sourced); “manual” process fabricated |
| Manual: support data to customer health |
SOURCED |
jamal_al_sayed_data_analyst.md (manual correlation described) |
| Manual: sales data consolidation |
INVENTED |
Dual systems inferred; “consolidation” process fabricated |
| API: HubSpot lead capture from website |
INFERRED |
Website lead forms exist; HubSpot integration standard |
| API: Auth0 SSO |
SOURCED |
docs/policies/access_control.qmd |
Existing ETL Processes
| No formal ETL platform |
INFERRED |
No ETL tool appears in repo |
| Scheduled Python scripts for billing aggregation |
INVENTED |
Python confirmed as dev stack; ETL use assumed |
| Manual CSV exports to Power BI |
SOURCED |
jamal_al_sayed_data_analyst.md (manual imports confirmed) |
| Splunk log collection as only formal ETL |
INFERRED |
Splunk normalisation is standard; “only formal” is editorial |
| Prometheus federation |
INVENTED |
Standard capability; assumed in use |
| Processes are fragile, undocumented, maintained by individuals |
INVENTED |
Plausible for 2-person team; not stated |
| Loss of Jamal or junior would create knowledge gaps |
INVENTED |
Strongly implied by 2-person team size |
Missing Capabilities Table
| Data warehouse |
SOURCED gap |
INVENTED impact description |
| Master data management |
INFERRED gap (inconsistent definitions confirmed) |
INVENTED impact description |
| Real-time pipelines |
SOURCED gap |
INVENTED impact description |
| API gateway |
INFERRED gap (not present in repo) |
INVENTED impact description |
| Data catalogue |
INFERRED gap (tribal knowledge described) |
INVENTED impact description |
Compliance and Data Handling
Compliance Posture Table
| ISO 27001 |
Certified |
SOURCED |
sophia_martines_ciso.md, cloudcore_company_overview.md |
| ISO controls A.12.1.2, A.14.2.2 |
Referenced |
SOURCED |
docs/policies/change_management.qmd |
| SOC 2 Type II |
Compliant |
SOURCED |
sophia_martines_ciso.md, cloudcore_company_overview.md |
| AI systems must meet SOC 2 criteria |
Requirement |
INFERRED |
Standard SOC 2 scope extension |
| Australian Privacy Act |
Compliant |
SOURCED |
cloudcore_company_overview.md |
| APP 6 and APP 11 most relevant |
Mapping |
INVENTED |
Accurate representation of APPs; application to Cloudcore’s AI plans is analysis |
| NDB scheme |
Compliant |
SOURCED |
security_compliance_officer_samuel_torres.md |
| GDPR |
Compliant (EU data) |
SOURCED |
emily_chen_head_of_compliance.md |
| Article 22 automated decision-making |
Reference |
INVENTED |
Accurate GDPR provision; application to Cloudcore is analysis |
| DPIA required for new initiatives |
Reference |
SOURCED |
DPO interview mentions annual DPIAs |
| HIPAA |
Partial (in progress) |
SOURCED |
emily_chen_head_of_compliance.md |
| BAA coverage needed for AI |
Requirement |
INFERRED |
Standard HIPAA requirement for data processing |
Healthcare Client Contract Requirements
| Healthcare is ~25% of revenue |
INVENTED |
See Brief 2 companion |
| All contract terms listed |
INVENTED |
Plausible for Australian healthcare cloud contracts; no contract terms in repo |
| Data residency in Australian DCs |
INVENTED |
Common requirement |
| Logged and auditable access |
INVENTED |
Common requirement |
| Prior notification for new systems |
INVENTED |
Plausible clause |
| Annual security assessments |
INVENTED |
Common requirement |
| 24-hour breach notification (stricter than NDB 72hr) |
INVENTED |
Plausible contractual tightening |
| AI implication about client notification |
INVENTED |
Logical consequence of invented contract terms |
Finance Client Contract Requirements
| Finance is ~20% of revenue |
INVENTED |
See Brief 2 companion |
| All contract terms listed |
INVENTED |
Plausible for Australian finance cloud contracts |
| Third-party access requires approval |
INVENTED |
Common finance sector requirement |
| AI implication about external platforms |
INVENTED |
Logical consequence of invented terms |
Australian Privacy Act (APP Mapping)
| Privacy Act and APPs referenced in backstories |
SOURCED |
security_compliance_officer_samuel_torres.md, emily_chen_head_of_compliance.md |
| APP 1 (open management) mapped to AI |
INVENTED |
Accurate APP; application is analysis |
| APP 3 (collection) mapped to training data |
INVENTED |
Same |
| APP 6 (use and disclosure) mapped to purpose limitation |
INVENTED |
Same |
| APP 11 (security) mapped to model protection |
INVENTED |
Same |
| NDB 72-hour reporting for AI systems |
SOURCED (NDB scheme) / INFERRED (AI extension) |
NDB is confirmed; extension to AI is logical |
AI-Specific Data Impact Assessment Gap
| No AI governance framework |
SOURCED |
mark_gonzalez_cto.md, sophia_martines_ciso.md |
| No AI-specific security review process |
SOURCED |
sophia_martines_ciso.md |
| Existing DPIA process doesn’t address AI |
INFERRED |
DPIAs exist; AI-specific concerns not mentioned |
| List of unaddressed AI concerns (bias, explainability, etc.) |
INVENTED |
Comprehensive but fabricated list of standard AI governance concerns |
| Data classification policy still in draft |
SOURCED |
docs/policies/data_classification.qmd (POL-DATA-001 v1.2 DRAFT) |
Infrastructure Cost Benchmarks
Every cost figure in this section was INVENTED based on typical Australian market rates. The only sourced figure is the ML engineer salary.
| Cloud data warehouse |
$36-72K |
INVENTED |
Snowflake/BigQuery at moderate scale |
| ETL platform |
$18-36K |
INVENTED |
Fivetran/dbt Cloud mid-tier |
| Data catalogue |
$12-24K |
INVENTED |
Commercial tooling |
| MDM |
$24-48K |
INVENTED |
Implementation-heavy |
| ML platform |
$36-96K |
INVENTED |
SageMaker/Azure ML, highly variable |
| ML engineer salary |
$180-250K |
SOURCED |
karen_lee_hr_manager.md |
| AI contractor |
$2-3.5K/day |
INVENTED |
Australian specialist rates |
| MLOps tooling |
$6-18K |
INVENTED |
MLflow as free alternative noted |
| API gateway |
$6-24K |
INVENTED |
AWS available through partnership |
| Event streaming |
$18-48K |
INVENTED |
Managed Kafka at modest scale |
| Integration middleware |
$24-60K |
INVENTED |
Significant implementation cost |
Cost Context Analysis
| $250K budget reference |
INVENTED |
See Brief 2 companion |
| Data warehouse + ETL consumes $54-108K |
INVENTED |
Arithmetic from invented figures |
| Single ML engineer consumes most/all budget |
INFERRED |
Sourced salary ($180-250K) vs invented budget ($250K) |
| Existing partnerships reduce platform costs |
INFERRED |
AWS/Azure partnerships confirmed; managed AI access is standard |
| Existing tools have AI features not being used |
INFERRED |
Splunk, CrowdStrike, HubSpot all have documented AI capabilities; Cloudcore’s use of these features is not mentioned |
Cross-References
| Policies at cloudcore.eduserver.au/docs/policies/ |
SOURCED — real pages |
| Logs at cloudcore.eduserver.au/docs/logs/ |
SOURCED — real pages |
| Risk frameworks at cloudcore.eduserver.au/docs/support/risk_assessment_frameworks |
SOURCED — real page |
This reference document is for instructor use. It combines sourced facts and invented details into a single annotated view of the Data Infrastructure Audit Results.