Week 1 T&L Activities:

3.1 Data, information and knowledge

3.1.1 The differences and relationships between, data, information and knowledge.

1. Data

  • What it is:
    Data is the raw facts and figures. On its own, it doesn’t have much meaning until it’s organised or processed.

  • Example (Digital Support & Security):
    Imagine a server log that records every login attempt. A single line might look like this:
    2025-09-01 09:45:12 | User: JamesF | Login: Failed
    On its own, that’s just one piece of data.


2. Information

  • What it is:
    When data is processed, organised, or put into context so it makes sense, it becomes information. Information answers questions like “who?”, “what?”, “where?”, “when?”.

  • Example (Digital Support & Security):
    If you take the server log data and count how many failed login attempts happened in the last 24 hours, you might discover:
    “There were 45 failed login attempts from 5 different IP addresses on the college’s network.”
    This is information because it’s structured and tells you something meaningful.


3. Knowledge

  • What it is:
    Knowledge is when you analyse and interpret the information to make decisions or take action. It answers “why?” and “how?”.

  • Example (Digital Support & Security):
    From the information about 45 failed login attempts from 5 IPs, you recognise a possible brute-force attack on student accounts. You know this because of your training in cybersecurity, which tells you that multiple failed logins from a small set of IP addresses is a common threat indicator.
    Using this knowledge, you might:

    • Block those IP addresses in the firewall.

    • Alert the IT security team.

    • Review authentication logs for suspicious activity.

 

3.1.2 The sources for generating data:

Human (Surveys, Forms)

Humans generate data whenever they give information directly – for example, filling in a form, survey, or feedback questionnaire. This is usually self-reported data (what a person chooses to share).

(Digital Support & Security):
A college IT support team might send students a survey about how secure they feel when using online learning platforms. The answers (Yes/No, ratings out of 5, written comments) are data that can be collected and analysed.

 


Artificial Intelligence (AI) / Machine Learning – Dangers of Feedback Loops

AI and machine learning systems create data as they learn from user behaviour. A feedback loop happens when the AI uses its own output as new input, which can lead to bias or errors being reinforced.

 (Digital Support & Security):A cybersecurity monitoring tool that uses machine learning to detect suspicious logins could wrongly flag normal student behaviour (like logging in late at night) as a threat. If those false alarms are fed back into the system as “evidence,” it may become overly strict and block real students from logging in.

 


Sensors (Temperature, Accelerometer, Vibration, Sound, Light, Pressure)

Sensors collect data from the environment. They measure physical things like heat, movement, sound, or light.

(Digital Support & Security):
In a server room at college, temperature sensors monitor if equipment is overheating. If the temperature goes above a safe level, the system can trigger an alert to the IT support team before the servers shut down.


Internet of Things (IoT) – Smart Objects

IoT devices are everyday objects connected to the internet (e.g., smart lights, thermostats, security cameras). They collect and send data automatically.

(Digital Support & Security):
A college might use smart security cameras that detect movement and send alerts to the IT team’s dashboard. This data helps keep the campus safe, but IT staff must also secure the devices to stop hackers from gaining access.


Transactions (Customer Data, Membership, Timing, Basket)


Every time someone buys something, signs up for a service, or logs into a system, data is generated. Transactions create a digital footprint.

(Digital Support & Security):
When a student pays for a college esports event online, the system records:

  • The student’s name

  • Payment method

  • Date & time

  • Items purchased (e.g., entry ticket + team jersey)
    This transaction data must be stored securely to comply with data protection laws (like GDPR) and to prevent cybercriminals from stealing card details.

 

 

"Data Detectives"
Scenario:

You’re part of a college IT support team. The college wants to improve security and gather data from different sources. Below are some situations.

Task (10 mins):
For each scenario, identify:
What is the raw data?
What information can you get from that data?
What knowledge or decisions could the IT team make?

Scenarios:

1 - A survey sent to 200 students asks: “Do you feel safe using the college Wi-Fi?” 120 say Yes, 80 say No.
2 - A machine learning tool notices that a student logs into the network at 2 a.m. every night and flags it as unusual.
3 - A server room temperature sensor records 35°C at 3:00 p.m. (normal temperature should be under 25°C).
4 - The college installs smart locks on computer labs that record every time someone enters or leaves.
5 - The esports society’s online shop records that most students buy merchandise around payday (the 28th of each month)

Extension (5 mins):
Find an example of an IoT device that could be used in a school or esports setting.
Describe what data it collects, what information it provides, and what knowledge the IT team could gain from it.

 

3.1.3 Ethical data practices and the metrics to determine the value of data:

Ethical Data Practices & Metrics to Determine Data Value

Before we dive into the metrics, remember:
Ethical data practice means collecting, storing, and using data responsibly.
This includes:

  • Getting permission (consent) from users.

  • Protecting data from cyberattacks.

  • Not misusing personal information.

  • Following laws like GDPR (General Data Protection Regulation) in the UK/EU.

Now, let’s explore the metrics used to decide how valuable data is.


Quantity


Quantity refers to the amount of data collected. More data can help identify patterns more accurately.

(Digital Support & Security):
A college IT team collects data from 10 login attempts vs 10,000 login attempts. The larger dataset is more valuable because it shows broader patterns (e.g., which times of day attacks are most common).


Don’t collect more data than necessary – only gather what’s useful (“data minimisation” under GDPR).


Timeframe

Timeframe is about when the data was collected and how long it remains relevant. Recent data is often more valuable than old data.

(Digital Support & Security):
A log of failed Wi-Fi logins from yesterday is more useful for spotting a live cyberattack than logs from 2019.

Don’t keep data longer than necessary. For example, student support tickets might be deleted after a year once resolved.


Source

The value of data depends on where it comes from and how trustworthy the source is.

(Digital Support & Security):

Login data from the college’s own servers = reliable source.

A random spreadsheet emailed by an unknown user = unreliable (could be fake or manipulated).

Always check sources and avoid using stolen or illegally obtained data.


Veracity

Veracity means the accuracy and truthfulness of data. Data full of errors or lies is less valuable.

(Digital Support & Security):
If students fill in a survey about cyber safety and many joke by giving fake answers (“My password is 123456”), the veracity of the data is low, so the results can’t be trusted.

Organisations should clean and validate data, and not mislead people by presenting false or incomplete results.

 

3.1.4 How organisations use data and information:

Analysis to Identify Patterns

Organisations look at large sets of data to find trends, behaviours, or repeated issues. Patterns help predict future events and improve decision-making.

The IT support team analyses helpdesk tickets and notices that every Monday morning, many students report Wi-Fi login problems. The pattern suggests that systems might need restarting after the weekend.

Google analyses search trends (e.g., millions of people suddenly searching for the same issue). This helps them detect outbreaks of cyberattacks or bugs spreading online.


System Performance Analysis (Load, Outage, Throughput, Status)


Organisations monitor how well their systems are running:

  • Load – how much demand is placed on the system (e.g., number of users).

  • Outage – when systems go down or stop working.

  • Throughput – how much data or traffic can pass through the system.

  • Status – current health of servers, networks, or applications.


An esports tournament hosted at a college requires fast servers. The IT team monitors server load and bandwidth usage during live matches. If the system slows down, they can add more resources to avoid crashes.

Amazon Web Services (AWS) constantly monitors its cloud servers. If a data centre goes down, traffic is automatically re-routed to another server to prevent downtime for customers.


User Monitoring (Login/Logout, Resources Accessed)

Organisations track user activity to ensure systems are being used correctly and securely.

A college IT team monitors who logs into the Virtual Learning Environment (VLE). If a student logs in from two countries within the same hour, it may indicate a hacked account.

Microsoft 365 monitors user logins across the world. If an account logs in from London and then five minutes later from New York, it may block the login and alert security teams.


Targeted Marketing (Discounts, Upselling)

Organisations use data about customer behaviour to send personalised offers, suggest upgrades, or advertise products people are likely to buy.

A college esports society collects data on what students buy in the online shop. If a student buys a gaming jersey, they might get an email offering a discount on a matching mousepad.

Steam (Valve) analyses what games you play and recommends new titles you’re likely to enjoy. They also send personalised sale notifications to encourage more purchases.


Threat/Opportunity Assessment (Competitors, Security, Compliance)

Organisations analyse data to spot risks (threats) or advantages (opportunities). This can relate to cybersecurity, business competition, or legal compliance.

The IT security team compares data about phishing attempts with government alerts from the NCSC (National Cyber Security Centre). If a new type of phishing attack is targeting colleges, they can prepare staff with updated training – turning a threat into an opportunity to strengthen security.


NCSC (UK) collects data on cyber incidents across the country. They publish reports on new cyber threats, which organisations use to improve security and stay compliant with regulations like GDPR.

 

"Data in Action"

Scenario:

You are working in the IT support and security team for a college esports club. You have access to the following datasets:

1 - Login records: Show that some students are logging in at 3 a.m. from outside the UK.
2 - Server stats: During last Friday’s tournament, the main game server slowed down when 200 players connected at once.
3 - Shop sales: Jerseys sell out every time there’s a big tournament, but headsets don’t sell as well.
4 - Competitor data: Another nearby college just announced a new gaming lab with high-spec PCs.

Task:
1 - Analysis to Identify Patterns:

Which dataset shows a repeated trend?
What pattern do you see?

2 - System Performance:
Which dataset shows a system issue?
What actions should IT take to prevent it happening again?

3- User Monitoring:
What do the login records tell you?
What security risks do they suggest?

4 - Targeted Marketing:
How could the esports club use the shop sales data to increase revenue?

5 - Threat/Opportunity Assessment:
How should the club respond to the competitor’s new gaming lab?

Extension:
Research how a company like Netflix or Amazon uses data to recommend products or detect suspicious activity.
Share your findings with the group.

 

3.1.5 Interrelationships between data, information and the way it is generated and make judgements about the suitability of data, information and the way it is generated in digital support and security.

What this means

  • Data = raw facts or figures (numbers, logs, text, clicks, etc.) without context.

  • Information = processed, organised, and meaningful data that helps people make decisions.

  • Way it is generated = how the data is collected (e.g. login records, surveys, sensors, monitoring tools).

These three parts are linked together:

  1. The way data is generated determines the type and quality of the data you get.

  2. That raw data needs to be processed and organised.

  3. Once processed, the data becomes information that can be used to make decisions.

If the data is incomplete, biased, or collected in the wrong way, the information may not be suitable for decision-making.


"A College Cybersecurity Incident Response"

Scenario:
A UK college notices that some students’ accounts have been logging in at unusual times. The IT security team collects data from three different sources:

1 - Login/Logout Records (system generated data)
2 - Firewall Logs (network traffic data, showing unusual connections from overseas IPs)
3 - Incident Reports (manually generated by staff when they notice suspicious behaviour)

How the interrelationships work:

  • Data:

    • Login records show timestamps, usernames, and IP addresses.

    • Firewall logs capture packet traffic and potential intrusion attempts.

    • Staff reports note suspicious emails and students complaining about locked accounts.

  • Information (processed data):

    • Combining the login timestamps with IP addresses shows multiple students logging in from a single overseas location at odd hours.

    • Staff reports confirm phishing emails were sent to many accounts the day before.

  • Suitability of Data:

    • Login data: Useful and reliable, but could be misleading if students use VPNs.

    • Firewall logs: Provide technical detail, but require expertise to interpret.

    • Staff reports: Subjective, but add valuable context about user behaviour.

  • Judgement:
    The most suitable data in this case is the combination of automated system logs (objective, timestamped evidence) and user-reported incidents (human context). Relying on only one source could lead to misinterpretation (e.g. mistaking a VPN for a hacker).


Real-World Industry Example

NHS Digital (UK Health Service) collects data from hospital IT systems about cyber incidents.

In 2017’s WannaCry ransomware attack, logs showed unusual traffic patterns while staff reported being locked out of systems.

By combining both machine data (network logs, malware signatures) and human-reported issues, NHS Digital was able to coordinate with cybersecurity agencies to restore services and improve future protections.

This demonstrates how data, information, and generation methods must work together to make correct security decisions.


"Data to Information Detective"

1 - Work in pairs or small groups.
2 - Read the case study above about the college cybersecurity incident.
3 - Answer the following questions together (10 minutes):

Data: List two types of raw data the IT team collected. Why is each useful?
Information: How did the IT team turn the raw data into useful information?
Suitability: Which source of data (login logs, firewall logs, or staff reports) do you think is most reliable for making security decisions? Why?
Judgement: If you were the IT manager, what actions would you take based on the information gathered? (E.g., resetting passwords, training, blocking IP addresses.)

Extention:
(Optional challenge if time allows, 5 minutes):

Think of a real organisation (like a bank, online shop, or gaming company).

What kind of data do they collect?
How do they turn it into information?
What threats or opportunities might this create?

Output:
Each group should share one key insight with the class about why it’s important to think about both the data itself and how it’s generated when making digital support or security decisions.


Files that support this week

English:

Assessment:


Learning Outcomes:
Awarding Organisation Criteria:
Maths:
Stretch and Challenge:
E&D / BV
Homework / Extension:
ILT
  →  →  →  →  →  →
Week 2 T&L Activities:

3.2 Methods of transforming data

3.2.1 Methods of transforming data:

When organisations collect data, it is often raw and not immediately useful. To make it valuable, it must be transformed. The main methods are:

  • Manipulating

  • Analysing

  • Processing

Manipulating Data


Changing or reorganising data to make it more understandable or useful. This might include filtering, sorting, or combining data from different sources.

A college IT support team exports login data from the network. At first, it’s just thousands of rows of timestamps and usernames. By manipulating the data (sorting by user, filtering failed attempts), they quickly see which accounts have repeated login failures.

Splunk and Elastic (ELK Stack) are widely used in cybersecurity to manipulate and search through huge log files, making it easier to spot patterns of suspicious behaviour

 

Analysing Data


Looking at data in depth to identify patterns, trends, or relationships. Analysing moves beyond just reorganising – it’s about making sense of the information.

After manipulating login records, the IT team analyses them and notices that 80% of failed logins happen between midnight and 3 a.m. This unusual pattern suggests a brute-force attack.

IBM Security QRadar analyses logs from multiple systems (firewalls, servers, apps) to detect cyber threats by identifying unusual traffic patterns.

 

Processing Data

Converting raw data into a different format or structure so it can be used by systems, applications, or people. Processing often involves automation.

A system collects sensor data from a server room (temperature, humidity). This raw data is processed into a dashboard that shows “green, amber, red” warnings. IT staff don’t need to read every number – the processed data tells them instantly if action is needed.

SIEM (Security Information and Event Management) tools like Azure Sentinel automatically process logs from thousands of endpoints and generate alerts for IT teams.

 

You are part of a college IT security team. Below is some raw login data:

                          +----------------+---------------------------------+------------+
                          |    Username  |               Timestamp           |    Status   |
                          +----------------+---------------------------------+------------+
                          |       Alex01     |     02/09/2025 00:15:12      |  Failed     |
                          +----------------+---------------------------------+------------+
                          |       Alex01     |     02/09/2025 00:15:12      |  Failed     |
                          +----------------+---------------------------------+------------+
                          |       Alex01     |     02/09/2025 00:15:12      |  Failed     |
                          +----------------+---------------------------------+------------+
                          |       Sam02     |     02/09/2025 00:15:12      |  Success   |
                          +----------------+---------------------------------+------------+
                          |       Mia03      |     02/09/2025 00:15:12      |  Failed     |
                          +----------------+---------------------------------+------------+
                          |       Mia03      |     02/09/2025 00:15:12      |  Failed     |
                          +----------------+---------------------------------+------------+
                          |       Mia03      |     02/09/2025 00:15:12      |  Success  |
                          +----------------+---------------------------------+------------+


Task:
Manipulating:

Sort the data by username. What do you notice?

Analysing:
Which accounts show suspicious behaviour? Why?

Processing:
Imagine you are designing a dashboard. How would you present this data (e.g., traffic light system, charts, alerts)?

Extension:
Research one industry tool (Splunk, ELK Stack, QRadar, or Azure Sentinel).
Explain: Does it mainly manipulate, analyse, or process data – or all three?

 

 

 


Files that support this week

English:

Assessment:


Learning Outcomes:
Awarding Organisation Criteria:
Maths:
Stretch and Challenge:
E&D / BV
Homework / Extension:
ILT
  →  →  →  →  →  →
Week 3 T&L Activities:

3.3 Data taxonomy

What is a Taxonomy?

Think of a taxonomy like a family tree, but for data. It’s a way of splitting things into groups so we know what type of data we’re dealing with.

 

3.3.1 Definition of qualitative and quantitative, its purpose, and how data is categorised

Quantitative

quantitative data – which basically means numbers. If you can count it or measure it, it’s quantitative.

Two Types of Quantitative Data can be

Discrete Data

  • Discrete means things you can count in whole numbers.

  • You can’t have half of one, it’s either 1, 2, 3… but not 2.5.

  • In IT support/security:

    • How many times a student typed the wrong password.

    • The number of emails flagged as spam.

    • How many viruses an antivirus tool finds.

 “How many login attempts failed this morning?” and you answer “7”, that’s discrete data.

Continuous Data

  • Continuous means measurements – and you can have decimals.

  • In IT support/security:

    • The server room temperature (22.3°C, 22.4°C, etc.).

    • Bandwidth speed during an esports match (245.6 Mbps).

    • CPU load (%) on a computer.

“What’s the server temperature right now?” and it says “23.5°C” – that’s continuous data.

Both are useful, but in different ways:

  • Discrete data is great for counting events – like how many people tried to hack into your system.

  • Continuous data is better for monitoring performance – like spotting if your server is overheating or slowing down.

 

Take Amazon Web Services (AWS) they’re running thousands of servers worldwide they use discrete data to count login attempts and block suspicious ones. At the same time, they use continuous data to monitor server performance. If both types spike at once, they know something is wrong.

 

Qualitative.

What is Qualitative Data?

Qualitative data is about descriptions, opinions, and categories rather than numbers.

Types of Qualitative Data;

Categorical (or Nominal) Data

Data that can be sorted into groups, but the groups don’t have a natural order.

  • In Digital Support & Security:

    • Type of cyberattack: phishing, malware, ransomware, brute force.

    • Operating system: Windows, macOS, Linux.

    • User role: student, staff, admin.

It’s like labels – they tell you what “type” something is, but not which one is bigger or better.

 

Ordinal Data

Data that can be put in a ranked order, but the gaps between them aren’t necessarily equal.

  • In Digital Support & Security:

    • Student feedback on password security training (Poor, Okay, Good, Excellent).

    • Cybersecurity risk ratings: Low, Medium, High, Critical.

    • Priority of support tickets: Urgent, Medium, Low.

So ordinal data has a sense of order, but it’s not really about numbers. “High risk” is more serious than “Low risk,” but we can’t say it’s exactly “two times” more serious.

Quantitative data is great for spotting patterns in numbers – but qualitative data adds the human side:

  • What people think

  • How people feel

  • Why something is happening

NCSC (National Cyber Security Centre, UK):

They collect quantitative data about how many phishing emails are reported but they also collect qualitative data from feedback surveys asking staff how confident they feel spotting phishing emails, by combining the two, they can judge not just how many phishing attempts are happening, but also how well people are prepared to deal with them.


Case Study: College Cybersecurity Awareness

Your college has recently run a campaign to improve cybersecurity awareness among students and staff. The IT support and security team collected both quantitative and qualitative data to see if it worked.

Data Collected:
• Quantitative (numbers):
   - 1,200 phishing emails reported in Term 1.
   - Only 450 phishing emails reported in Term 2.
   - 95% of students logged in successfully without needing password resets.

• Qualitative (opinions/descriptions):
   - “I feel more confident spotting phishing emails now.”
   - “The password rules are still too complicated.”
   - “Training was useful but too short.”
   - Risk ratings given by IT staff: Low, Medium, High.

Task Part 1 – Analysis (20 mins, group work)

Work in small groups and:
1. Identify the quantitative data in the case study.
2. Identify the qualitative data in the case study.
3. Explain how each type of data helps the IT team understand the effectiveness of the campaign.
4. Make a judgement: Do the numbers and opinions show the campaign was successful? Why or why not?

Task Part 2 – Research (Homework or 30 mins independent task)

Each group must research a real-world cybersecurity awareness campaign. Examples:
- NCSC “Cyber Aware” (UK)
- Google Security Checkup
- StaySafeOnline.org (US)
- OR another campaign you find.

For your chosen case:
- Find one example of quantitative data they collected.
- Find one example of qualitative data they used.
- Explain how combining both types of data made their campaign stronger.

Task Part 3 – Group Presentation (15 mins prep + delivery in next lesson)

Prepare a 5-minute presentation to share with the class. Your presentation should include:
1. A short explanation of the difference between quantitative and qualitative data.
2. An analysis of the college case study – was the awareness campaign effective?
3. Findings from your research case study.
4. A recommendation: If you were the IT manager, what would you do next to improve cybersecurity awareness?

Tip: Use visuals like graphs (for quantitative data) and word clouds or quotes (for qualitative data).

Extension / Stretch Task
Design your own mini research survey that collects both quantitative and qualitative data about how safe students feel online. Share 3–5 questions (mix of numerical scales and open-ended questions).

 

3.3.2 Know the definition for structured data, understand its purpose, and understand that quantitative data is structured.

3.3.3 Know the definition for unstructured data, understand its purpose, and understand that qualitative data is unstructured.

3.3.4 Know the definition for each representation and understand the representations of quantitative data:

• discrete values

• continuous values

• categorical values.

3.3.5 Know and understand the properties of qualitative data:

• stored and retrieved only as a single object

• codified into structured data.

3.3.6 Understand the interrelationships between data categories data structure and transformation and make judgements about the suitability of data categories, data structure and transformation in digital support and security.

 


Files that support this week

English:

Assessment:


Learning Outcomes:
Awarding Organisation Criteria:
Maths:
Stretch and Challenge:
E&D / BV
Homework / Extension:
ILT
  →  →  →  →  →  →
Week 4 T&L Activities:

3.4 Data types

3.4.1 The definition of common data types, their purpose, and when each is used:

Integer (Whole numbers)

What/why: Whole numbers (no decimal point). Efficient for counting, indexing, quantities, IDs.
When to use: Anything that can’t have fractions: number of users, attempts, port numbers, stock counts.
Gotchas: Watch out for range limits (e.g., 32-bit vs 64-bit) and accidental division that produces decimals.

Example Suitable Uses Not Suitable For
0 Counter start Currency with pennies
7 Login attempts Temperatures needing decimals
65535 Network ports (unsigned) Precise measurements (e.g. cm)
-12 Temperature differences  

Real (Floating-point / Decimal)

What/why: Numbers with fractional parts.
When to use: Measurements (temperature, CPU load), ratios, scientific values.
Gotchas: Floating-point rounding error (binary floating point). For money, prefer fixed-point/decimal types.

Example Suitable Uses Notes
3.14 Maths/geometry Stored as float/double
-0.75 Signal values Rounding errors possible
72.5 CPU temperature °C Use DECIMAL for money (not float)

Character (Char)

What/why: A single textual symbol (one character).
When to use: Fixed-width codes (Y/N flags), single-letter grades, check digits.
Gotchas: In Unicode, a “character” users see may be multiple code points (accents/emoji). Many systems still treat CHAR as a single byte/letter in a given encoding.

Example Suitable Uses Notes
'Y' Yes/No flag Case sensitivity may matter
'A' Grade Encoding/locale may affect storage
'#' Delimiter symbol  

String (Text)

What/why: Ordered sequence of characters (words, sentences, IDs).
When to use: Names, emails, file paths, JSON blobs-as-text, logs.
Gotchas: Validate length and content; normalise case; be mindful of Unicode, whitespace, and injection risks.

Example Suitable Uses Validation Ideas
"James Farrington" Person name Trim spaces; allow accents
"[email protected]" Email address Regex format check
"/var/www/index" File path Disallow control chars
"BE-UK-2025-00017" Reference code Check length & pattern

Boolean (True/False)

What/why: Logical truth value with two states.
When to use: Feature flags, on/off, pass/fail, access granted/denied.
Gotchas: In databases and CSVs, Booleans are often stored as 1/0, TRUE/FALSE, Y/N—be consistent when importing/exporting.

Example Suitable Uses Storage Variants
TRUE MFA enabled? TRUE/FALSE, 1/0, or Y/N
FALSE Account locked? Keep consistent across DBs

Date (and Date/Time)

What/why: Calendar date (optionally time and timezone).
When to use: Timestamps for logs, booking dates, certificate expiry, backups.
Gotchas: Time zones and daylight saving; choose UTC for servers, localise only for display. Use proper date types, not strings, for comparisons and indexing.

Example Suitable Uses Notes
2025-09-02 Report date Use ISO 8601 format
2025-09-02T10:30:00Z Audit timestamp (UTC) Store UTC, display in local timezone
2025-12-31T23:59:59+1 Regional display Avoid treating dates as strings

BLOB (Binary Large Object)

What/why: Arbitrary binary data (files) stored as a single value.
When to use: Images, PDFs, compressed archives, firmware, encrypted payloads—when you must keep the bytes intact.
Gotchas: Large size affects backups and query speed; consider storing large files in object storage (S3, Azure Blob) and keep only a URL/metadata in the database.

Example Suitable Uses Notes
PNG logo bytes Small media in DB Mind database size limits
PDF policy document Immutable file storage Often better in file/object storage
Encrypted payload Secure binary storage Store MIME type, size, checksum for integrity

 

3.4.2 The interrelationships between structured data, unstructured data and data type.

In today’s digital world, organisations gather information in many different forms – from neatly organised spreadsheets of customer transactions to complex streams of emails, images, and social media posts. To make sense of this, we look at three key concepts: structured data, unstructured data, and data types.

Structured data is highly organised, stored in predefined formats such as rows and columns within a spreadsheet or database. This makes it straightforward to search, filter, and analyse. Examples include account numbers, dates of birth, and purchase amounts.

Structured Data

  • Organised in a predefined format (rows, columns, fields).

  • Easily stored in databases (SQL, relational systems).

  • Examples: customer IDs, transaction amounts, dates, sensor readings.

By contrast, unstructured data has no fixed format or schema, making it harder to process. It includes content such as emails, audio recordings, images, videos, or free-text survey responses. While it carries rich insights, it requires more advanced tools and techniques to interpret.

Unstructured Data

  • No fixed schema or easily searchable structure.

  • Stored in raw formats like documents, images, videos, social media posts.

  • Examples: customer service call recordings, CCTV footage, email bodies.

At the foundation of both lies the concept of data types. A data type defines how a particular piece of information is stored and used – for instance, an integer for whole numbers, a string for text, or a blob for multimedia. Structured systems rely on data types to keep information consistent, while unstructured data is often stored in broader types like text fields or binary objects to preserve its form.

Together, these three elements form the backbone of how data is represented, stored, and ultimately transformed into meaningful information.

Examples in Practice

Scenario Structured Data Unstructured Data Data Types in Play
Banking Transactions Account ID, amount, timestamp Call centre audio logs Integer, DateTime, Blob
Healthcare Patient ID, diagnosis code, prescription dosage MRI scans, doctor notes String, Decimal, Blob
Social Media Username, post date, likes count Image posts, videos, captions String, Integer, Blob, Text
Cybersecurity Login/logout logs, IP addresses Suspicious emails, attached files String, Boolean, Blob

 

Case Studies

Case Study 1: Healthcare – NHS Patient Records

  • Structured: Patient demographic data (NHS number, date of birth, appointment dates).

  • Unstructured: Doctor notes, x-ray images, voice dictations.

  • Interrelationship: Structured records (like appointment schedules) link to unstructured evidence (x-rays stored as BLOBs). The combination provides a holistic medical history.

  • Application: AI systems analyse unstructured scans, while SQL systems schedule appointments. Both need data types (integer IDs, date, blob images).

Case Study 2: Cybersecurity – Threat Detection

  • Structured: Firewall logs (IP addresses, timestamps, action taken).

  • Unstructured: Email attachments, phishing attempts, PDF exploits.

  • Interrelationship: Structured logs identify when and where data entered; unstructured payloads (attachments) must be analysed with ML tools. Data types (IP as string, timestamp as date, file as blob) define how each element is stored and processed.

  • Application: SIEM (Security Information and Event Management) platforms like Splunk combine both data types to detect anomalies.

Case Study 3: Retail – Amazon Recommendations

  • Structured: Order history (user ID, product ID, purchase date).

  • Unstructured: Customer reviews, product images, Q&A responses.

  • Interrelationship: Data types underpin storage (strings for reviews, integers for quantities, blobs for images). Machine learning models merge structured purchase histories with unstructured reviews to improve recommendations.

 

3.4.3 Understand the interrelationships between data type and data transformation.

 

3.4.4 Be able to make judgements about the suitability of using structured data, unstructured data, data types, and data transformations in digital support and security.

 


Files that support this week

English:

Assessment:


Learning Outcomes:
Awarding Organisation Criteria:
Maths:
Stretch and Challenge:
E&D / BV
Homework / Extension:
ILT
  →  →  →  →  →  →
Week 5 T&L Activities:

3.5 Data formats

3.5.1 Know the definition of common data formats and understand their

purpose and when each is used:

• JSON

• Text file

• CSV

• UTF-8

• ASCII

• XML.

 

3.5.2 Understand the interrelationships between data format and data

transformation, and make judgements about the suitability of using

data formats in digital support and security.


Files that support this week

English:

Assessment:


Learning Outcomes:
Awarding Organisation Criteria:
Maths:
Stretch and Challenge:
E&D / BV
Homework / Extension:
ILT
  →  →  →  →  →  →
Week 6 T&L Activities:

3.6 Structures for storing data

3.6.1 Understand the role of metadata in providing descriptions and contexts

for data.

 

3.6.2 Know the definition of file-based and directory-based structures and

understand their purposes and when they are used.

 

3.6.3 Know the definition of hierarchy-based structure and understand its

purpose and when it is used.

 

3.6.4 Understand the interrelationships between storage structures and

data transformation.


Files that support this week

English:

Assessment:


Learning Outcomes:
Awarding Organisation Criteria:
Maths:
Stretch and Challenge:
E&D / BV
Homework / Extension:
ILT
  →  →  →  →  →  →
Week 7 T&L Activities:

3.7 Data dimensions and maintenance

3.7.1 Know the definitions of the six Vs (dimensions) and understand the

six Vs (dimensions) of Big Data and their impact on gathering, storing,

maintaining and processing:

• volume

• variety

• variability

• velocity

• veracity

• value.

3.7.2 Know the definition of Big Data and understand that it has multiple

dimensions.

3.7.3 Understand the impact of each dimension on how data is gathered

and maintained.

3.7.4 Know the definitions of data quality assurance methods and understand

their purpose and when each is used:

• validation

• verification

• reliability

• consistency

• integrity

• redundancy.

3.7.5 Know and understand factors that affect how data is maintained:

• time

• skills

• cost.

3.7.6 Understand the interrelationships between the dimensions of data, quality

assurance methods and factors that impact how data is maintained and

make judgements about the suitability of maintaining, transforming and

quality assuring data in digital support and security.


Files that support this week

English:

Assessment:


Learning Outcomes:
Awarding Organisation Criteria:
Maths:
Stretch and Challenge:
E&D / BV
Homework / Extension:
ILT
  →  →  →  →  →  →
Week 8 T&L Activities:

3.8 Data systems

3.8.1 Know the definition of data wrangling and understand its purpose and

when it is used.

 

3.8.2 Know and understand the purpose of each step of data wrangling:

• structure

• clean

• validate

• enrich

• output.

 

3.8.3 Know and understand the purpose of each core function of a data system:

• input

• search

• save

• integrate

• organise (index)

• output

• feedback loop.

 

3.8.4 Know the types of data entry errors and understand how and why

they occur:

• transcription errors

• transposition errors.

 

3.8.5 Know and understand methods to reduce data entry errors:

• validation of user input

• verification of user input by double entry

• drop-down menus

• pre-filled data entry boxes.

 

3.8.6 Know and understand the factors that impact implementation of

data entry:

• time needed to create the screens

• expertise needed to create screens

• time needed to enter the data.

 

3.8.7 Understand the relationship between factors that impact data entry and

data quality and make judgements about the suitability of methods to

reduce data entry errors in digital support and security.

 

3.8.8 Understand the relationship between factors that impact implementation

of data entry and make judgements about the suitability of implementing

data entry in digital support and security.


Files that support this week

English:

Assessment:


Learning Outcomes:
Awarding Organisation Criteria:
Maths:
Stretch and Challenge:
E&D / BV
Homework / Extension:
ILT
  →  →  →  →  →  →
Week 9 T&L Activities:

3.9 Data visualisation

3.9.1 Know and understand data visualisation formats and when they are used:

• graphs

• charts

• tables

• reports

• dashboards

• infographics.

3.9.2 Know and understand the benefits and drawbacks of data visualisation

formats based on:

• type of data

• intended audience

• brief.


Files that support this week

English:

Assessment:


Learning Outcomes:
Awarding Organisation Criteria:
Maths:
Stretch and Challenge:
E&D / BV
Homework / Extension:
ILT
  →  →  →  →  →  →
Week 10 T&L Activities:

3.10 Data models

3.10.1 Know the types of data models and understand how they organise data

into structures:

• hierarchical

• network

• relational.

 

3.10.2 Know and understand the factors that impact the selection of data model

for organising data:

• efficiency of accessing individual items of data

• efficiency of data storage

• level of complexity in implementation.

 

 

3.10.3 Understand the benefits and drawbacks of different data models and

make judgements about the suitability of data models based on efficiency

and complexity.

 

3.10.4 Be able to draw and represent data models:

• hierarchical models with blocks, arrows and labels

• network models with blocks, arrows and labels

• relational models with tables, rows, columns and labels.


Files that support this week

English:

Assessment:


Learning Outcomes:
Awarding Organisation Criteria:
Maths:
Stretch and Challenge:
E&D / BV
Homework / Extension:
ILT
  →  →  →  →  →  →
Week 11 T&L Activities:

3.11 Data access across platforms

3.11.1 Understand the features, purposes, benefits and drawbacks of accessing

data across platforms:

• permissions

o authorisation

o privileges

o access rights

o rules

• access mechanisms:

o role-based access (RBAC)

o rule-based access control (RuBAC)

o Application Programming Interfaces (API).

 

3.11.2 Know and understand the benefits and drawbacks of methods to access

data across platforms.

 

3.11.3 Understand the interrelationships between data access requirements

and data access methods and make judgements about the suitability

of accessing data in digital support and security.


Files that support this week

English:

Assessment:


Learning Outcomes:
Awarding Organisation Criteria:
Maths:
Stretch and Challenge:
E&D / BV
Homework / Extension:
ILT
  →  →  →  →  →  →
Week 12 T&L Activities:

3.12 Data analysis tools

3.12.1 Know data analysis tools and understand their purpose and when they

are used:

• storing Big Data for analysis:

o data warehouse

o data lake

o data mart

• analysis of data:

o data mining

o reporting

• use of business intelligence gained through analysis:

o financial planning and analysis

o customer relationship management (CRM):

– customer data analytics

– communications.

 

3.12.2 Understand the interrelationships between data analysis tools and the

scale of data


Files that support this week

English:

Assessment:


Learning Outcomes:
Awarding Organisation Criteria:
Maths:
Stretch and Challenge:
E&D / BV
Homework / Extension:
ILT
  →  →  →  →  →  →