- If your AI-generated code becomes faulty, who faces the most liability exposure?
- These discoutned earbuds deliver audio so high quality, you'll forget they're mid-range
- This Galaxy Watch is one of my top smartwatches for 2024 and it's received a huge discount
- One of my favorite Android smartwatches isn't from Google or OnePlus (and it's on sale)
- The Urgent Need for Data Minimization Standards
Data science and data privacy work hand-in-hand to improve the world
This blog comes to us from Amber Yandow, a data science subject matter expert, coach, mentor, and instructor, who has worked with Cisco to develop Introduction to Data Science curriculum, now available for free on Skills for All from Cisco Networking Academy. You can learn more about Amber’s journey on the Women Rock-IT replay of “What’s So Exciting About Data Science?” . Amber joins at the 23:14 mark.
As a data scientist, my job is to analyze data to help solve problems. And as the world digitizes—with not only mobile devices, but sensors and other IoT (Internet of Things) technologies like wearables—the quantity of data available is growing exponentially.
International Data Corporation forecasts that by 2025 the global datasphere will be 163 zettabytes, up from 16.1 zettabytes in 2016, and that the average connected person will interact with connected devices nearly 4,800 times a day.
With that quantity of data being generated and collected, data privacy is increasingly important. Data Privacy Day, which was instituted by The Council of Europe in 2007 as Data Protection Day, January 28th each year, is as good a day as any to contemplate what this means.
The Value of Data
Data is generally collected in one of three ways:
- Observations—Scientists, analysts and even marketers observe customer behavior and record it
- Inferences—Data can be inferred based on a users’ search history, purchases or social media activity
- Volunteered—People provide data to organizations through surveys and forms
Once data has been collated it can be used to solve problems and answer questions. Data science relies on data that relates to the problem you are trying to solve or the question you are trying to answer, so personally identifiable information (PII) is not necessary in many instances. The important thing is that the data is representative of the problem you are solving. It is imperative for a data scientist to be able to recognize when to exclude data, for instance, to avoid errors or biases in artificial intelligence and machine learning environments.
Why Data Privacy is so Important
Data is typically stored on local servers or on the cloud. It’s a company’s ethical and legal responsibility to take care of the privacy aspect. Many times that responsibility will fall under the data engineer or database administrators’ purview.
Anonymizing data by removing or encrypting direct identifiers to individuals, such as a person’s full name, address, email, personal identification number, physical description, or biometric info—the PII—and preventing the ability to reidentify them, is one way of ensuring data privacy.
McKinsey argues that effective regulation of data anonymization is actually an opportunity, by reducing the risks to individuals and organizations, while making data more available for analysis.
Data protection laws vary from country to country, but there are common practices such as: Having a data loss prevention and data discovery strategy; frequent backups; built-in protections such as replication, firewalls, encryption, authorization and authentication; and erasure and recovery strategies.
The European Union’s General Data Protection Regulation (GDPR) arguably has the widest reach. The EU Charter of Fundamental Rights stipulates that EU citizens have the right to protection of their personal data, and under the GDPR 1,031 fines were issued in the year to March 2022, totaling €1.581 billion.
It should be noted that these fines were not levied against cyber criminals, but well-known corporations for breaches of the rules such as insufficient legal basis for data processing; non-compliance with general data processing principles; and insufficient technical and organizational measures to ensure information security.
As an individual online, what can you do?
There are many things you can do to secure your data. The most basic things are to:
- Use strong passwords, at least 11 characters long with a mixture of upper- and lower-case letters, symbols and numbers—it would take a cyber-criminal at least 400 years to crack your password that met these conditions. Longer passwords make it even harder.
- Back up your data
- Don’t open suspicious emails
- Never provide personal data like a government ID number over the phone
Josh McCloud, Cisco’s National Cybersecurity Officer in Singapore has some great cybersecurity tips online. Or you could explore the subject in greater depth by enrolling in Cisco Networking Academy’s free Introduction to Cybersecurity course, designed to make cybersecurity awareness available for all.
If, like me, you are curious about the world around you and have an interest in problem-solving, all the data being collected represents an enormous opportunity to improve communities and organizations in every corner of the globe.
What’s so great about data science?
Introduction to Data Science is a primer course from Cisco Networking Academy that myself and a team of learning scientists developed to allow anyone to get their feet wet in the data science field. You can learn about data science at a high level and in an intuitive and interactive way for free on our ‘mobile-first’ Skills for All learning platform.
Data science affects our lives in numerous ways:
- In the entertainment industry, data science is responsible for classification algorithms that help viewers find videos they like. Based on their profile, including what videos they’ve watched, and what other customers with similar tastes have watched, the algorithms serve up recommendations.
- The fitness app on your smartphone, or fitness tracker, collects data fed into an application that can provide you with valuable health information. To calculate how many steps you take during a day or the distance you walk, these apps must build a model of your movements to identify what constitutes taking a step and the distance you cover with each one. Some fitness trackers are even using self-learning artificial intelligence (AI) software that can recognize and adapt to a wide variety of movements and is able to learn new fitness activities that are based on repetitive, cyclical patterns.
- In agriculture, farmers use cellphones to provide researchers with images of plant diseases. These images are used in image recognition systems to diagnose the diseases, and combined with environmental data regression, algorithms are then used to predict future outbreaks.
- And in medicine, researchers have developed a machine learning model that uses probability to classify breast cancer by examining medical histopathology images. This approach may eventually be capable of detecting cancer subtypes and classifying benign and malignant tissue.
Data science is a powerful tool for good, and these are just a few examples of its application. On the surface, the cost of data privacy may appear to be an impediment to the potential advances that data science can bring. However, data privacy gives data scientists social license to use that tool responsibly. Everybody wins.
Share: