Welcome to this Full-Stack presentation!

Feel free to browse through projects via the blue links provided below. Enjoy this digital experience of problem solving and critical thinking.

Let's get started!

Part 1:

Go to Project 1 - Data Engineering

Go to Project 2 - Data Science

Go to Project 3 - Analytics and Visual Ouput

Part 2:

Go to Project 4 - Cloud Computing

Go to Project 5 - Unit Testing

Go to Project 6 - CyberSecurity

Go to Project 7 - GDPR Offuscator Draft

Go to Project 8 - Refactoring Machine Learning processes

Deployment and presentation

The projects presented here are all interconnected. The task here is to build an automated data management system that will feed a Deep Learning Neural Network for predictions. Further illustrations will be presented along to provide further references.

You can visit the Git repository for this portfolio website via the following link: Visit the Git repository

Project 1 - Data Engineering: Building an ETL ingestion Pipeline

Description

In this section an automated ETL (Extract / Transform / Load) pipeline is being setup to handle data from a remote database. The pipeline output is an ingestion data lake connected a local data warehouse where processed data are stored in a STAR schema (relational databases) for further processing in the Data Science project that will follow.

Relational databases can created with PostgreSQL which can be intalled with the following command on Linux:

pip install psql

A virtual environment is then created for testing purposes to ensure that the software is robust and always working. For testing the library Pytest is being used while a MakeFile ensures automation during deployment.

A virtual environment can be created with the following commands:

pip install pyenv

python -m venv venv

Loading the virtual environment:

source venv/bin/activate

Exporting the python path to the environment:

export PYTHONPATH=$PWD

Pytest can be installed in the virtual environment with the following command:

pip install pytest

A requirement file is ideal in this case to store all the list of packages required for the project and to be installed all at once:

pip freeze > requirements.txt

pip install -r requirements.txt

Autopep8 is added to the list for PEP-8 compliance.

Safety and Bandit check for any issues such as SQL injections.

Coverage is a package that checks how much of the code is being tested.

The entire automation procedure is then deployed remotely, essentially by running a Makefile in Bash script.

The dataset

The example dataset is accessed via the following link:

\link>

\img>

Those data are static for now but ideally the management system should eventually be able to handle new sets of data dynamically using a cloud computing platform such as Amazon Web Services (AWS) and Terraform. You can view a version of a pipeline previously built with AWS and AWS Lambda

The data consist of a training set and a testing set each detailling screen dots with x and y coordinates from handwritten data from a touch-screen device. Validation data are also included in this case which means it is possible to assess whether the model's predictions were true or not.

MVP: Most Valuable Product

The bot must accurately ingest the data and carry timestamp processing tasks, then store the procesed data in a data warehouse for the purpose of the Data Science project, all in order to translate the hand written letters.

The Code: Python 3 and Pandas libraries

Pandas libraries provide robust packages that enable processors to handle pkl formats and Json protocols efficiently. The code for ingestion is pretty straightforward and looks as below:

          
          #import Pandas libraries
          import pandas as pd

          # Loading the training dataset
          # An util folder may ideally be created later for reused functions
          def load_data():
              f = gzip.open('mnist.pkl.gz', 'rb')
              f.seek(0)
              training_data, validation_data, test_data = pickle.load(f, encoding='latin1')
              f.close()
              return (training_data, validation_data, test_data)

          training_data, validation_data, test_data = load_data()

          training_data

          # Printing datasets details
          print("The feature dataset is:" + str(training_data[0]))
          print("The target dataset is:" + str(training_data[1]))
          print("The number of examples in the training dataset is:" + str(len(training_data[0])))
          print("The number of points in a single input is:" + str(len(training_data[0][1])))

The Data Warehouse

The data warehouse is designed in three stages:

- Concept: Storing loaded training and test datasets in a relational database in a STAR schema. The data warehouse will contain two tables: the training table and the test table.

- Logic: Primary keys and timestamps are assigned to data points from the data sets to allow dynanamic data tracking and time series analysis.

- Physic: PostgreSQL will be used to create the database and relational tables via the python script. The tables will be linked by the timestamps attribute in the Entity Relationship Diagram.

Entity Relationship Diagram (ERD)

The local database and tables can be created using the Psycopg2 library. A dedicated warehouse_handler.py facilitate debugging while carrying this task.

Let's create our data warehouse with psql using Psycopg2:

        
          import psycopg2
          from psycopg2 import sql

          try:
              # Connecting to the default 'postgres' database
              connection = psycopg2.connect(
                  dbname="postgres",
                  user="hgrv",
                  password="hgrv_pass",
                  host="localhost",
                  port="5432"
              )
              connection.autocommit = True  # Enable autocommit for database creation

              # Creating a cursor object for SQL script
              cursor = connection.cursor()

              # Checking whether the database already exists
              cursor.execute(
                  sql.SQL("SELECT 1 FROM pg_database WHERE datname = %s"),
                  [data_warehouse]
              )
              exists = cursor.fetchone()

              # Creating the table for the first instance
              if not exists:
                  # Defining a name for the database for the first instance
                  database_name = "data_warehouse"

                  # Executing a SQL command to create the database if it does not already exist (first instance only)
                  cursor.execute(sql.SQL("CREATE DATABASE {}").format(sql.Identifier(database_name)))
                  print(f"Database '{data_warehouse}' created successfully!")
              else:
                  print(f"Database '{data_warehouse}' already exists.")
          except Exception as e:
              print(f"Error: {e}")
              print(f"Check the debugging console for any error in the SQL script.")

Note: The same result is achievable with Pg8000 (ideal for low-load applications without C-denpendencies). You can check the specific differences between Pg8000 and psycopg2 on the following link: https://www.geeksforgeeks.org/python/difference-between-psycopg2-and-pg8000-in-python/

Let's add the tables that will store the following corresponding attributes: 'id' (primary key), 'x-coordinates' and 'y-coordinates'. The connection and cursor

          
            try:
                # SQL query to create a table
                create_table_query = """
                CREATE TABLE IF NOT EXISTS test_data (
                    id SERIAL PRIMARY KEY,
                    x_coordinates NUMERIC(10, 2),
                    y_coordinates NUMERIC(10, 2),
                    timestamp DATE DEFAULT CURRENT_DATE
                );
                CREATE TABLE IF NOT EXISTS training_data (
                    id SERIAL PRIMARY KEY,
                    x_coordinates NUMERIC(10, 2),
                    y_coordinates NUMERIC(10, 2),
                    timestamp DATE
                );
                """

                # Executing the query
                cursor.execute(create_table_query)
                connection.commit()  # Save changes to the database
                print("Table 'employees' created successfully!")

            except Exception as error:
                print(f"Error occurred: {error}")
                print(f"Check for any error in the SQL script.")

At this point, the data warehouse can be updated with the current load of data sets:

          
            try:
                # Data to insert
                # training_data_to_insert = (train_set_x, train_set_y, timestamp_current)
                # test_data_to_insert = (test_set_x, test_set_y, timestamp_current)

                # Inserting x-coordinates, and timestamp into table 'training_data'
                for item in train_set_x:
                    # Data to insert
                    training_data_to_insert = item

                    # SQL query to insert data into tables
                    insert_query = """
                    INSERT INTO training_data (x_coordinates, timestamp)
                    VALUES (train_set_x, timestamp_current)
                    """

                    # Execute the query
                    cursor.execute(insert_query, training_data_to_insert)

                    # Commit the SQL script
                    connection.commit()

                # Inserting y-coordinates into table 'training_data'
                for item in train_set_y:
                    # Data to insert
                    training_data_to_insert = item

                    # SQL query to insert data into tables
                    insert_query = """
                    INSERT INTO training_data (y_coordinates)
                    VALUES (item, timestamp_current)
                    """

                    # Execute the query
                    cursor.execute(insert_query, training_data_to_insert)

                    # Commit the SQL script
                    connection.commit()

                # Inserting x-coordinates, and timestamp into table 'test_data'
                for item in test_set_x:
                    # Data to insert
                    test_data_to_insert = item

                    # SQL query to insert data into tables
                    insert_query = """
                    INSERT INTO test_data (x_coordinates, timestamp)
                    VALUES (item, timestamp_current)
                    """

                    # Execute the query
                    cursor.execute(insert_query, test_data_to_insert)

                    # Commit the SQL script
                    connection.commit()

                # Inserting y-coordinates into table 'test_data'
                for item in train_set_y:
                    # Data to insert
                    test_data_to_insert = item

                    # SQL query to insert data into tables
                    insert_query = """
                    INSERT INTO test_data (y_coordinates)
                    VALUES (item, timestamp_current)
                    """

                    # Execute the query
                    cursor.execute(insert_query, test_data_to_insert)

                    # Commit the SQL script
                    connection.commit()

                print("Data inserted successfully into {data_warehouse}!")
            except Exception as error:
                print(f"Error occurred: {error}")
                print(f"Check for any error in the SQL script.")
            finally:
                # Closing the cursor and connection
                cursor.close()
                connection.close()
            print("PostgreSQL connection is now closed.")

These 'try-except-finally' clauses are then put together in a function update_data_base() invoqued in the neural_layers.py file. We must also ensure that there are no duplicates due to the random pixel effect of handwriting on a touchscreen or touchpad.

In later projects when setting up the enpoint of an API, those SQL commands must always ensure that no injection occurs during queries.

Setting up an API (Application Programming Interface)

To make the designed platform more interactive, engaging and intersting, an API can be built to request the website user to suggest optimisation parameters before running the projects scripts, for the purpose of the following Data Science project:

        
          import requests

          def get_suggestion(number):
              url = "http://numbersapi.com/{}".format(number)

              r = requests.get(url)
              if r.status_code == 200:
                  print(r.text)
              else:
                  print("An error occurred, code={}".format(r.status_code))

The code can be refactored later in the project in order to intercept errors with user friendly messages and hints for conflicts resolution. This API can then be ran in the corresponding app or module as follows:

      
        import api

        api.get_suggestion(input("Enter a number to suggest layers density (Ex: 35): "))

APIs essentially act as simplified communication bridges between various software components and devices.

Automation and Orchestration

The orchestration of dependencies is then updated in the Makefile to reflect requirements changes and refresh the virtual environment for processing purposes, deep learning and automation:



        # Define the variables for the Python interpreter

        VENV = venv
        PYTHON = $(VENV)/bin/python3
        PIP = $(VENV)/bin/pip

        # Declare phony target with no prerequisites

        .PHONY: run clean test pythonpath

        # Creating first rule to run the app in the virtual environment:

        run: venv/bin/activate
          $(PYTHON) src/app.py


        # Run the neural network in the virtual environment with the app as prerequisite

        run_neural_network: venv/bin/activate, run
          $(PYTHON) src/neural_layers.py

        # Creating virtual environment with updated dependencies

        venv/bin/activate: requirements.txt
          python3 -m venv venv

          $(PIP) install -r requirements.txt

        # Setting the python path to the current working directory

        pythonpath: venv/bin/activate
          export PYTHONPATH=$PWD

        # Run tests

        test: venv/bin/activate
          $(PYTHON) -m unittest discover -s tests

        # Clean up .pyc files and refresh

        clean:
          find . -name "*.pyc" -delete
          rm -rf __pycache__
          rm -rf venv

Once all the handling modules have been set, a backend neural network for deep learning can be added at the end of the pipeline to generate relevant visual outputs and predictive metrics.

Project 2- Data Science: Building an optimised Neural Network for training and predictions from datasets

In this section a Neural Network is being added with optimised hyper-parameters using the Python API of Tensorflow. Tensorflow is a very popular tool used to build and deploy Neural Network. Hence for the purpose of this project, Tensorflow will be the computing framework for batch processing while compiling will be handled by importing the Keras library.

One great advantage of using Tensorflow is that one does not need to write code for the feedforward and back-propagation, but only to set the hyper-parameters such as density of spare weights matrix or the number of neurons and layers as well as the activation function for the last layer.

The chart below illustrates the multilinear neural network for one data point:

Cumulative input = wT*x + b = w1*x1 + w2*x2 + wk*xk + b

The chart below illustrates the batch processing matrix for a given batch size after feedforward and back-propagation for weights calibration:

Chart:

There are generally 6 main steps needed in order to implement a neural network model in Keras:

Step 1: Loading the data and creating timestamps


              """
                The loading has already been implemented in Project 1 as an ingestion pipeline with Pandas libraries.

                Let's process the data with a wrapper and one-hot encoding
                 (handling database columns in arrays) and assign the timestamp reference.
              """

              # Importing the Python module Datetime to handle timestamping tasks at the end of each loading phase

              from datetime import datetime


              # Setting the one_hot encoding for the target value

              def one_hot(j):
                  """
                  -   input is the target dataset of shape (1, m) where m is the number of data points
                  -   returns a 2 dimensional array of shape (10, m) where each target value is converted to a one hot encoding
                  -   Look at the next block of code for a better understanding of one hot encoding
                  """
                  n = j.shape[0]
                  new_array = np.zeros((10, n))
                  index = 0
                  for res in j:
                      new_array[res][index] = 1.0
                      index = index + 1
                  return new_array

              data = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
              one_hot(data)


              # Defining the data_wrapper function

              def data_wrapper():
                  tr_d, va_d, te_d = load_data()

                  training_inputs = np.array(tr_d[0][:]).T
                  training_results = np.array(tr_d[1][:])
                  train_set_y = one_hot(training_results)

                  validation_inputs = np.array(va_d[0][:]).T
                  validation_results = np.array(va_d[1][:])
                  validation_set_y = one_hot(validation_results)

                  test_inputs = np.array(te_d[0][:]).T
                  test_results = np.array(te_d[1][:])
                  test_set_y = one_hot(test_results)

                  return (training_inputs, train_set_y, validation_inputs, validation_set_y)


              # Calling the data_wrapper() function and assigning the output to local variables

              train_set_x, train_set_y, test_set_x, test_set_y = data_wrapper()


              # Transposing the sets to ensure that the data-Matrix is in the correct shape
              train_set_x = train_set_x.T
              train_set_y = train_set_y.T
              test_set_x = test_set_x.T
              test_set_y = test_set_y.T


              # Checking that the sets are in the desired shape

              print ("train_set_x shape: " + str(train_set_x.shape))
              print ("train_set_y shape: " + str(train_set_y.shape))
              print ("test_set_x shape: " + str(test_set_x.shape))
              print ("test_set_y shape: " + str(test_set_y.shape))

              # Getting current timestamp
              timestamp_current = datetime.now().strftime("%d%m%Y %H%M%S")
              print('Latest database timestamp: {timestamp_current}')
              """
                      Timestamps details:

                  %d: Day of the month (01-31).

                  %m: Month (01-12).

                  %Y: Year in four digits.

                  %H: Hour in 24-hour format (00-23).

                  %M: Minutes (00-59).

                  %S: Seconds (00-59).

              """

Notice that before batch processing, data are usually pre-processed or cleaned beforehand to fit the dimensions of a processing matrix (In this case dimension (n, m) for n data points and m layers for the overall feedforward and back-propagation)

It is also common practice to timestamp the data to ensure data integrity and generate timeseries for analysis, for instance in cases where seasonality would seem to be a strong empiric factor on the predictions.

Step 2: Defining the model


              # Creating first instance of sequential neural model and adding density (layers, activation function, regulariser)

              # First instance without layers
              nn_model = Sequential()

              # Adding a Dropout for the first layer at 0.3% of neurons in Bayesian dropout for each iteration in order to generate a sparse weight matrix after cumulative input
              nn_model.add(Dropout(0.3))

              # Initialising first hidden layer with 35 neurons, 28x28 = 784 components in the input vectors and 'relu' activation function
              # nn_model.add(Dense(35, input_dim=784, activation='relu'))
              # refactored to request density level from user
              nn_model.add(Dense(density_level, input_dim=784, activation='relu'))

              # Regularising the interconnected neural network
              nn_model.add(Dense(21, activation = 'relu', kernel_regularizer = regularizers.l2(0.01)))

              # Setting a last softmax layer with 10 classes
              nn_model.add(Dense(10, activation='softmax'))

Step 3: Compiling


              # Compiling and optimising the predictive model with the crossentropic loss function

              nn_model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

Step 4: Fitting the model


              # Fitting the model with a minibatch of size 10 and 10 epochs

              nn_model.fit(train_set_x, train_set_y, epochs=10, batch_size=10)

Step 5: Evaluation


              # Evaluating the model's scores and printing the accuracy in the training dataset

              scores_train = nn_model.evaluate(train_set_x, train_set_y)
              print("\n%s: %.2f%%" % (nn_model.metrics_names[1], scores_train[1]*100))

Step 6: Predictions


              # Setting the predictions on the test dataset

              predictions = nn_model.predict(test_set_x)
              predictions = np.argmax(predictions, axis = 1)
              predictions

Note the great advantage of Keras' default use of Tensorfow as backend ("backend":"tensorflow") in the 'keras.json' file.

Note also that batch normalisation in Keras libraries makes efficient use of empirically proven methods too: Here for instance, a specific minimal constant (epsilon) is being added to the Variance to ensure that the standard deviation is never nil during normalisation. Batch normalisation in Keras is being implemented with the following code where the 'axis= -1' specifies a normalisation across rows:


              nn_model.add(BatchNormalization(axis=-1, epsilon=0.001,
                          beta_initializer='zeros',
                          gamma_initializer='ones'))

Project 3- Data Analytics: Visualising datasets models and assessing accurracy

In this section new data stored in the data warehouse are used to train the neural network continually and produce a visual chart for further monitoring of the data and the model's apparent behaviour.

In Keras, the imported Numpy and Matplotlib modules are working dynamically using the Tensorflow backend. A visual outut of the data may be implemented with Keras as follows:


              # Visualising the dataset by index to check correct labelling

              index  = 1000
              k = train_set_x[index,:]
              k = k.reshape((28, 28))
              plt.title('Label is {label}'.format(label= training_data[1][index]))
              plt.imshow(k, cmap='gray')


              # Visualising different test cases for assessment against validation data

              index  = 9997
              k = test_set_x[index, :]
              k = k.reshape((28, 28))
              plt.title('Label is {label}'.format(label=(predictions[index], np.argmax(test_set_y, axis = 1)[index])))
              plt.imshow(k, cmap='gray')

The model is eventually being assessed in this data lake to determine its current accuracy metrics for the training dataset and the test dataset:


              # Evaluating the model's scores and printing the accuracy in the training dataset

              scores_train = nn_model.evaluate(train_set_x, train_set_y)
              print("\n%s: %.2f%%" % (nn_model.metrics_names[1], scores_train[1]*100))

              # Setting the predictions on the test dataset

              predictions = nn_model.predict(test_set_x)
              predictions = np.argmax(predictions, axis = 1)
              predictions

              # Setting scores and printing the accuracy in the test dataset

              scores_test = nn_model.evaluate(test_set_x, test_set_y)
              print("\n%s: %.2f%%" % (nn_model.metrics_names[1], scores_test[1]*100))

Relevant local variables and interactive data can also be visualised dynamically with the use of JavaScript to display variables from the Python environment. In terms of added functionalities however, other Software Engineering frameworks, such as React (JavaScript librairy) or node.js (JavaScript runtime environment) for example, can also provide highly efficient ways to build similar frontend projects.

In this regard, it is worth noting the power of Python 3 and JavaScript alone during automation and interactive displays as JavaScript can also effectively be used to run the backend python file, while handling frontend tasks with high efficiency.

In order to illustrate this concept, a request from is added to the html file for this present wepage, and displayed as below prompting the user to suggest a density level between 10 and 100 with or a default value of 35:

              
              <form action="/process.php method="POST" class="" ">

                  <label for="density" class="">
                      <p class="">
                        Enter a density level for your
                        installed neural network:
                      </p>
                  </label><br>

                  <input class="" type="text" id="density" name="density" value="35"><br>

                    <input class="" type="submit" value="Submit" disabled>

              </form>

Here below is the result after adding that script to the html file for this present webpage at the desired display point:

Notice however that classes are also added later to hml elements to enable control of size, police, colours and display from the CSS script (Updated code in the repository). Let's deploy our designed frontend point:

(Above: Simulation of a display point on the user interface with input-box enabled and 'Submit' button currently disabled.)

Now our frontend point is almost good to go (with the 'Submit' button still desactivated), and a PHP file can specify the actions for the target buttons on the server side (Note that this can also be achieved effectively with Node.js, Python Flask, Django, React or Javascript which are all flexible options; PHP is somehow useful as it precedes and overtakes Html protocols for faster displays).

Html actions can be set by updating the form parameters with ' action="process.php" method="post" ', while using javascript to specify the action in the process.php file:


          <?php

              // Using a JavaScript if-statement to trigger the action

              if ($_SERVER['REQUEST_METHOD'] === 'POST' && isset($_POST['form'])) {
                  // logic of the action after the setting has been submitted by the user
                  echo "Density level submitted successfully! JavaScript executed via PHP script.";
                  echo "Get set and ready for the next phase of this presention!";
              }

          ? >

This process is using JavaScript to trigger the action written in the PHP file. For this purpose, the setting ' onclick="callPHP()" ' is added as a property of the button element, and the html file can be updated with the following javascript:


              <script>
                function callPHP() {
                    fetch('process.php', { method: 'POST' })
                        .then(response => response.text())
                        .then(data => alert(data));
                }
              </script>

Let's deploy our resulting user interface on the display point below (with the 'Submit' button enabled this time!) and also update the last pipelines connections to test that the actions are being triggered on submission:

('Submit' button enabled after implementing last connections for testing and debugging purposes.)

Keeping in mind that the variable density_level was set when buiding the API, a visual rendering can be designed from the front end. Python makes this possible by implementing an update_density module in the API as you can see in the commented code below:


            from flask import Flask, request

            app = Flask(__name__)

            # Decorating a submit_density() function with the form's action and method

            @app.route('/submit', methods=['POST'])
            def update_density():
                density = request.form['density']  # Accessing form density from user input
                return density

            #  Enabling detailed error tracebacks during development on main module.

            if __name__ == '__main__':
                app.run(debug=True)

(The simple Flask App script above creates an updating unit module ready for import within the API.)

For presentation purposes, let's use Data Engineering skills to also deploy a 'Start' button using the Javascript fecth() module to interact indirectly with the Make's API via a webhook in order to promt the website user to start a demonstration of the project in trial version. The Make's API allows to trigger workflows in JavaScript by using automated HTTP requests:

< class="greybox">


              # The webhook url is set up in Make by creating an account and creating a scenario
              # that is triggered in Javascript as below:

              const triggerWebhook = async () => {
                const response = await fetch('https://hook.make.com/PROJECTS_WEBHOOK_URL', {
                  method: 'POST',
                  headers: { 'Content-Type': 'application/json' },
                  body: JSON.stringify({ key: 'value' }),
                });

                if (response.ok) {
                  console.log('Webhook triggered successfully!');
                } else {
                  console.error('Failed to trigger webhook:', await response.text());
                }
              };

              triggerWebhook();

Note that The same result is also achievable with Node.js which is also a very popular option for frontend application and software development projects. Typically in such Python Web environments data are commonly extracted from parsed data handled by servers with POST and Get Requests.

Let's deploy the 'Start' button:

You can try to visualise the output of the whole pipeline yourself with our last settings-box deployed above by clicking the 'Submit' button, which will also trigger the Makefile as well as related backend modules in the virtual python environment. Once the project is fully installed, visual outputs of data should be generated on the frontend user interface by Matplotlib using the Tensorflow framework in Keras.

You should however note that our Javascript handlers are also embedded within the html file to ensure that the modules will still be ran in browsers unable to handle PHP requests from the server side.

We must also update our Javascript to trigger the update_density() function from our Flask App, so that the 'Start' button can be ready to run the updated neural network.

- To be updated -

Project 4 - Cloud Computing

Terraform scripts

Serverless applications with AWS Lambda

Scalabilty and tags for higher efficiency

Project 5 - Unit testing and PEP8 compliance

Testing with Pytest and Moto

PEP8 testing modules

Project 6 - Addressing CyberSecurity and Permissions

- To be updated -

Project 7 - Draft for a GDPR Obfuscator project

Building a Data Obfuscator under UK GDPR 2018 (General Data Protection Regulation, United Kindom)

The legal framework

Under UK domestic law, the General Data Protection Regulation or UK GDPR is a comprehensive data protection law that came into effect on 25 May 2018, alongside an revised version of the Data Protection Act 2018.

According to www.gov.uk/data-protection, data protection in the UK is mainly governed by the UK GDPR and the Data Protection Act of 2018. UK data protection principles specifically restricts how personal information is used by organizations in order to ensure data protection and privacy for individuals (data subjects).

These ‘data protection principles’ are strict rules requiring that people or entities responsible for using personal data must ensure (unless an exemption applies) the following legal requirements are met:

- Fair, lawful and transparent use of data

- Use of information for specific purposes

- Adequate and relevant use of information limited to only necessary data

- Accuracy and update of information

- Data Storage no longer than is necessary

- Appropriate data integrity and security, including protection against unlawful or unauthorised processing, access, loss, destruction or damage

In addition to those legal requirements, there is also a strong emphasis on the processing of more sensitive data (Ex: race, ethnicity, religion, biometric id, health, background checks etc.) as data subjects carry fundamental rights such as the right to transparency and access to information, right to rectification/erasure, objections and also restrictions on automated decision-making. Therefore personal data must also be handled in accordance with these principles.

A Descriptive Overview of the Project

The GDPR framework described previously actually constitutes the rationale for building and Obfuscator to secure sensitive data. In this project we will precisely follow those personal data guidelines as stated in the UK legislation at:

https://www.legislation.gov.uk/eur/2016/679/contents

The Principles relating to processing of personal data (Article 5 to Article 11A)

Building our Data Obfuscator under GDRP regulation and guidelines ensures that data processing platforms can have a viable option at hand providing up to date and secured solutions.

MVP (Most Valuable Product)

Building a fully tested and automated GDPR Obfuscator for deployment on AWS and third-party software integration.

Framework and Purpose

Linux CLI

Python 3

Terraform

Pytest

Boto3

Moto

Make

How to use

- System Requirements

- Installation

- AWS Deployment

- Integration

- Note on Terraform deployment and Passkeys

- Note on Unit-Testing and Mock-Tests

Documentation

- Obfuscation principles

- Moto, Boto3

- Terraform documentation

- Amazon Web Services (AWS)

Legal Notice

- To be updated soon -

Project 8 - Refactoring Machine Learning processes with a visual output

Empirically proven methods for hyper-parameters optimisation and initialisation

Visualising Neural Networks with JavaScript librairies and Node.js

- Back to Top of Page -