AI/ML Projects


Final Year Project: Generation of realistic 2D scenes by Text-to-2D Models

Description of the image

Click to view the Research Report

This project aims to improve a text-to-image model called AttnGAN in terms of textual understanding and training efficiency.

  • Developed a novel architecture called Trans_AttnGAN. Employed a pre-trained BERT model as the text encoder for generating more contextually accurate sentences and word embeddings. Designed a novel Soft Alignment Loss, leveraging a pre-trained image captioning model BLIP followed by a BERT to generate fine-grained guidance in sentence and word level.
  • Conducted intensive experiments in fine-tuning and component analysis Trans_AttnGAN.
  • Verified that Trans_AttnGAN achieved comparable performance to AttnGAN with roughly half the total training time on the CUB-200 dataset


Script2Video-SD: Personalized Short Video Generation Model with Stable Diffusion (In Progress)

Click to see the proposal

This model is designed to generate high-quality and personalized short videos from text scripts, addressing the time-consuming nature of video creation. The three major challenges and the corresponding innovative solution are:

  1. Adapting stable diffusion, a picture generator, for video generation.
    • Solution: Creat a "thick picture" by concatenating sampled frames along the channel dimension for the stable diffusion model to take in. This results in a representation where each pixel is represented by a 3*N-dimensional array, encoding the RGB values of that pixel for the N frames. This thick picture allows for efficient processing and captures the temporal dependencies among consecutive frames.
  2. Adapting stable diffusion, a picture generator, for audio generation.
    • Solution: Transform audio signals into spectrogram graphs when feeding into the model for training. Utilize a vocoder or waveform synthesizer to convert the generated spectrogram graphs back into continuous audio clips after training.
  3. Synchronization of the generated video and audio clips
    • Solution: A reinforcement learning model will be employed, guided by reward functions, to iteratively improve and refine the alignment of video and audio, considering their semantic and temporal relationship.


Music Genre Classifier Paper

Click to view the paper

Source codes available

Train and fine-tune 9 different machine learning models for music genre classification in Python, which are SVM, Logistic Regression, KNN, Naive Bayes, QDA, Random Forest, MLP, CatBoost, XGBoost. Evaluate their performance and explain the observations obtained.

  • Group leader
  • Use the FMA music dataset containing metadata, features, and genres for over 100,000 tracks
  • Preprocess data by cleaning, converting strings to numbers using word2vec, filling missing values, and normalizing
  • Feature selection with Chi-square and Dimensionality reduction with PCA
  • 5 fold cross-validation to fine tune the hyperparameters of the 9 models
  • Evaluate the best 9 models on test set using accuracy and macro F1 score


Movie Review Sentiment Classifier

Click to view the report

Source codes available

The goal of this project was to build a RNN-based model in Pytorch for sentiment classification on the movie reviews.

  • Preprocess dataset "Large Movie Review Dataset", including tokenization using SpaCy, construction of the one-hot vocabulary and using BucketIterator for batch process.
  • Initial architecture: Embedding layer + RNN + Linear layer, however, though fine-tuned, the performance is unsatified.
  • Modified architeture: Embedding layer + Bi-directional 2-layer LSTM + Linear layer. After fine-tuning, the classification accuracy improved to about 70%.


CNNs for Handwritten Digit Classification

Click to view the report

Source codes available

The goal of this project was to build and modify two CNN models using Keras and PyTorch to classify handwritten digits.

  • Implemented baseline CNN models on MNIST dataset for digit classification
  • Tuned model architectures by varying number of layers, kernel sizes, and nodes
  • Analyzed the CNNs' architecture and the effects of changes on model accuracy to gain insights into CNN optimization


ResNet and Gradient Vanishing

Click to view the report

Source codes available

This project investigated the gradient vanishing problem and ResNet in Tenserflow and Keras.

  • Demonstrated the gradient vanishing issue in a feedforward network with tanh activation and solved the problem using ResNet.
  • Identified the maximum layers for ffnet (21 layers) and ResNet (63 layers) with tanh activation before gradient vanishing emerges.
  • Showed that switching to ReLU in ffnet and ResNet further increases their resilience to gradient vanishing (22 layers for ffnet, 81 for ResNet).


Analysis of MLP and CNN in Pattern Recognition

Click to view the report

This project analyzed the performance and parameter settings of MLP and CNN on MNIST and CIFAR 10 dataset in Matlab.

  • Experimented with various MLP structures and parameters on the MNIST dataset.
  • Implemented CNNs on MNIST and CIFAR 10 datasets, adjusting channels and pooling methods.
  • Fine-tuned to get the best CNN on MNIST which gained 0.968 and 0.9 for traning and testing accuracy, respectively.


A Model of Residual Sugar Content and Volatile Acidity Content in Wine Using MLE

Click to view the paper

Apply maximum likelihood estimation to model the relationship between residual sugar content and volatile acidity content in wine in Python.

  • Based on the chemical equation: C6H12O6+2O2 = 2CH3COOH + 2CO2 +2H2O, the hypothesis that there is a linear relationship between volatile acidity and residual sugar is made, i.e., residual sugar (y) and volatile acidity (x) can be represented by a linear function y = a + bx, where a and b are coefficients
  • According to MLE, mathematically derive the expressions for the parameters a and b.
  • Use polyfit function to calculate the coefficients, visualize the results and find out linear relationship is not efficient in describing the relationship between residual sugar and volatile acidity.
  • Finally, extend the experiment to quadratic and cubic, compare the performance of the three models and find out the cubic model performs the best.


A Comparative Analysis of Machine Learning Techniques for Phishing URL Detection

Click to view the paper

Research and compare the two models for phishing URL detection, namely Phishing Websites Classification using Hybrid SVM and PDRCNN: Precise Phishing Detection with Recurrent Convolutional Neural Networks.

  • Group leader
  • Compare the two models in terms of their dataset, feature extraction process, model architecture, and performance.
  • According to the experiment, both models showed good performance, but PDRCNN was more accurate on a larger real-world dataset with 97% accuracy compared to SVM with 95.8%.
  • Based on the performance, propose potential improvements to the two models. For example, for the SVM model, SVM + KNN ensembling could be applied to improve efficiency and accuracy, while for the PDRCNN, we can incorporate additional features like URL-based, page-based, and content-based features.


Gradient Descent based Time Series Prediction Algorithm

Click to see the technical report

Source codes available

This project aims to use gradient descent to predict a data series in Python.

  • Implement gradient descent to optimize prediction model parameters
  • Analyze convergence behavior by plotting cost over iterations
  • Investigate effect of learning rate on convergence properties
  • Test the algorithm on multiple datasets for verification


Feature Selection using Chi-Square Testing for Phishing URL Detection

Click to see the report

This project identified the most important ones using chi-square feature selection on a dataset of phishing and legitimate URLs in Python.

  • Perform chi-square testing to compute feature importance values
  • Determine the optimal number of features for classification using cross-validation
  • Find out the top features identified were predominantly URL-based, Domain-based and popularity features also contributed significantly, while content features had less impact.


A Comparative Experiment between Neural Network and Random Forest

Click to see the report

This project compared the performance of random forest and neural networks on classification tasks by evaluating their training and testing accuracy at varying complexities and analyzing the differences in modeling aspects in Python.

  • Random forest achieved comparable accuracy to neural networks while requiring significantly less training and testing time, making it preferable for lightweight tasks
  • Hyperparameter tuning was simpler for random forest than neural networks, demonstrating its advantage in ease of optimization over more flexible deep models.


Importance of Feature Normalization for Different ML Models

Click to see the report

This project investigated the impact of feature normalization on the performance of various machine learning models including KNN, Logistic Regression, Decision Tree, Random Forest and SVM in Python.

  • Find out that Distance-based and Margin-based models like KNN and Logistic Regression directly benefit from normalization while Condition-based model like decision trees are not sensitive to it


Comparative Study of SVM and Neural Networks in Diabetes Classification

Click to see the report

This project compared the performance of support vector machines and neural networks for classification of the Pima Indians Diabetes dataset using Microsoft Azure Machine Learning Studio.

  • Implement SVM and neural network models in Azure ML for binary classification
  • Evaluate model performance on test data through various metrics like accuracy, AUC etc
  • Explore parameter tuning and different model configurations