Natural Language Processing (NLP) is the subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to process and analyze large amounts of natural language data Wikipedia.

Nowadays, text data is taking a huge portion in a source of data. For instance, language model like RoBERTa can consist of datasets trained with ~58 million Twitter data source after the pre-processed text data goes through word embedding to be vectorized. On top of that, I was intrigued by how this new chapter of Deep Learning can accelerate data mining and model training.

1. NLP Basics

I worked on the Notion page to organize the fundamental concepts. Notion. The content is written in both English 🇺🇸 and Korean 🇰🇷

2. HuggingFace

2-1. Code Interpretation

Huggingface (©) is the single-most famous language model hub that users can build, train and deploy state of the art models powered by the reference open source in machine learning. You can search for models based on the task(Image Classification, Translation, Image Segmentation, Fill-Mask, Automatic Speech Recognition, Sentence Similarity, Audio Classification), library (PyTorch, Tensorflow…), and dataset.

I followed a tutorial presented by Huggingface (©). Refer to the following repository:
Github

3. Projects

Most of the LLM projects are delivered in repositories. In order to actually deliver, I used Gradio for demonstration and deployment of the models. Here are some projects that I created for demonstration:

3-1.