Binit Bhattarai

About me

I'm a ML Engineer from Nepal, working in machine learning in the domain of audio and speech processing. I enjoy turning complex problems into solutions and finding new ways to tackle problems.

I am also involved in research in the audio and speech processing domain with interests in automatic speech recognition, singing voice synthesis, speech enhancement and audio classification. Other than this, I am also working to document datasets for low resource languages in my country for speech and NLP purposes. Any one willing to collaborate to collect these data, please feel free to connect. Find my CV.

Outside my work, I am interested in sports and its analytics and music.

What I'm doing

ASR For Low Resource Language

Working to create robust ASRs for Low Resource Languages in Nepal.
Speech Enhancement

Working on building different architectures for speech enhancement.
Audio Classification

Detection of early MCI through speech and detection of audio manipulation like splicing and pasting.
Speech Synthesis

Building a robust TTS for Nepali language.

Past/Present Affiliations for Research

Resume

Education

Vellore Institute of Technology, Vellore, India
2020 — 2024
• Received the full ride COMPEX Scholarship from the Embassy of India to Nepal.
• Graduated with a first class degree with a CGPA of 9.09 out of 10.
• Main courses: Data Structures and Algorithms, Neural Networks, Statistics, DBMS, Operating System, Cloud Computing, Cyber Security, Machine Learning, Deep Learning, Computer Vision, Natural Language
Capital Secondary School
2018 — 2020
• Completed my +2 with a GPA of 3.64/4 from the National Examinations Board (Nepal).

Experience

Research Volunteer @ SH-RI, Kathmandu
Aug 2024 — Present
Working towards ASR and TTS for Low Resource Languages in Nepal.
Research Student @ A*STAR, Singapore
Feb 2024 — July 2024
• Awarded SIPGA and worked on Automatic Speech Recognition, Singing Voice Synthesis and Conversion, Speech Enhancement and Audio Classification with different neural network architectures like RNNs, GANs, diffusion, transformer and encoder-decoder.
• Worked with large scale audio datasets including Indonesian corpus (18k hours), ATC and Singing corpus.
• Conducted pretraining and finetuning of models like Wav2Vec2, Whisper, WavLM and AST for ASR and classification and Bark and VITS for singing generation and conversion.
Research Student @ Samsung Research Institute, Bangalore
Dec 2022 — Aug 2023
• Selected to work on a Samsung IoT Edge project as part of the PRISM program.
• Worked on the kernel of Linux based real time operating system, TizenRT.
• Implemented a network file system client library on TizenRT and used it for low storage embedded devices.
Full Stack Developer Intern @ BitsKraft, Kathmandu
May 2023 — July 2023
• Developed an efficient MERN-based Employee Data Management System and contributed to the creation of RESTful APIs for a Drone Delivery System, enhancing operational efficiency.

About me

What I'm doing

ASR For Low Resource Language

Speech Enhancement

Audio Classification

Speech Synthesis

Past/Present Affiliations for Research

Resume

Education

Vellore Institute of Technology, Vellore, India

Capital Secondary School

Experience

Research Volunteer @ SH-RI, Kathmandu

Research Student @ A*STAR, Singapore

Research Student @ Samsung Research Institute, Bangalore

Full Stack Developer Intern @ BitsKraft, Kathmandu

Publication

VietSing: A High-quality Vietnamese Singing Voice Corpus

Self-Attention Siamese Network for Unsupervised Few-Shot Learning Tasks

news

Contact

Contact Form

What I'm doing

ASR For Low Resource Language

Speech Enhancement

Audio Classification

Speech Synthesis

Daniel lewis

Past/Present Affiliations for Research

Education

Vellore Institute of Technology, Vellore, India

Capital Secondary School

Experience

Research Volunteer @ SH-RI, Kathmandu

Research Student @ A*STAR, Singapore

Research Student @ Samsung Research Institute, Bangalore

Full Stack Developer Intern @ BitsKraft, Kathmandu

VietSing: A High-quality Vietnamese Singing Voice Corpus

Self-Attention Siamese Network for Unsupervised Few-Shot Learning Tasks

Contact Form