Attention Final year students
Please see notice regarding
potential students to work on a summer project scholarship related to computer
programming. The project details are given below and the funding for this is
$400/Week for 12 week. This applies to students in their final
year, Interested applicants should send a copy of their CV
to me please.
Improving Usability and User Interaction with KALDI Open-Source Speech Recogniser
Description: To enable users to access functionalities of KALDI (http://kaldi-asr.org/doc/) without the knowledge of scripting, a language like Bash, or detailed knowledge of the internal algorithms of KALDI. Furthermore attempts will be made to transcribe live audio speech continuously.
Project Proposals: The proposal consists of two parts.
The first part is focused on improving usability and User Interaction with KALDI through a GUI that has the following features:
§ Availability of a microphone soft ON and OFF switch
§ Minimal scripting knowledge or commands to operate.
§ Provide users the ability to select acoustic and language models of their choice. This can be done by allowing the users either to select one of the pre-trained models or to perform their own acoustic and language model training in order to subsequently use those models.
§ Allow the user to select transcribing from continuous live speech input or from recorded audio. Recording audio from the speaker during live input allows the audio to be played back in order to correct errors in the transcript.
§ Isolating Utterance/Speaker ID and Speaker ID/Utterance pairs from decoded results for later analysis of recognition performance of each user. This process also allows plain transcript for each user to be produced that is free from labels and indices.
§ A facility whereby a user can improve her/his recognition performance with KALDI through user adaptive training i.e. by saving changes to her/his acoustic model after each decoding session.
The second part is reporting the project outcomes through:
§ Documenting the developed graphical user interface design and functionality for KALDI including the processes for selecting acoustic and language models, and incorporating online decoding features.
§ Documenting the results of evaluation studies related to the usability of the new GUI design.
§ Presenting the work to interested staff in Intelligence Analytics Branch of DST Group.
Supervisors: Dr Said Al-Sarawi and Dr Ahmad Hashemi-Sakhtsari (DST Group)
Said Al-Sarawi, PhD
Programs Advisor for BEng(SE&E), BEng(A&ESE), BEng(CSE), BEng(Telecom) and B.E(EEE)
Ingkarni Wardli Building, Level 3, Office 3.39
Director, Centre for Biomedical Engineering (CBME)
Associate Director, The Centre for High Performance Integrated Technologies and Systems (CHiPTec)
School of Electrical and Electronic Engineering
The University of Adelaide, AUSTRALIA 5005
Ph : +61 8 8313 4198
Fax : +61 8 8313 4360