Greetings

I'm a 4th-year Ph.D. candidate in Computer Science at the University of Toronto, working under the supervision of Prof. Khai Truong.

My research centers human-AI interaction, with an emphasis on accessibility and creativity support, particularly in enhancing "music accessibility" for d/Deaf and hard-of-hearing individuals. One of my main projects involves song signing to support culturally responsive content creation and encourage collaboration between d/Deaf and non-d/Deaf artists. Another aspect of my work focuses on enhancing people's well-being. I am engaged in projects that support individuals with dementia in their out-of-home experiences and encourage mindful eating behaviours among children.

I completed my B.Sci in Computer Science and Engineering at Ewha Womans University, where I was advised by Prof. Uran Oh (Human-Computer Interaction Lab) and Prof. Hyokyung Bahn (Distributed Computing and Operating System Lab). Additionally, I worked as a research intern at the Samsung AI Centre Toronto under the guidance of Dr. Iqbal Mohomed, and at NAVER AI (HCI group) with Dr. Young-Ho Kim.

Genomics & Informatics: Organizing an in-class hackathon to correct PDF-to-text conversion errors of Genomics & Informatics 1.0

 https://genominfo.org/journal/view.php?number=620



This paper describes a community effort to improve earlier versions of the full-text corpus of Genomics & Informatics by semi-automatically detecting and correcting PDF-to-text conversion errors and optical character recognition errors during the first hackathon of Genomics & Informatics Annotation Hackathon (GIAH) event. Extracting text from multi-column biomedical documents such as Genomics & Informatics is known to be notoriously difficult. The hackathon was piloted as part of a coding competition of the ELTEC College of Engineering at Ewha Womans University in order to enable researchers and students to create or annotate their own versions of the Genomics & Informatics corpus, to gain and create knowledge about corpus linguistics, and simultaneously to acquire tangible and transferable skills. The proposed projects during the hackathon harness an internal database containing different versions of the corpus and annotations.


Conclusion

In this paper, we listed issues associated with upgrading the G&I corpus, and discussed methodological strategies to develop the next version of the G&I corpus based on a semi-automatic approach. Besides manual corrections, the outcome using pattern matching techniques and machine learning methods was noteworthy, and it greatly improved the error correction rate.
This is a progress report, and the current debate regarding our post-processing procedures focuses on how to ensure the quality of this semi-automatically modified corpus. It is taken as axiomatic that any correction must be confirmed by at least two, and usually more, people acting independently, so that their modification decisions can be compared. We suggest that a couple more rounds of the GIAH hackathon be organized to construct the future G&I 2.0 corpus. A semi-automatic method should be designed to build and improve the corpus, with a diminishing amount of manual checking.

Comments