Corpws Cenedlaethol Cymraeg Cyfoes (CorCenCC) - the National Corpus of Contemporary Welsh
                        
                        
                            - Submitting institution
 
                            - 
                                Swansea University / Prifysgol Abertawe
                                
 
                            
 
                            - Unit of assessment
 
                            - 27 - English Language and Literature
 
                            - Output identifier
 
                            - 55093
 
                            - Type
 
                            - S - Research data sets and databases
 
                                - DOI
 
                                - 
                                        10.17035/d.2020.0119878310
                                
 
                                - Location
 
                                - https://www.corcencc.org/
 
                            - Month
 
                            - October
 
                            - Year
 
                            - 2020
 
                            - URL
 
                            - 
                                    
                                        https://www.corcencc.org/
                                    
                            
 
                            - Supplementary information
 
                            - 
-                            
 
                            - Request cross-referral to
 
                            - -
 
                            - Output has been delayed by COVID-19
 
                            - No
 
                            - COVID-19 affected output statement
 
                            - -
 
                            - Forensic science
 
                            - No
 
                            - Criminology
 
                            - No
 
                            - Interdisciplinary
 
                            - Yes
 
                            - Number of additional authors
 
                            - 
                                26
                            
 
                            - Research group(s)
 
                            - 
-                            
 
                            - Proposed double-weighted
 
                            - Yes
 
                                - Double-weighted statement
 
                                - This submission is the main output from a major ASRC/ESRC funded project (ES/M011348/1).
It constitutes a data set of 14,338,149 tokens (circa 11.2-million-words), collected according to a principled sampling frame and submitted to processes of anonymisation, transcription, semantic tagging (using bespoke tool SemCyTag) and Part-of-Speech (POS)  tagging (using bespoke tool CyTag).  In addition to the corpus (the first of its kind for Welsh Language), the output includes supporting documentation and information, including a project report of approx 19,000 words.   All elements of the output are presented bilingually (in English and Welsh).
 
                            - Reserve for an output with double weighting
 
                            - No
 
                            - Additional information
 
                            - The CorCenCC submission (National Corpus of Contemporary Welsh – Corpws Cenedlaethol Cymraeg Cyfoes) includes a data set (c. 11.2 million words), bespoke technical tools (Part-of-Speech tagset and tagger (CyTag), adapted semantic tagger (CySemTag), crowdsourcing app, transcription conventions), the Y Tiwtiadur pedagogic toolkit, the Yr Amliadur word frequency lists, and supporting documentation and information, including a project report.  The corpus and associated documentation is accessed through the CorCenCC webpages (https://www.corcencc.org/ and https://www.corcencc.cymru/) and technical tools are at https://github.com/CorCenCC. These websites are external to Swansea University. CorCenCC can be explored via the website, and the data behind CorCenCC can be requested via a webpage link. 
The three co-founders and co-creators of CorCenCC (Fitzpatrick, Knight and Morris) were jointly responsible for decisions around, and strategic management and co-ordination of, all work-packages and the project team of around 40 individuals. In addition to the operational management of the project, Fitzpatrick’s specific role focused on effective communications, creative trouble-shooting, and dynamic and agile decision-making. Besides her key role in the creation and strategic leadership of the project, Fitzpatrick took a leading role on pedagogical and lexical aspects of the research portfolio. She was also a key author of the central project documentation including the main project report, which documents the research process, provides a detailed overview of CorCenCC tools and outputs, and relates the applications of these to a range of user groups. 
 
                            - Author contribution statement
 
                            - -
 
                            - Non-English
 
                            - No
 
                            - English abstract
 
                            - -