|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Search the dataIPNI Home |
Cleaning the dataThe first stage in the data improvement process was to fix the citations which had suffered parsing problems on import, leaving data inappropriately split across data fields. These fixes were automated using Python scripts. Each category of error had its own particular requirements and each record was checked manually after splitting to ensure the integrity of the data. Here are some examples of the types of parsing problems worked on:
Cleaning misparsed records which have an edition number in the title or the collation has proven to be particularly time consuming. This is because in some in cases ‘ed.’ is part of the standard form of the publication title (as in Sp. Pl., ed. 2), and in others it is part of the collation. To be sure of capturing all of the possible candidates for this clean-up we had to extract all IPNI records which had ‘ed.’ in either the title field or the collation field (more than 25,000 records). We are in the process of checking and correcting these records on a case-by-case basis. The situation is similar for those publications involving series and supplements. When correctly parsed, the data is ready to undergo intensive processing of the collation. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|