Media Matching Algorithm

STANDARD BIBLIOGRAPHIC NUMBER MATCH

010 - LCCN withCrossreferencing (hyphens and spaces are ignored, upper case letters are normalized)

020 - ISBN withCrossreferencing (match on first occurrence, hyphens and spaces are ignored, upper case letters are normalized)

022 - ISSN withCrossreferencing (match on first occurrence, hyphens and spaces are ignored, upper case letters are normalized

028 40 - Publisher's number (match on first occurrence)

Validation of all number matches on the following:

Leader position 6 and 7

245 $a 15 characters (normalize upper case letters, special characters are ignored; checks field 008 position 35-37 for Language to decide which article to drop in the normalization of field 245)

TEXT-BASED MATCH

245 $a (all characters) $b 15 characters

Validate leader position 6 and 7

Validate 245 $h 7 characters

Validate 245 $p 15 characters

Validate 260 $b Publisher's name (first 5 words; match any one word), 260 $c Publisher's Date (first 5 dates)

Validate 440 $a 15 characters (normalizeFirstArticle)

Validate 710 $a string of first 15 characters. A valid match is when this set of characters appears in any of a record's 710 field.

Please Note: This does not mean that "all" 710 fields must match, rather only that the first 15 characters from one of the 710 fields matches.

NOTES:

  • Validation on 008 positions 23 and 35-37 is optional for TEXT-BASED matches. ELN has chosen not to include this matching criteria.
  • The program uses the 245 filing indicator. To safeguard against incorrect usage, however, the program has a stop list which includes "the" and "an". This stop list is applied to all 245 subfields, excluding $h.
  • The stop list "the" and "an" is also applied to the 4xx match.
  • For 245 $b and $p, the absence of a subfield in merging records will not inhibit an otherwise valid match.

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer