Knowledge

User:Rjwilmsi/CiteCompletion

Source 📝

36:
CiteCompletion is a script that completes fields within citations to common English-language news sites on the English Knowledge. It works by taking the news article URL from the Knowledge article page, looking up the news page and extracting the missing details of the news article based on per-site
933:
An alternative to CiteCompletion: CiteCompletion handles its supported sites more thoroughly than REFLINKS and can complete existing citations whereas REFLINKS offers all site support in a more generic way (normally does not detect authors etc.) but only for bare URLs (no completion of existing
442:
Publication dates are stripped of timestamps and days of the week and converted to the predominant format used in the Knowledge article (International, American or ISO, falling back to ISO if there is no predominant
883:
Authors with multiple first names or multiple surnames are not supported (script cannot determine whether for 'Name Anothername Surname' Anothername should be part of
173:
It does not modify non-templated manually formatted citations (because it cannot interpret the existing data so may overwrite user-set data).
866:
Not all fields are found from all supported sites. CiteCompletion will be improved over time to correctly extract more data.
553:
Otherwise, if no majority default to "2011-01-15" format (avoids accusation of any American/International bias).
447:
The tidied up value is then appended to the citation. Values are not updated, they are only added if missing:
900:
The following are ideas that may or may not be implemented in CiteCompletion at some point in the future:
345:
fields is not specified. If one or more of the fields are missing, the HTML source of the URL is fetched.
960: 851:
Not all sites have rules for all fields (e.g. news.bbc.co.uk does not specify the article authors).
576:. The rules determine how to extract the template fields for each news site supported (e.g. for 541: 531: 848:
Where the derivation is a custom regular expression, the derivation starts with character '@'.
946: 250: 891:). Currently such authors are ignored; solutions for including them are under investigation. 176:
It has only been designed for use on the English Knowledge; it may not work anywhere else.
8: 966: 970: 927: 845:
Where there are multiple derivations for the same field, these are separated by commas.
523:) is either "2011-01-15", or "15 January 2011" or "January 15, 2011". The decision is: 376: 353: 220: 136: 62: 51:
It operates only on sites that it has been specifically configured to work on, see the
939: 366: 240: 230: 202:
written in C#. In the future it may be made generally available as a Plugin for AWB.
82: 72: 45: 955:
Specialised for Journal citations. Not an alternative to CiteCompletion as such.
873:
are supported. CiteCompletion will be improved over time to support more sites.
779:. This file is loaded into memory once per session. The format of the file is: 430:
All UPPERCASE or lowercase titles and author names are converted to Title Case.
199: 41: 17: 914:
Identify and flag news articles where registration/paid access is required.
563:
An edit summary is generated with counts of how many fields were completed.
349:
Where the citation matches but it is not templated, it is converted to use
930:– a citation insertion script that supports all sites in a generic way. 550:
Otherwise count existing date usage in article and use the majority one.
186:
CiteCompletion is fully compatible with the Harvard referencing system.
166:
It does not handle non-English news sites, nor sites not listed in the
150:
for those within citation templates, still only for those sites on the
329:
on the Knowledge article is assessed for a URL matching one of the
333:. If a match is found a check is made to see if one or more of the 58:
It can complete the following fields in citation templates such as
262:
Bare URLs with a bot generated title when within <ref> tags.
969:- a similar tool that scrapes popular sites in with jquery using 163:
It does not modify or update fields where they are already set.
942:– a citation completion script for Scientific Journal cites ( 424:
Quotes are trimmed from titles (not quotes within the title).
408:
Custom regex (matching a span, heading or script value etc.).
502:
is set from the XML settings if relevant (first checks that
572:
For each supported site a set of rules are available in an
132:
It will also tag dead links if not already tagged with
189:
Authors/titles with accented characters are supported.
417:
When a match is found the source match is tidied up:
436:
Locations, job titles are removed from author names.
921: 439:Authors are split to "Lastname, Firstname" format. 271:CiteCompletion can complete the following fields: 963:– generates citations for The New York Times etc. 917:Allow community maintenance of XML settings file. 421:HTML-escaped characters are converted to Unicode. 895: 515:The date format used for inserted dates (both 427:Smart quotes are converted to straight quotes. 326: 157: 775:CiteCompletion uses an XML settings file of 266: 855: 210: 412: 216:Citation templates referencing a URL e.g. 205: 488:is set from the XML settings if relevant. 391:The HTML source is then parsed using the 31: 904:Release CiteCompletion as an AWB plugin. 402:HTML script numbered property (s.prop). 259:Bare URLs when within <ref> tags. 14: 198:CiteCompletion is a Custom module for 44:and normally run under the account of 870: 386: 330: 167: 151: 52: 776: 392: 320: 315: 23: 590: 573: 433:Newlines are replaced with spaces. 24: 982: 567: 395:. Supported parsing methods are: 770: 180: 922:Alternative & related tools 766:Others will be added over time. 405:HTML div id/span class/p class. 193: 510: 13: 1: 877: 557: 896:Possible future improvements 7: 860: 495:is set to the current date. 10: 987: 961:Knowledge:WikiCite Builder 482:etc. for multiple authors. 158:What CiteCompletion is not 26: 680:seattletimes.nwsource.com 267:Supported template fields 144:|bot=RjwilmsiBot 856:Issues & limitations 781: 362:Where the citation uses 327:Supported citation types 211:Supported citation types 585:OriginalPublicationDate 413:Insert parameter values 372:it is converted to use 206:Detail of functionality 967:Ubiquity citation tool 399:HTML meta tag content. 32:What CiteCompletion is 934:templated citations). 911:field where relevant. 741:hollywoodreporter.com 521:|accessdate= 493:|accessdate= 305:|accessdate= 148:|deadurl=yes 122:|accessdate= 583:is stored under the 504:|publisher= 168:supported sites list 152:supported sites list 53:supported sites list 723:theglobeandmail.com 671:accessmylibrary.com 486:|location= 300:|location= 117:|location= 729:huffingtonpost.com 689:chicagotribune.com 622:washingtonpost.com 940:User:Citation bot 909:|agency= 836:</NewsSite> 833:</Encoding> 805:</Location> 796:TheDailyTelegraph 613:independent.co.uk 610:timesonline.co.uk 574:XML settings file 506:etc. is not set). 464:|author= 387:Parse HTML source 343:|author= 286:|author= 103:|author= 40:It is written by 978: 951: 945: 910: 890: 886: 885:|first= 837: 834: 830: 829:<Encoding> 827: 823: 820: 819:</Authors> 816: 813: 809: 806: 802: 801:<Location> 799: 795: 792: 788: 785: 784:<NewsSite> 674:post-gazette.com 659:findarticles.com 649:findarticles.com 582: 546: 540: 536: 530: 522: 518: 505: 501: 494: 487: 481: 480:|last2= 477: 476:|last1= 473: 472:|first= 469: 465: 460:is set as found. 459: 454:is set as found. 453: 452:|title= 381: 375: 371: 365: 358: 352: 344: 340: 336: 335:|title= 321:Assess citations 316:Processing logic 311: 306: 301: 295: 294:|first= 291: 287: 282: 277: 276:|title= 255: 249: 245: 239: 235: 229: 225: 219: 149: 145: 141: 135: 128: 123: 118: 112: 111:|first= 108: 104: 99: 94: 93:|title= 87: 81: 77: 71: 67: 61: 986: 985: 981: 980: 979: 977: 976: 975: 949: 943: 924: 908: 898: 889:|last= 888: 884: 880: 871:Supported sites 863: 858: 839: 838: 835: 832: 828: 826:</Titles> 825: 821: 818: 815:<Authors> 814: 811: 807: 804: 800: 797: 793: 790: 789:telegraph.co.uk 786: 783: 773: 763: 762: 714:bizjournals.com 710: 692:dailymail.co.uk 652: 616:telegraph.co.uk 593: 591:Supported sites 581:|date= 580: 570: 560: 544: 538: 534: 528: 520: 517:|date= 516: 513: 503: 500:|work= 499: 492: 485: 479: 475: 471: 468:|last= 467: 463: 458:|date= 457: 451: 415: 389: 379: 373: 369: 363: 356: 350: 342: 339:|date= 338: 334: 331:Supported sites 323: 318: 310:|work= 309: 304: 299: 293: 290:|last= 289: 285: 281:|date= 280: 275: 269: 253: 247: 243: 237: 233: 227: 223: 217: 213: 208: 196: 183: 160: 147: 143: 139: 133: 127:|work= 126: 121: 116: 110: 107:|last= 106: 102: 98:|date= 97: 92: 85: 79: 75: 69: 65: 59: 48:as a bot task. 34: 29: 22: 21: 20: 12: 11: 5: 984: 974: 973: 964: 958: 957: 956: 937: 936: 935: 923: 920: 919: 918: 915: 912: 905: 897: 894: 893: 892: 879: 876: 875: 874: 867: 862: 859: 857: 854: 853: 852: 849: 846: 822:<Titles> 812:</Dates> 810:DC.date.issued 782: 777:per-site rules 772: 769: 761: 760: 757: 754: 751: 748: 747:oregonlive.com 745: 742: 739: 738:irishtimes.com 736: 735:independent.ie 733: 732:nzherald.co.nz 730: 727: 724: 721: 720:denverpost.com 718: 715: 711: 709: 708: 705: 704:indiatimes.com 702: 699: 696: 693: 690: 687: 684: 681: 678: 675: 672: 669: 666: 665:pqarchiver.com 663: 660: 657: 653: 651: 650: 647: 644: 641: 638: 635: 632: 629: 626: 623: 620: 617: 614: 611: 608: 607:guardian.co.uk 605: 602: 599: 598:news.bbc.co.uk 595: 594: 592: 589: 578:news.bbc.co.uk 569: 568:Per-site rules 566: 565: 564: 559: 556: 555: 554: 551: 548: 512: 509: 508: 507: 497: 489: 483: 461: 455: 445: 444: 440: 437: 434: 431: 428: 425: 422: 414: 411: 410: 409: 406: 403: 400: 393:per-site rules 388: 385: 384: 383: 360: 322: 319: 317: 314: 313: 312: 307: 302: 297: 283: 278: 268: 265: 264: 263: 260: 257: 212: 209: 207: 204: 195: 192: 191: 190: 187: 182: 179: 178: 177: 174: 171: 164: 159: 156: 130: 129: 124: 119: 114: 100: 95: 33: 30: 28: 25: 15: 9: 6: 4: 3: 2: 983: 972: 968: 965: 962: 959: 954: 953: 948: 941: 938: 932: 931: 929: 926: 925: 916: 913: 906: 903: 902: 901: 882: 881: 872: 868: 865: 864: 850: 847: 844: 843: 842: 808:<Dates> 798:</Work> 780: 778: 771:Settings file 768: 767: 758: 755: 752: 750:seattlepi.com 749: 746: 743: 740: 737: 734: 731: 728: 725: 722: 719: 716: 713: 712: 706: 703: 701:economist.com 700: 697: 694: 691: 688: 685: 682: 679: 676: 673: 670: 667: 664: 662:theage.com.au 661: 658: 655: 654: 648: 645: 642: 639: 636: 633: 630: 627: 624: 621: 618: 615: 612: 609: 606: 603: 600: 597: 596: 588: 587:meta value). 586: 579: 575: 562: 561: 552: 549: 543: 542:use mdy dates 533: 532:use dmy dates 526: 525: 524: 498: 496: 490: 484: 466:is set using 462: 456: 450: 449: 448: 441: 438: 435: 432: 429: 426: 423: 420: 419: 418: 407: 404: 401: 398: 397: 396: 394: 378: 368: 361: 355: 348: 347: 346: 332: 328: 308: 303: 298: 284: 279: 274: 273: 272: 261: 258: 252: 242: 232: 222: 215: 214: 203: 201: 188: 185: 184: 181:Compatibility 175: 172: 169: 165: 162: 161: 155: 153: 138: 125: 120: 115: 101: 96: 91: 90: 89: 84: 74: 64: 56: 54: 49: 47: 43: 38: 19: 18:User:Rjwilmsi 950:}} 947:cite journal 944:{{ 899: 840: 794:<Work> 791:</URL> 774: 765: 764: 726:scotsman.com 698:thesun.co.uk 634:newsbank.com 628:usatoday.com 584: 577: 571: 545:}} 539:{{ 535:}} 529:{{ 514: 491: 446: 416: 390: 380:}} 374:{{ 370:}} 364:{{ 357:}} 351:{{ 325:Each of the 324: 270: 254:}} 251:cite journal 248:{{ 244:}} 238:{{ 234:}} 228:{{ 224:}} 218:{{ 197: 194:Availability 140:}} 134:{{ 131: 86:}} 80:{{ 76:}} 70:{{ 66:}} 60:{{ 57: 50: 39: 35: 928:WP:REFLINKS 787:<URL> 695:cbsnews.com 686:foxnews.com 646:reuters.com 640:news.com.au 637:variety.com 631:latimes.com 619:thestar.com 601:nytimes.com 547:if present. 511:Date format 46:RjwilmsiBot 878:Infrequent 831:iso-8859-1 717:forbes.com 668:boston.com 656:sfgate.com 643:smh.com.au 558:Completion 146:, and set 869:Only the 759:pcmag.com 756:wired.com 707:hindu.com 377:cite news 354:cite news 221:cite news 137:dead link 63:cite news 971:ubiquity 907:Set the 861:Frequent 604:time.com 443:format). 367:cite web 241:citation 231:cite web 83:citation 73:cite web 42:Rjwilmsi 841:Notes: 683:wsj.com 625:cnn.com 527:Follow 288:(using 154:below. 105:(using 55:below. 37:rules. 27:Summary 817:author 803:London 753:ew.com 744:rte.ie 677:cbc.ca 170:below. 142:using 824:title 296:etc.) 113:etc.) 16:< 519:and 478:and 246:and 78:and 887:or 537:or 474:or 341:or 200:AWB 952:) 470:, 337:, 292:, 256:). 236:, 226:, 109:, 88:: 68:, 382:. 359:.

Index

User:Rjwilmsi
Rjwilmsi
RjwilmsiBot
supported sites list
cite news
cite web
citation
dead link
supported sites list
supported sites list
AWB
cite news
cite web
citation
cite journal
Supported citation types
Supported sites
cite news
cite web
cite news
per-site rules
use dmy dates
use mdy dates
XML settings file
per-site rules
Supported sites
WP:REFLINKS
User:Citation bot
cite journal
Knowledge:WikiCite Builder

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.