Difference between revisions of "Hackathon 2013/Citations"

From TaxonWorks Wiki
Jump to: navigation, search
(Revelant Code and projects)
(Revelant Code and projects)
Line 117: Line 117:
 
* [http://biblio.globalnames.org GN Biblio tools, plug-ins] some jQuery plugins to expose biblio parsing and mark-up
 
* [http://biblio.globalnames.org GN Biblio tools, plug-ins] some jQuery plugins to expose biblio parsing and mark-up
 
* [http://www.refinder.org/ Pensoft/ViBRANT ReFinder] API docs broken, code unknown
 
* [http://www.refinder.org/ Pensoft/ViBRANT ReFinder] API docs broken, code unknown
* [http://www.hathitrust.org/htrc HathiTrust Research Center wiki]
+
* [http://www.hathitrust.org/htrc HathiTrust Research Center wiki] tools to extract taxon names from literature
  
 
== APIs available ==
 
== APIs available ==

Revision as of 13:27, 1 October 2013

This pitch covers making sure that citations can flow easily in and out of TaxonWorks. It will do this in three ways:

  1. Finding an existing data model which can hold information citations with a standard format for representing it.
  2. Writing code to read this in and out in Ruby, possibly just using a standard library
  3. Writing code to resolve microcitations. A working example of this has been created by Rod Page that links generic names from Nomenclator Zoologicus with references from the Biodiversity Heritage Library.
    • Micro-citations should be differentiated from verbatim references. A micro-citation is just an author and date (as seen in a full taxon name). A verbatim reference is the full reference as it appears in documentation that hasn't be broken into normalized pieces yet. Both versions need to be tracked and searchable. We also need a way to get a listing out of TW of both versions, so they can be normalized into full sources.
    • Lists of journals.
    • BHL API to convert BHL URLs into references
  4. Finding out more about a particular citation
    • Getting abstract, keywords from PubMed
    • Pulling in citation from BHL
    • ImpactStory information
  5. Automatically parsing citations into authors, title, etc.
  6. Designing a user interface to make it easy to resolve microcitations
    • Autocompletion
    • Journal name identification
    • Searching on Google Scholar/Wikipedia for books/authors


Contents

Members

Requirements

  • Must support letters after the year for multiple publications of the same author in the same year.
  • No single field will be require for any given record - This is different than the requirements for valid BibTex.
  • When importing SF reference data into TW sources, they will need to be tagged "For review" because the BibTex has a finer grain of differentiation of types than SF.
  • Should be able to store abstracts.
  • Should be able to round trip data (e.g. import a BibTex file, then output a BibTex file and have them be the same.)

Coding Notes

The following are notes retrieved from Matt's VUE (Media:Source.JPG) file relating to Sources.

  • Relationships to other objects within TaxonWorks:
    • Sources may have a SourceAuthor and/or SourceEditor (Role of a person)
    • Sources may have a HumanSource (Requires a person with a role of SourceSource)
    • Sources will support the following BibTex types:
      • Book
      • Article - an article is published in a Serial (Journal)
      • Conference
      • Booklet
      • InBook
      • InCollection
      • MastersThesis
      • InProceedtings
      • Misc - this will be used when the only available current source is a URL.
      • PhdThesis
      • Techreport
      • Unpublished - TaxonWorks revisions and other works-in-progress, may also be used for LepIndex catalogue cards.
      • Manual
    • Sources may be published in a Serial, which will support relationships of Preceding and Succeeding.
      • Serials have:
        • Title
        • Series_year_start
        • Series_year_end
        • editors (text list - different from people with roles)
        • publisher
        • place_published
        • primary_language
      • SerialRelationship is modeled as type & 2 serial IDs
      • SerialRelationshipType will be based on MARC (http://www.oclc.org/bibformats/en/7xx.html - see 780 and 785)

Open Issues

  • How do we support Tom in Tom, Dick & Harry (E.G. Tom is the authority but the actual journal article is by Tom, Dick & Harry). Are these separate sources (Dmitry says no - not supporting ref-in-ref the same way that SF does. In this case, the taxonomic authority string (which is just a text string) would just not match the author string in the original description source).

Deliverables

  1. Use bibtex-ruby to read, write and round-trip bibliographic information.
  2. Write a Rails system for storing citations, and integrate it with bibtex-ruby.
    • Beth
  3. Given an identifier, look for information about it online.
    • Gaurav
  4. Parsing citations in Ruby.

Terms

  • Citation: An individual, unnormalized use of a source.
    • Each citation will have a unique identifier that the rest of the system can reference.
    • It must be possible to have a citation which consists ONLY of a single identifier. We could treat this as a verbatim reference.
  • Source: Something you want to credit in providing that data.
    • A person can be a source.
    • TW needs sources to be private or public.
  • Global source: a common pool of sources. These should be published and non-private.

Members

Datasets we can play with

  • ITIS
  • GNUB
  • UCD

Input/output formats

URLs and identifiers of taxonomic significance

It should be noted that there will be multiple identifiers associated with a single source.

  • ISBN/ISSN
  • BHL URLs
  • PubMed ID/URLs
  • DOI ID/URLs
  • Handle ID/URLs?
  • Mendeley/Zotero/EndNote ID/URLs

Revelant Code and projects

Projects and gems that may be of interest

APIs available

Links