Difference between revisions of "Hackathon 2013/Citations"

Revision as of 13:34, 1 October 2013

This pitch covers making sure that citations can flow easily in and out of TaxonWorks. It will do this in three ways:

Finding an existing data model which can hold information citations with a standard format for representing it.
- Possibly wikipedia:BibTeX
- Possibly http://bibliontology.com/
Writing code to read this in and out in Ruby, possibly just using a standard library
- Possibly http://rubygems.org/gems/bibtex-ruby
Writing code to resolve microcitations. A working example of this has been created by Rod Page that links generic names from Nomenclator Zoologicus with references from the Biodiversity Heritage Library.
- Micro-citations should be differentiated from verbatim references. A micro-citation is just an author and date (as seen in a full taxon name). A verbatim reference is the full reference as it appears in documentation that hasn't be broken into normalized pieces yet. Both versions need to be tracked and searchable. We also need a way to get a listing out of TW of both versions, so they can be normalized into full sources.
- Lists of journals.
- BHL API to convert BHL URLs into references
Finding out more about a particular citation
- Getting abstract, keywords from PubMed
- Pulling in citation from BHL
- ImpactStory information
Automatically parsing citations into authors, title, etc.
- Lots of cool stuff at http://biblio.globalnames.org/
Designing a user interface to make it easy to resolve microcitations
- Autocompletion
- Journal name identification
- Searching on Google Scholar/Wikipedia for books/authors

Members

Requirements (potential test cases/use cases)

Must support letters after the year for multiple publications of the same author in the same year.
No single field will be required for any given record - This is different than the requirements for valid BibTex.
- The scientist should be able to enter partial or incomplete data, then return later to complete this information. TW must support a workflow that is convenient for the scientist, not the program. (That may be all that is conveniently available at the moment.) (See note below on adding a "review needed" tag on import. This type of tag should be available on manual entry as well.)
- The scientist should be able to copy a verbatim reference from another document and past it into TW for normalization later.
- The scientist should be able to similarly past just a URL/URN or other identifier in as a reference for completion later (this will be a source of type "miscellaneous").
When importing SF reference data into TW sources, they will need to be tagged "For review" because the BibTex has a finer grain of differentiation of types than SF.
Should be able to store abstracts.
Should be able to round trip data (e.g. import a BibTex file, then output a BibTex file and have them be the same.)

Coding Notes

The following are notes retrieved from Matt's VUE (Media:Source.JPG) file relating to Sources.

Relationships to other objects within TaxonWorks:
- Sources may have a SourceAuthor and/or SourceEditor (Role of a person)
- Sources may have a HumanSource (Requires a person with a role of SourceSource)
- Sources will support the following BibTex types:
  - Book
  - Article - an article is published in a Serial (Journal)
  - Conference
  - Booklet
  - InBook
  - InCollection
  - MastersThesis
  - InProceedtings
  - Misc - this will be used when the only available current source is a URL.
  - PhdThesis
  - Techreport
  - Unpublished - TaxonWorks revisions and other works-in-progress, may also be used for LepIndex catalogue cards.
  - Manual
- Sources may be published in a Serial, which will support relationships of Preceding and Succeeding.
  - Serials have:
    - Title
    - Series_year_start
    - Series_year_end
    - editors (text list - different from people with roles)
    - publisher
    - place_published
    - primary_language
  - SerialRelationship is modeled as type & 2 serial IDs
  - SerialRelationshipType will be based on MARC (http://www.oclc.org/bibformats/en/7xx.html - see 780 and 785)

Open Issues

How do we support Tom in Tom, Dick & Harry (E.G. Tom is the authority but the actual journal article is by Tom, Dick & Harry). Are these separate sources (Dmitry says no - not supporting ref-in-ref the same way that SF does. In this case, the taxonomic authority string (which is just a text string) would just not match the author string in the original description source).

Deliverables

Use bibtex-ruby to read, write and round-trip bibliographic information.
Write a Rails system for storing citations, and integrate it with bibtex-ruby.
- Beth
Given an identifier, look for information about it online.
- Gaurav
Parsing citations in Ruby.

Terms

Citation: An individual, unnormalized use of a source.
- Each citation will have a unique identifier that the rest of the system can reference.
- It must be possible to have a citation which consists ONLY of a single identifier. We could treat this as a verbatim reference.
Source: Something you want to credit in providing that data.
- A person can be a source.
- TW needs sources to be private or public.
Global source: a common pool of sources. These should be published and non-private.

Members

Datasets we can play with

ITIS
GNUB
UCD

Input/output formats

BibTeX
RIS?
Mark21 (Library of Congress Standard)
BibJSON

URLs and identifiers of taxonomic significance

It should be noted that there will be multiple identifiers associated with a single source.

ISBN/ISSN
BHL URLs
PubMed ID/URLs
DOI ID/URLs
Handle ID/URLs?
Mendeley/Zotero/EndNote ID/URLs

Revelant Code and projects

Projects and gems that may be of interest

Anystyle Parser used primary for parsing citations, might be adapted to also parse microcitations
Citation Style Language Processor used to process various citation styles
OpenURL gem used to create, parse and use OpenURL queries eg to BHL or to CrossRef
GN Biblio tools, plug-ins some jQuery plugins to expose biblio parsing and mark-up
Pensoft/ViBRANT ReFinder API docs broken, code unknown
HathiTrust Research Center wiki tools to extract taxon names from literature

APIs available

The BHL API can be used to get bibliographic information on BHL pages and publications.
The Mendeley API can be used to get abstract, keywords, and search.
CrossRef Search API used to get all information about a DOI or search for citation, highly recommended
CrossRef OpenURL Send OpenURL requests to CrossRef (also API)
Open Library API and link to Ruby interface for the API

@@ Line 29: / Line 29: @@
 * Must support letters after the year for multiple publications of the same author in the same year.
 * No single field will be required for any given record - This is different than the requirements for valid BibTex.
-** The scientist should be able to enter partial or incomplete data, and then return later to complete this information. TW must support a workflow that is convenient for the scientist not the program. (That may be all that is conveniently available at the moment.) (See note below on adding a "review needed" tag on import. This type of tag should be available on manual entry as well.)
+** The scientist should be able to enter partial or incomplete data, then return later to complete this information. TW must support a workflow that is convenient for the scientist, not the program. (That may be all that is conveniently available at the moment.) (See note below on adding a "review needed" tag on import. This type of tag should be available on manual entry as well.)
 ** The scientist should be able to copy a verbatim reference from another document and past it into TW for normalization later.
 ** The scientist should be able to similarly past just a URL/URN or other identifier in as a reference for completion later (this will be a source of type "miscellaneous").

Difference between revisions of "Hackathon 2013/Citations"

Revision as of 13:34, 1 October 2013

Contents

Members

Requirements (potential test cases/use cases)

Coding Notes

Open Issues

Deliverables

Terms

Members

Datasets we can play with

Input/output formats

URLs and identifiers of taxonomic significance

Revelant Code and projects

APIs available

Links

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

CODE

HACKATHONS

Toolbox