Difference between revisions of "Hackathon 2013/Citations"
From TaxonWorks Wiki
(→Requirements (potential test cases/use cases)) |
(→Requirements (potential test cases/use cases)) |
||
Line 29: | Line 29: | ||
* Must support letters after the year for multiple publications of the same author in the same year. | * Must support letters after the year for multiple publications of the same author in the same year. | ||
* No single field will be required for any given record - This is different than the requirements for valid BibTex. | * No single field will be required for any given record - This is different than the requirements for valid BibTex. | ||
− | ** The scientist should be able to enter partial or incomplete data, | + | ** The scientist should be able to enter partial or incomplete data, then return later to complete this information. TW must support a workflow that is convenient for the scientist, not the program. (That may be all that is conveniently available at the moment.) (See note below on adding a "review needed" tag on import. This type of tag should be available on manual entry as well.) |
** The scientist should be able to copy a verbatim reference from another document and past it into TW for normalization later. | ** The scientist should be able to copy a verbatim reference from another document and past it into TW for normalization later. | ||
** The scientist should be able to similarly past just a URL/URN or other identifier in as a reference for completion later (this will be a source of type "miscellaneous"). | ** The scientist should be able to similarly past just a URL/URN or other identifier in as a reference for completion later (this will be a source of type "miscellaneous"). |
Revision as of 13:34, 1 October 2013
This pitch covers making sure that citations can flow easily in and out of TaxonWorks. It will do this in three ways:
- Finding an existing data model which can hold information citations with a standard format for representing it.
- Possibly wikipedia:BibTeX
- Possibly http://bibliontology.com/
- Writing code to read this in and out in Ruby, possibly just using a standard library
- Possibly http://rubygems.org/gems/bibtex-ruby
- Writing code to resolve microcitations. A working example of this has been created by Rod Page that links generic names from Nomenclator Zoologicus with references from the Biodiversity Heritage Library.
- Micro-citations should be differentiated from verbatim references. A micro-citation is just an author and date (as seen in a full taxon name). A verbatim reference is the full reference as it appears in documentation that hasn't be broken into normalized pieces yet. Both versions need to be tracked and searchable. We also need a way to get a listing out of TW of both versions, so they can be normalized into full sources.
- Lists of journals.
- BHL API to convert BHL URLs into references
- Finding out more about a particular citation
- Getting abstract, keywords from PubMed
- Pulling in citation from BHL
- ImpactStory information
- Automatically parsing citations into authors, title, etc.
- Lots of cool stuff at http://biblio.globalnames.org/
- Designing a user interface to make it easy to resolve microcitations
- Autocompletion
- Journal name identification
- Searching on Google Scholar/Wikipedia for books/authors
Contents |
Members
Requirements (potential test cases/use cases)
- Must support letters after the year for multiple publications of the same author in the same year.
- No single field will be required for any given record - This is different than the requirements for valid BibTex.
- The scientist should be able to enter partial or incomplete data, then return later to complete this information. TW must support a workflow that is convenient for the scientist, not the program. (That may be all that is conveniently available at the moment.) (See note below on adding a "review needed" tag on import. This type of tag should be available on manual entry as well.)
- The scientist should be able to copy a verbatim reference from another document and past it into TW for normalization later.
- The scientist should be able to similarly past just a URL/URN or other identifier in as a reference for completion later (this will be a source of type "miscellaneous").
- When importing SF reference data into TW sources, they will need to be tagged "For review" because the BibTex has a finer grain of differentiation of types than SF.
- Should be able to store abstracts.
- Should be able to round trip data (e.g. import a BibTex file, then output a BibTex file and have them be the same.)
Coding Notes
The following are notes retrieved from Matt's VUE (Media:Source.JPG) file relating to Sources.
- Relationships to other objects within TaxonWorks:
- Sources may have a SourceAuthor and/or SourceEditor (Role of a person)
- Sources may have a HumanSource (Requires a person with a role of SourceSource)
- Sources will support the following BibTex types:
- Book
- Article - an article is published in a Serial (Journal)
- Conference
- Booklet
- InBook
- InCollection
- MastersThesis
- InProceedtings
- Misc - this will be used when the only available current source is a URL.
- PhdThesis
- Techreport
- Unpublished - TaxonWorks revisions and other works-in-progress, may also be used for LepIndex catalogue cards.
- Manual
- Sources may be published in a Serial, which will support relationships of Preceding and Succeeding.
- Serials have:
- Title
- Series_year_start
- Series_year_end
- editors (text list - different from people with roles)
- publisher
- place_published
- primary_language
- SerialRelationship is modeled as type & 2 serial IDs
- SerialRelationshipType will be based on MARC (http://www.oclc.org/bibformats/en/7xx.html - see 780 and 785)
- Serials have:
Open Issues
- How do we support Tom in Tom, Dick & Harry (E.G. Tom is the authority but the actual journal article is by Tom, Dick & Harry). Are these separate sources (Dmitry says no - not supporting ref-in-ref the same way that SF does. In this case, the taxonomic authority string (which is just a text string) would just not match the author string in the original description source).
Deliverables
- Use bibtex-ruby to read, write and round-trip bibliographic information.
- Write a Rails system for storing citations, and integrate it with bibtex-ruby.
- Beth
- Given an identifier, look for information about it online.
- Gaurav
- Parsing citations in Ruby.
Terms
- Citation: An individual, unnormalized use of a source.
- Each citation will have a unique identifier that the rest of the system can reference.
- It must be possible to have a citation which consists ONLY of a single identifier. We could treat this as a verbatim reference.
- Source: Something you want to credit in providing that data.
- A person can be a source.
- TW needs sources to be private or public.
- Global source: a common pool of sources. These should be published and non-private.
Members
- Beth Frank
- Gaurav Vaidya
- Mike Maehr
Datasets we can play with
- ITIS
- GNUB
- UCD
Input/output formats
URLs and identifiers of taxonomic significance
It should be noted that there will be multiple identifiers associated with a single source.
- ISBN/ISSN
- BHL URLs
- PubMed ID/URLs
- DOI ID/URLs
- Handle ID/URLs?
- Mendeley/Zotero/EndNote ID/URLs
Revelant Code and projects
Projects and gems that may be of interest
- Anystyle Parser used primary for parsing citations, might be adapted to also parse microcitations
- Citation Style Language Processor used to process various citation styles
- OpenURL gem used to create, parse and use OpenURL queries eg to BHL or to CrossRef
- GN Biblio tools, plug-ins some jQuery plugins to expose biblio parsing and mark-up
- Pensoft/ViBRANT ReFinder API docs broken, code unknown
- HathiTrust Research Center wiki tools to extract taxon names from literature
APIs available
- The BHL API can be used to get bibliographic information on BHL pages and publications.
- The Mendeley API can be used to get abstract, keywords, and search.
- CrossRef Search API used to get all information about a DOI or search for citation, highly recommended
- CrossRef OpenURL Send OpenURL requests to CrossRef (also API)
- Open Library API and link to Ruby interface for the API