Skip to main content

(index page)

PIDapalooza – What, Why, When, Who?

audience

PIDapalooza, a community-led conference on persistent identifiers
November 9-10, 2016
Radisson Blu Saga Hotel
pidapalooza.org

PIDapalooza will bring together creators and users of persistent identifiers (PIDs) from around the world to shape the future PID landscape through the development of tools and services for the research community. PIDs support proper attribution and credit, promote collaboration and reuse, enable reproducibility of findings, foster faster and more efficient progress, and facilitate effective sharing, dissemination, and linking of scholarly works.

If you’re doing something interesting with persistent identifiers, or you want to, come to PIDapalooza and share your ideas with a crowd of committed innovators.

Conference themes include:

  1. PID myths. Are PIDs better in our minds than in reality? PID stands for Persistent IDentifier, but what does that mean and does such a thing exist?
  2. Achieving persistence. So many factors affect persistence: mission, oversight, funding, succession, redundancy, governance. Is open infrastructure for scholarly communication the key to achieving persistence?
  3. PIDs for emerging uses. Long-term identifiers are no longer just for digital objects. We have use cases for people, organizations, vocabulary terms, and more. What additional use cases are you working on?
  4. Legacy PIDs. There are of thousands of venerable old identifier systems that people want to continue using and bring into the modern data citation ecosystem. How can we manage this effectively?
  5. The I-word. What would make heterogeneous PID systems “interoperate” optimally? Would standardized metadata and APIs across PID types solve many of the problems, and if so, how would that be achieved? What about standardized link/relation types?
  6. PIDagogy. It’s a challenge for those who provide PID services and tools to engage the wider community. How do you teach, learn, persuade, discuss, and improve adoption? What’s it mean to build a pedagogy for PIDs?
  7. PID stories. Which strategies worked? Which strategies failed? Tell us your horror stories! Share your victories!
  8. Kinds of persistence. What are the frontiers of ‘persistence’? We hear lots about fraud prevention with identifiers for scientific reproducibility, but what about data papers promoting PIDs for long-term access to reliably improving objects (software, pre-prints, datasets) or live data feeds?

PIDapalooza is organized by California Digital Library, Crossref, DataCite, and ORCID.

We believe that bringing together everyone who’s working with PIDs for two days of discussions, demos, workshops, brainstorming, and updates on the state of the art will catalyze the development of PID community tools and services.

And you can help by getting involved!.

Propose a session

Please send us your session ideas by September 18. We will notify you about your proposals in the first week of October.

Register to attend

Registration is now open — come join the festival with a crowd of like-minded innovators. And please help us spread the word about PIDapalooza in your community!

Stay tuned

Keep updated with the latest news at the PIDapalooza website and on Twitter (@PIDapalooza) in the coming weeks.

See you in November!

UC3 to Explore Amazon S3 and Glacier Use for Merritt Storage

The UC Curation Center (UC3) has offered innovative digital content access and preservation services to the UC community for over six years through its Merritt repository.  Merritt was developed by UC3 to address unique needs for high-quality curation services at scale and a low price point.   Recently, UC3 started looking into Amazon’s S3 and Glacier cloud storage products as a way to address cost concerns, fine-tune reliability issues, increase service options, and keep pace with ever-increasing scale in the volume, variety, and velocity of new content contributions.

The current Merritt pricing model, in effect since July 1, 2015, is based on recovering the costs of storage use, currently totally over 73 TB contributed from all 10 UC campuses.  This content is now being replicated in UC private clouds supported by UCLA and UCSD.   Since the closure earlier this year of the UCOP data center, the computational processes underlying Merritt, along with all other CDL services, have been moved to virtual machines in the Amazon AWS cloud.  Collocating storage alongside this computational presence in AWS will provide increased data transfer throughput during Merritt deposit and retrieval.  In addition, the integration of online S3 with near-line Glacier storage offers opportunities to lower storage costs by moving archival materials with no expectation of direct end-user access to Glacier.  The cost for Glacier storage is about one quarter of that for S3, which is comparable with UCLA and UCSD pricing.  Of course, the additional dispersed replication of Merritt-managed data in AWS will also increase overall reliability and long-term preservation assurance.

The integration of S3 and Glacier will supplement Merritt’s existing use of UC storage.  Merritt’s storage function acts as a broker that automatically routes submitted content to the appropriate storage location based on its curatorially-defined access characteristics.  Once Amazon storage has been added to Merritt, content tagged for public access will be routed to S3 for primary storage, from which it will be automatically replicated to a UC cloud.  Retrieval requests for this content will be served from the S3 copy; should these requests fail (for example, if S3 is temporarily non-responsive), Merritt automatically retries from its secondary copy.

The path for content tagged for private access is somewhat different.  It is initially routed to S3 for temporary storage until the replication to a UC cloud completes.  The content is then moved into Glacier for permanent low-cost primary storage.  Retrieval requests will be served from the UC cloud.  In the unlikely event that this retrieval doesn’t success, there is no automatic retry from Glacier, since Glacier, while inexpensive for static storage, is costly for systematic retrieval.  UC3 staff can, however, intervene manually to retrieve from Glacier if it becomes necessary.  In the case of both public and private access, the digital content will continue to be managed with at least five copies spread across independent storage infrastructures and data centers.

The integration of Amazon S3 and Glacier into Merritt’s storage architecture will increase overall reliability and performance, while possibly leading to future reduction in costs.  Once the integration is complete, UC3 will monitor AWS storage usage and associated costs through the end of the current Merritt service year in June 30, 2017, to determine the impact on Merritt pricing.

We’re hiring a new Product Manager!

CDL is recruiting for a new Product Manager.  This position will oversee the product management and outreach activities for the Dash project and service, as well as offer research data management and digital preservation consulting for the UC community.

We are looking for an experienced professional with a full understanding of product/service development and production practices.  This position (officially titled “UC3 Service Manager, Dash”) will focus on the successful development, outreach, and adoption of the Dash service.  A complete revamp of the UI and technical architecture of Dash is nearing completion.  More detail about Dash is available here. A recent presentation on the project is also available here. Because this position will focus on continuous development of Dash, it requires an enthusiastic advocate for research data management best practices, open source community building, and digital curation skills development.

A successful candidate will advocate for the needs of our constituents and translate those needs into detailed enhancements of diverse scope, size, impact, and budget  This Dash Product Manager will have a large support network: the UC3 Director, other UC3 product managers, UC3 development team, other California Digital Library departments, plus the library/IT teams across the 10 UC campuses.  

Learn more and apply here.

What is Dash?

Dash is an open source, online data publication service that makes research data sharing easy.  While Dash gives the appearance of being a full-fledged data repository, it is actually a lightweight overlay layer that sits on top of, and freely interoperates with, standards-compliant repositories supporting common protocols for submission and harvesting.  UC3 has integrated Dash with its Merritt curation repository. The Dash system provides intuitive, easy-to-use interfaces for dataset submission, description, publication, and discovery.  Dash imposes minimal prescriptive eligibility and submission requirements, and automates and hides the mechanical details of DOI assignment, data packaging, and repository deposit from the user.  It features a streamlined, self-service user experience that can be integrated easily and unobtrusively into multifarious scholarly workflows.  

What is UC3?

This position is within the University of California Curation Center (UC3) at the California Digital Library (CDL), an administrative unit of the University of California Office of the President (UCOP).  UC3 works within CDL and across the 10 UC campuses to deliver leading-edge digital curation services.  We plan, create, maintain, enhance, and operate robust services responsive to the evolving needs of UC stakeholders.  UC3’s current initiatives include digital preservation, research data management, data publication, alternative metrics for usage and impact, and web archiving. Reporting to the UC3 Director, this position is responsible for managing the development and maintenance of the Dash service, including playing a key role in promoting  and setting the strategic direction for Dash. As a member of this dynamic team, a successful candidate will be asked to contribute to furthering our work advancing digital curation concepts across the UC community.  More information about UC3 can be found at http://www.cdlib.org/uc3.  

More information about this position can be found here.