Learn more about your collections and help the Internet Archive test new AI and Text and Data Mining pipelines!
Do you have a set of documents, images, audio or video that you’d like to know more about? The Internet Archive's ARCH (Archives Research Compute Hub) can derive actionable data from digital collections …and we have compensation available to help with active testing.
- We are looking for digital collections of all sizes - the bigger the better!
- We are looking for mixed metadata quality - can we help make it better?
- We are looking for people willing to try new analyses - come explore with us!
What do you need to do?
- Let us know more about your collection below.
- Let us know how our funding could help you:
- Metadata enhancement or completeness?
- Scanning the missing parts of a collection?
- Something else?
- Upload your collection – no requirement to make the collection publicly available
- Use ARCH to get data from your collection
- Text recognition from images
- Speech transcription from audio and video
- Named entity recognition across extracted text
- And actively testing image description and expanding datasets
- Meet with us to tell us your experience using ARCH
That’s it!
Participants will be accepted over the next 4-6 weeks and analysis can continue through July, 2025. Internet Archive staff will respond to form submissions with more information and answers to your questions. Next steps include: discussing your goals and requirements, signing memoranda of understanding for software service access, and live training.