Files
Download Full Text (995 KB)
Abstract
In this digital world, access to information is essential. We need it and almost every day; Google, Bing, Yahoo, or one of these other major services that help us find what we’re looking for so we can use a resource or answer a simple question. SearchBlox, an enterprise elasticsearch toolkit which boasts a robust and easy to use indexing system for an array of MIME types lacks an essential capability: remoting. The scope of this problem requires the ability to efficiently index particularly voluminous, dense, or distributed file-systems to a centralized SearchBlox indexing server while keeping the services that clients expect from this software intact. More specifically, the ability to be easily deployed as a remote agent, access the indexed documents via a central server, remain fault tolerant, and react to changes in the file-systems in question.
The work here is important for maintaining accurate and up to date indexes to information. This problem is challenging in lieu of the sheer amount of information which is growing at an alarming rate. Storage of this information means more distributed file systems because of current hardware capacity. Users need access, and chances are – the documents will not be local. If a solution to this problem is successfully implemented, users will possess a fault tolerant streamline to the information they need.
Our approach stemmed from Requirements Engineering. We reduced the end-goal into modules and worked to produce a system of Akka actors, each with a job and a role. To achieve the performance required we implemented an Apache-esque Hadoop cluster worker nodes. Stylistically, we were agile in that each member of our team had a task to work on and often each task was dependent on each other’s modules. We took the bottom up approach, building the necessary tools to crawl, parse, and make indexes on a remote server concurrently. Then pursued the other required capabilities of reactive and ensuring the availability of files from the indexing server.
We were successful in streamlining a large volume of data to a local SearchBlox server and expect reactive, real time, and deployment capabilities in the near future.
We expect that our agent will benchmark well on the terabyte scale and will be performing metrics on several different hardware platforms to support such claims.
Publication Date
2015
Keywords
computer science, big data indexing
Disciplines
Computer Engineering | Engineering
Faculty Advisor/Mentor
Preetam Gosh
VCU Capstone Design Expo Posters
Rights
© The Author(s)
Date of Submission
July 2015