What Can a Business do with a Web Index?
Preview:
Citation preview
- 1. From Trust Flows Understanding The Deloitte Fast 50 Big Data
Company You never heard of Until now.
- 2. @tryMajestic Some Stuff Youll Learn How we built a search
engine without $30 billion dollars How you can use it to make lots
of: Predictions Insights Money Data Stories
- 3. @tryMajestic Reaching for the Stars
- 4. @tryMajestic An Inspiration of a Search Engine
- 5. @tryMajestic Majestic is a Specialist Search Engine Digital
knowledge on a grand scale Dixon Jones
- 6. @tryMajestic The BIG specialist search engine Twitter has
500,000,000 Tweets per day on average In the same day, Majestic
crawls well over 2,000,000,000 NEW URLs (and sees 7 billion)
- 7. @tryMajestic How do they do that? Information Retrieval in
the Zeta age 1. Data Collection 2. Data Grouping 3. Data Indexing
4. Data Matching
- 8. @tryMajestic How to Collect 7 Billion URLs a Day?
- 9. @tryMajestic How to Analyze 200 Billion URLs a Day?
- 10. @tryMajestic Groups Make Search Much Better Find a Fact
Find a Friend Find a Customer Finding Anything
LibraryofCongresscirca1940 Research At:
info.majestic.com/groupresearch
- 11. @tryMajestic We Group AND ANALYSE pages Topical Trust Flow
using decay Algorithm ???
- 12. @tryMajestic The Index: For every page we know Its
influence in a simple score Its Context Its context by keyword Its
Influence in Context! In a series of simple 0-100 scores
- 13. @tryMajestic Works best with Universal Data set Every
signal is small Individually prone to error or opinion At scale the
error decreases Confidence increases
http://info.majestic.com/universal
- 14. @tryMajestic Data Matching
- 15. @tryMajestic Our Data Stack (For the Techies) Crawler: C#
.net / Mono NoSQL Read only file system Java Interrogation Dynamic
Front End Perl/Ruby etc Hadoop coming soon
- 16. @tryMajestic So we built it Now Imagine What COULD you do
with it?
- 17. @tryMajestic 1: Compare Competitor Backlinks
- 18. @tryMajestic Who is more popular on Twitter? 2: Finding
influencers Lady Gaga? Barack Obama? Trust Flow 74 Trust Flow
70
- 19. @tryMajestic 3: Prediction Elections Boris v Ken Obama v
Romney
- 20. @tryMajestic 4: Lobbying Senators
- 21. @tryMajestic 5: Data Art (Profiling Companies)
- 22. @tryMajestic What if we Pivot? Hadoop Imagine your OWN
version of our web index? A subset of the data, prepopulated for
your needs Updated Daily / Weekly / Monthly Stored in Open Source
Hadoop instances ready for easy interrogation What could you do
then?
- 23. @tryMajestic Data Store Examples
- 24. @tryMajestic
- 25. @tryMajestic
- 26. @tryMajestic
- 27. @tryMajestic Ways you could segment the web All domains
hosted in [Choose country or City Here] Most influential sites
about [Insert 800 Topics Here] Best Web Pages for [Choose 50
Million Phrases Here] Spamiest pages about [Insert 800 Topics Here]
Most influential Pages on [Choose any set of sites] Create a set of
pages with [Choose properties here] Got a plan? We have the
starting point for web data
- 28. @tryMajestic Some Takeaways How we built a search engine
without $30 billion dollars How you can use it to make lots of:
Predictions Insights Money Data Stories
- 29. @tryMajestic Out of Trust Flows understanding Real insight
into the world wide web from Majestic, the specialist search
engine
- 30. From Trust Flows Understanding