• Senior Data Scientist

    Job Locations US-VA-Herndon
    Job ID
    2018-2210
    Category
    Accounting/Finance
  • Overview

    By submitting your resume for this position, you understand and agree that ASEC may share your resume, as well as any other related personal information or documentation you provide, with its subsidiaries and affiliated companies for the purpose of considering you for other available positions.

    ASEC is an Equal Opportunity/Affirmative Action Employer.

    We consider applicants without regard to race, color, religion, age, national origin, ancestry, ethnicity, gender, gender identity, gender expression, sexual orientation, marital status, veteran status, disability, genetic information, citizenship status, or membership in any other group protected by federal, state or local law. Equal Opportunity Employer Minorities/Women/Vets/Disabled.

    Candidates must already possess a current and active TS/SCI with Poly clearance to be considered for this position

    Responsibilities

    • Identify and implement methods for duplicate detection, document categorization, entity and information extraction using natural language processing and machine learning.
    • Assist with the design and implementation of visualizations and reports for business intelligence metrics.
    • Understand and implement methodologies that are consistent with standard techniques in the data science field.
    • Propose, implement, and evaluate content analytic strategies for characterizing and categorizing large data sets of unstructured files and messages using COTS/GOTS/Open Source tools.
    • Develop custom software as required by Sponsor to characterize and categorize large datasets of unstructured files and messages.
    • Oversee construction of annotated data sets for training and evaluation of prototype tools.
    • Serve as a Subject Matter Expert (SME) in discussions with analytic tool developers and enterprise IT management.
    • Partner with information management SMEs to define and refine framework, strategies, and actions for collecting and analyzing unstructured file metadata and content stored in Sponsor's automated systems (e.g. email repositories, databases, shared drives).
    • Implement collection and analysis actions, such as ingesting, indexing, normalizing, and structuring file content and metadata in preparation for analysis using tools in the big data environment (GOTS, COTS, and open source tools including but not limited to Hadoop, Hive, Tableau, Spark, Visual Studio, and other emerging technologies).
    • Partner with information management SMEs to determine baseline, analyze patterns and characteristics in file content and metadata, and construct visualizations to share lessons learned.
    • Lead and/or contribute to discussions with Sponsor and Sponsor partners on collection and analysis framework, strategies, processes, and methodologies.
    • Build relationships with stakeholders to negotiate access, security, and storage needs for the unstructured file objects and the features created during the collection and analysis process.
    • Provide recommendations and training to Sponsor and Sponsor partners on techniques and tools in the big data environment.
    • Write MapReduce jobs, Hive queries; Python, Java, Scala, R, and Scala programs as appropriate to perform various tasks related to machine learning and data science activities including data cleanup, data transformation, data mashing and algorithm parallelization.
    • Implement algorithms from various sources (academia, federal labs or other Government Agencies) into parallelized MapReduce.
    • Analyze and correlate large amounts of data.
    • Run machine learning workflows from various platforms (such as Apache NIFI and Jupyter Notebook) on large amounts of data.
    • Administer, configure, and optimize a distributed cluster ecosystem such as Hadoop or Spark.
    • Write and deploy web-based applications using HTML, JavaScript, Java, graph, and similar technologies that expose data and allow end users to view and interact with it.

    Qualifications

    Required:

     

    • Demonstrated on-the-job experience integrating and analyzing large data sets using big-data technologies, such as GOTS, COTS, Tableau, Visual Studio, Hadoop, and Hbase.
    • Demonstrated on-the-job experience with XML, JSON, or other Customer standards for data transfer and metadata management.
    • Demonstrated on-the-job experience with Hadoop and Spark.
    • Demonstrated on-the-job experience with Java, Python, Bash scripting
    • Demonstrated on-the-job experience proposing, implementing and evaluating content analytic strategies for characterizing and categorizing large data sets.
    • Demonstrated on-the-job experience designing and implementing metadata measurement and trend extraction on large data sets
    • Ability to communicate technical concepts to a non-technical audience.

     

    Desired Skills/Experience:

    • Familiarity with Scala and or R
    • Experience with AWS
    • Familiarity with Linux/Windows
    • Systems administration for AWS, Hadoop, Spark, databases such as Oracle and MySql
    • Familiarity with Scikit-Learn, Gensim, NLTK, Spacy and the applications of these tools to Natural Language Processing
    • Familiarity with Theano, Tensorflow, Torch, Keras, Mxnet, Deeplearning4j and the application of these tools to Natural Language Processing
    • Familiarity with classification and clustering algorithms such as LightGBM, Xgboost, Random Forest, Support Vector Machine, K-means and t-SNE

    Options

    Sorry the Share function is not working properly at this moment. Please refresh the page and try again later.
    Share on your newsfeed