Information Harvesting

Information Harvesting

Information Harvesting (IH) was an early data mining product from the 1990s. It was invented by Ralphe Wiggins and produced by the Ryan Corp, later Information Harvesting Inc., of Cambridge, Massachusetts. Wiggins had a background in genetic algorithms and fuzzy logic. IH sought to infer rules from sets of data. It did this first by classifying various input variables into one of a number of bins, thereby putting some structure on the continuous variables in the input. IH then proceeds to generate rules, trading off generalization against memorization, that will infer the value of the prediction variable, possibly creating many levels of rules in the process. It included strategies for checking if overfitting took place and, if so, correcting for it. Because of its strategies for correcting for overfitting by considering more data, and refining the rules based on that data, IH might also be considered to be a form of machine learning. The advantage of IH, as compared with other data mining products of its time and even later, was that it provided a mechanism for finding multiple rules that would classify the data and determining, according to set criteria, the best rules to use.

Syman

SYMAN is an artificial intelligence technology that uses data from social media profiles to identify trends in the job market. SYMAN is designed to organize actionable data for products and services including recruiting, human capital management, CRM, and marketing. SYMAN was developed with a $21 million series B financing round secured by Identified, which was led by VantagePoint Capital Partners and Capricorn Investment Group.

Kalman filter

In statistics and control theory, Kalman filtering (also known as linear quadratic estimation) is an algorithm that uses a series of measurements observed over time, including statistical noise and other inaccuracies, to produce estimates of unknown variables that tend to be more accurate than those based on a single measurement, by estimating a joint probability distribution over the variables for each time-step. The filter is constructed as a mean squared error minimiser, but an alternative derivation of the filter is also provided showing how the filter relates to maximum likelihood statistics. The filter is named after Rudolf E. Kálmán. Kalman filtering has numerous technological applications. A common application is for guidance, navigation, and control of vehicles, particularly aircraft, spacecraft and ships positioned dynamically. Furthermore, Kalman filtering is much applied in time series analysis tasks such as signal processing and econometrics. Kalman filtering is also important for robotic motion planning and control, and can be used for trajectory optimization. Kalman filtering also works for modeling the central nervous system's control of movement. Due to the time delay between issuing motor commands and receiving sensory feedback, the use of Kalman filters provides a realistic model for making estimates of the current state of a motor system and issuing updated commands. The algorithm works via a two-phase process: a prediction phase and an update phase. In the prediction phase, the Kalman filter produces estimates of the current state variables, including their uncertainties. Once the outcome of the next measurement (necessarily corrupted with some error, including random noise) is observed, these estimates are updated using a weighted average, with more weight given to estimates with greater certainty. The algorithm is recursive. It can operate in real time, using only the present input measurements and the state calculated previously and its uncertainty matrix; no additional past information is required. Optimality of Kalman filtering assumes that errors have a normal (Gaussian) distribution. In the words of Rudolf E. Kálmán, "The following assumptions are made about random processes: Physical random phenomena may be thought of as due to primary random sources exciting dynamic systems. The primary sources are assumed to be independent gaussian random processes with zero mean; the dynamic systems will be linear." Regardless of Gaussianity, however, if the process and measurement covariances are known, then the Kalman filter is the best possible linear estimator in the minimum mean-square-error sense, although there may be better nonlinear estimators. It is a common misconception (perpetuated in the literature) that the Kalman filter cannot be rigorously applied unless all noise processes are assumed to be Gaussian. Extensions and generalizations of the method have also been developed, such as the extended Kalman filter and the unscented Kalman filter which work on nonlinear systems. The basis is a hidden Markov model such that the state space of the latent variables is continuous and all latent and observed variables have Gaussian distributions. Kalman filtering has been used successfully in multi-sensor fusion, and distributed sensor networks to develop distributed or consensus Kalman filtering. == History == The filtering method is named for Hungarian émigré Rudolf E. Kálmán, although Thorvald Nicolai Thiele and Peter Swerling developed a similar algorithm earlier. Richard S. Bucy of the Johns Hopkins Applied Physics Laboratory contributed to the theory, causing it to be known sometimes as Kalman–Bucy filtering. Kalman was inspired to derive the Kalman filter by applying state variables to the Wiener filtering problem. Stanley F. Schmidt is generally credited with developing the first implementation of a Kalman filter. He realized that the filter could be divided into two distinct parts, with one part for time periods between sensor outputs and another part for incorporating measurements. It was during a visit by Kálmán to the NASA Ames Research Center that Schmidt saw the applicability of Kálmán's ideas to the nonlinear problem of trajectory estimation for the Apollo program resulting in its incorporation in the Apollo navigation computer. This digital filter is sometimes termed the Stratonovich–Kalman–Bucy filter because it is a special case of a more general, nonlinear filter developed by the Soviet mathematician Ruslan Stratonovich. In fact, some of the special case linear filter's equations appeared in papers by Stratonovich that were published before the summer of 1961, when Kalman met with Stratonovich during a conference in Moscow. This Kalman filtering was first described and developed partially in technical papers by Swerling (1958), Kalman (1960) and Kalman and Bucy (1961). The Apollo computer used 2k of magnetic core RAM and 36k wire rope [...]. The CPU was built from ICs [...]. Clock speed was under 100 kHz [...]. The fact that the MIT engineers were able to pack such good software (one of the very first applications of the Kalman filter) into such a tiny computer is truly remarkable. Kalman filters have been vital in the implementation of the navigation systems of U.S. Navy nuclear ballistic missile submarines, and in the guidance and navigation systems of cruise missiles such as the U.S. Navy's Tomahawk missile and the U.S. Air Force's Air Launched Cruise Missile. They are also used in the guidance and navigation systems of reusable launch vehicles and the attitude control and navigation systems of spacecraft which dock at the International Space Station. == Overview of the calculation == Kalman filtering uses a system's dynamic model (e.g., physical laws of motion), known control inputs to that system, and multiple sequential measurements (such as from sensors) to form an estimate of the system's varying quantities (its state) that is better than the estimate obtained by using only one measurement alone. As such, it is a common sensor fusion and data fusion algorithm. Noisy sensor data, approximations in the equations that describe the system evolution, and external factors that are not accounted for, all limit how well it is possible to determine the system's state. The Kalman filter deals effectively with the uncertainty due to noisy sensor data and, to some extent, with random external factors. The Kalman filter produces an estimate of the state of the system as an average of the system's predicted state and of the new measurement using a weighted average. The purpose of the weights is that values with better (i.e., smaller) estimated uncertainty are "trusted" more. The weights are calculated from the covariance, a measure of the estimated uncertainty of the prediction of the system's state. The result of the weighted average is a new state estimate that lies between the predicted and measured state, and has a better estimated uncertainty than either alone. This process is repeated at every time step, with the new estimate and its covariance informing the prediction used in the following iteration. This means that Kalman filter works recursively and requires only the last "best guess", rather than the entire history, of a system's state to calculate a new state. The measurements' certainty-grading and current-state estimate are important considerations. It is common to discuss the filter's response in terms of the Kalman filter's gain. The Kalman gain is the weight given to the measurements and current-state estimate, and can be "tuned" to achieve a particular performance. With a high gain, the filter places more weight on the most recent measurements, and thus conforms to them more responsively. With a low gain, the filter conforms to the model predictions more closely. At the extremes, a high gain (close to one) will result in a more jumpy estimated trajectory, while a low gain (close to zero) will smooth out noise but decrease the responsiveness. When performing the actual calculations for the filter (as discussed below), the state estimate and covariances are coded into matrices because of the multiple dimensions involved in a single set of calculations. This allows for a representation of linear relationships between different state variables (such as position, velocity, and acceleration) in any of the transition models or covariances. == Example application == As an example application, consider the problem of determining the precise location of a truck. The truck can be equipped with a GPS unit that provides an estimate of the position within a few meters. The GPS estimate is likely to be noisy; readings 'jump around' rapidly, though remaining within a few meters of the real position. In addition, since the truck is expected to follow the laws of physics, its position can also be estimated by integrating its velocity over time, determined by keeping track of wheel revolutions and the

Bibliotheca Polyglotta

The Bibliotheca Polyglotta is a Norwegian database for Multilingualism project, lingua franca and science per global history at the University of Oslo. The aim of the project is according to pages is "producing a web corpus of Buddhist texts for using in multilingual lexicography. More generally, will the texts used for the study Sanskrit, Chinese and Tibetan."

Maike Osborne

Maike Osborne (born Michael Osborne, 1982) is an Australian academic and scientist who serves as a professor of machine learning at University of Oxford in the Machine Learning Research Group in the Department of Engineering Science. In 2016 she co-founded Mind Foundry, an artificial intelligence company, along with fellow professor Stephen Roberts. == Education == She has a BEng in Mechanical Engineering and a BSc in both Pure Mathematics and Physics from the University of Western Australia. She has a PhD in Machine Learning from the University of Oxford. == Career == Osborne has contributed to over 100 publications, and her work has received over 24,000 citations with an h-index of 46 according to Google Scholar. and has acted as principal or co-investigator for £10.6M of research funding. Her career has focused in particular on Bayesian approaches to AI and machine learning, named after the famous British statistician Thomas Bayes. Osborne's work has contributed to Probabilistic numerics, with Osborne co-authoring the first textbook on the subject. In 2013, Osborne co-authored a paper alongside Swedish-German economist Carl Benedikt Frey called "The Future of Employment: How Susceptible are Jobs to Computerisation?". The paper has received over 13,000 citations and extensive media coverage. In 2023 Osborne gave oral evidence to the UK House of Commons Science and Technology Committee on the subject of the "Governance of Artificial Intelligence". Her testimony received significant coverage around her warnings of the threat of "rogue AI". == Honors == She is also an Official Fellow of Exeter College, and St Peter's College, Oxford, a Fellow of the ELLIS society, and a Faculty Member of the Oxford-Man Institute of Quantitative Finance. She joined the Oxford Martin School as Lead Researcher on the Oxford Martin Programme on Technology and Employment in 2015. She is a Director of the EPSRC Centre for Doctoral Training in Autonomous Intelligent Machines and Systems.

Digital image processing

Digital image processing is the use of a digital computer to process digital images through an algorithm. As a subcategory or field of digital signal processing, digital image processing has many advantages over analog image processing. It allows a much wider range of algorithms to be applied to the input data and can avoid problems such as the build-up of noise and distortion during processing. Since images are defined over two dimensions (perhaps more), digital image processing may be modeled in the form of multidimensional systems. The generation and development of digital image processing are mainly affected by three factors: first, the development of computers; second, the development of mathematics (especially the creation and improvement of discrete mathematics theory); and third, the demand for a wide range of applications in environment, agriculture, military, industry and medical science has increased. == History == Many of the techniques of digital image processing, or digital picture processing as it often was called, were developed in the 1960s, at Bell Laboratories, the Jet Propulsion Laboratory, Massachusetts Institute of Technology, University of Maryland, and a few other research facilities, with application to satellite imagery, wire-photo standards conversion, medical imaging, videophone, character recognition, and photograph enhancement. The purpose of early image processing was to improve the quality of the image. In image processing, the input is a low-quality image, and the output is an image with improved quality. Common image processing includes image enhancement, restoration, encoding, and compression. The first successful application was the American Jet Propulsion Laboratory (JPL). They used image processing techniques such as geometric correction, gradation transformation, noise removal, etc. on the thousands of lunar photos sent back by the Space Detector Ranger 7 in 1964, taking into account the position of the Sun and the environment of the Moon. The impact of the successful mapping of the Moon's surface map by the computer has been a success. Later, more complex image processing was performed on the nearly 100,000 photos sent back by the spacecraft, so that the topographic map, color map and panoramic mosaic of the Moon were obtained, which achieved extraordinary results and laid a solid foundation for human landing on the Moon. The cost of processing was fairly high, however, with the computing equipment of that era. That changed in the 1970s, when digital image processing proliferated as cheaper computers and dedicated hardware became available. This led to images being processed in real-time, for some dedicated problems such as television standards conversion. As general-purpose computers became faster, they started to take over the role of dedicated hardware for all but the most specialized and computer-intensive operations. With the fast computers and signal processors available in the 2000s, digital image processing has become the most common form of image processing, and is generally used because it is not only the most versatile method, but also the cheapest. === Image sensors === The basis for modern image sensors is metal–oxide–semiconductor (MOS) technology, invented at Bell Labs between 1955 and 1960, This led to the development of digital semiconductor image sensors, including the charge-coupled device (CCD) and later the CMOS sensor. The charge-coupled device was invented by Willard S. Boyle and George E. Smith at Bell Labs in 1969. While researching MOS technology, they realized that an electric charge was the analogy of the magnetic bubble and that it could be stored on a tiny MOS capacitor. As it was fairly straightforward to fabricate a series of MOS capacitors in a row, they connected a suitable voltage to them so that the charge could be stepped along from one to the next. The CCD is a semiconductor circuit that was later used in the first digital video cameras for television broadcasting. The NMOS active-pixel sensor (APS) was invented by Olympus in Japan during the mid-1980s. This was enabled by advances in MOS semiconductor device fabrication, with MOSFET scaling reaching smaller micron and then sub-micron levels. The NMOS APS was fabricated by Tsutomu Nakamura's team at Olympus in 1985. The CMOS active-pixel sensor (CMOS sensor) was later developed by Eric Fossum's team at the NASA Jet Propulsion Laboratory in 1993. By 2007, sales of CMOS sensors had surpassed CCD sensors. MOS image sensors are widely used in optical mouse technology. The first optical mouse, invented by Richard F. Lyon at Xerox in 1980, used a 5 μm NMOS integrated circuit sensor chip. Since the first commercial optical mouse, the IntelliMouse introduced in 1999, most optical mouse devices use CMOS sensors. === Image compression === An important development in digital image compression technology was the discrete cosine transform (DCT), a lossy compression technique first proposed by Nasir Ahmed in 1972. DCT compression became the basis for JPEG, which was introduced by the Joint Photographic Experts Group in 1992. JPEG compresses images down to much smaller file sizes, and has become the most widely used image file format on the Internet. Its highly efficient DCT compression algorithm was largely responsible for the wide proliferation of digital images and digital photos, with several billion JPEG images produced every day as of 2015. Medical imaging techniques produce very large amounts of data, especially from CT, MRI and PET modalities. As a result, storage and communications of electronic image data are prohibitive without the use of compression. JPEG 2000 image compression is used by the DICOM standard for storage and transmission of medical images. The cost and feasibility of accessing large image data sets over low or various bandwidths are further addressed by use of another DICOM standard, called JPIP, to enable efficient streaming of the JPEG 2000 compressed image data. === Digital signal processor (DSP) === Electronic signal processing was revolutionized by the wide adoption of MOS technology in the 1970s. MOS integrated circuit technology was the basis for the first single-chip microprocessors and microcontrollers in the early 1970s, and then the first single-chip digital signal processor (DSP) chips in the late 1970s. DSP chips have since been widely used in digital image processing. The discrete cosine transform (DCT) image compression algorithm has been widely implemented in DSP chips, with many companies developing DSP chips based on DCT technology. DCTs are widely used for encoding, decoding, video coding, audio coding, multiplexing, control signals, signaling, analog-to-digital conversion, formatting luminance and color differences, and color formats such as YUV444 and YUV411. DCTs are also used for encoding operations such as motion estimation, motion compensation, inter-frame prediction, quantization, perceptual weighting, entropy encoding, variable encoding, and motion vectors, and decoding operations such as the inverse operation between different color formats (YIQ, YUV and RGB) for display purposes. DCTs are also commonly used for high-definition television (HDTV) encoder/decoder chips. == Tasks == Digital image processing allows the use of much more complex algorithms, and hence, can offer both more sophisticated performance at simple tasks, and the implementation of methods which would be impossible by analogue means. In particular, digital image processing is a concrete application of, and a practical technology based on: Classification Feature extraction Multi-scale signal analysis Pattern recognition Projection Some techniques that are used in digital image processing include: Anisotropic diffusion Hidden Markov models Image editing Image restoration Independent component analysis Linear filtering Neural networks Partial differential equations Pixelation Point feature matching Principal components analysis Self-organizing maps Wavelets == Digital image transformations == === Filtering === Digital filters are used to blur and sharpen digital images. Filtering can be performed by: convolution with specifically designed kernels (filter array) in the spatial domain masking specific frequency regions in the frequency (Fourier) domain The following examples show both methods: ==== Image padding in Fourier domain filtering ==== Images are typically padded before being transformed to the Fourier space, the highpass filtered images below illustrate the consequences of different padding techniques: Notice that the highpass filter shows extra edges when zero padded compared to the repeated edge padding. ==== Filtering code examples ==== MATLAB example for spatial domain highpass filtering. === Affine transformations === Affine transformations enable basic image transformations including scale, rotate, translate, mirror and shear as is shown in the following examples: To apply the affine

Jiaya Jia

Jiaya Jia (Chinese: 贾佳亚) is a Chair Professor of the Department of Computer Science and Engineering at The Hong Kong University of Science and Technology (HKUST). He is an IEEE Fellow, the associate editor-in-chief of one of IEEE’s flagship and premier journals- Transactions on Pattern Analysis and Machine Intelligence (TPAMI), as well as on the editorial board of International Journal of Computer Vision (IJCV). == Early life and education == Jiaya Jia joined CUHK in 2004 as an assistant professor, and was promoted to full professor in 2015. He obtained his PhD degree in computer science jointly from Hong Kong University of Science and Technology and Microsoft Research in 2004. From March 2003 to August 2004, he was a visiting scholar at Microsoft. He conducted collaborative research at Adobe Research in 2007. == Career == Jiaya Jia is a distinguished scientist in the fields of computer vision and artificial intelligence. His research team at HKUST, DV Lab, is one of the largest vision AI research teams in the world and has been making significant contribution to advanced development of computer vision algorithms and technologies with focuses on image/video understanding, detection and segmentation, multi-modal AI, computational imaging, practical optimization, and advanced learning for visual content since 2000. Jiaya Jia has published 200+ top papers and was cited 80,000+ times on Google Scholar with H-Index 110+. 40+ PhDs and fellows from this group are now active in academia and industry, and have become prominent AI tech leaders as professors, directors in major research labs, and founders of several successful startups. Jiaya Jia assumes the position of associate editor-in-chief of IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) since 2021. He is also on the editorial board of International Journal of Computer Vision (IJCV). Jiaya Jia has served as the area chair of ICCV, CVPR, AAAI, ECCV, and several other premium international AI conferences for years. He was on program committees of major conferences in graphics and computational imaging, including ICCP, SIGGRAPH, and SIGGRAPH Asia. == Research == The research areas of Jiaya Jia are computer vision, large X models, and deep learning. Jiaya Jia has made outstanding contributions to computer vision technology, algorithms and engineering, and is among the world's leading experts in the field. His research partners include numerous renowned multinational technology companies, such as Microsoft, Qualcomm, Adobe, Intel, NVIDIA, Amazon, and Lenovo. Jia has cultivated a number of outstanding talents with Master's and PhDs who continue to engage in scientific research and development in computer vision. Many technologies in image analysis and processing developed by Jiaya Jia are still leading in the field worldwide. Wherein, his achievements in image deblurring, filtering, image sparse processing, multi-band image signal fusion and enhancement, large range motion estimation, texture and structure-based layering, etc. have been published in the industry's most influential conferences and publications, and implemented in the real-world applications. These achievements have demonstrated outstanding performance in established systems, and most of which are open source so as to enable wider applications across industries such as aviation, medical imaging, safety management, robotic design, meteorological analysis and many more. == Selected publications == In his over 20 years of research experience, Jiaya Jia has published 200+ top papers that have been cited more than 80,000 times. According to HKUST Website in August 2024, Jiaya Jia has accumulatively published over 200 scientific papers in books, journals and conferences, such as IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), International Journal of Computer Vision (IJCV) "Computer Vision and Pattern Recognition (CVPR)", and "International Conference on Computer Vision (ICCV)". Representative papers include: Jiaya Jia: Mathematical Models and Practical Solvers for Uniform Motion Deblurring (in Motion Deblurring: Algorithms and Systems), Cambridge University Press, ISBN 9781107044364, 2014; Jiaya Jia: “Matte Extraction” Book: Computer Vision - A Reference Guide, Springer, ISBN 9780387307718 Editor-in-chief: Ikeuchi, Katsushi; Jiaya Jia, Chi-Keung Tang:Image Stitching Using Structure Deformation,IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Vol. 30, No. 4, 2008; Jiaya Jia, Jian Sun, Chi-Keung Tang, Heung-Yeung Shum:Drag-and-Drop Pasting,ACM Transactions on Graphics (also in SIGGRAPH 2006), Vol. 25, No. 3, 2006. Xiaojuan Qi, Zheng zhe Liu, Renjie Liao, Philip HS Torr, Raquel Urtasun, Jiaya Jia:GeoNet++: Iterative Geometric Neural Network with Edge-Aware Refinement for Joint Depth and Surface Normal Estimation,IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI). Accepted. == Selected honors and awards == ACM Fellow. 1st Place of WAD Drivable Area Segmentation Challenge 2018; 1st Place of LSUN'17 Instance and Semantic Segmentation Challenges; 1st Place of COCO Instance Segmentation Challenge 2017; 2nd Place in COCO Detection Challenge 2017; 1st Place of ImageNet Scene Parsing Challenge 2016 with the paper PSPNet presented in CVPR 2017.